WEBVTT 00:00.000 --> 00:12.000 Hello, good afternoon to everybody here. Good morning or hopefully not good middle of the night to folks out on the internet. 00:12.000 --> 00:17.000 I am so obnoxiously excited to be here with you all today. 00:17.000 --> 00:25.000 I'm going to talk to you about multi-lingual speech technologies for a global world, which is a bit like saying ATM machine, global world. 00:25.000 --> 00:30.000 It's a world world, it's kind of the point. 00:30.000 --> 00:37.000 My name is Jess and I work for the Common Voice Project over at the Mizzilla Foundation. 00:37.000 --> 00:44.000 I am absolutely needlessly intense about Birmingham being the best British city. 00:44.000 --> 00:53.000 For folks watching on the video, I'm the only one might, but somebody just like explained it to your agreement. 00:53.000 --> 00:59.000 I'm also really excited about communication and human languages. 00:59.000 --> 01:03.000 I'm really bad at learning languages and I have no self-control. 01:03.000 --> 01:10.000 Please, please do not talk to me after this and tell me anything interesting about your language. 01:10.000 --> 01:15.000 You will ruin my summer. Please, please, please don't. 01:15.000 --> 01:20.000 We're going to be talking about data and models and multi-lingual models. 01:20.000 --> 01:27.000 So the first thing I wanted to do was just go ahead and make the nod to AI hype. 01:27.000 --> 01:33.000 Are these models going to change the world? Are these part of Gen AI? Is this the most important thing? 01:33.000 --> 01:35.000 Are they going to replace programmers? 01:35.000 --> 01:42.000 Joyously, this is so far above my pay grade that I just get to sort of nod and bump along. 01:42.000 --> 01:52.000 In a non-editorial way though, I'd like to suggest that maybe we could talk about something practical and useful. 01:52.000 --> 01:58.000 I want to talk to you about multi-lingual speech technologies. 01:58.000 --> 02:05.000 And I want to talk to you about speech technologies because hopeful technology in 2025 feels a little bit rare. 02:05.000 --> 02:08.000 So I'd love to see if we can all be excited together. 02:08.000 --> 02:17.000 If any of you are eventually going to be building some of these, I'd love to kill any excuses you may have to not make them linguistically inclusive. 02:17.000 --> 02:26.000 But for those of you who are using these speech technologies, I'd love for you to be really, really aggressive out in the world about why doesn't this understand me. 02:26.000 --> 02:31.000 And why doesn't this understand people who aren't like me? 02:31.000 --> 02:37.000 Speech technologies are so boring. They are joyously boring. 02:37.000 --> 02:43.000 Almost all of them have terrible diagrams, and this was my favorite one which is quite good. 02:43.000 --> 02:47.000 A research team of Arshalan, I'm in Ramsey and Allen. 02:47.000 --> 02:54.000 Did this really fantastic one, which I think walks us through how ASR, automatic speech recognition, 02:55.000 --> 03:04.000 and speech to text, turning speech waveforms into a type of input a computer can recognize and act on. 03:04.000 --> 03:13.000 So we've got a bucket because that's how we like our data, just in a cylinder of transcribed speech data. 03:13.000 --> 03:19.000 So that's going to be clips of people talking and then text associated with that. 03:19.000 --> 03:29.000 And that gets built into a model to say, hey, when you hear these waveforms, it's associated with these words, this orthography. 03:29.000 --> 03:38.000 And then we've got text data that also feeds into models for what we think that's likely to be saying, and a model for pronunciation. 03:38.000 --> 03:42.000 So in this dialect, in this language, how are things pronounced? 03:42.000 --> 03:48.000 When a speaker comes to a speech technology, the technology will extract out the features. 03:48.000 --> 03:52.000 Usually, we'll get a cool look at the shape of those words. 03:52.000 --> 04:00.000 And all three of these models are going to feed into a decoder, and the thing I love best is, human speech is messy. 04:00.000 --> 04:03.000 Human speech is wild, even in the best of times. 04:03.000 --> 04:09.000 And speech technologies, automatic speech recognition is going to spit out hypothesized text. 04:09.000 --> 04:11.000 What we think that person said, 04:11.000 --> 04:14.000 is anybody Scottish? 04:14.000 --> 04:16.000 Cool. 04:16.000 --> 04:20.000 This works better or worse in some languages. 04:20.000 --> 04:25.000 And this works better and worse for some accents and variants and dialects. 04:25.000 --> 04:27.000 I'm not picking on Scottish people. 04:27.000 --> 04:36.000 They just seem to have the hardest time for speech technologies just really aren't voice assessments. 04:36.000 --> 04:40.000 When I give this talk in other types of mainstream conferences, I love to ask, 04:40.000 --> 04:45.000 like, hey, who uses those little things that live in your house? 04:45.000 --> 04:48.000 And totally don't listen to you when you don't use the key words. 04:48.000 --> 04:54.000 Who uses speech assistance, voice assistance, or those, oh, oh, cool. 04:54.000 --> 05:00.000 I mean, I'm, yeah, me too. 05:00.000 --> 05:07.000 I would definitely do the, so speech technologies are boring, but they're also not just 05:07.000 --> 05:09.000 Siri or Alexa. 05:09.000 --> 05:13.000 They're really, really things that are often exciting. 05:13.000 --> 05:22.000 So they're not perfect, but real time interpretation applications are magical feeling. 05:22.000 --> 05:28.000 I've seen them use together and share livestock disease information for pastoral farmers. 05:28.000 --> 05:31.000 Use for remote medical services platforms. 05:31.000 --> 05:35.000 So the ability to say, hey, I'm bleeding really bad. 05:35.000 --> 05:36.000 What can I do? 05:36.000 --> 05:40.000 It can be quite good when you don't have both hands free at the moment. 05:40.000 --> 05:44.000 Used in assistive technologies, which I absolutely love. 05:44.000 --> 05:47.000 Power is the automatic transcriptions. 05:47.000 --> 05:51.000 We're seeing more and more on videos for whatever value that's worth. 05:51.000 --> 06:00.000 And really at the end of the day, any time you talk to a machine, we're seeing ASR or speech-to-text at work. 06:00.000 --> 06:05.000 I'm hopeful because this is huge for lowering barriers. 06:05.000 --> 06:08.000 You don't need literacy skills. 06:08.000 --> 06:12.000 And all of us hanging out in well-developed Western European city. 06:12.000 --> 06:15.000 We know, oh, people who, a lot of people can't read. 06:15.000 --> 06:18.000 This could be due to cognitive challenges. 06:18.000 --> 06:21.000 This could be coming from a background where there's less literacy. 06:21.000 --> 06:28.000 I kind of don't care what I do care about is people deserve access to this technology. 06:28.000 --> 06:33.000 And if we can remove literacy as a barrier and let people interact through speech, 06:33.000 --> 06:37.000 at least to me, that's so dazzlingly exciting. 06:37.000 --> 06:42.000 You don't need to use your hands either because you don't have use of them now, 06:42.000 --> 06:45.000 or you're doing something else with them. 06:45.000 --> 06:47.000 You can use them. 06:47.000 --> 06:52.000 You can use speech technologies while probably not driving your car, please. 06:52.000 --> 06:53.000 And thank you. 06:53.000 --> 06:57.000 I'm not sure about local laws, but while using equipment machinery, 06:57.000 --> 07:04.000 while doing other things, and speaking requires less directed focus than writing. 07:04.000 --> 07:09.000 If I have to type something, I mean, a screen in my field of vision. 07:09.000 --> 07:13.000 I should probably almost definitely not do this while driving or operating machinery. 07:13.000 --> 07:19.000 But I could probably use speech technologies while chatting with you all. 07:19.000 --> 07:26.000 And interactions with speech technologies mirror the ways many of us communicate with each other day. 07:26.000 --> 07:34.000 This isn't a main point, but the ability to talk to our computers is retro sci-fi magic. 07:34.000 --> 07:42.000 They're probably not going to love us back, but the brief opportunity to feel like they talk to us is pretty cool. 07:42.000 --> 07:46.000 And linguistic include exclusion, so saying, 07:46.000 --> 07:52.000 hey, these speech technologies work, but not for everyone has huge impacts. 07:52.000 --> 07:57.000 When we're seeing people come to the web for the first time, whether in speech or text, 07:57.000 --> 08:02.000 we're seeing people join us in equal hardware, in equal connectivity, 08:02.000 --> 08:10.000 but we're also asking users to join us, often a second or third, or in a language they don't primarily speak. 08:10.000 --> 08:15.000 At this language is, this is an English spang on about all the time, 08:15.000 --> 08:18.000 but languages die every day. 08:18.000 --> 08:21.000 And the languages that die are the languages that don't get used. 08:21.000 --> 08:26.000 The languages we use are the ones we include in our tool line. 08:26.000 --> 08:38.000 And accent and variant-based barriers to speech technologies mean that they develop additional barriers based on class and race and region. 08:38.000 --> 08:48.000 Speech technology isn't just voice assistance, but voice assistance are a fantastic sort of lens through which to see what works and what doesn't. 08:48.000 --> 08:56.000 How many languages there are and what's a language, what's a variant, is kind of the tab's versus spaces of linguistics. 08:56.000 --> 09:00.000 But there's definitely more than 7,000 languages in the world. 09:00.000 --> 09:04.000 Voice assistance work really, really well with about 20. 09:04.000 --> 09:15.000 I'd go ahead and gather that we've got more than 21st language speakers of different languages here today. 09:15.000 --> 09:25.000 This means that the datasets that power them are coming from backgrounds where they underrepresented and lock out people of color, 09:25.000 --> 09:30.000 people from indigenous backgrounds, not just different language backgrounds. 09:30.000 --> 09:39.000 While I'm super excited about speech technologies and how hopeful they are, they're not being built for everybody right now. 09:39.000 --> 09:44.000 When we saw this diagram, I was pointing at all of the things I was excited about. 09:44.000 --> 09:58.000 One of the things I'm also excited about is this research that produced this diagram that I love is actually about the impact of gender and dialect and training size. 09:58.000 --> 10:06.000 Gender and dialect in the training data for Arabic automatic speech recognition. 10:06.000 --> 10:09.000 And this is broken because of the datasets. 10:09.000 --> 10:19.000 When we come back here and see that we've got this transcribed speech data that begins all of our processes for doing automatic speech recognition, 10:19.000 --> 10:27.000 that data not being there or not being included for these models to be trained kind of breaks a lot of things. 10:27.000 --> 10:35.000 Almost all of these datasets are proprietary, close source, and they are expensive as hell. 10:35.000 --> 10:37.000 Often they're limited in demographic scope. 10:37.000 --> 10:45.000 If you gave me a project why I needed to collect 17 hours of Belgian French data right now, 10:45.000 --> 10:48.000 I'd probably run a university campus. 10:48.000 --> 10:53.000 I'd probably go talk to younger students who are chill with casual work. 10:53.000 --> 11:05.000 And often times when we do get regional languages, these are being collected by folks from outside the region who don't understand the language, and are going to wind up with weird data. 11:05.000 --> 11:14.000 I can talk to you about what we did, but in doing so much less of a hey use this completely free and CC0 dataset pitch. 11:14.000 --> 11:22.000 But more of a let's talk through the different aspects of the data and the different aspects of linguistic data that I'm really excited about. 11:22.000 --> 11:29.000 So what common voice did back in 2017 is start collecting speech data via crowdsource platform. 11:29.000 --> 11:33.000 The big thing I want to stress is this is not our data. 11:33.000 --> 11:41.000 We never had a language unless someone from that language with community asks us, and we see ourselves as librarians of it. 11:41.000 --> 11:51.000 We release that data every quarter under CC0 license because it belongs to the people who made it. 11:52.000 --> 11:58.000 If you did want to collect your own data, let's kind of walk through how we did it and how we've been doing it. 11:58.000 --> 12:13.000 Somebody asks us to add a language right now we've got 131 languages on common voice, which doesn't really stack up to the 7,000 I was shouting about earlier. 12:13.000 --> 12:20.000 Because people are donating their voices on a website, we then ask people to help us localize the platform. 12:20.000 --> 12:30.000 We ask you to donate your voice, and I ask you to agree to the terms and a language you don't speak, that is exceedingly uncool. 12:30.000 --> 12:40.000 Because we're a red speech corpus, we also need copyright free sentences for folks to come to the website to read in their home language. 12:40.000 --> 12:46.000 We launch a new language, we have a little bit of a remote party in the office, it's fine. 12:46.000 --> 12:49.000 But people come and contribute their voices. 12:49.000 --> 12:54.000 Other people validate them because that data needs to be validated to be really valuable. 12:54.000 --> 12:59.000 And then because it's not our data, we just go ahead and kick it out into the world every quarter. 12:59.000 --> 13:06.000 If you thought, I want to do this, I want to go ahead and collect my own data, love it, love it, love it, do it. 13:06.000 --> 13:09.000 Let's walk through all this stuff you got to do. 13:10.000 --> 13:15.000 This will absolutely show exactly how old I am. 13:15.000 --> 13:22.000 But common voices gone for a CC0 license, which is the most yellow license you could possibly source. 13:22.000 --> 13:27.000 It means that folks can package the data, resell the data, use it for whatever they want. 13:27.000 --> 13:30.000 We thought this was a really good fit for our project. 13:30.000 --> 13:37.000 The big thing I want to shout about for folks looking at speech data collection is, this is not a good fit for every language community. 13:37.000 --> 13:44.000 We've talked to folks from indigenous language backgrounds saying, how do I keep big tech from accessing this? 13:44.000 --> 13:49.000 Say fantastic, not by using a CC0 license. 13:49.000 --> 13:53.000 So really looking at where the data comes from? 13:53.000 --> 13:59.000 While I could not definitively speak to it, when we look at a lot of big pre-trained models these days, 13:59.000 --> 14:05.000 there's a lot of supposition that they're trained on non-consensually access data. 14:05.000 --> 14:12.000 So thinking about, I'm using speech technologies, where do I want to get my data from, and is it okay to use the data? 14:12.000 --> 14:16.000 Is something that I really like to encourage folks think about? 14:16.000 --> 14:24.000 Think about the age of the data, the speakers in your speech data is massively important. 14:24.000 --> 14:33.000 For common voice, we ask people be adults to contribute their voices for a range of different ethical and legal reasons. 14:33.000 --> 14:43.000 And this is true of a lot of open data sets, but this means that speech technologies overwhelmingly, working incredibly poorly with young voices. 14:43.000 --> 14:52.000 Speech technologies also tend to work very, very poorly with the elderly, also because these are underrepresented demographics. 14:52.000 --> 15:02.000 Right now, we've got, so license, age of your speakers, literally just the languages you want to include. 15:02.000 --> 15:15.000 And this is massive. Right now, we've got 130 plus. I love to ask folks, and y'all have been chatty so far, to guess which language you think we have the most data for. 15:15.000 --> 15:18.000 I don't know y'all can y'all, I mean, that's fine. 15:18.000 --> 15:20.000 Oh. 15:20.000 --> 15:24.000 Oh, okay, with that English Spanish Chinese, Hindi. 15:24.000 --> 15:25.000 What? 15:25.000 --> 15:28.000 Yes. 15:29.000 --> 15:41.000 Okay, look, it's cheating, if you know. So we got a lot of really good and usually folks guess English first because tech tends to optimize for English data first. 15:41.000 --> 15:48.000 But we found that folks from research and language communities tend to be really, really passionate. 15:48.000 --> 15:50.000 I can't pick favorites. 15:50.000 --> 15:57.000 But folks like the cut-alone language community, the Welsh language community, have just been really, really passionate about. 15:57.000 --> 16:02.000 Hades are languages. We want to make sure we can use tools that represent it. 16:02.000 --> 16:05.000 I'm also a little bit active in English. 16:05.000 --> 16:10.000 Yeah. Sorry, I'm trying desperately not to get sidetracked, but I was like, yeah, did you know? 16:10.000 --> 16:12.000 No, no. 16:12.000 --> 16:15.000 You haven't been doing that for an instant longer than? 16:16.000 --> 16:17.000 Maybe. 16:17.000 --> 16:19.000 See me after class. 16:19.000 --> 16:22.000 So we don't just have to think about the language. 16:22.000 --> 16:28.000 And one really exciting thing is if you open this up to communities and say, hey, what's your language? 16:28.000 --> 16:34.000 You are going to get a ton of very interesting controversy around standards and what is a language. 16:34.000 --> 16:36.000 Love it, love it, love it. 16:36.000 --> 16:38.000 But also variants. 16:38.000 --> 16:43.000 So the way I speak English may be completely different from the way someone in Glasgow speaks English. 16:43.000 --> 16:48.000 Major language variants can be hugely different. 16:48.000 --> 16:51.000 And then we come down another level to accent. 16:51.000 --> 16:58.000 So having a look at how to map metadata for accents, define variants and split those out, 16:58.000 --> 17:05.000 and find a way that includes languages in a way that represents what people think of themselves as speaking. 17:05.000 --> 17:10.000 What we've done on common voice is we tend to disambiguate languages based on ISO code. 17:11.000 --> 17:16.000 So the Internet National Standards Organization often gets to choose who has and has into language. 17:16.000 --> 17:21.000 Variants we have a different BCP47 views. 17:21.000 --> 17:27.000 And for accents we've got a combination of an optional chance to set your accent. 17:27.000 --> 17:29.000 But it's also a free text field. 17:29.000 --> 17:33.000 So you get a drop down of American English, British English. 17:33.000 --> 17:36.000 Or you can type whatever it is in your heart. 17:36.000 --> 17:38.000 And people do. 17:39.000 --> 17:44.000 Right now the way we collect data on common voice is a red speech corpus. 17:44.000 --> 17:47.000 So people come up to the website, they push the button. 17:47.000 --> 17:48.000 It's very technical. 17:48.000 --> 17:53.000 And they'll get a sentence, which is always extremely normal. 17:53.000 --> 17:59.000 Like his research largely concerns the eco-physiology of legends. 17:59.000 --> 18:00.000 Like. 18:01.000 --> 18:10.000 I was going to say this is the first linguistic heckle I've gotten. 18:10.000 --> 18:13.000 But it's the first non-academic linguistic heckle I've gotten. 18:13.000 --> 18:14.000 And I quite like it. 18:14.000 --> 18:18.000 So our red speech corpus is fantastic for its simplicity. 18:18.000 --> 18:19.000 Folks can come in. 18:19.000 --> 18:22.000 They don't have to think too much about what's going on. 18:22.000 --> 18:24.000 Hey, just read the sentence. 18:24.000 --> 18:26.000 It's not great for a couple of different reasons. 18:26.000 --> 18:29.000 First of all, the sentence is extremely weird. 18:29.000 --> 18:33.000 And I think that speech recognition technology will very rarely be asked 18:33.000 --> 18:36.000 to identify the word likens. 18:36.000 --> 18:41.000 But also the way you read something and the way you say something is a very different vibe. 18:41.000 --> 18:43.000 His research largely. 18:47.000 --> 18:48.000 What are you doing? 18:48.000 --> 18:50.000 That's weird. 18:50.000 --> 18:51.000 Oh. 18:51.000 --> 18:58.000 Also, a problem that I'm so, I'm so sorry if I look really excited about like 18:58.000 --> 19:01.000 this technical problems, but they're fantastic. 19:01.000 --> 19:02.000 Oh, this. 19:02.000 --> 19:08.000 And like the language ones are even more complicated because they're people at the end of the day. 19:08.000 --> 19:11.000 So a red speech corpus as well. 19:11.000 --> 19:13.000 All you got to do is write it down, right? 19:13.000 --> 19:15.000 That's not a problem. 19:15.000 --> 19:16.000 People know how. 19:16.000 --> 19:19.000 Like once you write it down, it's not even a big deal. 19:19.000 --> 19:23.000 So for example, if I wanted to write something in Tajik, I would use the Tajik alphabet. 19:23.000 --> 19:24.000 Yeah. 19:24.000 --> 19:30.000 And that's not a problem at all until we get to the next step where this is also the Tajik alphabet. 19:30.000 --> 19:35.000 And really if I wanted to, this is also the Tajik alphabet. 19:35.000 --> 19:39.000 So looking at how to handle multiple orthographies. 19:39.000 --> 19:44.000 So different communities, different contexts may use different characters, 19:44.000 --> 19:48.000 which is one of my very favorite language problems. 19:48.000 --> 19:56.000 When and how do you give the people give folks the opportunity to switch between those? 19:56.000 --> 20:01.000 Sorry, I'm, yes. 20:01.000 --> 20:05.000 But also who here comes from an English first background? 20:05.000 --> 20:08.000 Like, I'm sorry, I need to. 20:08.000 --> 20:15.000 So from folks who come from an angle phone background, we might get very used to using the same language 20:15.000 --> 20:19.000 throughout our conversations with no code switching at all. 20:19.000 --> 20:27.000 For folks here in Brussels or for somebody in Nairobi, the opportunity to pop different languages words in and out is called code switching. 20:27.000 --> 20:30.000 And is immensely common. 20:30.000 --> 20:35.000 So one thing we've just piloted for common voice to deal with the different orthographies, 20:35.000 --> 20:41.000 to deal with code switching, to look at making it a little bit less like-any as well, 20:41.000 --> 20:48.000 is we've got a pilot coming out where we've asked people to spontaneously tell us what they think about something instead of. 20:48.000 --> 20:50.000 So instead of reading what's easy. 20:50.000 --> 20:55.000 This is a bit disappointing to give somebody who lives in Britain. 20:55.000 --> 20:59.000 Because it's a very, it's a very short. 20:59.000 --> 21:01.000 But it's fine. 21:01.000 --> 21:04.000 People instead get to record their responses. 21:04.000 --> 21:05.000 Oh, do you know what? 21:05.000 --> 21:12.000 I live with the equator, so I'm a reasonable person who is 12 hours of sun every day. 21:12.000 --> 21:16.000 But it gives folks the opportunity to talk about something they care about. 21:16.000 --> 21:20.000 It gives other folks the opportunity to transcribe it and learn about it. 21:20.000 --> 21:28.000 And folks get to speak more naturally with umms and aas and aas and aas and pops and code switching. 21:28.000 --> 21:31.000 And this could be a fantastic way to make a ton of money. 21:31.000 --> 21:38.000 You say, oh, Jess, you told me how all these proprietary data sets are really expensive. 21:38.000 --> 21:40.000 Like I did, I did, yeah. 21:40.000 --> 21:43.000 And all I got to do is these very simple things and I'll make a ton of money. 21:43.000 --> 21:45.000 I'm like, oh, cool, yeah. 21:45.000 --> 21:49.000 It's very easy. You have a good day. 21:49.000 --> 21:52.000 I wouldn't even worry about the marketing. 21:52.000 --> 21:56.000 Wait, wait, see me after class. 21:56.000 --> 22:00.000 But the big thing I would want to say is if you are using speech technology, 22:00.000 --> 22:05.000 if you're building something, please, please, please come and voice us free. 22:05.000 --> 22:08.000 It's thousands and thousands, it's tens of thousands of hours. 22:08.000 --> 22:13.000 There's no license on it. There's literally no excuse not to use it. 22:13.000 --> 22:17.000 But it's not just me sort of flogging our data set. 22:17.000 --> 22:22.000 There are so many language demographic and domain data sets out there that are free, 22:22.000 --> 22:25.000 that are open source, a lot of them are academic. 22:25.000 --> 22:28.000 In 2025, there are very few excuses. 22:28.000 --> 22:33.000 If you're building models and training models on languages, not to be adding these. 22:33.000 --> 22:38.000 Very, very free, permissively licensed data sets. 22:38.000 --> 22:44.000 And why I'm pushing this on you so hard is I like to stand up and be intense about languages, 22:44.000 --> 22:51.000 because one of you is going to come tell me about Georgian verbs after this. 22:51.000 --> 22:56.000 But if it's okay, it's not desperately uncool. 22:56.000 --> 23:02.000 I'd love the opportunity to be excited and hopeful about tech with you all. 23:02.000 --> 23:07.000 But really, I'd love to see some of you and some of you out there in the big wide internet, 23:07.000 --> 23:09.000 building speech technologies. 23:09.000 --> 23:13.000 There's a ton of open models. You can come get our data right now. 23:13.000 --> 23:15.000 I can't stop you. 23:16.000 --> 23:22.000 But also, even if you're not coming in telling me, giving feedback to people building the speech technologies 23:22.000 --> 23:26.000 and building the data sets is critically important. 23:26.000 --> 23:33.000 When I screw something up, when we screw something up, please yell at us. 23:33.000 --> 23:39.000 Oftentimes when you see well funded technologies coming out of the west, you see a rest of world mindset. 23:39.000 --> 23:42.000 Well, we're going to build this for the California market, 23:42.000 --> 23:45.000 and then we're going to come into Europe, and then do you know what? 23:45.000 --> 23:48.000 Then we're going to talk about rest of world. 23:48.000 --> 23:53.000 And that's not the way the world really works, especially if you're building open source, 23:53.000 --> 23:58.000 especially if you're building interesting, beautiful, useful things, 23:58.000 --> 24:03.000 that you hope would change the world, thinking about where and how language comes in, 24:03.000 --> 24:07.000 whether this is text, whether this is text localization, whether this is speech, 24:07.000 --> 24:10.000 is something I would like to politely beg of you. 24:10.000 --> 24:17.000 But even if you don't ever build anything with speech, please take into the world my permission 24:17.000 --> 24:23.000 to get as loud and to get as mad and to get as weird as you want when you're talking to your computer 24:23.000 --> 24:25.000 and it doesn't respect you. 24:25.000 --> 24:27.000 Thank you so much.