WEBVTT 00:00.000 --> 00:15.000 Thank you everyone, hi, I don't know if the microphone is, yeah, it seems to be working. 00:15.000 --> 00:19.000 First of all, thank you for coming to our talk. 00:19.000 --> 00:23.000 I hope we can get you excited about the work that we've been doing. 00:23.000 --> 00:27.000 Ujbo and he's emily, eloquently introduced us. 00:27.000 --> 00:35.000 And here's me trying to make something out of the first draft of the methods format to 00:35.000 --> 00:36.000 spec. 00:36.000 --> 00:42.000 It's been a while since we started work on this. 00:42.000 --> 00:49.000 So I'm from New Delhi in India and I live in Akorunia, it's a fun city. 00:49.000 --> 00:54.000 I love open source software, the web platform, it has taught me everything that I know. 00:54.000 --> 00:59.000 And I hope that we can localize it to serve more people. 00:59.000 --> 01:06.000 I also like badly designed video games and other things. 01:06.000 --> 01:09.000 And I work at Egalya. 01:09.000 --> 01:14.000 Hi, I'm Emily. 01:14.000 --> 01:19.000 And I'm working on localization at Mozilla. 01:19.000 --> 01:26.000 And as far as I know, I am the only person to work on localization and localization. 01:26.000 --> 01:33.000 These are the robotic underwater floats that my dissertation work is based on. 01:33.000 --> 01:36.000 They have absolutely nothing to do with today's talk. 01:36.000 --> 01:42.000 I just like to be proud of the fact that I work on both localization and localization. 01:42.000 --> 01:50.000 So one challenge that I like, the many challenges with localization these days. 01:50.000 --> 02:00.000 And the first part of this is about presenting some of those that we hope to be able to be solved or made more solvable with message format too. 02:00.000 --> 02:04.000 Then we're going to go on to talking about what is actually message format too. 02:04.000 --> 02:05.000 What does it look like? 02:05.000 --> 02:11.000 And then we're going to continue with some of the next steps of what can be done here. 02:11.000 --> 02:20.000 So a big challenge really of where we are today is that localization is very much siloed. 02:20.000 --> 02:23.000 These are very old silos, but you get the point, I hope. 02:23.000 --> 02:30.000 Where you end up picking a solution for localization mostly often based on developing needs, 02:30.000 --> 02:36.000 develop a desires and develop a presumptuous about how translators and translation works. 02:36.000 --> 02:44.000 And then you kind of just go with it because siloes can be old or they can be newer, 02:44.000 --> 02:49.000 but they're still siloes that we are mostly living in. 02:49.000 --> 02:54.000 And this becomes difficult to move from one solution to a different solution. 02:54.000 --> 03:04.000 You kind of stuck with it and the features of the localization systems that you end up with are kind of incidental. 03:04.000 --> 03:10.000 And they don't solve all the problems that. 03:10.000 --> 03:12.000 No, no, it was my question. 03:12.000 --> 03:16.000 They don't solve all the problems that we actually have. 03:16.000 --> 03:20.000 And many of those are actually honestly hard problems. 03:20.000 --> 03:24.000 I could not think of a good image to go for. 03:24.000 --> 03:32.000 Infection is for those of you not so deep into localization and internationalization. 03:32.000 --> 03:42.000 Infection is effectively needing when you have a message to transform some of the words depending on other words in the message. 03:42.000 --> 03:45.000 This doesn't happen so much in English. 03:45.000 --> 03:47.000 For example, finish. 03:47.000 --> 03:50.000 Finish does like. 03:50.000 --> 03:51.000 We do suffixes. 03:51.000 --> 03:56.000 Our suffixes of suffixes and those can also have suffixes that they can go on. 03:56.000 --> 04:01.000 And then you can also of course change the stem based on the any gets a bit complicated. 04:01.000 --> 04:05.000 In English, too, you have differences between A and Anne. 04:05.000 --> 04:09.000 So is it A historical account or Anne historical account for instance? 04:09.000 --> 04:14.000 So there's no not even a strict rule based solution for how this works. 04:14.000 --> 04:18.000 And these sorts of problems are difficult to solve. 04:18.000 --> 04:22.000 Message format is hopefully providing some of the better solutions here. 04:22.000 --> 04:30.000 Another dimension in which message format is hopefully providing a better solution. 04:30.000 --> 04:45.000 The better solution is when we have messages that you need to localize but they depend on not just one thing like the plural category of whether you have one message or three messages. 04:45.000 --> 04:52.000 But also when you need to take into account grammatical gender or other aspects at the same time. 04:52.000 --> 05:03.000 And you end up with the sort of either a tree structure or a matrix structure of all of these choices of what it is that you're actually localizing and how this works in different languages. 05:03.000 --> 05:14.000 And the dimensions that you might need for your source language are not necessarily at all the dimensions you need for your target language and it gets complicated. 05:14.000 --> 05:17.000 So we've been trying to solve this. 05:18.000 --> 05:34.000 And the way we've been working on a solution is through standardization through finding and determining a good new standard for localization that we can build a solution or solutions on top of. 05:34.000 --> 05:36.000 We're calling that message format too. 05:36.000 --> 05:42.000 This work has been ongoing in the Unicode to Sultium under the CLDR. 05:42.000 --> 05:49.000 Oh, this is a lot of acronyms here. I've got a skip for actively for about the last five years. 05:49.000 --> 06:01.000 But for example, a lot of this work, the impetus for the current work came from the JavaScript side of things where we want something like in the JavaScript input message format to exist. 06:01.000 --> 06:16.000 And the current proposal for what the Intel message format API looks like looks an awful lot like the proposal we have for it in 2013 when we decided we weren't quite ready to do this yet. 06:16.000 --> 06:23.000 So even at the rate of standards, this is slow, but you know, we're getting there. 06:24.000 --> 06:32.000 And with message format, too, we're hoping to get the final candidate out within literally a few months from now. 06:32.000 --> 06:34.000 So we're working on it. 06:34.000 --> 06:44.000 But what is this message format, too, that you know, is going to solve all the problems of localization in the world. 06:44.000 --> 06:48.000 And that, as well, you might take the lead. 06:49.000 --> 07:14.000 So having been done with nuance and all of that, let's go into the dirty details and talk about how not only the syntax looks, but how message format essentially approaches these complex problems that we talked about in localization and you know, give the most ergonomic tools to developers and translators a life. 07:14.000 --> 07:26.000 So here's a very simple message. It just says hello fast and this is how it looks in sort of the energy data model that we have. 07:26.000 --> 07:40.000 And if you've done localization before, you might notice that this is like maybe 90% of what you do ever, right, like every single static string on your interface is basically a simple message. 07:41.000 --> 07:52.000 So this doesn't need to be more complex than this and we try to keep it as simple as we could by not having any syntax for this. 07:52.000 --> 08:04.000 But once you start going through some of the more complex things by which I mean placeholders, you can have variables in your text. 08:04.000 --> 08:14.000 So you could have some kind of simple dynamic text which replaces a variable or you can have markup in these. 08:14.000 --> 08:25.000 So for instance, something like this, if you notice since we're in the inclusive web room, I think markup means something specific to you. 08:25.000 --> 08:36.000 But in, you know, something that has at least the aspirations of message format too, we cannot make very strong assumptions about markup. 08:36.000 --> 08:47.000 So the markup has been made to be as generally applicable as possible, but there's certainly some patterns that you can see. 08:47.000 --> 09:01.000 But yeah, these markup elements can be either an opening element, a closing element or some sort of standard loan element that doesn't need to be closed. 09:01.000 --> 09:14.000 Next we can have more complex expressions, so we already saw that one earlier just simple variable replacement, but you can have more interesting things in your expressions. 09:14.000 --> 09:22.000 So here you can see that we're calling a function, it's a number of function with a style. 09:22.000 --> 09:29.000 And since that style is currency, we're supplying the currency. So it's going to format it keeping all of these things in mind. 09:29.000 --> 09:35.000 These function calls are pretty relevant to our work. 09:35.000 --> 09:46.000 Basically, we hope to provide people the most common interpretations that they might need as sort of building blocks for more complex applications. 09:46.000 --> 09:56.000 And we also give them the ability to come with their own functions, so they can customize these functions or build entirely new functions. 09:57.000 --> 10:19.000 But yeah, they form sort of the building blocks of these various operations that you can do and express things in the mesh format syntax and sort of a combination of all of these different things different place holders, maybe some markup elements forms what we call a pattern. 10:19.000 --> 10:32.000 So anything like this sort of relatively simple message is a pattern and it's sort of a combination of various strings, some expressions maybe and some markup elements. 10:32.000 --> 10:42.000 And the most important thing about patterns is that at the end of the day every single message in message format is the result of formatting a pattern. 10:42.000 --> 11:00.000 You can either be a simple pattern like this or it can be one of many patterns. What do I mean by that is you can have matters. So here we have a variable call come. 11:00.000 --> 11:13.000 Because this variable is being supplied by whatever workflow you have and what you can do is have a match statement with this. 11:13.000 --> 11:21.000 In this case, since we are using number, it defaults to doing plural selection, but there's various different things you can do with that. 11:21.000 --> 11:33.000 I mean, you can do simple string matching or you could do things like ordinal matching all of these different rules that exist in a language. 11:33.000 --> 11:50.000 You could sort of build different patterns for each of the cases that you want to handle and then you can write a message using a matcher so that it finds out which case applies to your message and formats that pattern. 11:50.000 --> 12:03.000 So as you can see, we can have different variants and each of these variants come together as a select message where the final result is one of the different variants. 12:03.000 --> 12:08.000 So, okay, well over to you. 12:08.000 --> 12:16.000 So one question to ask here, what comes next? 12:16.000 --> 12:45.000 After these initial steps, we have a syntax here. You saw some of that. You're not expected to remember all of it, but just we've done a lot of work here and we hope that it is a sufficiently good syntax that is going to get adoption, get used and get to get to be. 12:46.000 --> 13:14.000 Because ultimately what we have now is the behavior of a single message, but if you have a single message that is localized, often you have not one but a whole pile of messages or possibly relating to each other that need to to be somehow defined together. 13:14.000 --> 13:24.000 And for this work, we are kind of bootstrapping and getting onwards with a message resource specification. 13:24.000 --> 13:36.000 That is going to define how specifically a message format two message is best placed together with others in a file or otherwise. 13:36.000 --> 13:56.000 The message format two can be embedded in pretty much everywhere in any format that currently allows for abandon messages in different formats, but it's got properties like being naturally multi line in places that make it not sit in quite so well. 13:56.000 --> 14:11.000 And other needs that we honestly don't have the time to get into, but it's also got we've identified a a lack in that the metadata about a message is not very well defined in the general case. 14:11.000 --> 14:24.000 There are some specific instances where there's decent definitions of what is the context for a message coming from a developer or otherwise providing information for translators. 14:24.000 --> 14:31.000 But for this we need to define a better system as a part of the message resource definition work. 14:31.000 --> 14:52.000 And to do that we have been building these tools and are providing tools that hopefully are going to be used to pull things together, but we're not providing really a ready solution here really we're providing building blocks. 14:52.000 --> 15:09.000 For for you to put the solutions together where we're not providing a monolith that does everything that you need we're doing kind of improvements on the angle. 15:09.000 --> 15:13.000 He's very eager. 15:13.000 --> 15:19.000 To do also we didn't practice this bit as well as we should have. 15:19.000 --> 15:33.000 So putting messages together is challenging sometimes and part of that is because we think messages are complicated and complex, but they're actually not. 15:33.000 --> 15:48.000 One thing that was was visible in the earlier parts where we was showing you the the syntax we also have this data model definition of what does the syntax mean. 15:48.000 --> 16:02.000 And one benefit that might not be obvious and was not really an intended goal that we we had when we started but just did identify during the work of message from our two is that the data model we have for messages. 16:02.000 --> 16:12.000 Is actually really good for representing all messages in all formats that currently exist like this so you can. 16:12.000 --> 16:29.000 Parts effectively messages from any localization format used by anyone into this one data model and retain all of the information that you have there originally and then move on was from there. 16:29.000 --> 16:47.000 So we kind of trying with that to also provide a building block for it systems to be put together to provide the solutions provide better monoliths better better silos because. 16:47.000 --> 17:00.000 Ultimately we are not really here to build a provide you with with a solution for your silo we're going to make the internals of your silo work better because it's going to be able to. 17:00.000 --> 17:23.000 Even if you currently don't use message format to in your messages because honestly none of you do yet you could parse your messages into a message format to data model and even have the runtime formatting use a message format to runtime to do so or do all sorts of other operations for which. 17:23.000 --> 17:33.000 We are building tools other people are are building tools and and we're looking to to kind of build a whole ecosystem here and. 17:34.000 --> 17:40.000 One of the ultimate places where we're kind of hoping to go here is to be able to. 17:40.000 --> 18:02.000 Bring message format to a message resources to be directly in HTML so that in HTML you could declaratively say that this file depends on these localization messages much like we have CSS defining these days and then within the HTML say that this item. 18:02.000 --> 18:23.000 You use this message reference to to to localize to provide the content for for this bit and that that's kind of cool but this of course depends on a whole stack of standards which we have some of it really ready some of it less ready. 18:23.000 --> 18:33.000 And another interesting direction that you know this is going to enable is consideration of what happens with. 18:33.000 --> 18:43.000 Translation memory that is systems that tell you that this thing that you're you're localizing translating. 18:44.000 --> 18:53.000 It's been localized before and this is how it was localized before so translators don't need to repeat the same work over and over again. 18:53.000 --> 19:10.000 And with the one data model representing all messages from all formats it's entirely doable to to consider improvements on on matching messages not just in this format that you've done but what you have previously and also. 19:10.000 --> 19:22.000 Improved how do you do matching when the message is multi variant it's got multiple one or multiple things it depends on and and built from there. 19:22.000 --> 19:39.000 And we're nearly done with the first part of this which is the message format to syntax and and here's a very small call to action in that you still have this small window opportunity to tell us that we're been doing this all wrong. 19:39.000 --> 19:49.000 Because the message format to spec is still under review this means that if you. 19:50.000 --> 20:05.000 The first one there is the to the message format working group if you tell there that this thing is wrong and you should fix this thing or that thing we might be able to still take you into account will say we're not going to do it because you know. 20:05.000 --> 20:13.000 The reason this and reason that because honestly we did not this for like five years and we've talked about many things. 20:13.000 --> 20:16.000 But you still have. 20:16.000 --> 20:28.000 A little bit of time before we finalize the spec to make sure that we take into account whatever we didn't take you into account or might not have taken into account. 20:28.000 --> 20:37.000 And there's a lot of other work ongoing there's this slides a link to from the. 20:37.000 --> 20:43.000 From the first presentation so you don't need to take a photo this but you can. 20:43.000 --> 20:44.000 There's the. 20:44.000 --> 20:57.000 I mentioned about inflection that we're not providing a ready solution for inflection for anything because that depends on a lot of data about the unique code consulting is booting up a working group. 20:57.000 --> 21:01.000 And so on quite a bit of data shared by Apple in particular. 21:01.000 --> 21:15.000 And a system for working with inflection but that of course requires a whole pile of data per language in order to make work so it's it's not a universal solution in all systems. 21:15.000 --> 21:23.000 The message format info message format proposal is is proceeding in T.C. 39 for JavaScript. 21:23.000 --> 21:29.000 That's another place where you might be interested to look at and voice your thoughts. 21:29.000 --> 21:38.000 The message resource working group is hopefully going to get finally transferred to the unicode organization relatively soon. 21:39.000 --> 21:57.000 But from there there's a JavaScript implementation of the message format to spec that is pretty much complete and there's a set of tools in Python that we've been working on for our localization needs specifically at Mozilla. 21:57.000 --> 22:03.000 It doesn't do formatting but it does a lot of everything else you could imagine doing with resources. 22:03.000 --> 22:05.000 That might be interesting. 22:05.000 --> 22:10.000 Have we, are there any more things we have to point? 22:10.000 --> 22:34.000 Oh yeah and then the ICU is going to the internationalization components for unicode provided by unicode is together with the release of the message format to spec going to provide format for that for the C++ and Java implementations and the ICU for X rust implementation upcoming as well. 22:35.000 --> 22:40.000 But yeah that is that's it for us. 22:40.000 --> 22:43.000 Thank you for your time and attention.