WEBVTT 00:00.000 --> 00:14.000 Hi everyone, my name is Sun, and today I'm going to talk about a fun project that I do like 00:14.000 --> 00:18.000 a weekend project that I stand on. 00:18.000 --> 00:24.000 Start brings Lamar Cp into the web using web assembly. 00:24.000 --> 00:30.000 So my talk will be divided into this point where I first introduce myself. 00:30.000 --> 00:36.000 Why I do this quickly more and then show you how it works and some children's 00:36.000 --> 00:40.000 that I placed on the web and my plan for the future. 00:40.000 --> 00:46.000 So my name is Sun, I'm a software engineer, a talking size, I draw a talking size that I 00:46.000 --> 00:48.000 find myself very new. 00:48.000 --> 00:54.000 And I'm one of the Lama Cp, an active engineer. 00:54.000 --> 01:02.000 Here's my shithap is NGXN, and my slogan is doing AI for fun, not for both of you. 01:02.000 --> 01:06.000 I actually had this way before, I draw an actual hacking phase. 01:06.000 --> 01:15.000 Okay, so some of my work on Lamar Cp is the first, like big things that I did for 01:15.000 --> 01:20.000 project was the chat template things, and then I had a refactorring for the 01:20.000 --> 01:24.000 support for low-rank adaptation. 01:24.000 --> 01:28.000 I'm also one of the core mentioned on Lamar server. 01:28.000 --> 01:36.000 That's now we are able to bring it into something called hardware inference endpoint. 01:36.000 --> 01:46.000 I'm one of my very, very big tasks that I'm actively doing is to reflect the 01:46.000 --> 01:54.000 monthly model, especially the vision part, and it's still going nowhere, but it's a big thing. 01:54.000 --> 02:02.000 And one of the things I want really to do is to add the web shithap you back into 02:02.000 --> 02:04.000 which is your main. 02:04.000 --> 02:08.000 And if you go to shithap, here's me by the way. 02:08.000 --> 02:13.000 Okay, so, but Sun, this is a serious thing. 02:13.000 --> 02:17.000 Why don't we do like this fun thing? 02:17.000 --> 02:21.000 So let me show you what do I mean by fun. 02:21.000 --> 02:26.000 Yeah, so you all know this guy from like 20 years ago. 02:26.000 --> 02:33.000 And now, what if I make it just a little bit smarter? 02:33.000 --> 02:35.000 Yeah. 02:35.000 --> 02:38.000 So hopefully this demo works. 02:38.000 --> 02:40.000 So it's a creepy. 02:40.000 --> 02:43.000 Yeah, nice. 02:43.000 --> 02:47.000 So the fun thing is this one directly on browser. 02:47.000 --> 02:51.000 The model is already downloaded into a local storage. 02:51.000 --> 02:57.000 It's not really a local storage, but yeah, it's being cut in the browser. 02:57.000 --> 03:01.000 And the inference is done using when assembly. 03:01.000 --> 03:05.000 Yeah, so that's the fun thing. 03:05.000 --> 03:11.000 And the high this demo is a project that I made in my free time. 03:11.000 --> 03:13.000 It's gone web lemma. 03:13.000 --> 03:14.000 Yeah. 03:14.000 --> 03:17.000 So this machine has to get started. 03:17.000 --> 03:20.000 It might be interesting. 03:20.000 --> 03:27.000 Yeah, so why I create in the first place, so long story, so long time ago, 03:27.000 --> 03:30.000 when I haven't joined, I haven't faced yet. 03:30.000 --> 03:34.000 I was a GPU for like very poor, not just poor. 03:34.000 --> 03:40.000 And then I also want to push lemma to delete me. 03:40.000 --> 03:45.000 Actually, it's the limit of my hardware that I had at the time. 03:45.000 --> 03:49.000 And so I was very inspired by whisper, CPP. 03:49.000 --> 03:53.000 The web SMD version that you can run directly on browser. 03:53.000 --> 03:55.000 And also it's just so fun. 03:55.000 --> 03:56.000 Not for fun. 03:56.000 --> 04:01.000 To make my voice, not to compete with like production ready for work out there. 04:01.000 --> 04:05.000 Like over internet or web NLM. 04:05.000 --> 04:06.000 Okay. 04:06.000 --> 04:08.000 So what is your goal? 04:08.000 --> 04:11.000 The first goal is firstly to create like a wrapper. 04:11.000 --> 04:12.000 Cheers. 04:12.000 --> 04:13.000 TypeScript libraries. 04:13.000 --> 04:16.000 That you can do a web developer. 04:16.000 --> 04:18.000 You can use this in your project. 04:18.000 --> 04:21.000 Just like running just one command, NPM install. 04:21.000 --> 04:23.000 It's strongly typed. 04:23.000 --> 04:24.000 TypeScript. 04:24.000 --> 04:25.000 And zero dependency. 04:25.000 --> 04:26.000 It's amazing. 04:26.000 --> 04:31.000 So let's show you what that means. 04:31.000 --> 04:34.000 So this little demo here that you show here. 04:34.000 --> 04:37.000 That you see here. 04:37.000 --> 04:39.000 How I made it. 04:39.000 --> 04:43.000 Under the hood is that I import this. 04:43.000 --> 04:46.000 By the way, this reaction is project. 04:46.000 --> 04:48.000 So I import it. 04:48.000 --> 04:49.000 It's a library. 04:49.000 --> 04:51.000 And then I download this model. 04:51.000 --> 04:52.000 I defy here. 04:52.000 --> 04:54.000 I just bought the model from HangiFest. 04:54.000 --> 04:57.000 Then I create a chat completion. 04:57.000 --> 05:00.000 I had a system message that say, hey, you know, 05:00.000 --> 05:03.000 we have a clip here. 05:03.000 --> 05:05.000 Okay. 05:05.000 --> 05:06.000 So that is. 05:06.000 --> 05:09.000 I also have another demo that you can see. 05:09.000 --> 05:14.000 Right on the digital page is here. 05:14.000 --> 05:16.000 And the demo is more functional. 05:16.000 --> 05:19.000 You have like a list of models that you can try. 05:19.000 --> 05:21.000 And then you can chat with it. 05:21.000 --> 05:22.000 Yeah. 05:22.000 --> 05:23.000 This is a demo. 05:23.000 --> 05:25.000 So yeah. 05:26.000 --> 05:29.000 So now the complicated path. 05:29.000 --> 05:33.000 It's a technical, like, deeply technical path. 05:33.000 --> 05:38.000 It might not be very interesting, but yeah, very unique. 05:38.000 --> 05:39.000 Okay. 05:39.000 --> 05:43.000 So that's one thing called EN script. 05:43.000 --> 05:50.000 And it's the thing that allows user to compile with 05:50.000 --> 05:55.000 or any CPP project into WebAssembly. 05:55.000 --> 05:57.000 So that should be simple. 05:57.000 --> 06:00.000 I just take EN script and then compile. 06:00.000 --> 06:01.000 Right. 06:01.000 --> 06:04.000 Turns out not that straightforward. 06:04.000 --> 06:09.000 Then I had many challenges on the way. 06:09.000 --> 06:13.000 But there's that form in challenge. 06:13.000 --> 06:14.000 That I find. 06:14.000 --> 06:16.000 And I wanted to share with you today. 06:16.000 --> 06:17.000 Yeah. 06:17.000 --> 06:21.000 So how it's work is. 06:21.000 --> 06:27.000 Firstly, from perspective, WebAssembly, everything is nice. 06:27.000 --> 06:31.000 Not a string or like a number. 06:31.000 --> 06:35.000 So what I end up doing is the first play. 06:35.000 --> 06:40.000 Is that I add a small wrapper, a team wrapper. 06:40.000 --> 06:46.000 That access JSON from JS from JS work. 06:46.000 --> 06:50.000 And then I translate it into API code, native API code, 06:50.000 --> 06:55.000 CPP, I say API code to the Lama CPP library. 06:55.000 --> 06:57.000 It's work pretty well. 06:57.000 --> 07:02.000 But then I realize that it's a lot of work because I have to, 07:02.000 --> 07:05.000 like, parse the JSON in the JS. 07:05.000 --> 07:08.000 So it's a CPP work. 07:08.000 --> 07:10.000 And it will be extremely slow. 07:10.000 --> 07:15.000 For example, when I call the tokenizer tokenized function, 07:15.000 --> 07:19.000 waste return, bunch of tokens, like thousands of tokens. 07:19.000 --> 07:20.000 It starts to slow. 07:20.000 --> 07:21.000 Slow down. 07:21.000 --> 07:24.000 So I am now moving away from that. 07:24.000 --> 07:26.000 Infliver binary protocol. 07:26.000 --> 07:29.000 Which is inspired by my protocol. 07:29.000 --> 07:30.000 But I invented. 07:30.000 --> 07:32.000 I'm not copying it. 07:32.000 --> 07:34.000 And I am inventing a new step here. 07:34.000 --> 07:35.000 Yeah. 07:35.000 --> 07:40.000 Next thing in my list is something called Defi system. 07:40.000 --> 07:41.000 Yeah. 07:41.000 --> 07:44.000 We don't know Defi for matches you have. 07:44.000 --> 07:46.000 We know when we lose. 07:46.000 --> 07:50.000 The problem is that in the early day. 07:50.000 --> 07:51.000 OK. 07:51.000 --> 07:56.000 So I just load this using the default file system. 07:56.000 --> 07:58.000 Ian Squidon. 07:58.000 --> 07:59.000 It works. 07:59.000 --> 08:02.000 And I'm using too much memory. 08:02.000 --> 08:03.000 Why? 08:03.000 --> 08:05.000 I start to die deeper into the code. 08:05.000 --> 08:09.000 And turns out the MMR function. 08:09.000 --> 08:13.000 That LMSVP will use this function for reason. 08:13.000 --> 08:19.000 Because we don't want to minimize the copy of Defi. 08:19.000 --> 08:20.000 OK. 08:20.000 --> 08:24.000 So when I look into the source code of Ian Squidon, 08:24.000 --> 08:25.000 what is that? 08:25.000 --> 08:28.000 It's exactly opposite of that. 08:28.000 --> 08:30.000 It's first the allocating memory. 08:30.000 --> 08:32.000 That is why. 08:32.000 --> 08:33.000 So what is that here? 08:33.000 --> 08:35.000 It's basically a copy. 08:35.000 --> 08:38.000 So this copy is a chunks of the file. 08:38.000 --> 08:40.000 So why is it not nice? 08:40.000 --> 08:45.000 Because we end up not knows if you load as you. 08:45.000 --> 08:49.000 If you have a 200 megabyte, you end up using 400. 08:49.000 --> 08:50.000 Yeah. 08:50.000 --> 08:51.000 So yeah. 08:51.000 --> 08:52.000 This is how it's looked like. 08:52.000 --> 08:53.000 I draw. 08:53.000 --> 08:54.000 I've re-explained. 08:54.000 --> 08:57.000 But I have just set it in the network. 08:57.000 --> 09:00.000 So it's copied the file into the worker. 09:00.000 --> 09:04.000 Where it's stored temporary into a buffer in JavaScript. 09:04.000 --> 09:08.000 And each time I go and not, it's got to be the buffer back to the 09:08.000 --> 09:10.000 HIP memory of WebAssembly runtime. 09:10.000 --> 09:11.000 OK. 09:11.000 --> 09:13.000 So what is the solution? 09:13.000 --> 09:16.000 I invent my own thing is gone HIP address. 09:16.000 --> 09:21.000 Which it used this stream API of the browser. 09:21.000 --> 09:27.000 And instead of having to temporarily write the file into a buffer 09:27.000 --> 09:29.000 inside the worker, inside JavaScript, 09:29.000 --> 09:37.000 I stream each chunk directly to the HIP memory of the WebAssembly runtime. 09:37.000 --> 09:39.000 So what now? 09:39.000 --> 09:41.000 What about a map function? 09:41.000 --> 09:49.000 So I ended up with patched function to return a pointer to directly to the location 09:49.000 --> 09:51.000 of the file in the HIP memory. 09:51.000 --> 09:55.000 So how it looks is this function. 09:55.000 --> 09:58.000 You don't need to care about the first part. 09:58.000 --> 10:02.000 Just need to care about the last two lines. 10:02.000 --> 10:05.000 Where I return the pointer is the pointer to the file. 10:05.000 --> 10:09.000 HIP pointer to the file plus the position in the file. 10:09.000 --> 10:11.000 So I want to map. 10:11.000 --> 10:13.000 So it's just return a pointer. 10:13.000 --> 10:15.000 Not a copy. 10:15.000 --> 10:16.000 OK. 10:16.000 --> 10:19.000 Next thing is apply storage. 10:19.000 --> 10:20.000 OK. 10:20.000 --> 10:24.000 So we never want to like each time you run the file. 10:24.000 --> 10:25.000 You run it. 10:25.000 --> 10:30.000 You need to read out loads of how one shaker by model or something. 10:30.000 --> 10:35.000 So in the first day, I used something called catch storage, 10:35.000 --> 10:38.000 which is a very nice thing. 10:38.000 --> 10:40.000 It's easy to use. 10:40.000 --> 10:43.000 Then the storage is very limited. 10:43.000 --> 10:45.000 And also it does not support stream. 10:45.000 --> 10:47.000 It's time I read something. 10:47.000 --> 10:48.000 It's in my copy. 10:48.000 --> 10:49.000 First it is load. 10:49.000 --> 10:52.000 If I into browser memory. 10:52.000 --> 10:53.000 OK. 10:53.000 --> 10:56.000 So I turn my attention to something called index dv. 10:56.000 --> 10:58.000 Yes, it's better. 10:58.000 --> 11:02.000 Not the problem is that it's actually, at least on 5th of, 11:02.000 --> 11:05.000 it's actually stored when it's stored to this. 11:05.000 --> 11:08.000 It's stored inside an SP light library. 11:08.000 --> 11:11.000 It lasts not support stream. 11:11.000 --> 11:15.000 Maybe it lasts, but it requires hacking. 11:15.000 --> 11:18.000 The next thing is that it does not have like a hat, 11:18.000 --> 11:21.000 a maximum capacity. 11:21.000 --> 11:22.000 Yeah. 11:22.000 --> 11:24.000 But still, it doesn't have stream. 11:24.000 --> 11:27.000 So it does not benefit to me. 11:27.000 --> 11:32.000 So at the end, I turn my attention to something called OPMS, 11:32.000 --> 11:34.000 or it's in private file system. 11:34.000 --> 11:37.000 So what is there is that it actually, when it's used, 11:37.000 --> 11:41.000 a fun thing is that when you store the file into OPMS, 11:41.000 --> 11:45.000 it actually stores the file as a real file on file system. 11:45.000 --> 11:46.000 On your list. 11:46.000 --> 11:50.000 And I reverse engineering it to know, I can show it to you. 11:50.000 --> 11:51.000 Yeah. 11:51.000 --> 11:53.000 That is a soft stream. 11:53.000 --> 11:55.000 Yes, either an I will show you. 11:55.000 --> 11:56.000 It's very cool. 11:56.000 --> 11:57.000 Okay. 11:57.000 --> 12:01.000 And it also does not have a maximum capacity. 12:01.000 --> 12:04.000 Most browsers just cancel this in my capacity. 12:04.000 --> 12:08.000 Most on the free spy on your list. 12:08.000 --> 12:09.000 Okay. 12:09.000 --> 12:14.000 So how it looks like now is that when I open the file, 12:14.000 --> 12:17.000 the browser gives me a file object. 12:17.000 --> 12:21.000 I think a bit like file is greater, not a copy of the file. 12:21.000 --> 12:23.000 It's like a pointer to the file. 12:23.000 --> 12:24.000 Yeah. 12:24.000 --> 12:28.000 And then I can stream it back to the browser. 12:28.000 --> 12:32.000 Sorry, to the worker to the where I think we won't have. 12:32.000 --> 12:34.000 So there will copy. 12:34.000 --> 12:35.000 Yeah. 12:35.000 --> 12:36.000 Yeah. 12:36.000 --> 12:40.000 So how that is looked like, maybe a little bit more in the, 12:40.000 --> 12:42.000 just look at the function signature. 12:43.000 --> 12:48.000 It returns a promise of the file on in the case. 12:48.000 --> 12:50.000 For example, the file does not exist. 12:50.000 --> 12:54.000 But the most important thing is that this returns this file. 12:54.000 --> 12:55.000 Okay. 12:55.000 --> 12:58.000 So what's a very cool file. 12:58.000 --> 13:00.000 This is that now. 13:00.000 --> 13:04.000 I cannot only just use the, 13:04.000 --> 13:07.000 the file provided by OPS. 13:07.000 --> 13:10.000 But I can also load my own file. 13:10.000 --> 13:14.000 I can also load my own file and I can also load my own file. 13:14.000 --> 13:18.000 Each browser, this button on the file, when you want to upload some file to 13:18.000 --> 13:19.000 Internet. 13:19.000 --> 13:23.000 If I see browser returned a file object with a stream. 13:23.000 --> 13:26.000 Yeah. 13:26.000 --> 13:27.000 Let's go. 13:27.000 --> 13:30.000 But then it's too slow. 13:30.000 --> 13:32.000 Even with. 13:32.000 --> 13:34.000 This flag, this two flag. 13:34.000 --> 13:36.000 The first flag is to enable the support for something, 13:36.000 --> 13:43.000 single instruction manipulator, and the second flag is to activate the 13:43.000 --> 13:52.000 lever trees, lever tree optimization of the compiler, which should allow 13:52.000 --> 13:56.000 something complexurized. So what does it mean by 13:56.000 --> 14:01.000 vectorized? That is it pick up on the loop and it try to use 14:01.000 --> 14:08.000 the same thing on that. But the second flag failed to do that. 14:08.000 --> 14:12.000 The solution is obvious to rely on this function in 14:12.000 --> 14:16.000 Cindy. Sometimes you might already know what I'm going. 14:16.000 --> 14:21.000 Yeah, especially. But then I only have one 14:21.000 --> 14:24.000 thunder and my thunder does not just with the computer. 14:24.000 --> 14:27.000 I also have a wife. Please. 14:27.000 --> 14:40.000 Okay, so. Yeah. I just asked it. And that works just 14:40.000 --> 14:48.000 kind of maybe. But at least two-ex speed, double the speed on 14:48.000 --> 14:53.000 most of the quantization. And almost triple the speed on 14:53.000 --> 14:56.000 the quantization to do it. Which is this? 14:56.000 --> 15:00.000 It's a bit quantization. Okay. 15:00.000 --> 15:03.000 Something I plan to do in the future. Firstly is the 15:03.000 --> 15:08.000 wedge CPU. What is still very early development? 15:08.000 --> 15:12.000 I still have a lot of trouble with this. But I have a very simple 15:12.000 --> 15:16.000 POC pull-up on set for now. I have to revise the kernel 15:16.000 --> 15:19.000 in something complex and please share the language. 15:19.000 --> 15:24.000 I have a fixed-down issue on mobile. For example, most important 15:24.000 --> 15:29.000 is about iOS device. Then I also want it to 15:29.000 --> 15:32.000 compatible with something called WRC, which is where 15:32.000 --> 15:36.000 assembly system interface. We should allow it to just 15:36.000 --> 15:39.000 for example, read the file directly from the file 15:39.000 --> 15:42.000 system retail. Here I'm integrating it to run 15:42.000 --> 15:46.000 with our browser. Yeah. I will also want to 15:46.000 --> 15:49.000 support the lower adapter, which should be 15:49.000 --> 15:51.000 something very simple to do. Just I don't get 15:51.000 --> 15:54.000 as a time. Yeah. It's a weekend for the 15:54.000 --> 15:57.000 better way. It's still here. Okay. And last thing I 15:57.000 --> 16:01.000 want is, it's not the last thing. It's just 16:01.000 --> 16:04.000 last thing here is the list. But the support for 16:04.000 --> 16:08.000 multi-modal is still waiting for my 16:08.000 --> 16:13.000 refactoring on LAMACP. But yeah, I want it to be 16:13.000 --> 16:17.000 very seriously. So, take care. I'm taking my 16:17.000 --> 16:20.000 time on this. So, that is. Thank you for your 16:20.000 --> 16:28.000 attention. Thank you. Thank you very much. We have 16:28.000 --> 16:32.000 time for one question. If somebody has the 16:32.000 --> 16:36.000 question and enough energy to ask it. Okay. Thank 16:36.000 --> 16:39.000 you very much. Thank you. Thank you. Thank you. 16:43.000 --> 16:48.000 Thank you. Thank you. Thank you.