WEBVTT 00:00.000 --> 00:08.320 All right everyone, Jen Looker here is going to be talking about 00:08.320 --> 00:13.560 porting GGML to Nux Kernel Development Frameworks. 00:13.560 --> 00:16.320 Are we ready? 00:16.320 --> 00:17.320 Nope. 00:17.320 --> 00:18.320 Just staring around. 00:18.320 --> 00:19.320 There we go. 00:19.320 --> 00:20.320 All right everyone. 00:20.320 --> 00:25.600 I give Jen Looker a round of applause. 00:25.600 --> 00:35.600 Okay everyone, my name is Jen Looker. 00:35.600 --> 00:39.680 The title will make sense as I go along. 00:39.680 --> 00:45.880 About me, while the older I get, the harder it gets to say what my specialization is. 00:45.880 --> 00:52.520 I've been working very skills, but whatever I do, I tend to naturally lift towards 00:52.520 --> 00:56.840 the border between software and hardware, it might do. 00:56.840 --> 01:03.680 I'm constantly working for a RIFI startup, but it's employers, I express my collective 01:03.680 --> 01:09.640 personal way to express my collective madness to the lockdown was building analog synthesizers. 01:09.640 --> 01:15.200 Well, it is my personal project, so I has nothing to do with my past or current employer. 01:15.200 --> 01:17.840 And so about this talk. 01:17.840 --> 01:25.760 So this talk is mostly actually about how to run GGML on a constrain environment. 01:25.760 --> 01:31.360 And by the title, I guess your question is what is Nux, and that's the first things I 01:31.360 --> 01:34.160 will try to answer. 01:34.160 --> 01:39.640 And then once I get what Nux says, like hardware and which is a constrain environment to run 01:39.640 --> 01:42.000 on better. 01:42.000 --> 01:46.560 The kernel, the Nux Kernel is a kind of space component, so you know, it handles interrupt 01:46.560 --> 01:51.120 exception and requests from the user space, and there's a proper user space, but usually 01:51.120 --> 01:56.640 you can, you know, you essentially have what you do with people called program. 01:56.640 --> 02:02.640 And yes, the request of this call and the model that I have at the start from any web 02:02.640 --> 02:08.360 notes is that you just have one user space learning, but of course the kernel can support 02:08.360 --> 02:12.720 creating new software, new programs, new space and go. 02:12.720 --> 02:13.720 Right. 02:13.720 --> 02:15.920 So what's our care looks like? 02:15.920 --> 02:23.120 Our care is again, I want to make it portable, so the way I did it, I made it an alphabet 02:23.120 --> 02:28.480 loader, so health is the binary system, and the two were away to see a Nux. 02:28.480 --> 02:33.440 One is the link view of a NL, which is based on segment, but then there's the loader 02:33.440 --> 02:38.600 way to the loader view of a NL, which is based on program ahead of. 02:38.600 --> 02:45.000 So what I do is that when you load the kernel, there was special program methods that 02:45.000 --> 02:50.200 tells where the kernel is, one's things like the frame buffer or the information page or 02:50.200 --> 02:55.440 the memory map and other data, so it's completely separate as a way, but in this also 02:55.440 --> 03:02.400 very portable, so what I had is our call library, and then I have various machine-dependent 03:02.400 --> 03:06.640 platform-dependent functions, and then you can just like the call plus one of these, you 03:06.640 --> 03:12.680 know, gray boxes, which can move to NFI or to booth on SPI, and right now targeting mostly 03:12.680 --> 03:20.600 it is 564, and it is exported to N64, big, so the second part, so after I load the 03:20.600 --> 03:24.960 bootloader, I do have a kernel, the kernel, what is a kernel? 03:24.960 --> 03:32.200 Well, in the most abstract way, a kernel is executable, that is loaded by the bootloader, 03:32.200 --> 03:36.760 and essentially reacts to various events that happen in the system. 03:36.760 --> 03:41.280 So in order to achieve that and making that simple as possible to build a new kernel, because 03:41.280 --> 03:46.640 there was my goal, I have, of course, like two libraries that are dependent on the platform, 03:46.640 --> 03:52.160 one is the hull, which is up to a station layer, but it's mostly meant to boost up the 03:52.160 --> 03:57.360 system, so I've got essentially jumps into the beginning of the, of lib hull, and then 03:57.360 --> 04:02.800 essentially abstract the CPU away, and then there's the platform, the platform is the part 04:02.800 --> 04:09.680 that controls mostly, the intercontroller and now, and provides a timer, because this is 04:09.680 --> 04:17.880 usually a fundamental part, again, I suppose, it's 36, 32-bit and 64-bit, and this five in 04:17.880 --> 04:26.200 both of them, and mostly, so for example, the SIPI version for this five is coming, but 04:26.200 --> 04:27.200 it's not ready yet. 04:27.200 --> 04:31.240 And then, of course, I have our portal, lib hull, which is the interface, which usually 04:31.240 --> 04:36.320 the custom kernel code that you write interface with, and, you know, it provides all the 04:36.320 --> 04:40.120 stuff, which you need when you programming a kernel, like, you know, memory-allocates of 04:40.120 --> 04:45.080 mapping, you know, they use a stack frame to switch threads, and, of course, the page table 04:45.080 --> 04:50.280 to switch page table and global, you know, the global memory currency, and, you know, panics 04:50.280 --> 04:56.880 which is what you mostly run when you run a your own kernel, okay, so what is like to be 04:56.880 --> 05:01.000 the kernel for people interested in, in nox? 05:01.000 --> 05:06.720 Well, the way I see it, as I before, essentially, is a kernel code that is a seed programming 05:06.720 --> 05:11.040 in this case, but you can be voted to new things, the essential reacts to various events. 05:11.040 --> 05:16.480 One is the timer, the other is the intercontroller exceptions, the Q-sup programming device, 05:16.480 --> 05:20.920 which is a seed-school, and, essentially, in order to write kernel under nox, you just 05:20.920 --> 05:26.160 need to implement these functions, and, there you go, and then you can just boot and put 05:26.160 --> 05:27.160 in things. 05:27.160 --> 05:33.320 Of course, having a kernel without a user space, I can call the user devices over a little 05:33.320 --> 05:35.960 news, although this demo is all about this. 05:35.960 --> 05:41.440 And so, I also provide a way to have a standard way to program the, at least the beginning 05:41.440 --> 05:46.000 of our user program, and so, essentially, there's a lip nox user that just ups out the 05:46.000 --> 05:53.000 C-scopes mechanism that a kernel use, and, yeah, that's actually what I'm doing, yes, 05:53.000 --> 06:02.520 as I said earlier, there's only one user space at the beginning, and, yeah, and so, yes, 06:02.520 --> 06:05.600 of course, as I said, like if you want to run multiple programs, the kernel, of course, 06:05.600 --> 06:10.520 need to have a four-core anexic or as powerful whatever you need. 06:10.520 --> 06:15.320 Right, this is important, despite the minor details, because it would be important for the 06:15.480 --> 06:16.320 test. 06:16.320 --> 06:22.040 The talk, there's an anxangiros, it's a call it, and it's a lip, and by the C, that is a C-lab 06:22.040 --> 06:27.040 rate that I call it together, mostly taking bits of net BSD, and right in the stuff in 06:27.040 --> 06:28.040 a simple way. 06:28.040 --> 06:32.760 It's simply a very small lip C, because in order to create a binary, usually, during 06:32.760 --> 06:38.600 the C-run time, which is certainly C, or mostly, and then C, at the end for the various 06:38.600 --> 06:43.760 constructor, I live C, and then the C. So, you know, you have a main function, the main function, 06:43.760 --> 06:48.280 actually, is called by the C, at the C, or the C, or T in general, and then usually have 06:48.280 --> 06:52.000 the various function, like the interface, and stuff that are part of the lip C. And so, 06:52.000 --> 06:55.840 this is the part that you need to create a binary, and if you go back, everything goes 06:55.840 --> 07:01.520 things that I've explained, from this, from the user space, to the kernel, to the arcade, 07:01.520 --> 07:07.280 itself, the whole binary. So, this is, this is the call, this is really the call, 07:07.280 --> 07:11.040 actually, what makes noxveres it bought. 07:11.040 --> 07:18.760 Right. So, another we have, this idea of what noxes, how do I, actually, run GGML on it? 07:18.760 --> 07:25.040 So, let's start a look at what GGML is, at least in my view, so in the most simple way. 07:25.040 --> 07:32.080 Right. The way I think personally, in GGML, not definitely an expert, is that, in order 07:32.080 --> 07:38.160 to build the most minor, minor minimal GGML, I have built, essentially, you can look at 07:38.160 --> 07:42.800 this components, I don't know how to call, but this is how I abstract them. For one day, 07:42.800 --> 07:47.680 there's definitely a part of my abstraction, which is something that, you know, like GGML's 07:47.680 --> 07:53.600 time, or GGML, all of the functions that actually abstract the actual low level call, 07:53.600 --> 07:59.680 then there is utility functions, which are, you know, there's a lot of functions to do 07:59.760 --> 08:05.360 file I here, also for GGML, for dot, for our files of loading models and things like this. 08:06.320 --> 08:11.120 And then there's the function when you actually build the model, so you do, you build the 08:11.120 --> 08:15.520 graph, then there's a table implementation that is the one that actually start programming, 08:16.160 --> 08:22.640 running the compute task, then there's the GGML's function, and this is the, I'm not talking 08:22.640 --> 08:27.680 about, I'm just talking about, you know, CPU executions, so I'm not talking about back-end 08:27.760 --> 08:32.240 stuff like this, but in general, this is the model that I have for GGML and that I'm going to use, 08:32.240 --> 08:38.880 you know, right? So let's have a look at the software side of it, what are the dependencies? 08:38.880 --> 08:44.960 Well, GGML is written in C and at least this part of it, in C, and I'm very minimal set of C 08:44.960 --> 08:50.880 plus plus, luckily. And so usually this is what you get when you try to compile GGML, 08:51.840 --> 08:57.440 there is some standard C, like there's definitely a lib C, that's of course shockingly, 08:57.440 --> 09:02.880 mathematical operations, and usually the lib C doesn't implement this floating point of 09:02.880 --> 09:08.640 relation, that's sort of standard function called lib M, you need, for the C plus plus part, 09:08.640 --> 09:12.800 you need usually like standard C plus plus library and the C plus plus one time, 09:14.160 --> 09:19.840 uses the title, uses P2, and actually as a cell, I will say later, it's actually quite difficult 09:19.840 --> 09:28.640 to separate both of them, and then GGML, GGML, time, uses a POSIC, so, you know, it's a more unique 09:28.640 --> 09:33.920 specific, of course, to do things, and it's like locked monotonic, just back to the POSIC standard. 09:34.800 --> 09:41.120 The venues that I had, that made this so possible with these, were as the fact that the subset 09:41.120 --> 09:46.080 of the C plus plus standard library is very small, is mostly back to the narrator's end, 09:46.160 --> 09:49.600 you know, going to get the crazy stuff, that C plus plus can give to programmers. 09:50.880 --> 09:57.760 Right, so what does it take to port GML to nugs? Let's have a look again, another, you know, 09:57.760 --> 10:03.040 how the software of nugs is architected, what do you get when you're actually running a system 10:03.040 --> 10:09.680 under nugs, so after you've rooted? And, well, we support the MPs, so we have multiple CPUs, 10:09.760 --> 10:17.280 and in modern architectures after the 361 will say, the focus is mostly on chupy-vierage level. 10:17.280 --> 10:21.280 One is the kernel, and the other is the user. And, as I said, 10:23.200 --> 10:29.680 when we put on the nugs, when I calculate, when I calculate this, nugs will create, 10:30.480 --> 10:35.840 we will have like a kernel image, and I usually match on each CPU, each being the same. 10:36.800 --> 10:42.480 And, of course, you can run different code for HPP, you can specialise it, but like at the beginning, 10:42.480 --> 10:47.040 you give this to welfare, you will be sure that HPP has the same alpha mark everywhere. 10:48.880 --> 10:55.520 Right, so, when I found myself in today's, and I got GML link and say, how would I like to run? 10:56.240 --> 11:02.720 And, this layout is increasingly flexible, because you can do everything, and so, I wanted to create 11:02.720 --> 11:07.680 a compute platform, because I just wanted to say, okay, let me see what I can get, if I can run today. 11:07.680 --> 11:14.960 So, I decided that I actually like the idea to dedicate entire CPU that runs and interrupted, 11:14.960 --> 11:22.800 inside the system, so, instead of having threads that get created, and so, locally, I can just simply 11:24.320 --> 11:29.440 dedicate entire CPU, ask compute threads. And then, of course, it would be very useful, 11:29.440 --> 11:32.800 as if you have this machine that boots and start doing this calculation without the ability 11:32.800 --> 11:37.200 to communicate or to the saved things. So, I can use the bootstep CPU, which 11:38.480 --> 11:43.600 most of the time, it's easier to just, you know, the speed that start in the system to have a 11:43.600 --> 11:50.320 system interface. It's actually at the most minimal, as you can think of, or whatever you want to run. 11:51.360 --> 11:55.520 Unicunners will be a good idea, because essentially, the just use one thing. 11:56.240 --> 12:02.560 And so, this is the plan, this is the model that I want to, for this approach, for time to get 12:02.560 --> 12:10.240 some sense into notes. Right, how do you, and so, here's the thing. So, I'm going to look at 12:10.240 --> 12:16.480 a compute CPU, as it said, we have kernel and user space, and there are two ways to run things. 12:17.120 --> 12:23.520 One, I can just directly run GGLML, and whatever, in the tech computing vector in kernel mode, 12:23.520 --> 12:29.040 so in highest, in the higher privileged mode, which in the nox model, actually that means that 12:29.040 --> 12:34.800 interrupt will not interrupt, you will not interrupt it ever, not even by interrupt. All, I can do 12:34.800 --> 12:41.120 something which is, you know, more say, and usually, which is like a kernel that is just a compute 12:41.120 --> 12:47.760 support, and then the GGLML computer is running in user space, so that you can request stuff, and you 12:47.760 --> 12:54.800 can just, but it would be interrupted. So, usually, the reason to run things in user space is 12:54.800 --> 12:59.760 because there's a corruption in the user space mode, usually the kernel is not affected, so the 12:59.760 --> 13:05.520 sub-each of the system is not affected. The problem is, of course, that it's also easy to put libraries, 13:05.520 --> 13:10.800 because you can just compile way more stuff, and having the compute support to implement the 13:10.880 --> 13:17.360 stuff needed, there's a possibility that by, by interrupting the stuff and also fact that, 13:17.360 --> 13:21.200 you know, that there's some privileged operation, you need to do the system, back and forth, 13:22.080 --> 13:28.480 and this was the demo, so I decided to go for the first option, and so running everything in the 13:28.480 --> 13:35.920 kernel and see what happens. So, this is what end when the, what the software's fact looks like, 13:36.880 --> 13:42.160 we got GGLML first, of course, we need to have the Liban, because let's see, we give them courage. 13:43.920 --> 13:49.120 I'm going to use Libnu's compute is the first library that I used there was on top of LibiC, 13:49.120 --> 13:54.720 which is essentially the part that allows me to allocate thread just by scheduling a CPU, 13:54.720 --> 13:59.360 so the CPU are waiting, and when I'm in the thread, essentially the needs to compute with 13:59.360 --> 14:04.160 set to a CPU, okay, now start, and of course, it stops and implemented the whole condition 14:04.160 --> 14:09.360 waiting for the various CPUs, so essentially it kind of emulates the thread tool, and then 14:09.360 --> 14:15.760 I have to write something that is awful to look at, but it works, which is LibgGLML UX, 14:16.400 --> 14:21.360 which is well, the dirty work is done, and so everything that actually needed that was missing 14:21.360 --> 14:26.720 is both there, so one is the C++ on time, which is like some sector completely function that 14:26.800 --> 14:39.520 we need to implement for the C++ code compiles to run, okay, yes, and yes, so you need that, 14:39.520 --> 14:45.440 and then you need, if see, essentially, to LibiC, because LibiC is minimal, so you know, 14:45.440 --> 14:50.480 sometimes you need, sometimes you need like Q source, and stuff like this in GGLML, 14:50.480 --> 14:54.800 and so I said, okay, instead of adding to LibiC, I'm just going to add another library that just 14:54.800 --> 15:00.960 includes next user headers, and so I did that, and only sanction one there, and then, yes, 15:00.960 --> 15:06.160 and so of course, I have to map it side to no compute, and I have to implement it in GGLML's time 15:06.160 --> 15:14.640 directly to use the Libnux timer, so putting all together, this is what this has been my, 15:14.640 --> 15:21.280 what I did, this is our system looks like right now, so right now, in order to test this, 15:21.280 --> 15:26.800 I, as you can see, I have all the GGLML running in the CPU, in the compute part, 15:26.800 --> 15:32.640 and I needed a quick demo, so I took example GPT2, and of course, the model is loaded in memory 15:32.640 --> 15:39.280 shed, I could just land directly, booting from hardware, yeah, you can find the code there, 15:39.280 --> 15:44.800 is originally, I was trying to put last directly to nuts, then I said, okay, you know what, 15:44.800 --> 15:49.840 I can actually do directly by implementing the tent pool, it's going to be simpler, and it was, 15:50.080 --> 15:56.240 and yeah, so it's a prototype, it's just, I wanted to say, look, we can do it, which was my goal, 15:56.240 --> 16:02.960 and also to learn what it takes to run GGLML on a very minimal environment, what you need to, 16:02.960 --> 16:08.960 to look at the various dependencies in order to understand it. So yeah, the code is there, 16:08.960 --> 16:13.840 it's for anyone who wants to read it, and the final consideration, well, honestly, 16:13.840 --> 16:19.920 porting it was much, much easier than expected, when you, when you see as it was last code, 16:19.920 --> 16:24.480 usually you do not expect that to see that, but actually, it actually is very saying, and 16:25.360 --> 16:29.840 it's, it's, it's, it's still very, very valuable to port in various different environment, 16:31.360 --> 16:35.840 the only thing that I would, I would do is like simple modification, but unfortunately, 16:35.840 --> 16:40.800 they're in two, the core architectures like, for example, GGLML, C contains everything from 16:40.800 --> 16:45.600 file I owe to various plus or plus section definition, and probably separating different file would 16:45.600 --> 16:51.280 be very nice, it was very impossible, very difficult, so that's why I had to learn the 16:51.280 --> 16:57.840 pizza to do the various, to implement the type of, it would be very nice, if the type of 16:57.840 --> 17:03.840 would just be linked in a separate file, it would just change it, and yeah, and again, 17:03.840 --> 17:10.560 like the plus one section is defining GGLML, C, and I would need to do some ifs to define 17:10.640 --> 17:15.120 all those things like the posit head interface, and that's pretty much it, this is where you 17:15.120 --> 17:18.400 can find more information about it, it really was!