WEBVTT 00:00.000 --> 00:14.560 So we now have Andrea Rigi who's going to give a talk on rustifying the Linux kernel scheduler. 00:14.560 --> 00:23.880 So please give it up. 00:23.880 --> 00:38.680 Hello, you can hear me, I guess. 00:38.680 --> 00:44.160 So yeah, we're going to talk about scheduling kernel scheduler, but in raster. 00:44.160 --> 00:49.600 How many kernel developers are here, or people that have played with the kernel? 00:49.600 --> 00:50.600 A few? 00:50.600 --> 00:51.600 A few? 00:51.600 --> 00:52.600 Okay, that's good. 00:52.840 --> 00:55.440 Okay, you don't need to understand the kernel. 00:55.440 --> 00:58.680 That's a good thing. 00:58.680 --> 01:04.160 Yeah, so here's the agenda. 01:04.160 --> 01:09.000 I want to go to the cool stuff, but before going to the cool stuff, I need to tell you something 01:09.000 --> 01:14.680 about scheduling in general, and then we see how we can use this technology called 01:14.680 --> 01:22.160 the SKDX to basically implement a scheduler kernel scheduling rust. 01:22.160 --> 01:26.560 But first of all, what is a scheduler? 01:26.560 --> 01:33.880 So a scheduler is a kernel component that determines where each task needs to run, when, and 01:33.880 --> 01:35.760 for a long. 01:35.760 --> 01:43.960 So it seems fairly easy to conceptually speaking, but in practice it can be really hard. 01:43.960 --> 01:49.160 Scheduling is a very known trivial problem, particularly because there are different 01:49.160 --> 01:55.360 architectures, different workloads, and it's really difficult to model a scheduler that works 01:55.360 --> 02:00.120 for everyone and in every possible scenarios. 02:00.120 --> 02:06.400 So the challenges are fairness if you want to design a scheduler that is generic as 02:06.960 --> 02:14.280 possible, because you need to give each task some chances to run after a while, like 02:14.280 --> 02:18.600 you've got more than something so generic. 02:18.600 --> 02:24.560 Now, so what's the situation in Linux? 02:24.560 --> 02:31.960 The policy in Linux is to maintain just a single scheduler or one scheduler to rule the 02:32.920 --> 02:32.960 wall. 02:32.960 --> 02:41.800 We used to have CFS before six, six, now we have another scheduler called EVDF. 02:41.800 --> 02:47.480 The first one is like the CFS stands for completely fair share scheduler or completely 02:47.480 --> 02:51.920 fair scheduler, and it's based on fairness from the previous slide. 02:51.920 --> 02:58.000 So it gives each task is like it's doing way if it bandwidth allocation to task that's 02:58.000 --> 02:59.000 it. 02:59.000 --> 03:06.920 Now, we move to EVDF, EVDF is a deadline-based scheduler, deadline allows to perform 03:06.920 --> 03:13.240 better in latency, sensitive workloads, so it's for latency. 03:13.240 --> 03:18.800 But I'm not going into details of these schedulers if you're interested in some details, 03:18.800 --> 03:23.840 there's another talk tomorrow in the kernel development where I'm talking more about these 03:23.840 --> 03:29.640 schedulers, and I'll play in video games. 03:29.640 --> 03:36.320 But anyway, the fact is that there's just one scheduler, and therefore it's really difficult 03:36.320 --> 03:40.600 to conduct experiments, because if you want to do some tests or experiments with the 03:40.600 --> 03:47.840 Linux kernel, you need to patch the kernel, recompile, reboot the system. 03:47.840 --> 03:52.840 It's not always easy to do, especially in production. 03:52.840 --> 03:57.960 Even rebooting in production, if you think at large cloud environments, you reboot 03:57.960 --> 04:04.480 your utility, warm the caches, you need to have a downtime, so it's really difficult to conduct 04:04.480 --> 04:10.240 experiments, and it's really difficult to upstream changes, because you may find something 04:10.240 --> 04:17.520 that works really well for you, but if it doesn't work for anyone, your change is not 04:17.520 --> 04:21.640 going to get merged, because you may introduce regressions. 04:21.640 --> 04:28.480 So the single Linux kernel scheduler is like, there are compromises, it's trying to do 04:28.480 --> 04:35.280 the best, and it's the best generics scheduler that we may possibly have, but sometimes, 04:35.280 --> 04:39.560 you may want to relax some constraints and say, I'm willing to accept to have it like 04:39.560 --> 04:43.640 an unfair scheduler just to solve my problem. 04:43.640 --> 04:48.800 And right now, I mean, in this situation, it was impossible to do that unless you maintain 04:48.800 --> 04:49.800 your own scheduler. 04:49.800 --> 04:55.800 Our big company is that can afford that, even if it's painful for big companies, because 04:55.800 --> 05:05.720 maintaining a patch, so impactful as scheduler, it's not a joke, it's quite a tight time 05:05.720 --> 05:07.920 consuming. 05:07.920 --> 05:13.520 So here comes BPS, KDEX and BPF. 05:13.520 --> 05:20.360 The next, for those that don't know, is a new technology in the Linux kernel that allows 05:20.360 --> 05:28.480 you to implement a scheduling policy as a BPF program, and it needs to be GPLB2, by the way, 05:28.480 --> 05:31.560 it's not like a software requirement. 05:31.560 --> 05:39.880 Actually the BPF verifier would not accept your scheduler if it's not licensed as GPLB2. 05:39.880 --> 05:44.600 So BPF, for those that, I think you're all familiar with BPF, I don't want to go, 05:44.600 --> 05:45.600 no, okay. 05:45.600 --> 05:50.040 BPF is like a GIT in the kernel. 05:50.040 --> 05:55.840 You can see BPF is like kind of a virtual machine, and you can, you can write programs 05:55.840 --> 05:56.960 in C, for example. 05:56.960 --> 06:03.720 Our whole BQs, you can test things, you can test changes to the kernel without, without 06:03.720 --> 06:06.920 crashing the kernel, so it's safe. 06:06.920 --> 06:12.520 Of course, you can't write things at the trial in kernel memory, but you get read access 06:12.520 --> 06:15.160 to kernel memory. 06:15.160 --> 06:23.760 And Skatex, the leverage BPF, to give the connections to the scheduler called BX. 06:23.760 --> 06:31.360 So the scheduler in the Linux kernel is implemented as a class, or as a struct of function 06:31.360 --> 06:38.840 pointers, and those function pointers are via Skatex, the redirected to the BPF program. 06:38.840 --> 06:44.600 That's how you can write a BPF program that implements your team called BX, that are 06:44.600 --> 06:49.920 called by the Skatex, the technology, and that's how you can implement your scheduling 06:49.920 --> 06:51.920 policy in BPF. 06:51.920 --> 06:58.240 So of course the benefits of this technology are the fact that you can design custom schedulers. 06:58.240 --> 07:03.080 You can load that demo runtime, you don't need to recompile the kernel or reboot, you 07:03.080 --> 07:05.560 just start the program. 07:05.560 --> 07:12.700 And of course, this leads to rapid experimentation, usually when you test the program 07:12.700 --> 07:21.960 uses base, you know, you edit compile, run, and then edit compile run, so the turnaround 07:21.960 --> 07:28.080 of this edit compile run cycle is really fast in user space. 07:28.080 --> 07:33.200 It's really slow in kernel space because you need to reboot and whatnot. 07:33.200 --> 07:40.080 But with this technology, you have the same feeling of developing like a user space application, 07:40.080 --> 07:46.280 while instead you are actually changing kernel code, in particular the schedule. 07:46.280 --> 07:51.480 So how does it work? 07:51.480 --> 07:59.320 So there's a kernel component that runs in the kernel, kernel subsystem called Skatex. 07:59.320 --> 08:03.720 And in BPF, you just need to implement a few callbacks. 08:03.720 --> 08:11.280 For example, there's a callback called NQ that is invoked every kind of task once to run. 08:11.280 --> 08:18.400 There's a callback called dispatch that is invoked every time a CPU is ready to accept tasks 08:18.400 --> 08:19.680 and so on and so on. 08:19.680 --> 08:28.480 So implementing this callback allows you to write a program in BPF that implements a scheduler. 08:28.480 --> 08:36.640 But like I mentioned, there are restrictions because BPF, in order to provide this concept 08:36.640 --> 08:45.440 of safety, added there's this project, RAS4 Linux, that tries to bring RAS to the kernel. 08:45.440 --> 08:51.760 Here I'm trying to do the opposite, I'm trying to bring the kernel into RAS4. 08:51.760 --> 08:57.760 Because RAS4 Linux is controversial, this is even more controversial. 08:57.760 --> 09:00.000 So here's the idea. 09:00.000 --> 09:06.840 I had to give you a little introduction about Skatex and BPF just to explain this slide. 09:06.840 --> 09:14.560 Basically, so my idea was like, if we what if you use Skatex to implement a BPF scheduler 09:14.640 --> 09:21.600 that does nothing, except to bounce all the scheduling events to a user space program. 09:21.600 --> 09:29.520 Because a call feature of BPF is that the user space program that loads the BPF by code via 09:29.520 --> 09:38.400 the BPF-C scroll shares the other space with BPF and BPF provides some data structures that are 09:38.480 --> 09:45.920 called maps that can be used like a message passing interface between BPF and the user space program. 09:46.800 --> 09:54.400 So the idea is like let's write a minimal layer in BPF that just bounces the scheduling 09:54.400 --> 09:59.760 events to user space. And then the user space can take all the scheduling decisions. 09:59.760 --> 10:05.360 So you actually implement the scheduling user space. You pass the results of the scheduling 10:05.440 --> 10:11.920 back to the BPF and BPF will do the actions that are decided by the user space program 10:11.920 --> 10:19.280 pass everything back to the kernel. And in this way, we have uploaded complexity from BPF 10:19.280 --> 10:28.720 into a user space program. We can still load and unload the scheduler at runtime using the usual 10:28.800 --> 10:36.640 like BPF scheduling way, but we have a user space program now. So that's like 10:38.320 --> 10:43.520 micro-cernal vibes here because we're actually moving part of the kernel into user space. 10:44.720 --> 10:51.520 The consequence of good consequence of these is that now we have unlocked the access to any kind 10:51.520 --> 10:58.640 of libraries and languages because if the kernel scheduler is a user space program, 10:59.360 --> 11:05.120 I can write this user space program in any language. I can write it in Python, in Java, in 11:05.120 --> 11:13.600 Raster, for instance. And I can use all the user space libraries in Raster. We use crates. I can use 11:13.600 --> 11:21.360 any kind of crates and whatnot. So yeah, we have seen like the previous slide 11:21.360 --> 11:29.440 where there was only the left part. Now we have also the right part that is basically the scheduler. 11:30.160 --> 11:36.160 And what I'm doing is like the BPF scheduler will share data is only implementing this kind of 11:36.160 --> 11:43.360 message passing interface and it uses BPF which is a C library, but there's also 11:43.440 --> 11:53.120 BPF at S that is the Rust binding to BPF. So user space program, user space scheduler becomes 11:53.120 --> 12:04.400 the program down here. Where am I here? Down here. And yeah, and this is like a regular Rust program. 12:04.480 --> 12:14.160 I can literally cargo build this thing and run and it will replace the internal scheduler 12:14.160 --> 12:23.520 with my program. And so as a proof of concept, I implemented a scheduler that is called Rustland. 12:25.360 --> 12:30.880 Because initially I implemented there was a scheduler called user land that was written in C. 12:31.520 --> 12:37.920 So I decided to do the same with Rustland and it's called Rustland. The scheduler itself, 12:37.920 --> 12:44.880 okay, so deadline-based scheduler. So it's better for latency. It's not fair. 12:46.000 --> 12:53.920 In a position to the, to the internal scheduler, it's very unfair. It prioritizes latency 12:53.920 --> 13:00.480 sensitive work to task a lot. But that's like the implementation of the scheduler is like 13:00.480 --> 13:06.400 it's a secondary goal. I just wanted to prove that it was possible to implement a scheduler, 13:06.400 --> 13:11.840 a kernel scheduler in user space using this technology. So this is more like a proof of concept. 13:11.840 --> 13:19.120 It's supposed to be a proof of concept. And I tested this with like a video game because 13:19.760 --> 13:25.840 I was like, well, with your game is cool. Let's see how fast a video game can go if I replace the 13:25.840 --> 13:33.680 Linux scheduler with this complete mess that is moving everything to user space. And like I was expecting 13:33.680 --> 13:42.000 to do like, I don't know, 5 FPS or something like that. But then I actually posted this video. 13:42.080 --> 13:49.680 I was showing that this unusual, unusual workload where I'm playing through Raria, it's a video game, 13:50.720 --> 13:56.560 while building the kernel, which is not something that you usually do because if you're a gamer, 13:56.560 --> 14:03.840 I mean, you want to do gaming unless you're also a kernel developer and you'll be the kernel in the background. 14:04.800 --> 14:10.960 But yeah, so the part on the left, the EVDF is the kernel is the ink kernel scheduler, 14:10.960 --> 14:18.640 the default Linux scheduler. And there's this video that's also available on the schedule with website. 14:18.640 --> 14:26.480 And shows that I mean, the game is very choppy and it's like 26 FPS, but it's not just choppy, 14:26.480 --> 14:33.360 it's also very inconsistent. So it goes from 10 FPS to 40 FPS, sometime 30, 20. 14:34.240 --> 14:40.400 So it's really a really bad gaming experience. And with the user space scheduler, with the 14:40.400 --> 14:49.040 Rask scheduler is doing 60 FPS, which is insane. But the thing is like, it's just the scheduling algorithm, 14:49.040 --> 14:55.920 that is different, and it's overly prioritizing latency sensitivity work. So it's just the scheduling, 14:55.920 --> 15:03.520 but that was to prove that not that the schedule is better because it's better for this particular use case. 15:03.520 --> 15:12.880 It's worse for everything else. But the point was it's possible to implement user space schedulers 15:12.880 --> 15:19.840 that can outperform the internal scheduler, which is an interesting thing. Like, and this one, 15:19.840 --> 15:25.200 you don't have to recompine the kernel, you just insert any, and it works. I was actually planning 15:25.200 --> 15:35.360 to show you a demo if I can. Let's see if I can. So we have this fish tank. This is the default scheduler. 15:37.120 --> 15:46.320 Let's see what happens if I start. Do you see it? Let me try to be in the kernel. 15:46.400 --> 16:00.400 That's the same example, but showing live, it's more cool. You see the FPS goes 9 FPS, it's really bad. 16:02.080 --> 16:12.000 Now, I should have here. Sorry, I prepared everything at advance, but I had to reboot. 16:12.000 --> 16:33.200 So, okay, let's do, so Rastland is the scale. Okay, now let's go back to, okay, I start the 16:33.280 --> 16:42.800 build. I'm doing 10 FPS. Now, I just start this program down here, which is the, okay, 16:42.800 --> 16:47.440 now the Rast scheduler is going, and you see, it's going in 40 FPS. 16:55.440 --> 17:02.800 You know, I don't care if the schedule is not fair. If I stop the program, 17:04.080 --> 17:12.080 you see, it's going back to the bad performance. Start the program again, and it's going back to, so 17:13.120 --> 17:18.720 I can, if you imagine doing these by changing also the schedule, this is really powerful, 17:18.720 --> 17:26.400 because you can literally do this in production. If I intentionally insert a bug in my scheduler, 17:27.360 --> 17:33.680 there's, so there are two things. There's a very fire that if there's a memory problem, 17:33.680 --> 17:38.880 it won't load my scheduler. So I can see, and see, it's actually faster now, that's the, 17:38.880 --> 17:45.840 it's which is live. Now, but the thing is, so why is it better? What is happening? Is it, 17:45.840 --> 17:53.840 because Rast magic? No. Yeah, yes, some of them will say, yes, I wish I could say yes, 17:53.840 --> 18:01.120 but unfortunately, the answer is, it's not really. So the trick is in the algorithm, right? It's 18:01.120 --> 18:06.160 not in the language. The algorithm is different, and you can see here, this is a, this is a 18:06.160 --> 18:11.440 professor, a big shout out to Google for making this tool, which is amazing. It helps you, 18:12.000 --> 18:17.600 tracks all, what happens? Like what's on the, on the Y axis, you see all the CPUs, on the X axis, 18:17.680 --> 18:26.080 you see the tasks. The yellowish one is the area, it's like the main thread, and you can see that 18:27.920 --> 18:33.680 the top, that's the kernel scheduler, is trying to be fair with all the tasks that are running, 18:33.680 --> 18:40.880 so all the, all the clang, that are compiling the kernel, and also the main terraria task, 18:40.880 --> 18:47.840 are, you know, it's fair, it's a fair scatterer. So there are certain logics, like it's still 18:47.840 --> 18:55.680 deadline based, so there are certain logic to prioritize latency for the terraria task, and you can 18:55.680 --> 19:04.960 see, you know, periodically it gets some, some CPU time, but down below, Rastlin is just doing 19:05.040 --> 19:12.080 a lot of time to the main terraria thread compared to the other tasks. It's also a lot more 19:12.080 --> 19:18.960 bouncy, like it constantly bouncing the task here and there, which probably is not a good idea, 19:18.960 --> 19:25.200 because for casual quality this is bad, but, you know, it's trying to find all the possible 19:25.200 --> 19:30.960 freeze lots, and it's very work-conservant, so it's as soon as there a CPU available, 19:31.840 --> 19:39.840 you can see that it's trying to schedule the main game thread into the CPU that is available, 19:41.040 --> 19:48.880 so that that's why it's working better, but like what's the benefit of Rastlin, what's the 19:48.880 --> 19:55.920 powerful thing about Rastlin? Because I thought, okay, this is a cool idea, why don't I 19:55.920 --> 20:07.520 generalize this, and why don't I make like a crate that people can use to design, to write 20:07.520 --> 20:15.760 their own scheduler, so I generalize the backend of this Rastlin scheduler, and I implemented 20:16.000 --> 20:29.920 a Rastrate, a CX Rastlin Core, which is available in crates.io, and you can use this, like, you don't have 20:29.920 --> 20:39.040 to learn like all these Skydex and BPF boilerplate, it provides an easier API, and the cool thing 20:39.040 --> 20:45.440 is that you can literally like cargo unit your project, you use this crate, and I also wrote a 20:45.440 --> 20:54.320 template, so you can literally get, it's on the top, you can get clone the template, and yeah, 20:54.320 --> 21:02.400 just cargo build that, and the template, the template implements a five-scadler using the Rastlin 21:02.400 --> 21:08.800 Core crate, so it's a really easy to understand template, and it is where you don't have to 21:08.800 --> 21:17.440 learn anything about the underlying layers of Skydex and BPF kernel, it's a very abstracted interface, 21:18.480 --> 21:30.000 and I think I have, okay, that's the design, basically, it's more in details, we have the 21:30.320 --> 21:41.200 the coldbacks here that are implemented, like this is the BPF part, and the crate contains a 21:41.200 --> 21:47.600 back end, which implements the BPF part, and then it's using the BPFRS to implement a front end, 21:48.480 --> 22:03.440 and your program is here. Whoops, yeah, so this is cool, I wanted to show, like this is a 22:03.440 --> 22:11.840 working five-scadler in SCX, implemented in SCX Rastlin Core, which fits in as light, so I thought 22:11.920 --> 22:19.440 that was a cool achievement, this is a kernel scheduler, like we can run this, and it's like, 22:19.440 --> 22:36.320 yeah, I can show you this, like it's in Rastlin scheduler, 22:36.320 --> 22:51.920 so this program that runs here, it's printed some statistics, but basically it's the code is this one, 22:52.800 --> 22:59.360 and right now we are doing this presentation, running a five-scadler implemented in Rast, 22:59.680 --> 23:06.880 and what it's doing is just, you know, the Q task is just consuming task that wants to run, 23:07.600 --> 23:14.880 select CPU is telling the back end, like give me a CPU that is available, is idle, and I assign 23:15.440 --> 23:24.640 the task, the CPU to the task, if select CPU doesn't give me a CPU, I say, I tell the back end, 23:24.720 --> 23:31.680 just dispatch this task in the first CPU available, then I assign a slide times lies that is 23:31.680 --> 23:38.080 inversely proportional to the amount of tasks that are waiting in the waiting to be scheduled, 23:38.080 --> 23:45.520 and then I dispatch a task in order, that's it, like that's a kernel scheduler written in Rastlin 23:45.520 --> 23:55.680 that fits in a slide, I think it's pretty cool, and yeah, if you have questions, 23:59.280 --> 24:07.040 actually, no, these are the cakeaways. I told this already, like the important part here is 24:07.760 --> 24:12.480 Rastlin is not a better scheduler in general, it's just a proof of concept to show you the 24:12.560 --> 24:19.760 potential of this technology that you can use, literally using production to do scheduling testing, 24:20.800 --> 24:27.120 which never happened before, because scheduler was a very monolithic part in the Linux kernel 24:27.120 --> 24:34.400 difficult to change. Rast itself doesn't make the scheduler better, but having access to Rast, 24:34.400 --> 24:42.400 especially in user space, allows you to use the whole language without any restriction, 24:43.200 --> 24:49.680 it's not like Rastlin, that still has restriction, it needs the proper abstraction to be implemented 24:49.680 --> 24:55.280 in the kernel as kernel code, this one you're actually using space, so you don't have the Rastlin 24:55.280 --> 25:02.080 of restrictions, it's easy to experiment, because it's like writing, I mean, the five 25:02.080 --> 25:06.960 scheduler fits in as light, and you just need to compile and run and stop it whenever you want, 25:08.800 --> 25:13.920 and potentially you can import any other create that you have access with a regular user space 25:13.920 --> 25:21.760 program, you can import libraries, maybe if you're crazy enough you can plug an entire AI or a 25:21.760 --> 25:28.080 large language model in this program that decides your scheduling, it's going to be more 25:28.080 --> 25:34.800 CPU expensive than the task that you're trying to run, but I mean, that's like, yeah, let's get 25:34.800 --> 25:41.920 x, there's a logic that says, if nothing is running, this patch the user space scheduler in the 25:41.920 --> 25:50.080 first CPU available, and you know, the CPU user space scheduler will have like a queue internally, 25:50.080 --> 25:54.880 which all with all the tasks that are waiting to run, so it's scheduled by the BFF program, 25:54.960 --> 26:02.000 it runs, it tells the BFF program like, okay, this guy goes here, this guy goes here, then go, 26:02.880 --> 26:10.240 and then BFF tells KDEX the results and the scheduling happens, but it's a good question, it's the 26:10.240 --> 26:14.560 BFF backend that decides to schedule the user space program. 26:18.240 --> 26:22.320 How can you tell that terraria has tighter deadlines than everything else? 26:23.280 --> 26:30.320 Sorry, I can tell that the microphone is for the video, but you still need to speak up, so, okay, 26:30.320 --> 26:37.280 so how can you tell that terraria has tighter deadlines than compiling the kernel? 26:38.880 --> 26:47.120 Okay, the thing is like, usually tasks that voluntarily releases the CPU before their times 26:47.200 --> 26:53.360 lies expires, their latency sensitivity, so let's say video games are very seek-lic and 26:55.360 --> 27:01.200 all the words are very periodic, usually in this case, you know, you're supposed to render frames 27:01.200 --> 27:10.000 at 60 FPS, so every one over 60 seconds, which is like 60 milliseconds, every 16 milliseconds 27:10.000 --> 27:16.960 a task needs to run, usually the composer or ex-wailant needs to run and it needs to draw a frame, 27:16.960 --> 27:24.640 send a frame to the video card, and then it's just leaping, so latency sensitivity tasks are 27:24.640 --> 27:33.520 working in bars of CPUs and days leap, so if you find, if you model a deadline that prioritizes 27:33.600 --> 27:40.960 this task that has this leap behavior, and you can do that by, for example, that there are 27:40.960 --> 27:49.600 many ways, one is measuring out the average runtime between a wake up and asleep, if this average 27:49.600 --> 27:57.200 time is very short, you're probably facing latency sensitivity tasks, and you can model the deadline 27:57.200 --> 28:04.640 as, like, you can use a Viren time, which is a proportional fair share, and you add, for example, 28:04.640 --> 28:11.040 the average of this average runtime, so if a task is using a small amount of time, it would get 28:11.040 --> 28:17.600 a smaller deadline, and if a task is, like, a client that is building the kernel, it would use 28:17.600 --> 28:26.000 its entire CPU, it's entire time's lies, so its deadline will be longer, and with this trick you 28:26.000 --> 28:32.640 can prioritize the lift and sensing, it doesn't work every time, but that is the best, I mean, 28:32.640 --> 28:38.240 you would need to predict the future to write the best deadline based algorithm, but that's the best you can do. 28:40.960 --> 28:44.320 I imagine that you need some context switches between the VPS, 28:44.320 --> 28:48.240 well, I won't be able to understand this thing, okay, I'm coming here, it's easy. 28:49.280 --> 28:55.280 So I imagine that you need some context switches between the kernel space VPS, and the user space 28:55.280 --> 28:58.640 scheduler, right? So how much of an overhead is it? 28:58.640 --> 29:07.520 Yeah, so, yeah, it's a good question. Of course, the user space scheduler is a task in its own, so 29:07.520 --> 29:13.040 there are context switches when you need to do scheduling, when you need to do scheduling decisions. 29:14.400 --> 29:19.040 I mentioned, like, almost at the beginning that the scheduler itself is not a 29:19.040 --> 29:27.520 CPU-intensive application, it works almost like a latency-sensitive task, that means, 29:27.520 --> 29:32.480 when it needs to run, it needs to run, and then it just stops, it just, you know, the decisions 29:32.480 --> 29:38.320 that the scheduler makes are very small, you know, you need to say, okay, this guy goes here, this 29:38.320 --> 29:44.480 guy goes here, and then I go to sleep. So, the context switches at the end are not, 29:45.440 --> 29:53.840 they don't impact much on the scheduler itself, but they can impact on the user space applications 29:53.840 --> 30:02.960 that are running, specifically more interruptions you have, like if you decide to use a fine grain 30:02.960 --> 30:08.960 times less, like a smaller times less, you would get more context switches, but this is, 30:08.960 --> 30:14.240 it doesn't depend on the schedule, it's based on how small you define your times less, right? 30:14.800 --> 30:22.720 So, a shorter times less would give you probably better performance in terms of responsiveness, 30:22.720 --> 30:28.560 but you also have more over it, so the throughput would be worse, right? 30:30.400 --> 30:36.080 And that, you know, the scheduler here is just yet another task that is running in the system, 30:36.080 --> 30:39.200 but you have the same problem with the synchronous scheduler, because 30:40.160 --> 30:45.440 at some point the synchronous scheduler would also run with the same constraint. 30:52.800 --> 30:58.000 While running the scheduler, the same happens with the synchronous scheduler, 30:58.960 --> 31:05.520 because, you know, you have, okay, you're like, I'm not switching to another task, 31:06.320 --> 31:12.960 oh, yeah, I see what you mean, yeah, some switching to a different task, yes, that's correct. 31:14.160 --> 31:20.880 So, potentially that could be, like the contest which penalty is bigger with this approach, yes, that's correct. 31:23.520 --> 31:30.960 So, I have a question with regards to, yeah, okay, okay, I can speak louder. 31:31.040 --> 31:39.360 How do you ship a six-risk land curve, for example? So, there are two questions behind. 31:39.360 --> 31:44.400 Your BPS program, is it, is it written in rest or not, or is it to see? 31:44.400 --> 31:48.800 No, okay, that's another question, okay, yeah, that's a question, and second question, 31:49.920 --> 31:53.440 do you ship a blob in the crate, or is it built, how do you build it? 31:53.440 --> 32:03.040 Yeah, so the BFF part is currently written in C, and what I ship is, is the dot C, 32:04.080 --> 32:11.520 which is compile, like there's a build dot RS that calls Clang and compiles the BFF part, 32:12.160 --> 32:18.880 and then it becomes BFF by code in your machine when you build it, and that's how it works. 32:19.360 --> 32:25.680 Now, I'm looking at, there's a library called Aya that generates BFF code that actually, 32:25.680 --> 32:31.200 so I was investigating if that would be probably better, because everything would be raster, 32:31.200 --> 32:35.840 and then BFF, but that's like, that's the bike code, is the backhand. 32:37.760 --> 32:44.800 And, yeah, probably it's not like a super polished crate, I'm sure there's something that is not. 32:44.880 --> 32:50.320 Even if it passes like the cargo test and everything, but I'm not sure if it's properly polished, 32:50.320 --> 32:58.160 because I'm shipping the dot C as an artifact that is included in the crate, which is compiled 32:58.160 --> 33:07.280 when you build your schedule. But, for any reason, if the user space schedule is blocked, 33:08.000 --> 33:13.200 the whole schedule, the system won't progress, everything would be dead blocked and blocked. 33:13.840 --> 33:19.360 One thing is like page fault, let's say you do our locations in the user space 33:19.360 --> 33:25.360 scheduler, and you eat the page fault. In order to resolve the page fault, you need to schedule 33:25.360 --> 33:31.520 a kernel thread, but the scheduler is the guide that is blocked, so there's a dead look. That's 33:31.520 --> 33:40.800 a problem, but that was solved. I solved this one using by, I'm locking all the memory of the user space 33:40.800 --> 33:45.760 scheduler, so the backhand is actually unlocking all the memory. You can do a location, it's just, 33:45.760 --> 33:52.480 they will happen in an arena of pre-allocated memory, and I'm using the, how's it called, 33:52.480 --> 33:58.160 the global alok, which is an abstraction that allows you to read the direct, all the 33:58.160 --> 34:02.400 locations. That was really cool, I didn't know that. I learned this in raster, I was like, 34:02.400 --> 34:08.880 oh, raster is the best, because I can read the direct, like, all the locations into whatever 34:09.200 --> 34:15.920 algorithm I want to use to do our locations, and I did this trick to solve the page fault problem. 34:19.040 --> 34:27.120 Okay, if you have further questions, please find Andrea, in the hallway, or if you don't have 34:27.120 --> 34:30.240 a hallway, I guess. But thank you very much.