WEBVTT 00:00.000 --> 00:20.000 All right, I think we're good to go. Let's learn about rest on Linux. 00:20.000 --> 00:25.080 Yeah, welcome to my talk on Restful Linux. I'll give you an overview of the project, what 00:25.080 --> 00:35.880 it is. First one, my name is Anis. Anis Astier. The live blogger of the Canon Recipes Conference, 00:35.880 --> 00:41.640 a great Canon conference in Paris, and actually to check it out. Small disclaimer, I'm not 00:41.640 --> 00:47.320 a restful Linux contributor. Just this talk to learn more about restful Linux, I invite you 00:47.320 --> 00:54.040 on the contrary to look at Miguel Oreda stock yesterday, it was in Jansson. It was recorded, of course, 00:54.200 --> 01:00.600 a very interesting presentation. So what is restful Linux? First of all, it's a meta project, 01:00.600 --> 01:05.880 just like the Canon, you know, the Canon is a group of multiple projects that merge every 10 weeks, 01:06.680 --> 01:12.440 and restful Linux is just that it's a set of multiple different projects that want to use rest 01:12.440 --> 01:20.200 in the Linux Canon. The goal of the project is to make rest second the main language for the Canon 01:21.160 --> 01:30.600 in general, not just for drivers, but in general in the Canon. And upstream using, making sure 01:30.600 --> 01:38.280 everything is upstream the core goal of this project. A quick history of restful Linux, it started 01:38.280 --> 01:47.480 in 2013 with a demo, it was a small, a lot of module, rest.co. You can basically, basically, 01:47.480 --> 01:53.400 we can click the links if you go, check out the slides. There are online on the page, so every link 01:53.400 --> 02:01.480 you see is kickable. Then, fast forward in 2019, Alex Gainor and Jeffrey Thomas, give a talk 02:01.480 --> 02:08.760 at the Linux Security Summit and inviting kind of developers to use rest for security reasons. 02:08.760 --> 02:15.080 A year later in 2020, there was a discussion at the Linux Plumber conference with many people already 02:15.160 --> 02:23.720 involved. A year later, Miguel Ojeda sends the first pull request with many, many contributors. 02:24.440 --> 02:31.480 He did the project announcement along with the pull requests, and the restful Linux experiment 02:31.480 --> 02:36.920 was merged into Linux 6.1, so it was three years ago, almost three years ago. 02:37.880 --> 02:44.520 And ever since 6.1, at every release, of course, there were new changes. At first, 02:44.520 --> 02:47.800 at the beginning there was nothing, there was just the infrastructure to be able to build 02:49.080 --> 02:53.080 thing with rest, but there was nothing to build. Now there's a bit more and we'll talk about it. 02:55.400 --> 03:01.960 First of all, why rest? Usually when you pick a mainstream programming language, you have to 03:01.960 --> 03:06.760 make this trade-off. You have to pick two of those. You want to be able to do dynamic memory 03:06.760 --> 03:12.280 allocations. Is your program going to have everything statically hard-coded or not? Do you want to 03:12.280 --> 03:19.720 have memory safety? I prevent the full of flows, data races, things like that. And do you want to 03:19.720 --> 03:24.840 have a garbage collection or not? And the language that can be free running, you can run without any 03:25.000 --> 03:31.000 runtime at native speed. Well, with rest, you don't need to pick, you get all three, and that's 03:31.880 --> 03:36.440 a part of mainstream language. There's no other one that comes to mind, that has exactly the same 03:37.320 --> 03:47.320 properties. Of course, memory safety is just one aspect of the equation. And the last XDC 03:47.320 --> 03:53.240 lead poll, she's a kernel developer, she said that memory of safety is a list convincing point. 03:54.280 --> 04:00.360 Linux kernel developers, they've been using C for decades, they know it's unsafe, so it feels like 04:00.360 --> 04:06.280 they're being talking down to when you keep repeating the same arguments about memory safety. 04:06.280 --> 04:13.320 And ergonomics are what she says a bit more interesting, because rest can be a kernel 04:13.320 --> 04:20.120 material. You can encode properties of your kernel subsystem of your drivers, of your APIs, 04:20.680 --> 04:27.080 in rest, which you can't do in seeing, you have keeping your minds because of the many features 04:27.080 --> 04:33.240 in rest, you have an image pattern matching trade, you have ownership and like time tracking, 04:34.280 --> 04:41.080 ball-checking. Of course, there are reasons why one would not want to use rest. For example, 04:41.160 --> 04:47.240 it's a new language, so you already know C learning text time, and if you've been programming 04:47.240 --> 04:53.160 for a long time, you know usually you can pick up a new language pretty quickly. With rest, 04:54.120 --> 04:58.440 it's a bit different, like you find walls, you find things you are used to that you can't do. 04:59.880 --> 05:05.320 So it takes a bit of time. There are other reasons, of course, for example, the Linux kernel 05:05.400 --> 05:13.240 supports 22 or 23 architecture families, sorry, just in general, and resty has about half of those 05:15.240 --> 05:20.680 up to tier two targets. So if you want, there's a bit more, but it's not exactly the same number 05:20.680 --> 05:25.800 of architectures, so some architectures, supposedly by Linux won't be able to build rest code. 05:26.840 --> 05:33.160 Until we have DCC at least. So right now for the main architectures, you can build the kernel with 05:33.240 --> 05:41.320 clank and DCC, and this is support for rest code this coming via two projects. So there's 05:41.320 --> 05:47.240 DCC RS, which is working with upstream to add rest support directly into DCC, 05:49.080 --> 05:56.200 and there's rest code gen DCC, which is another project whose goal is to change the rest 05:56.200 --> 06:04.440 C backend to use DCC for cogeneration. So instead of LLVM, there are other by the way rest C backend, 06:04.440 --> 06:12.280 like crane leaf, but yeah, it's south of scope of here. What is the strategy of the rest for Linux 06:12.280 --> 06:19.240 projects for working with the Linux kernel? First, the project wants to lead by example. So it 06:19.320 --> 06:27.320 wants to have documentation everywhere with tests, safety commands. So when you're doing rest, 06:27.320 --> 06:33.960 some path might be unsafe, and every safe there's a link in the Linux kernel should have 06:33.960 --> 06:41.320 a safety comment explaining why it's used. It uses a bind gen for FFI layer generation. So FFI 06:41.320 --> 06:45.400 means for instruction interface. It's when you want to bring two languages, usually right 06:45.400 --> 06:51.640 and interface either by hand or not. Usually not, you prefer to have this automated. So there's 06:51.640 --> 06:58.840 a tool called bind gen that will parse the C headers of C files and generate a rest FFI. So it's 06:58.840 --> 07:06.680 basically rest code with unsafe functions. Those unsafe functions on top of it, the rest for an 07:06.680 --> 07:12.680 project says we should not use them directly in the account. We should build safe abstractions 07:12.680 --> 07:21.240 on top of those bindings. So you don't call the bindings directly. What are those safe abstractions? 07:21.240 --> 07:27.240 It's exactly what we were talking about. It's a way to encode the constraints, 07:27.240 --> 07:32.920 which you have in the C code, the C layer, and you need to understand those constraints to put them 07:34.120 --> 07:41.000 in rest types. That's where you need really kernel domain expertise. Next, please, 07:41.000 --> 07:46.920 read it on the specific kernel domain you are abstracting. So usually it's not trivial to do, 07:46.920 --> 07:53.720 but it depends. In fact, there are already quite a few abstractions in Linux, many APIs, 07:53.720 --> 08:00.840 for example, you might know about work use, device, miss device, which has merit recently, 08:00.840 --> 08:06.120 platform device for embedded developers and any kind of people using platform device. 08:07.080 --> 08:13.880 There's an alloc module to do allocations. There's a PIGNM space, which was done by 08:13.880 --> 08:22.840 a Christian here for a driver for the Android driver. Anyway, there are many others, it's just a selection, 08:22.840 --> 08:27.880 and this is quite long, there are a lot of things you can do in rest, but for many use cases, 08:27.880 --> 08:33.160 it will still be too short. We'll still at the beginning and you might find that you might want to 08:33.160 --> 08:38.600 use something in the kernel, some API that's not abstracted, and this would need to be written. 08:40.200 --> 08:46.840 So where are we with drivers? First of all, in the Linux kernel, there's this rule, 08:46.840 --> 08:50.840 called the node duplicate rule. So if you want to merge the driver, it should not have 08:52.120 --> 08:57.560 another driver, which has the signature for the same hardware. So of course, this rule has been 08:57.560 --> 09:03.640 relaxed in some cases, and for in the rest of the next project, it's being relaxed for what 09:04.520 --> 09:11.720 are called reference drivers. So this reference drivers serve as examples of how to write 09:11.720 --> 09:20.200 a rough code. And one of those drivers is the null block driver, which was merged upstream recently, 09:20.200 --> 09:25.000 so it's a complete implementation of the C button. For now, there are the two ones 09:25.080 --> 09:30.680 that exist in the Linux kernel. There are other things that were merged. For example, the DRM 09:31.560 --> 09:38.680 panic curr code generation. So it's kind of like a Linux blueprint of that, except not. So you have 09:40.680 --> 09:44.680 when you panic, you have a lot of, a wall of text with a lot of information, the stack trace, 09:44.680 --> 09:49.480 the state of the registered row. And instead of taking a picture or copying that by hand, 09:49.480 --> 09:53.640 you just kind of curr code, and you have also data, you can copy test it and send it in your 09:53.640 --> 10:00.120 degree port. And also curr code generation is done in rest. There are two five drivers for 10:00.120 --> 10:06.600 network, social network key and thing that have already been merged. And of course, that's just 10:06.600 --> 10:11.400 the tip of the IPS third. There are many upcoming drivers, many things that are being worked on, 10:12.280 --> 10:20.600 one, which is well known, will be the other Linux Apple GPU driver. It's not upstream yet, 10:20.680 --> 10:25.800 but it's being shipped, if you use as a Linux, it's being shipped to everyone with the 10:25.800 --> 10:33.880 running Linux. There's a re-implementation of the Android Bider driver, which is a core driver 10:33.880 --> 10:40.680 that is extremely Linux. So it's being rewritten with the aim of replacing completely the C 10:40.680 --> 10:49.560 implementation, the same for the NVME. There's an upcoming driver again for GPUs, for NVDPUs, 10:50.680 --> 10:55.720 and many others. Yeah, the list is quite long and many working progress projects. 10:57.560 --> 11:05.480 Recently, in the recent updates, we've had a few changes over the last Linux version. For example, 11:07.080 --> 11:12.120 it used to be that you had to use for to be the rest into a specific error. You had to use the 11:12.120 --> 11:18.680 specific rest compiler version. It's no longer the case since two or three release. Now you use 11:18.840 --> 11:28.040 rest 178, I think. It was picked very consciously because it's package in almost every 11:28.040 --> 11:39.240 district that may be stable, but it's in testing. There are now a few types to wrap the Linux 11:39.240 --> 11:45.320 allocators. So there's KVAC, KBox, which if you know the thing about two about rest, 11:45.320 --> 11:52.760 is just like VAC and Bugs, but for the kernel, they have a slightly different API, and it's 11:52.760 --> 11:58.120 done to be able to pick the type of allocators because they might be multiple allocators in the 11:58.120 --> 12:06.120 kernel and the allocation flags. Another date, which was done recently, was that the rest project, 12:06.840 --> 12:14.200 the rest compiler now builds the Linux kernel in CI. So every PR is tested with the Linux kernel 12:14.520 --> 12:20.760 build to make sure that there is no breakage. The rest project takes rest for an extra 12:20.760 --> 12:25.960 sleet, one of its flagship goals for the second half of 2024, and probably will be for the 12:25.960 --> 12:36.600 first half of 2025. What are our kennel mentions thinking about rest? What do they think? 12:36.600 --> 12:42.680 So globally, I'd say there's a positive outlook. As the last maintenance, it seemed that 12:43.800 --> 12:48.120 it was globally positive. There are many supportive maintenance. I give you the example of 12:49.240 --> 12:57.000 Christian here who even wrote an abstraction to have to have encode the properties of a domain 12:57.000 --> 13:04.200 he knew very well. The Linux talk of the steam right ahead to rest developers, so right code, 13:04.200 --> 13:11.400 even if it's still an experiment, and I invite you again to write what Miguel Okadastok, 13:11.400 --> 13:17.880 who did a really great segment yesterday, and he interviewed many different kind of developers 13:17.880 --> 13:25.000 and gave code to what they think on rest for Linux. Supporting rest for Linux is still optional, 13:25.000 --> 13:32.280 still an experiment, or is it depends? It depends on the subsystem. If the maintenance 13:32.360 --> 13:38.680 is supportive, it's not optional. It's part of the features, and the rest for Linux developers never 13:38.680 --> 13:46.280 said that you can break rest wherever you want. In the last case scenario, if there's a disagreement 13:46.280 --> 13:51.640 of something, then you can reach this point, but it's not something that was never said. 13:53.640 --> 14:00.280 So that concludes the first part of this talk. Now we'll show a bit a few code examples to 14:00.280 --> 14:05.320 why rest is interesting in the Linux kernel, and we'll start with this, which is the direct 14:05.320 --> 14:14.360 copy path from the presentation by Alex Alis-Rill and Carlos Yamaz on the binder driver rewrite. 14:14.360 --> 14:20.360 So you have a comparison between C code and rest code. So you see the rest code, you have just 14:20.360 --> 14:27.640 a closing accolade, and that's perfectly normal, that's because of life time tracking, of ownership 14:27.720 --> 14:33.240 tracking, and the way the drop tray works. And of course the C code is only just a part, 14:33.240 --> 14:43.640 it's not even complete yet. Let's now look at the minimal rest drivers. So this is directly 14:43.640 --> 14:49.640 from the kind of source tree, if you go into the examples, you see how to write a module or a driver. 14:49.720 --> 15:01.320 For that, you will first use a prelude, it will import a lot of few things that the 15:01.320 --> 15:06.680 rest for next developers are important into your scope. What's important, for example, you have 15:06.680 --> 15:16.360 the module macro, it allows you to declare your driver, your driver has a type, it's a structure, 15:16.360 --> 15:23.640 we'll see that later, and it has some metadata, and it has a name, a license, or a description, 15:23.640 --> 15:35.640 everything you have in C code usually. And if you look a bit more, I told you the module has a type, 15:35.640 --> 15:43.240 this is basically a structure, and this is the state of the driver itself. Here it's an example, 15:43.240 --> 15:52.600 so the state is an array of numbers, a dynamic, an array of sign 32 bit integrers, which is a 15:52.600 --> 16:00.440 KVAC, I thought I talked to you about it a bit earlier. And then you need to, for this structure, 16:00.440 --> 16:06.680 you need to implement what we call the tray in rest, so the tray is called module, 16:06.680 --> 16:13.000 so you implement the module tray, and what does that mean means that you need to have this function, 16:13.080 --> 16:19.320 the innate function, that's what it means, implementing this tray. And inside this function, 16:19.320 --> 16:26.440 you will see the PR info macro, it's basically almost the same thing as the PR info Linux 16:27.320 --> 16:32.520 kind of function, so it's actually a function, it's also macro, but it's nothing important, and you 16:32.520 --> 16:42.760 declare your dynamic array, so this will be done on the stack, and then you push things to it, 16:42.760 --> 16:50.920 and you will call the push function, like the stack in rest, except, this function can do allocations, 16:50.920 --> 16:55.800 and you will need to pass the allocation flags, because in the corner you might, depending on the 16:55.800 --> 16:59.800 context, you might need to have different flags, for example, if you insert people context, 17:00.760 --> 17:07.080 so you pass the flags, and of course this allocation is available, so it can return an error, 17:07.080 --> 17:16.120 and you pass this error to the return of the function itself, and then you return the states, 17:16.120 --> 17:28.040 so you return the structure, you declare with the numbers inside. The end of this driver is 17:29.000 --> 17:35.320 to exit, when you exit, it was chosen to use a drop tray, which is a built-in tray, 17:35.320 --> 17:41.240 which means that when it goes something goes out of scope, it's called, that's how you do the automatic 17:41.240 --> 17:48.280 screen, as an example, that was shown before, and so the structure will implement the tray, 17:48.280 --> 17:55.080 it has one function, and this function calls the print again, and you print the state that you had before, 17:55.160 --> 18:01.160 so that's basically it, that's the rest, minimal sample, that's in the Linux kernel, 18:01.160 --> 18:13.160 a source tree. Let's show now a bit more complex example from the SAE Linux GPU driver, 18:13.880 --> 18:21.400 and one of the reasons that was set it by SAE Linux, and even later by Devali and Danielo Kremich, 18:21.400 --> 18:27.720 is that rest is very interesting, because it allows us to abstract the firmware layer, 18:27.720 --> 18:34.360 the interaction between the kernel driver and the firmware, and on Apple platforms, and it's 18:34.360 --> 18:41.720 true, so for NVIDIA GPUs, the kernel developers don't control the firmware, it's controlled by 18:41.720 --> 18:48.120 the hardware window, and with every iteration, they might break somethings. In order to prevent 18:48.120 --> 18:56.440 breaking the kernel driver, there's an abstraction inside Linux to help, and this will be a macro. 18:57.320 --> 19:03.080 This macro, it's a pro macro, we won't show it, we will show how to use it, but not how it's implemented. 19:03.080 --> 19:09.320 First of all, it's not this one, this one just macro to tell the rest compiler, I want this 19:09.320 --> 19:14.520 structure to be represented like in C, so it would be C like representation in memory. 19:15.240 --> 19:21.560 This is a macro, and it has one argument, which is AGX, and we'll come back to it later, 19:22.520 --> 19:28.520 and inside the definition of the macro, there's this AGX version, so it's the same, 19:28.520 --> 19:35.480 it's referring as exactly that, and the firmware can be selected hardware dependent, 19:36.200 --> 19:41.320 or version dependent, so the G would be generation and V version, and you have 19:41.720 --> 19:49.240 like three hardware that are supported, and six firmware versions. Of course, we don't want to 19:49.240 --> 19:54.520 support all of those, it would be like 18 things, and it would grow even faster, so there are only 19:54.520 --> 20:03.560 five that are supported in Linux, for this out of three driver. Let's go back and take a look, 20:03.640 --> 20:12.200 and what this macro allows to do, you can now add an annotation, this will be again, 20:12.200 --> 20:18.360 an entrepreneurship at build time by the macro, and this annotation allows saying I want this 20:18.360 --> 20:24.200 next field to be or this next expression in general, to be only for this version, and this version 20:24.200 --> 20:32.920 it has an expression in evaluator that's run at build time. You see here's a version, it has to be, 20:33.000 --> 20:38.920 if we want to have the counter, it has to be greater than V13, or equal to V13. 20:40.520 --> 20:48.120 So if we go and take a look at what happens with this macro, it will generate each of the five 20:50.120 --> 20:57.160 possibilities we've seen to generate at build time different structures. You see the structures 20:57.240 --> 21:03.000 and I'm slightly differently, it contains a different generation and the firmware version, 21:03.640 --> 21:09.640 and we can see that there's one that contains the counter, which is because that's how it's 21:09.640 --> 21:15.240 declared in the firmware interface, and the other does not have it. So yeah, that concludes the 21:15.240 --> 21:22.360 this example. I just want to show a small possibility of what could be possible in the future in 21:22.360 --> 21:31.000 Linux, usually when you are using Rust, it's not Rust does not protect against deadlocks. 21:31.000 --> 21:38.360 So you can deadlock, it's not one of the properties perfectly safe to deadlock, depending on the 21:38.360 --> 21:44.200 definition of safe, talking about the Rust definition. Of course, using the Rust programming language 21:44.280 --> 21:53.560 is possible to build a structure in a way that you can never deadlock. So this was presented 21:53.560 --> 22:00.120 by Joshua Libre of Feather at the latest RustConf, in a talk called 16 and not safe world. 22:00.120 --> 22:05.880 I won't go into details, but it explains how to build the context with two traits and the 22:05.880 --> 22:12.760 macro, and how to build basically a new text directly, actually graph, so that the other in which 22:12.840 --> 22:19.800 you can take mutex always depends on the previous one, and you can never take it in a wrong order 22:19.800 --> 22:27.720 if it has to be satisfied at compile time. And it has no compile time. It has no runtime impact, 22:27.720 --> 22:31.640 it's just verified at compile time. If you're a bit curious how it might work, I invite you to 22:31.640 --> 22:41.400 look at it. It works for Fuchsia's NetStack, which has 77 mutex's. And that would be for my presentation, 22:41.800 --> 22:50.920 thanks a lot. 22:55.160 --> 23:03.960 Thank you. The mutex verification, do we need to annotate the mutex's or just a compiler 23:03.960 --> 23:10.120 look at all the mutex's and says, well, this one, does it infer that? Or does it do we have to do it? 23:10.120 --> 23:18.120 It's done manually. So everyone, it works for this project. They have like 77 mutex's. They went 23:18.120 --> 23:25.560 through every one of those, and they defined it an order in a graph. And I think it could be done for 23:26.920 --> 23:34.920 some subsistence, something that needs to be explored. Yeah, I mean, I think I'm thinking about 23:34.920 --> 23:42.120 the giant comment we have at the top of our map.c, which lays out the locking hierarchy inside the 23:42.120 --> 23:48.440 MM, and I'm a little bit scared, but it would be nicer to have that actually verified by the 23:48.440 --> 23:54.600 compiler rather than just a comment that can be ignored. 23:54.760 --> 24:08.520 Other questions? Can you explain why there is no duplicate rule for drivers? Why is there 24:08.520 --> 24:13.000 no duplicate rules for drivers in the kernel? Because you don't want to have duplicate work. 24:13.000 --> 24:17.160 You don't want to have too many people working on different drivers for the same hardware. 24:17.960 --> 24:22.520 So how does that work when you have platforms that are not supported by restaurants and you have 24:22.600 --> 24:27.960 code running means? Someone might want to contribute to driver running restaurants and you 24:27.960 --> 24:31.880 are running a map, or whatever, and it's not supported by restaurants. 24:31.880 --> 24:37.640 Okay, wrong example. Sorry. Try the guess. Try to guess one and support the architecture for PC, 24:37.640 --> 24:44.680 maybe? Some old poor PC? Poor prices with all. M68K is not supported by RIS, which we have 24:45.560 --> 24:52.360 gems here. It's not supported either. DCC is coming. And of course, if you really want to have 24:52.360 --> 24:58.840 you know RIS driver running back you to reinvest in RIS or RIS or RIS code reinvest for your architecture. 25:04.440 --> 25:06.280 All right. Thanks all.