WEBVTT 00:00.000 --> 00:17.160 Welcome. I'll give this talk about static code analysis in the kernel. So last year, I did 00:17.160 --> 00:27.160 just talk about hardware uploading through network caps and using internal kernel APIs 00:27.160 --> 00:36.440 to push rules into hardware. I didn't talk much about last year, but this actually led 00:36.440 --> 00:45.800 to discovery of some performance issues in the kernel, which then fixed shortly after. So this 00:45.800 --> 00:56.080 was in 610 where we implemented a TC bypass feature, which is then refined further in the 00:56.080 --> 01:03.080 next release, which is essentially only enables software processing of TC rules, if all 01:03.080 --> 01:12.400 the rules don't require. It's only enabled if any rules require software processing. And 01:12.400 --> 01:18.760 then after I did this, I also worked a bit on trying to make sure that the validation of 01:18.760 --> 01:23.520 these rules were done properly, whether they could be hard to upload it or not. And this 01:23.520 --> 01:31.960 set me to poke around in five few different drivers. So one of the drivers that I poke 01:31.960 --> 01:41.960 around was the quid driver from Qilotik. And in that I found something like this, where 01:41.960 --> 01:49.600 this looks like a normal static helper function and a current current zero and a current 01:49.600 --> 01:58.520 turn two different error codes. I then had to propagate in any argument to this function. 01:58.520 --> 02:04.560 And that led me to edit all the call sites. And then I discovered the call site looked 02:04.560 --> 02:11.560 like this. So it just checked for none, several X code, and always returned the same error 02:11.560 --> 02:20.920 code. So the unabsuppe would not be possible to get exported from this function and 02:20.920 --> 02:30.680 out into use space. And all's used in this sense in turn on in the kernel. In some 02:30.680 --> 02:38.440 case where you don't specify if you use software hardware, then it will try to do hardware. 02:38.440 --> 02:43.440 If it can, it will say that it's hard to upload it. And otherwise it'll just influence 02:43.440 --> 02:55.600 the software. And it depends on getting the unabsuppe for that to work. So then I found 02:55.600 --> 03:00.880 three different bugs of this kind within this driver, where they had all been there from 03:00.880 --> 03:14.000 the initial commit off this functionality. And in some of the cases, it wasn't an issue yet 03:14.000 --> 03:19.360 because the helper function only did return one error code. And therefore it was fine that 03:19.360 --> 03:27.640 it was all written for the out. But in another case, it did cause an issue later on because 03:27.720 --> 03:35.400 the reviewers didn't really catch that adding a new return code in a function didn't 03:35.400 --> 03:44.160 necessarily propagate all the way through. So I don't blame the review for not catching this. 03:44.160 --> 03:52.360 And in general, if you intend on to use a function in this sense, it's better to have it 03:52.360 --> 03:57.720 be able to enroll you by it only has two extra codes when you just have it in a simple 03:57.720 --> 04:08.120 if statement. So they have been a bunch of other research on this topic. Unfortunately, 04:08.120 --> 04:15.120 the research team from Reconcentre University that have done this have focused only on 04:15.200 --> 04:23.200 the file system and the block layer. So they haven't looked into other subsystems. And 04:23.200 --> 04:32.240 they're not active anymore. And this book that I found was in Houston, Susan 18, but still 04:32.240 --> 04:37.120 been there for quite some time. So therefore I thought it was fun to see if I could do some 04:37.120 --> 04:44.720 static analyzes and detect these because they seem pretty simple to detect. So for doing that, 04:44.800 --> 04:56.000 I've looked into a tool called SPAS, SPAS. It's a tool written by Lune Souss, way back in order 04:56.000 --> 05:04.400 to help him vet patches for the Lune's kernel. And it is also standalone static analyzes tool 05:04.400 --> 05:12.240 that has some building checks. And more or less, the whole Lune's kernel today is clean in this 05:12.240 --> 05:19.840 tool. And by using this code base for doing this, I also know that this code base works well 05:19.840 --> 05:28.880 with the kernel. So I know that it works with the whole code base and all this special ways 05:28.880 --> 05:40.720 that the kernel abuses GCC. So it has a bunch of binaries in the code base, which are useful 05:40.720 --> 05:56.720 inspiration. And the initial commit was this from, yeah. It's not so simple anymore, but it's very 05:56.720 --> 06:05.840 easy to run with K-built. So you can just do a check equals SPAS. And then it will run SPAS for 06:05.840 --> 06:11.840 area of single source file. And this is frequently done in the kernel tree. But it's also very easy 06:11.840 --> 06:21.280 to replace SPAS here with your own SPAS base tool that will do some analyzes. So that's the way 06:21.280 --> 06:29.200 I hooked it in. Unfortunately, this runs one per source file and we'll come back to that data. 06:29.200 --> 06:38.240 But yeah, SPAS has a function looks like this within SPAS. If you dump the instructions, 06:39.280 --> 06:45.040 this is internally we represent a lot of structs and unions that are all in the find with a lot of 06:45.040 --> 06:52.240 pointers, which makes it a bit hard to dump. But more in the later. But this is basically a 06:52.320 --> 06:58.880 returned step in the bottom that returns a register. Then register is set to zero or to an error 06:58.880 --> 07:11.600 code. And up top is a code to a register, which a register is a time from a code for a static function. 07:11.600 --> 07:18.640 And then based on the return code, we choose which branch to take. So this is a simple case 07:18.640 --> 07:23.920 where we can analyze that we only have two possible return codes from this function. This is the 07:23.920 --> 07:29.520 only the brief part of the function that this mentioned on top to fit on this slide. 07:31.360 --> 07:39.280 This is a more complicated code where it's a function pointer that's called where you need to know 07:39.280 --> 07:46.960 which structs is reading in and of which offset you have the different functions defined. 07:47.200 --> 07:54.640 Which is a bit more interesting to, so I've left that as a later exercise. 07:58.640 --> 08:07.280 But basically in order to do an analysis of this, I haven't completely finished yet, unfortunately. 08:07.760 --> 08:14.720 The way I've gotten is that, you know, because this is called once the source file, 08:15.360 --> 08:22.400 then I need to dump all of the information that I need and then retrieve it again in a later processing 08:22.400 --> 08:29.200 stage. And most of it worrying, it's like 95% done. So probably in a week or two, I will publish 08:30.160 --> 08:39.440 how many other bugs I found, unfortunately, life happened. And there was enough time to finish it for this. 08:43.440 --> 08:49.920 But basically it's turned into a big civilisation, deceleration task in order to 08:50.880 --> 09:00.080 be able to do the analysis at a later stage, combining all of the knowledge that was found 09:00.080 --> 09:09.440 in the initial processing stage. But it does show promise and the way I've implemented those two 09:10.160 --> 09:17.600 focus on finding the stages where I know that a bug, like the ones that I have fixed already, 09:18.880 --> 09:27.520 and I will also welcome you to have this website here, where I'll post updates. And I will 09:27.520 --> 09:33.760 welcome you all to look at this a few weeks and see if there are some bugs that you can fix, 09:33.840 --> 09:41.760 because it would be awesome to get newcomers to help contribute some of those fixes back. 09:47.120 --> 09:49.440 So yeah, any questions? 09:49.920 --> 10:11.920 So I love what you're doing. This is fantastic work. Having newcomers do it may be problematic, 10:12.640 --> 10:21.040 because it does require a certain amount of big picture understanding that newcomers may not 10:21.040 --> 10:32.720 have. So the code that we have today in the tree has been tested and does work. And you may find 10:32.720 --> 10:40.560 that in some of these cases returning e-overflow or all over the other, I think that was actually 10:40.640 --> 10:47.920 great user space, because it's not expecting to get blacks. It's not been tested. 10:47.920 --> 10:58.240 We just terrified. So and we may well have situations where we are deliberately throwing away 10:58.240 --> 11:02.800 an error code and returning a different error code. We actually go to all the time in the file system 11:02.800 --> 11:11.040 world. We guess back some very specific errors from scuzzi devices. The scuzzi error return 11:11.040 --> 11:17.520 code, I think in return like 300 different errors. And 99% of the get mapped EIO. And that is 11:17.520 --> 11:23.120 entirely on purpose, because user space cannot cope with, oh well what's actually going on here is 11:23.120 --> 11:28.000 that there's a medium error and blah blah blah, you space can't cope. It just needs to know EIO 11:28.080 --> 11:34.640 this didn't work. Of course. And there are many cases like that. But part of the minimizing 11:34.640 --> 11:44.800 false positives is also to begin with just focusing on taking a looking at this function, 11:44.800 --> 11:49.040 this helper functions, how many places are called. It's all of the places that it's calling 11:49.040 --> 11:55.760 only treating it as a blur. Then it's worth looking into if that can be changed over the return 11:55.840 --> 12:04.000 isably in. If there's only one core site like the functions that I have here, then it should 12:06.320 --> 12:14.640 return by the whole spectrum off them, otherwise it's a bug. So I'm trying to minimize the 12:14.640 --> 12:21.200 amount of false positives by not targeting finding all the bugs, but targeting finding the 12:21.200 --> 12:32.160 bugs that I find are actually probably bugs. And this was in a driver. And when different drivers 12:32.160 --> 12:40.480 are not returning right error codes as the rest of drivers do, then use space which normal 12:40.480 --> 12:47.360 support multiple drivers will normally expect to do the bad thing. That actually sets my mind 12:47.520 --> 12:53.520 disease to a large extent. Thank you. The other thing about trying to get more people to submit 12:53.520 --> 12:58.960 the patches is that if I submit all the patches, then I get blamed by not doing enough reviews. 13:04.720 --> 13:05.520 Any other questions? 13:05.760 --> 13:18.880 It's basically continuous. Can you add a comment? So it stires that you can overwrite 13:18.880 --> 13:23.520 is that you say here, the other code is okay. Yeah. Okay, you have certain. 13:23.520 --> 13:30.960 I don't have that yet. It is a little bit problematic to do because what I can see in sparse 13:31.040 --> 13:40.240 is only the actual code after the preposterous have run. So I might be able to see attributes, 13:42.320 --> 13:47.200 but I haven't looked into that, but yeah, good system. Okay, save me as close in error. 13:47.200 --> 13:52.320 Yeah. It all means that it needs to be run on many different configurations. 13:56.560 --> 13:57.360 Anything else? 13:57.360 --> 14:01.440 All right, thank you.