WEBVTT 00:00.000 --> 00:17.240 All right, so, please settle down, while more people are tricky to lean, we have 00:17.240 --> 00:28.080 max already starting with Kerala's bomb, hey, I hope, I think it's so long, yeah, I 00:28.080 --> 00:34.200 am as a, hey, I'm max, I work at a IT consistency in Munich, Germany, doing some 00:34.200 --> 00:40.920 consulting stuff, and one part was working on creating an S-bomb for the Linux 00:40.920 --> 00:47.760 cone, and with the goal to have that tooling part of the cone sauce so that it can 00:47.760 --> 00:54.600 always be generated automatically. The tool, the development part of the tool is currently 00:54.600 --> 01:02.080 in this repository, and it's to generate an S-bomb, you just take the sauce fee, the 01:02.080 --> 01:07.880 object fees or the build output, and it will generate a S-bomb for you, and the vision 01:07.880 --> 01:12.760 is what's in its merge, and once it's contributed, you just call make S-bomb, and you 01:12.760 --> 01:21.200 have the S-bomb of the current cone, but let me explain how we get the metadata or the build 01:21.200 --> 01:30.940 graph for that S-bomb. The Kerala build, so that's a specific thing within the 01:30.940 --> 01:41.760 Kerala build, is that it generates .zmd files, this Java, this .zmd files here, they contain 01:41.760 --> 01:47.800 metadata about the build and about what happened, and we pass them, and an example for 01:47.880 --> 01:53.320 that is, so these are two examples, they are either just contain the make come up that 01:53.320 --> 01:59.840 was used for that specific target, or they even contain additional source and dependency 01:59.840 --> 02:11.320 information. If we look into that, it's, that's an example, it's the VM Linux file, and 02:11.320 --> 02:19.720 the corresponding .zmd file is shown here, and you see it calls Aldi, it has a lot of 02:19.720 --> 02:26.120 .ofs, as inputs, and it builds as an output, the VM Linux file, and for that we see, we 02:26.120 --> 02:33.560 can start constructing the graph of dependencies or file level dependencies within the Linux 02:33.560 --> 02:44.040 file. Another example is, here, the Kerala info.0, and that's one of the examples where 02:44.040 --> 02:51.000 the dependencies are already mostly extracted by the K-build build system, so it says, 02:51.000 --> 02:56.640 that's my source file, it's the .s file, and that are my dependencies, some .h files, so 02:56.640 --> 03:09.360 it's easier to extract the graph from the build, and additionally to these two cases, where 03:09.360 --> 03:15.360 we have the metadata, oh, I start slowing down, I see I was a little bit stressed in the beginning, 03:15.360 --> 03:23.960 so I'm not getting relaxed, right, additional to the .zmd files that we need, or that we have, 03:23.960 --> 03:30.040 where the data is already present, there are two cases where we need to fill some gaps, 03:30.040 --> 03:41.240 there are .appercase s files that contain .inkbin, I think that's include binary statements, 03:41.240 --> 03:46.920 these we need to pass them by hand, and add them to the build graph that we are building, 03:47.000 --> 03:53.000 and these files that are included, then again, have .zmd files and we can follow the graph 03:53.000 --> 04:00.840 even further, and there are some other cases where we currently still have to hard count the 04:00.840 --> 04:09.400 dependencies, so that there are some gaps that we still need to fill manually, but we hope that 04:09.480 --> 04:18.520 we can improve the cable script, and other tools to fill the gaps and get rid of this annoying 04:18.520 --> 04:26.280 ugly hard coding at some places, but I always link the issues in this light, if you download 04:26.280 --> 04:33.480 them online afterwards, so that's basically the first part of the presentation already done, 04:33.480 --> 04:40.920 I think I am too fast, but yeah, more time for questions, so we build a graph from the 04:40.920 --> 04:49.320 curl build by looking through the .zmd files, these two cases that I have shown, we have the 04:49.320 --> 04:56.360 include binary statements and some hard coding and that results in this build graph, and 04:57.240 --> 05:03.560 we have built tooling just to understand what's happening, but can visualize that, if you 05:04.120 --> 05:12.440 run the scripts, it's JavaScript, Sumable Visualization of the kernel build, and you can find out 05:12.440 --> 05:19.800 how does it all relate, and we also worked on validating if it is complete, since we might miss some 05:19.800 --> 05:30.120 stuff, so we did, we compared our data against an output of an s-traced kernel build, and saw that we 05:30.120 --> 05:39.160 have a 99.6% overlap, so it's still not 100%, so there are still some gaps, but the s-traced build 05:39.160 --> 05:48.600 also overreport some parts, so we are still in a moving towards 100%. And the second thing that we did 05:48.680 --> 05:54.680 is we just removed all files that are not listed and the kernel still built, so that's also a good sign. 05:59.560 --> 06:07.960 Yeah, that was part one, that's the data and how we get that, how we build the graph, and 06:08.040 --> 06:18.840 part two is what's to be generated from that data. We are generating spdx files as s-boms, 06:21.400 --> 06:28.680 I think we have heard a lot about spdx today already, and matching what we have heard today, 06:28.680 --> 06:36.120 we are generating three different s-boms, the output s-bom, I think that's not one of the listed 06:36.200 --> 06:43.880 names of types, but it's this single s-bom that describes the package that contains the 06:43.880 --> 06:50.280 commetadata and contains just the high-level information, it is very small and can be shared easily, 06:50.280 --> 06:56.200 it has some of the essential hashes, it has some of the essential metadata, and it's just the 06:56.920 --> 07:02.280 small thing that can be shared. We have the source s-bom that contains the final level information, 07:03.000 --> 07:09.400 and we have the build s-bom that represents the whole graph that links the sources to the final 07:09.400 --> 07:17.240 output, and yeah there's an edge case, the to distinguish between what the source, what is 07:17.240 --> 07:24.040 the immediate file way we need to have an out of three builds so that these two parts are 07:24.040 --> 07:31.720 than two different directories, that's how we distinguish them. So yeah and just to have also 07:31.720 --> 07:40.520 as light with way to small font, that's how internal structures it's a lot, but I have split it up 07:40.520 --> 07:46.680 into all the individual pieces, and that's the remainder of my presentation just to explain how 07:46.840 --> 07:54.200 that all looks in detail. So the source s-bom that contains the final information it's basically 07:54.200 --> 07:59.960 filled with the hashes of the individual files, some of the statically extractable information like 07:59.960 --> 08:08.600 SPDX license identifiers that are in the file, and we do some basic horrific to guess what type of 08:08.600 --> 08:19.240 file it is if it is a source file if it is a asset, all these things are tried to guess, and that's 08:19.880 --> 08:26.040 a simple entry of one of these files, just what I've said, machine-reelable s-tracing LD, 08:27.640 --> 08:33.640 and the source s-bom is basically a huge list of these, listing exactly the used files, 08:33.720 --> 08:41.880 and they are linked to the licenses with the has declared license relationship since this is part 08:41.880 --> 08:53.000 of the graph structure within SPDX. And on the other hand, on the other side of the spectrum, 08:53.000 --> 09:01.320 there's this output s-bom that contains the software package for the Linux kernel for all the 09:01.320 --> 09:12.200 modules that were contained in the build, and these packages are linked to the files that 09:12.200 --> 09:17.960 represent the packages with the has distribution artifact, so that's the first bridge from 09:17.960 --> 09:24.440 the metadata to files in the file system, and these distribution artifacts are also the tip of the 09:24.440 --> 09:35.320 build tree in the end, and that's the high level build element, since we also 09:35.960 --> 09:47.160 generate the build structure, that's the top of the build tree, it describes the, so it's basically 09:47.240 --> 09:56.360 the entry point of the top level build, but to get a deeper, there's the build s-bom in between 09:56.360 --> 10:04.120 that describes all the real details or the what command was used, what which files depend on which 10:04.120 --> 10:10.440 file, so this is then what this encodes the whole tree, and this is in the middle file, 10:10.840 --> 10:20.200 and all these small builds that are listed or the build objects that are just encoding a single 10:20.200 --> 10:28.600 command that was run during the build, they are linked with the ancestor of relationship to the high level 10:28.680 --> 10:42.840 build, and that's an example of that data, it's a build step, it has a comment that describes 10:42.840 --> 10:50.440 what was the command that was used, and it has two relationships which link the inputs on the 10:50.440 --> 10:56.520 one hand, and which link to the outputs on the other hand, so in the end it's build element that 10:56.600 --> 11:03.640 has these two relationships, and that links files in the file system together, and that builds the graph, 11:06.760 --> 11:17.240 so that's the file structure, and that contained a lot of details that I just went over pretty fast, 11:18.200 --> 11:25.880 maybe they are questions in the end, but yeah what's next with that project, then the big goal 11:25.960 --> 11:36.920 currently is to get it into the kernel source to get it into a sub directory in the kernel sources, 11:36.920 --> 11:49.560 and make it part of that, there's currently a contribution in process, and if that is 11:50.520 --> 11:59.080 merged in the end, you can just after a successful kernel build called make SPX, no, 11:59.080 --> 12:02.600 make S bomb that, that's a type of it should be make S bomb is the command in the end, 12:03.720 --> 12:09.080 I can, I don't fix that right now, but that's the command in the end, 12:10.280 --> 12:14.680 and that is also described in the email conversation that is currently happening, that's the 12:15.400 --> 12:21.000 current contribution that is in progress where we are discussing with maintainers, 12:21.880 --> 12:27.560 how to, how to adopt what to fix and how to get it aligned with the expectations, 12:28.680 --> 12:37.880 and we hope that we soon get green light and that it gets merged, not sure at which 12:38.040 --> 12:45.240 release it will arrive in the end, and further next steps, we are interested in feedback, 12:45.240 --> 12:53.000 if you want to look at the output files, the CI is generating BAM for 203 example builds, 12:53.000 --> 12:59.160 that are uploaded as assets to the CI, so you can look at examples, you can investigate examples, 13:00.040 --> 13:10.600 we are thinking about broadening the support for architectures, since we don't have for all 13:10.600 --> 13:14.520 architecture and analyze what are the gaps, what do we need to get to 100%, 13:15.640 --> 13:21.800 so there's more work to do, and we want to understand how this could integrate with other 13:22.600 --> 13:29.720 buildings, for example, if a Yachter builds something containing the kernel and itself builds 13:29.720 --> 13:35.400 aspects, how to bridge the gap, how to interlink them, and how to have them pointing at each other, 13:35.400 --> 13:42.120 I see a thumb up, yay, and that's it, on the right, that's the QR code to the repository, 13:43.000 --> 13:49.960 and on the left that are my coordinates, you can reach out to me, and now questions, 13:51.800 --> 14:09.160 so BSI is asking for 512 hashes, is it possible to potentially parameterize it so people can 14:09.240 --> 14:20.920 generate the 512 hashes automatically? We are using, so, we're using 256, yeah, 14:23.640 --> 14:29.160 for sure it's possible, we are using the native Python support for generating hashes, 14:29.160 --> 14:33.160 the whole library is built without any external dependencies, that was a requirement, 14:33.160 --> 14:41.000 that's why we use no other open source, SPDX2, but yeah, it's I think the support is there, 14:41.000 --> 14:47.160 if we just need to do a minus switch, or we can add parameters, yeah? 15:03.160 --> 15:14.920 So the question was, what do I think about incompatibility, compatibility 15:14.920 --> 15:25.000 about around the ecosystem, so for example, if we go to Yachter and the Slainx kernel build, 15:25.000 --> 15:39.640 or what ecosystems do you want to compare, from SPDX to CDX, for example, the conversion between 15:41.080 --> 15:51.160 S-POM ecosystems, I think it's a hard problem, I think we, for example, the build tree that is 15:51.160 --> 15:56.520 encoded here, that has an instance of relationship that links the high level built in the 15:56.520 --> 16:03.480 low level builds, which is a concept, I don't know whether it is, there's something that it can 16:03.480 --> 16:15.160 compare to, it's a dependency in Cycle and DX, but at the same time, tool that's what's 16:15.160 --> 16:26.280 understand, dependencies, okay, Anthony, your name was Anthony, right? Anthony said that it would 16:26.280 --> 16:33.000 be a dependency in CDX and you would need to add a comment or metadata to to encode that it's 16:33.960 --> 16:43.320 basically a dependency of builds and other dependency of dependencies, so it's, um, there's a lot to do, yeah? 16:43.320 --> 16:51.320 Yes, the relics kernel has put the SPDX ideas in the source code, yeah, and potentially to this 16:51.320 --> 17:00.520 relicating syntax, a GBL to the zero instead of the zero only, or only, or like, so I try to 17:00.520 --> 17:11.320 convince them to correct that, because we don't care. But even on the SPDX, the question was that 17:11.880 --> 17:19.480 the Linux kernel still uses the GPL 2.0 as, uh, identifies, which are, deprecate, it's deprecated 17:19.480 --> 17:27.960 the right route, but they are still correct, valid. Yeah, but the kernel people want to be short. 17:28.760 --> 17:38.040 Yeah, um, I can repeat the answer, the FSF requires the, there's only the curl wants to keep it short, 17:38.040 --> 17:44.280 but I'm still thinking that GPL 2.0 is still not valid, I identify, but the, deprecate is still 17:44.280 --> 17:53.080 valid as PDX, yeah, any other questions, uh, in the back there? So, the architecture differences, 17:53.080 --> 17:59.400 what are the main features for making this work for, say, how will PC you work this time? 18:00.440 --> 18:06.120 Um, it's this completeness analysis, um, what are the issues with getting it to work for power, PC, 18:06.200 --> 18:13.720 and risk five? The issues are that, um, some of the tools behave differently, um, we have, 18:13.720 --> 18:21.800 I mentioned that we are passing the, uh, the commands, so this example here, so there's a command 18:21.800 --> 18:26.600 string that are, that we are passing, if they are different tools and what we need to write different 18:26.600 --> 18:31.560 passes that support these tools, that is one, um, problem, and the second problem is that, 18:31.800 --> 18:37.800 um, there might be more things to be hard-coded, there might be more edge cases to be supported, 18:37.800 --> 18:44.680 so they, all the completeness analysis needs to be done, and it's, it works, but there's a less 18:44.680 --> 18:49.560 guarantee that it's complete, and it might run into problems, if commands cannot be passed. 18:50.360 --> 18:54.200 They include binary things, and your assembly is different in the other assembly. 18:55.400 --> 18:57.400 So, thank you very much, Max. 19:01.560 --> 19:03.560 Thank you for giving us your work going.