WEBVTT 00:00.000 --> 00:02.000 You 01:01.000 --> 01:02.000 Thank you very much 01:05.000 --> 01:14.400 Well, this is for the recording. Yes, again. Thank you for coming. I'm not going to be repeating the whole things right so we have 01:18.500 --> 01:21.000 We had received lots of lots of 01:22.000 --> 01:29.200 Presentations idea. We cannot fit more we even got a full day that was amazing by the way 01:31.000 --> 01:33.000 Please softer 01:33.000 --> 01:36.000 Today or at some point in the next weeks and stuff like that 01:36.500 --> 01:43.800 Right to the first the organizers and say if you like this workshop to continue and stuff like that because that's the way that they decide 01:44.000 --> 01:46.000 which rooms get which 01:47.000 --> 01:53.200 Beverem proposals get accepted and which rooms they get and which days they get or slots and stuff like that 01:54.000 --> 01:56.000 so yeah, 01:57.000 --> 01:59.000 we usually have 01:59.000 --> 02:01.000 30 minutes slots 02:05.000 --> 02:10.000 But in order to fit even more we went to 02:11.000 --> 02:17.000 20 minutes slots in the middle of the day and then you'll see my organizers are here 02:17.000 --> 02:19.000 Here 02:27.000 --> 02:29.000 So you'll get this after with the way 02:37.000 --> 02:46.000 Very basic information about the speakers. There's a camera there. There's a microphone here. You should be talking to the camera. You should be speaking to the mic 02:46.000 --> 02:48.000 Everything is recorded everything is streamed 02:49.000 --> 02:53.000 So yeah, please take care of that you should stay on the 02:54.000 --> 02:59.000 Correct side of this red line here because otherwise you'll get out of the picture and 03:01.000 --> 03:04.000 I think that's the basics 03:05.000 --> 03:07.000 For our staff 03:08.000 --> 03:10.000 I'm not sure we want to go through 03:11.000 --> 03:15.000 Oh, and the times that you see here 03:16.000 --> 03:20.000 As you probably see the end time is the start time of the next one which is not correct, right? 03:21.000 --> 03:25.000 So the end time is not when a speaker should end the talk 03:26.000 --> 03:33.000 The speaker should have ended all the questions would have done the speaker who are left people who have left and then we're starting the new one 03:34.000 --> 03:37.000 So imagine that all the end times are five minutes 03:40.000 --> 03:45.000 Way into the future so yeah, and that's why instead of nine ten 03:46.000 --> 03:52.000 It's already nine seven and I'm out of my time. All right, so any questions? 03:55.000 --> 03:57.000 Good, perfect. Let's have a great day 04:10.000 --> 04:13.000 You 04:40.000 --> 04:42.000 You 05:10.000 --> 05:12.000 You 05:40.000 --> 05:42.000 You 05:48.000 --> 05:53.000 Hello, I don't know if it's works. It works. Okay perfect. Hello everyone 05:55.000 --> 06:03.000 Thank you Alexios and the organizers first of all for giving me the opportunity to be here today 06:03.000 --> 06:06.000 And yeah, let's get started 06:07.000 --> 06:12.000 Today I would like to start with the quick and simple analogy 06:13.000 --> 06:20.000 Imagine that you're going to a restaurant and you're going to book for dinner or lunch whatever 06:21.000 --> 06:27.000 And you suddenly start to see that each restaurant has their own unique system for ratings 06:27.000 --> 06:35.000 So for example, you can see we have maybe some of them use stars maybe some use numbers 06:36.000 --> 06:41.000 Maybe some of them uses emojis words and so on 06:42.000 --> 06:50.000 The question is how do you even compare I mean this is the actual the same challenge that we are facing 06:51.000 --> 06:55.000 When dealing with crypto algorithm detection 06:58.000 --> 07:04.000 And yeah, basically this blocks good security and compliance and that's why 07:05.000 --> 07:11.000 Our work with desploms and cryptos and the recession is so important 07:12.000 --> 07:20.000 I'm at the as I'm a software engineer. It's kind of a says where we provide open source software intelligence 07:21.000 --> 07:25.000 By giving access to our knowledge base to be corporations 07:27.000 --> 07:36.000 And today I would like to talk you about two big updates that we were talking about working in the last couple years 07:36.000 --> 07:42.000 On the first side, the release of the crypto algorithms opened data set 07:43.000 --> 07:51.000 And on the other side, the decision of SPDX to adopt it as an actual standard 07:52.000 --> 08:01.000 And by the end of this talk, I hope you learned how to actually leverage this data set to make your work easier and also 08:01.000 --> 08:07.000 To potentially stop building the same tools over and over again 08:11.000 --> 08:15.000 To begin with, I would like to talk about the impact and why 08:17.000 --> 08:20.000 Sunderized crypto identification matters at all 08:21.000 --> 08:25.000 On the first side, we have a couple key stakeholders which might be 08:26.000 --> 08:34.000 Trade compliance teams, security teams, companies which are increasingly concerned about post quantum crypto 08:35.000 --> 08:40.000 And maybe because all the team requirements tend to grow year over year 08:41.000 --> 08:44.000 But here's something that is crucial 08:45.000 --> 08:47.000 Standardization saves money 08:48.000 --> 09:00.000 And when we started this journey, we were spending significant resources on maintaining and updating our crypto detection methods 09:01.000 --> 09:09.000 So this doesn't only save us at Sunderized money, but also helps the community to be more efficient 09:09.000 --> 09:16.000 This is our proposal for the community 09:17.000 --> 09:23.000 So we have a crypto algorithm, the definition list, which has a simple data structure 09:24.000 --> 09:29.000 Which is written in a machine readable format, so it's an extensive old 09:30.000 --> 09:38.000 And we even have some reference code, so you can check that out and use it as a starting point 09:39.000 --> 09:45.000 But the important thing here is that this is not only something that's theoretical 09:46.000 --> 09:55.000 It's also been better tested by us in production, it's kind of billions of files and also helping big organizations 10:00.000 --> 10:06.000 So these are the key milestones that we're rich so far in the last couple years 10:06.000 --> 10:14.000 And let's go deeper into each one of them 10:15.000 --> 10:21.000 In 2021, customers started to ask us if we could have them 10:22.000 --> 10:28.000 Identifying the crypto algorithms that were present in their open source projects 10:29.000 --> 10:31.000 And at first, it sounds simple, right? 10:31.000 --> 10:35.000 I mean, yeah, but it's not 10:36.000 --> 10:44.000 Because think about scale, when you're scanning billions of files and you have thousands of projects 10:45.000 --> 10:49.000 You have a big problem because you have to be as efficient as possible 10:50.000 --> 11:00.000 But we recognize that this wasn't just an internal need, this was just actually coming from real customers and real use cases 11:01.000 --> 11:12.000 We started with keyword matching, for each crypto algorithm we created that definition file 11:13.000 --> 11:22.000 Which contains some attributes such as the algorithm ID, I don't know if you can actually see anything in the screen or if it's too small 11:23.000 --> 11:25.000 But I'll just read it out 11:26.000 --> 11:37.000 So for each algorithm, we created this kind of definition file which contains the algorithm ID, the name, the security strength, if any 11:38.000 --> 11:47.000 And what's most important, the keywords list, which we'll do you see afterwards to actually do the matching 11:48.000 --> 11:58.000 So as simple as it may sound, we realized that this was quite effective for large-scale scanning 11:59.000 --> 12:04.000 And it also allowed us to be quite precise on the detection 12:04.000 --> 12:14.000 Because we could differentiate between AES-128, AES-256 and so on 12:17.000 --> 12:21.000 So something interesting happened 12:22.000 --> 12:28.000 Customers started asking, what about our known open source projects 12:29.000 --> 12:36.000 And also, a lot of crypto libraries and frameworks started to pop up 12:37.000 --> 12:45.000 So we were not able to keep the data set updated as fast as we needed to 12:46.000 --> 12:54.000 So we needed that that we have something valuable and why keep this private 12:55.000 --> 13:02.000 The community was being already involved, our customers were already helping us 13:03.000 --> 13:09.000 In the data set update their rules and so we just needed to open the door 13:09.000 --> 13:11.000 And so with it 13:13.000 --> 13:17.000 In 2020-24, we made two big moves 13:18.000 --> 13:22.000 On the first side, we released the data set under the CCC relations 13:23.000 --> 13:26.000 Which is as close as to public domain 13:27.000 --> 13:30.000 There is in terms of copyright law 13:30.000 --> 13:41.000 And the second one is that we were being recognized as the default standard for credit 13:42.000 --> 13:50.000 We started to talk with SPDX to collaborate and being adopted as the actual standard 13:50.000 --> 14:01.000 So as you know, SPDX has a license list so basically this will be going to be the same 14:02.000 --> 14:04.000 But in terms of crypto detection 14:06.000 --> 14:12.000 So I have a small demo, I was going to do live but you know something things happened 14:13.000 --> 14:15.000 So I prefer to just 14:16.000 --> 14:19.000 You have there a link to the repo 14:20.000 --> 14:23.000 If you want you can also use the QR code 14:24.000 --> 14:30.000 And I will show you some screenshots of what it looks like 14:31.000 --> 14:38.000 So basically inside the repo of the same in the same repo that the data set is present 14:39.000 --> 14:45.000 We have an example script for the actual detection that leverages the data set 14:46.000 --> 14:54.000 And it also created a demo folder with some useful examples that will outline some challenges that I will talk later 14:59.000 --> 15:04.000 And basically it's very simple you have to just execute the script 15:04.000 --> 15:13.000 And you pass the folder that you want to scan and you'll get basically Jason Files a result 15:14.000 --> 15:20.000 Which will contain the files where the crypto algorithms were found 15:21.000 --> 15:26.000 As well as the definition file for each and which keyword was the actual match 15:27.000 --> 15:32.000 So very very simple it's not the any fancy integration or anything like that 15:32.000 --> 15:38.000 So being very simple it presents some challenges 15:39.000 --> 15:46.000 So as you can see here in this example that you can check out in the repo 15:47.000 --> 15:56.000 We have for example keywords that are being matched but are in a completely different context 15:57.000 --> 16:03.000 Because for example as you can see we have a match with for tuna which is a crypto algorithm 16:04.000 --> 16:09.000 But in the code is not being actually used in a crypto context 16:10.000 --> 16:13.000 So sometimes a keyword just a coincidence 16:14.000 --> 16:20.000 And the other challenge is that sometimes you have comments in your code 16:20.000 --> 16:27.000 That actually doesn't do in this case well it doesn't do anything because it's actually all commented out 16:28.000 --> 16:36.000 But sometimes you have comments that are misleading or actually they call that something that it's actually the opposite of what the comment does 16:40.000 --> 16:41.000 So 16:41.000 --> 16:45.000 Looking ahead tuna 25 seems to be very exciting 16:46.000 --> 16:50.000 We are going to move the data set to the software transparency foundation 16:51.000 --> 16:55.000 We are going to see new implementations and new in different languages 16:56.000 --> 17:01.000 And we are going to see the community taking ownership 17:04.000 --> 17:05.000 So 17:06.000 --> 17:10.000 Usually in open source projects tends to get better as more people help 17:11.000 --> 17:18.000 So even if we have different skills and experiences we want you to be involved 17:19.000 --> 17:22.000 So we have a couple ways that you can help us to improve the data set 17:23.000 --> 17:32.000 You can even create new implementations or share real world use cases or even explore if AI has any role to play here 17:32.000 --> 17:41.000 To regarding the context awareness and basically there are more ways of course 17:42.000 --> 17:47.000 And yeah 17:48.000 --> 17:58.000 At the beginning of the dog I mentioned personal writings and in open source we try to work together to collaborate 17:59.000 --> 18:02.000 But to do this effectively we need to speak the same language 18:03.000 --> 18:07.000 So this is what that is what this is about 18:08.000 --> 18:13.000 We have this work in solution and we know it helps and we want you to get involved 18:14.000 --> 18:15.000 The repo is open 18:16.000 --> 18:21.000 The tools are there and we are really looking forward to what you are going to build it 18:22.000 --> 18:25.000 And basically that's it 18:26.000 --> 18:30.000 Thank you very much and if you have any questions please ask me 18:32.000 --> 18:33.000 Question 18:34.000 --> 18:39.000 And so thank you very much and actually I still do not really understand the use case 18:40.000 --> 18:46.000 So the very long aspect is do not recommend your hash algorithm or crypto to use a library 18:46.000 --> 18:51.000 A group library that has a set of issues that can find those 18:52.000 --> 18:58.000 And probably you don't find the communication that we are flying in show use of any library 18:59.000 --> 19:03.000 So why would someone be an interest in it 19:05.000 --> 19:07.000 I don't know if I understand correctly the question 19:09.000 --> 19:13.000 The question was why would we actually use this 19:13.000 --> 19:15.000 Yeah, what's the use case 19:17.000 --> 19:19.000 What yeah, I was in 19:20.000 --> 19:23.000 For example your your my control department you will 19:26.000 --> 19:31.000 To know all the crypto algorithms that you have in your composition before you can actually 19:32.000 --> 19:36.000 Sheep the product to start in countries because you don't 19:36.000 --> 19:38.000 But it doesn't make sense 19:39.000 --> 19:42.000 Yes, that's one example the other one security compliance there are certain security compliance 19:43.000 --> 19:47.000 And then where that requires what some people are starting to call a sebum 19:50.000 --> 19:57.000 Another important one is that in order to understand the end of life 19:58.000 --> 20:03.000 What your product you need to understand the end of life of the crypto side of product 20:03.000 --> 20:05.000 There is a lot of people