WEBVTT 00:00.000 --> 00:02.000 You 00:30.000 --> 00:32.000 You 00:48.000 --> 00:50.000 Green 00:50.000 --> 00:52.000 Green is good 00:52.000 --> 00:54.000 So rusty 00:54.000 --> 00:59.000 Thank you 00:59.000 --> 01:02.000 That's a completely unnecessary but we'll take it 01:02.000 --> 01:09.000 So yeah, I was just saying we're sorry about the just face stuff. We're a little bit rusty. We haven't spoken since about 2018 when we did the package management 01:09.000 --> 01:13.000 Devroom which I think you've been smashed into doing next year 01:13.000 --> 01:15.000 Back in the day, everybody's coming back to foster 01:15.000 --> 01:18.000 I just dumped you in that you can punch me later 01:19.000 --> 01:20.000 So thank you all for coming 01:20.000 --> 01:24.000 I appreciate it. It's a really long event and this is the last session of a very long event 01:24.000 --> 01:27.000 And that I'm standing between you and most likely a beer 01:27.000 --> 01:30.000 But yeah, thank you for listening 01:30.000 --> 01:34.000 And thank you all so those of you who don't speak English as your first language 01:34.000 --> 01:36.000 I have a tendency to talk too quickly 01:36.000 --> 01:38.000 This is the last session 01:38.000 --> 01:42.000 It's likely that we'll overrun and that they'll be Q&A at the end for which we have plenty of time 01:42.000 --> 01:47.000 So if I'm talking too quickly, just put your hand up and I'll start talking a little less quickly 01:48.000 --> 01:52.000 But yeah, thank you for coming and seeing us 01:52.000 --> 01:54.000 And the other 01:54.000 --> 01:55.000 Oh god 01:55.000 --> 01:56.000 Really 01:56.000 --> 01:58.000 Okay, so 01:58.000 --> 02:02.000 I did write a bit in this but it's a bit silly 02:02.000 --> 02:05.000 It says I kind of feel like Elton John closing down glass library 02:05.000 --> 02:09.000 Which is basically just a way of introducing who we are and where we're from 02:09.000 --> 02:11.000 So Andrew 02:11.000 --> 02:15.000 Lifts about ten minutes away from glass library the world's biggest festival 02:15.000 --> 02:17.000 And I live ten minutes away from nowhere 02:17.000 --> 02:19.000 Because I'm in the middle of nowhere in structure 02:19.000 --> 02:23.000 We've been working on open source for about ten years 02:23.000 --> 02:26.000 All started with heart bleed 02:26.000 --> 02:33.000 And Meredith Whitaker and Ben Laurie who pulled me into some conversations about where the next heart bleed comes 02:33.000 --> 02:39.000 And I met Andrew, I think I was talking a thing and you were building a system called 02:39.000 --> 02:41.000 The time was it called libraries? I think it was, wasn't it? 02:41.000 --> 02:42.000 Yeah 02:42.000 --> 02:47.000 That was about being able to point people at good open source projects to kind of contribute to 02:47.000 --> 02:50.000 Years later, you know, we've worked on very many projects 02:50.000 --> 02:54.000 Most recently working on ecosystems, which was a little bit about 02:54.000 --> 02:56.000 But it's not a sales picture or anything 02:56.000 --> 02:59.000 We kind of have to in order to get the talk done 02:59.000 --> 03:03.000 But yeah, we've been working for ten years together 03:03.000 --> 03:05.000 And we got bugs 03:05.000 --> 03:10.000 So on the left is Leo and on the right is Mabel 03:10.000 --> 03:15.000 I've got one dog, you don't, you have, is that Luna or Felix? 03:15.000 --> 03:19.000 That's Felix, that's blossom, the wookie 03:19.000 --> 03:22.000 That's Luna, the latest to do 03:22.000 --> 03:25.000 So I just have the one dog Andrew's got four 03:25.000 --> 03:30.000 So our talk is labelled 03:30.000 --> 03:33.000 I'm not afraid of a bit of clickbait, I'm very sorry 03:34.000 --> 03:36.000 But I will defend it if necessary 03:36.000 --> 03:42.000 But the honest fact is that we're treating this genuinely as a kind of research question 03:42.000 --> 03:46.000 So after saying, you know, and working in this space for ten years 03:46.000 --> 03:51.000 Are we in a closer saying that we are supporting our digital infrastructure? 03:51.000 --> 03:55.000 And we're going to treat this as kind of two hypotheses 03:55.000 --> 03:58.000 The first part is going to look at our opinion is 03:58.000 --> 04:02.000 Which is that usage is the best determinant of software criticality 04:03.000 --> 04:05.000 Which we hope to convince you of, we might not 04:05.000 --> 04:08.000 But genuinely, you know, we're here to try to convince ourselves 04:08.000 --> 04:11.000 We've already kind of convinced ourselves, but that's that 04:11.000 --> 04:14.000 And then part two, if you assume that that is true 04:14.000 --> 04:17.000 We believe that most funding that we see 04:17.000 --> 04:19.000 And there's a very specific word in there 04:19.000 --> 04:22.000 Is being misdirected 04:22.000 --> 04:26.000 It is not being, it's not used to fund critical infrastructure 04:26.000 --> 04:28.000 It's going elsewhere 04:29.000 --> 04:32.000 So what methods do we have at the moment? 04:32.000 --> 04:34.000 This is looking at the first part 04:34.000 --> 04:39.000 So is usage the best determinant of critical kind of open source project 04:39.000 --> 04:44.000 What methods do we currently use for identifying critical infrastructure 04:44.000 --> 04:48.000 Both within an organisation and within the border landscape 04:48.000 --> 04:51.000 And we've come up with basically two methods that we see in the world 04:51.000 --> 04:53.000 Which we're characterising thusly 04:53.000 --> 04:57.000 The first is the defining rod method in which one person 04:57.000 --> 05:00.000 An organisation company or maybe an individual 05:00.000 --> 05:03.000 Basically decides, decides what's critical 05:03.000 --> 05:05.000 Decides what's, you know, important in the world 05:05.000 --> 05:09.000 Just kind of looks using the cross sticks and says that that's that 05:09.000 --> 05:12.000 All we do the same thing that we have a group of people in the organisation 05:12.000 --> 05:14.000 Which we call the wage board method 05:14.000 --> 05:16.000 In which you all place a hand on the wage board 05:16.000 --> 05:19.000 And one of us is pushing a little bit more than the other 05:20.000 --> 05:23.000 And we end up, yes, towards your own projects 05:23.000 --> 05:26.000 I'm not going to say the name that you want to say 05:28.000 --> 05:31.000 But yeah, I believe that we can do better 05:31.000 --> 05:34.000 And I believe the evidence based methods 05:34.000 --> 05:39.000 Because, you know, ultimately we've come from a mathematical background in robotics and computer security 05:39.000 --> 05:44.000 The evidence based methods are going to be the way in which we can do this 05:44.000 --> 05:46.000 I'm going to make a point about that 05:46.000 --> 05:51.000 But what evidence based methods can we access currently and what do we see in the world? 05:51.000 --> 05:55.000 Well, we see things like popcorn, the popularity context 05:55.000 --> 05:59.000 Contest for devion packages that basically just takes a log 05:59.000 --> 06:03.000 Of who is downloading and using packages in the devion ecosystem 06:03.000 --> 06:05.000 We've got something similar in home brew 06:05.000 --> 06:08.000 Where they show you all the downloads that they've got 06:08.000 --> 06:11.000 An Andrew once scared a room full of rubyists 06:11.000 --> 06:12.000 Is that what they call them? 06:12.000 --> 06:18.000 In 2018 by doing the same in the NMPM package or was it a ruby package? 06:18.000 --> 06:22.000 I put good analytics in my JavaScript packages 06:22.000 --> 06:25.000 So that I could get real time install information 06:25.000 --> 06:29.000 And then just presented the real time analytics page 06:29.000 --> 06:31.000 During your conference 06:31.000 --> 06:33.000 Which scared the life out of everyone in there 06:33.000 --> 06:37.000 Because they didn't realize how close that you could watch all of the activity 06:37.000 --> 06:40.000 Because person's school scripts of 06:40.000 --> 06:43.000 JavaScript packages that you do whatever you like 06:43.000 --> 06:45.000 And they were all in the one room 06:45.000 --> 06:46.000 They were all in the room 06:46.000 --> 06:48.000 Which was good fun 06:48.000 --> 06:51.000 So we've got a bit of a close up of the results there 06:51.000 --> 06:52.000 Thank you very much 06:52.000 --> 06:54.000 This is incredibly impractical 06:54.000 --> 06:57.000 There are other things that we can do together data 06:57.000 --> 07:00.000 So we have something that is effectively doing that 07:00.000 --> 07:02.000 But on a larger scale scarf 07:02.000 --> 07:05.000 Which is basically a gateway in front of your own kind of download your else 07:05.000 --> 07:07.000 So that you gain access to the information 07:07.000 --> 07:09.000 Taking ownership a little bit more 07:09.000 --> 07:12.000 Or we can do the Linux Foundation's approach 07:12.000 --> 07:15.000 Which is aggregating data from faster, sneak, etc 07:15.000 --> 07:18.000 That is representing proprietary usage 07:18.000 --> 07:20.000 In an aggregate manner 07:20.000 --> 07:22.000 And they can continue to do this 07:22.000 --> 07:24.000 The Linux Foundation in 2022 07:24.000 --> 07:27.000 Published their mobilization strategy that said they want to create 07:27.000 --> 07:30.000 A data lake that companies can basically dump data into 07:30.000 --> 07:33.000 And that was one of their kind of main 07:33.000 --> 07:38.000 Screens of their report that was published just after their second 07:38.000 --> 07:41.000 I think it was meeting in the White House 07:41.000 --> 07:44.000 So you know who will like those 07:44.000 --> 07:48.000 My argument is basically regardless of how much information you 07:48.000 --> 07:52.000 Think you can collect on proprietary usage of open source software 07:52.000 --> 07:56.000 You are never going to be able to say that that's representative of the whole 07:56.000 --> 07:59.000 You are never going to borrow that ocean 08:00.000 --> 08:02.000 I don't believe that is the right approach to take 08:02.000 --> 08:05.000 I think we can basically save ourselves some time 08:05.000 --> 08:09.000 By instead trying to gather all of the data that we can see 08:09.000 --> 08:11.000 The available data 08:11.000 --> 08:15.000 Be able to demonstrate a correlation to what is 08:15.000 --> 08:18.000 Representative of overall usage 08:18.000 --> 08:22.000 And then say that that correlation indicates 08:22.000 --> 08:28.000 A strong enough indication for us to say that what we can see in the real world 08:29.000 --> 08:33.000 Leads us to believe what you know is holds for all of you 08:33.000 --> 08:37.000 Sage of open source which at this point sure that Andrew can say 08:37.000 --> 08:40.000 A lot better than me with some data 08:40.000 --> 08:45.000 Yes, now we agreed before not to make me the quant of open source 08:45.000 --> 08:50.000 But what we've ended up with is now I'm going to do the quant section of 08:50.000 --> 08:52.000 Of the talk 08:52.000 --> 08:54.000 Yes 08:54.000 --> 08:56.000 With your plan all along, isn't it? 08:56.000 --> 09:05.000 Okay, so we to be able to use all the data that we have about open source 09:05.000 --> 09:12.000 To try and get a good picture of how people use it within both within open source 09:12.000 --> 09:16.000 Projects and within closed source projects to be able to try and say 09:16.000 --> 09:20.000 Here is a global representation of usage of open source 09:20.000 --> 09:25.000 Especially from kind of like a relative perspective so you can say these projects are used more than these projects 09:25.000 --> 09:29.000 Which will help us define which projects are critical 09:29.000 --> 09:37.000 What I have been working on is collecting a normalizing data from every different software ecosystem possible 09:37.000 --> 09:44.000 And then mining the dependency data of every open source project that is available 09:44.000 --> 09:49.000 So far I have a database table in Postgres with 20 billion rows in it 09:49.000 --> 09:54.000 And it's a little bit of a headache but the value that you can get out of that 09:54.000 --> 10:07.000 And that we will explain is that we can use data from this open source usage to imply data about proprietary usage as well 10:07.000 --> 10:13.000 By looking at correlations with other measures that include proprietary usage of open source software 10:14.000 --> 10:21.000 And hopefully that will give us a number of 10:21.000 --> 10:26.000 Formulas and kind of statistics that give us confidence enough that we can draw those parallels 10:26.000 --> 10:32.000 Without having to collect all open source usage within proprietary software across the whole world 10:32.000 --> 10:39.000 And instead just use the data that is available and then kind of work just from that 10:39.000 --> 10:44.000 We actually are able to reproduce this data and share it and other people can do the same thing 10:44.000 --> 10:48.000 And where we don't have the data for where we don't have the data for downloads and so on 10:48.000 --> 10:54.000 We can infer it using the information that we do have because we've shown strong correlation in other ecosystems 10:54.000 --> 11:07.000 So yes we believe hypothesis number one we can work out what this usage figure is to be able to give us a picture of what are the most critical pieces of software 11:08.000 --> 11:19.000 Let's have a look at some examples of kind of the approach that we've taken to suggest the open source usage is a good parallel for 11:19.000 --> 11:25.000 Close source usage or at least good enough that we can work with that to then move forward 11:25.000 --> 11:31.000 This is a graph of download of this one is 11:31.000 --> 11:41.000 Packages pH P packages some of the most used pH P packages as you see you can imagine this long tail of dots that ends up in zero zero is pretty massive 11:41.000 --> 11:43.000 But 11:43.000 --> 11:51.000 It's simplified just so that I could actually render this in a browser because five million dots is too many dots for chart tears 11:51.000 --> 11:53.000 Yeah, well 11:53.000 --> 12:05.000 So the other access that we have here is the number of open source repositories that depend on each one of these pH P packages 12:06.000 --> 12:15.000 Looking at the correlation between downloads which includes everyone who is using pH P packages in their proprietary software and 12:15.000 --> 12:22.000 Their open source software against just how many times they're declared as a dependency in an open source project 12:22.000 --> 12:28.000 You can see this actually has quite a nice trend and we can look and see other ecosystems 12:28.000 --> 12:33.000 This is the rust ecosystem and actually has an even stronger correlation 12:33.000 --> 12:44.000 It's still a little bit hard to see here, but you can definitely see the trend locally. We have maps that can actually quantify that in a very useful way 12:44.000 --> 12:50.000 Pearson correlation coefficient of comparing the number of 12:51.000 --> 12:54.000 It's been a long for stem the number of 12:54.000 --> 13:03.000 Repose open source repose that depend on these packages versus the number of downloads that those packages have received over all time 13:03.000 --> 13:13.000 If the correlation is above 0.8 it's very very strong. We can be confident that one will infer the other if one is large the other will be large 13:14.000 --> 13:25.000 Going down the scare, right? Let's have a look at some examples of what we measured in open source usage for a number of very large software ecosystem 13:25.000 --> 13:39.000 So rust and packages are incredibly correlated. Basically if you know how many people in open source depend upon a pH P package or a rust package you can guess this is going to be a similar level 13:40.000 --> 13:49.000 Relatively like it's not going to give you an absolute number, but relatively it's going to be able to go I can infer this. This is really helpful because as you notice there isn't a go 13:49.000 --> 13:56.000 Row here you can't get go download figures they're just not available, but we can measure dependent 13:56.000 --> 14:07.000 Repo usage in open source of go packages if the correlation is strong we can imply that this we can kind of infer what the downloads that's would be if we had 14:07.000 --> 14:17.000 And people much smarter than us with a statistics background could do a lot more I'm not really a con I'm a Ruby developer 14:17.000 --> 14:26.000 We have all of this data that is available on public for people to build more interesting statistical models than I am able to do for a conference talk 14:26.000 --> 14:36.000 To help us do really interesting insights into different ecosystems and across all of open source potentially 14:36.000 --> 14:44.000 To be able to like really build out strong models that imply what we believe the correlation looks very strong 14:44.000 --> 14:50.000 To give us a picture of what proprietary usage of open source is even if we don't have that data 14:51.000 --> 14:59.000 And we actually had a funding proposal that we put out it turned down unfortunately, I was too sure you want to throw that shade 14:59.000 --> 15:05.000 But if anyone isn't interested in funding that research then do come and say hello 15:06.000 --> 15:10.000 You can see 15:10.000 --> 15:16.000 So having established that we believe 15:16.000 --> 15:19.000 You can tell we don't do this often but we do work together a lot 15:20.000 --> 15:31.000 Right, okay, so having proven we believe that you know, there's a very strong correlation between observable usage once saying and kind of all usage of open source software 15:31.000 --> 15:39.000 The next thing is that if you agree with that can we demonstrate that funding for critical infrastructure is being misdirected 15:40.000 --> 15:56.000 At which point I have to obviously say no, don't I because the data that we do have being predominantly from GitHub sponsors and open collective is not specifically directed for projects on the basis of their criticality 15:56.000 --> 16:07.000 So straight off the bat I have to kind of say no, but we can run with it and we can see where we get to and we can see the funding that we do have available that we can see 16:07.000 --> 16:15.000 We can take a look at that see where it's directed see how well it matches up and then we can talk a little bit about you know lack of data effectively 16:15.000 --> 16:22.000 There are plenty of other sources I'm going to talk about them a little bit in a second but I think now basically I have to sub you back in sorry 16:22.000 --> 16:33.000 Okay, so let's have a look at some more graphs and I kind of want to give an example of two different data sets and then see what happens when we smash them together 16:33.000 --> 16:37.000 And it's a little bit of a car crash but it should be fun 16:37.000 --> 16:47.000 So this is a graph of all of the accounts that are on GitHub sponsors by the number of people and other accounts that have sponsored them 16:47.000 --> 16:50.000 We don't have data for how much they gave 16:50.000 --> 16:53.000 GitHub we'd love that data that would be great 16:54.000 --> 16:58.000 But for now we can use this as a proxy 16:58.000 --> 17:03.000 It's a very nice graph you might have seen other graphs like this the 17:03.000 --> 17:10.000 The kind of long tail of anything ends up looking very much like this where there's one account at the end 17:10.000 --> 17:18.000 There's like more sponsors than absolutely everyone else trading off to these accounts all have like one or two sponsors ever 17:18.000 --> 17:25.000 Again, we don't know exactly how much money ZIG tools could have received like absolute huge amounts of sponsorship 17:25.000 --> 17:27.000 But we have to go with what we've got 17:27.000 --> 17:31.000 We know that what was it 68 million last year was given in sponsors 17:31.000 --> 17:33.000 Yeah, so 17:33.000 --> 17:34.000 I think it was 17:34.000 --> 17:39.000 Yeah, I have these Abby's not going to answer that question right now legal told her to stay quiet 17:40.000 --> 17:47.000 We also have very nice data from open collective there's maybe 17:47.000 --> 17:55.000 2,500 collectives in open source collective and again very similar graph 17:55.000 --> 18:02.000 I like this because you know we got two different data sources and the graph comes out looking kind of similar that's encouraging the area under this 18:02.000 --> 18:07.000 This line is approximately 50 million dollars 18:07.000 --> 18:12.000 Which is cool, but also like you can't really see the line 18:12.000 --> 18:19.000 I've actually cut this graph off because it goes over into like the next building. It's very very long tail 18:19.000 --> 18:27.000 What's interesting here is then like we have these funding stats for how much projects have received in 18:27.000 --> 18:33.000 total amount of dollars and let's use open collective because we have dollars not just number of sponsors 18:33.000 --> 18:45.000 Let's look at all of the rust packages that are all open collective and how much they have raised versus how much uses they have 18:45.000 --> 18:52.000 I wouldn't pay too much attention to the access here because the numbers in dollars are not 40 million 18:52.000 --> 19:01.000 Dollars that some of these rust packages have raised but I had to squish them to get them to show up otherwise all the funding looks like nothing compared to the number of downloads 19:01.000 --> 19:05.000 So it's relative right the 19:05.000 --> 19:14.000 We have some interesting things the some packages in rust that are highly used and you know are on open collective are actually receiving a good amount of money 19:15.000 --> 19:22.000 But there's also this kind of like the most downloaded doesn't necessarily have the most dollars collect there 19:22.000 --> 19:32.000 If we step through some other package managers and other ecosystems you start to see everything goes a little bit wobbly and we start to lose our trends a little bit 19:33.000 --> 19:44.000 The Python one has a numpy on here, but course numpy is actually under num focus foundation and doesn't have very good data here at all 19:44.000 --> 19:49.000 But num focus is has as a foundation has a lot of money to be able to support that project 19:49.000 --> 19:57.000 But we haven't mixed that in because it's really hard to discern exactly how much funding from foundations goes into individual projects 19:58.000 --> 20:05.000 JavaScript is actually really popular on open collective. It's like one of the first mentees to adopt that kind of open funding 20:05.000 --> 20:07.000 It's absolutely all over the place 20:07.000 --> 20:13.000 You notice a real correlation with some of these bars as well right where it's 20:13.000 --> 20:17.000 These are all Babel's projects and they're broken up into small modules 20:17.000 --> 20:22.000 How are you supposed to say like okay, well the Babel receives lots of money through open collective 20:22.000 --> 20:30.000 Which one goes to which package how can you then kind of a line that is actually quite difficult to do and how to how to 20:30.000 --> 20:35.000 Square that is really painful and then if you go to the Ruby ecosystem 20:35.000 --> 20:38.000 You're like oh no, what's happened here? 20:38.000 --> 20:46.000 We have almost no correlation between this funding data and the amount of usage of these projects. This is this is disastrous 20:46.000 --> 20:51.000 I have no confidence in this graph whatsoever, but it's actually based on real world data, right? 20:51.000 --> 20:55.000 We just don't have good data to be able to have ten minutes left 20:57.000 --> 21:01.000 So the answer is inconclusive 21:01.000 --> 21:04.000 We don't have enough data to be able to have any confidence in these graphs at all 21:04.000 --> 21:06.000 We're like well, that doesn't make any sense, right? 21:06.000 --> 21:10.000 But how do you know we don't know is kind of the problem 21:11.000 --> 21:16.000 Thank you, it is tiny, unlike me 21:17.000 --> 21:25.000 So as we said at the start of this section look you know the data that we're looking at is not targeted at critical projects 21:25.000 --> 21:32.000 We accept that but even if you were to say that it was the results are basically, you know, budget and not great 21:33.000 --> 21:37.000 So you know, going back to our original research question 21:37.000 --> 21:42.000 After a decade of saying that we need to support our digital infrastructure, we only close to saying that we do 21:44.000 --> 21:51.000 That's basically as my presentation, but unfortunately we've got a few more minutes and you have to listen to me 21:51.000 --> 21:54.000 So what do we need? How are we going to address this? 21:54.000 --> 21:56.000 Martin did such a good job on this. 21:56.000 --> 21:58.000 I know, such good 21:59.000 --> 22:04.000 So there are a few things we need and you know, I'm not saying I come up with all the best ideas 22:04.000 --> 22:09.000 But what I am going to do is just amplify good ideas that other people have had at this point 22:09.000 --> 22:13.000 Over a year ago just about I think it was a year and two weeks ago 22:13.000 --> 22:18.000 Frank Nagle and a few others wrote this report on the value of open source software 22:18.000 --> 22:23.000 Got some interesting kind of figures thrown around about the demand side 22:23.000 --> 22:27.000 And supply side value of open source. I suggest that you read it. It's free 22:27.000 --> 22:31.000 But I would like to share them out because Frank said exactly what I'm going to say 22:31.000 --> 22:36.000 Which is we need better data for financial support both direct and indirect 22:36.000 --> 22:39.000 So that's direct financial support and indirect financial support 22:39.000 --> 22:45.000 Which might be we're employing someone on 50% of their time at whatever rate that they're employing on 22:45.000 --> 22:51.000 To work on that particular project so that we can map what resources are available to whichever project 22:52.000 --> 22:58.000 Now there are some people who are trying to do this and I think Emmy did not manage to make their talk this morning 22:58.000 --> 23:03.000 Because of train but shout out to IOS and invest in open infrastructure 23:03.000 --> 23:10.000 They have been trying to map 400 million dollars worth of funding both state and philanthropic institutional level 23:10.000 --> 23:16.000 To a bunch of open infrastructure projects which a lot of are open source projects 23:16.000 --> 23:19.000 I'm not saying the same thing please don't shoot me Caitlin 23:19.000 --> 23:24.000 But yes, I would just like to give them a shout out and shout out to Emmy who 23:24.000 --> 23:27.000 Yeah, I wish I could have heard your talk 23:27.000 --> 23:29.000 And before you're alert on that 23:29.000 --> 23:32.000 It's really hard and it's very very manual to do right now 23:32.000 --> 23:40.000 Yeah, yeah, they've manually done that and there are very many issues including how to map actual money that is going to the project 23:40.000 --> 23:46.000 This is being used in administration the argument about whether that would actually count a support for a project anyway 23:46.000 --> 23:55.000 It is super super complex and basically what we need is to agree on a way to describe these resources for this project 23:55.000 --> 23:58.000 That can be led from the project side or from the funder side 23:58.000 --> 24:01.000 There are initiatives that have been spoken about 24:01.000 --> 24:06.000 I think IOI may be released a report earlier this week or maybe it will be coming next week 24:06.000 --> 24:12.000 That says hey the 360 data giving data standard was proposed a while back 24:12.000 --> 24:20.000 This is the method that we think we should use for describing how funding is being used and moving to these projects 24:20.000 --> 24:27.000 I actually worked on a open contracting data standard in 2012 when I worked in civic tech with the World Bank 24:27.000 --> 24:33.000 That was deployed in 2014 to describe I think it was most of New York state spending 24:33.000 --> 24:40.000 It's a similar kind of thing and I'm not proposing another standard. I know the SKCD comic no worries 24:40.000 --> 24:46.000 I'm just saying that there are ways in which we can do this and basically we need better information 24:46.000 --> 24:49.000 If we had that information what could we do with it? 24:49.000 --> 24:51.000 Do I take this bit or drink this bit? 24:51.000 --> 24:54.000 Yeah, this is no maths in this bit 24:54.000 --> 24:59.000 So let's imagine a world where my graphs weren't terrible 24:59.000 --> 25:03.000 But also we had enough data that you could be like okay, they're really weird 25:03.000 --> 25:08.000 But at least we can rely on them because like it was representative of actually where the money was flowing 25:08.000 --> 25:11.000 And where others have kinds of support were flowing as well 25:11.000 --> 25:15.000 Some measure of that all kind of fluff together as well 25:15.000 --> 25:24.000 That would potentially be able to give you a picture of where critical projects are under-resourced or over-resourced 25:24.000 --> 25:30.000 And then when you come to make decisions about where should funding go to a project or where should resources go 25:30.000 --> 25:37.000 You'd be able to make a data driven and let's call decision rather than using the OuG Board or using the defining rods 25:37.000 --> 25:43.000 To kind of just wave yourself around and land on whichever project happens to have a really easy way to donate to them 25:43.000 --> 25:49.000 You'd actually be able to go this is a critical project to my ecosystem that doesn't have enough support 25:49.000 --> 26:00.000 And I can confidently give them support until they get to the point where they are no longer kind of under-produced that term of it's a make-up end-up end-up 26:00.000 --> 26:05.000 I had this concept of under-production which basically is like these projects are really heavily used 26:05.000 --> 26:09.000 But they're absolutely crushed under the weight of the pressure of open source 26:09.000 --> 26:16.000 We can recognize that and we can give them more support so that they can stand up 26:16.000 --> 26:24.000 The project to make it much more stronger and sustainable which then has an occupant effect of improving everything in its ecosystem 26:24.000 --> 26:33.000 And kind of being able to scale it up to support for the whole world of open source because the whole world runs on open source now 26:35.000 --> 26:37.000 Thank you, that was very smooth 26:37.000 --> 26:42.000 Yeah, so in addition to that I just want to say you know, we start with the research question 26:42.000 --> 26:45.000 Like can we say that we're supporting and sustaining all of open source? 26:45.000 --> 26:48.000 I'm not saying that finances the be all and end all of open source 26:48.000 --> 26:56.000 I'm saying that it's a constituent component we can probably represent some of the other implicit implied and you know kind of hidden 26:56.000 --> 26:58.000 Kind of support for projects 26:58.000 --> 27:06.000 But ultimately what I want to do is be able to prove for ourselves that the thing that we've been working on in our case for 10 years is actually working 27:07.000 --> 27:09.000 That would be lovely if we could do that. Thank you very much 27:10.000 --> 27:17.000 It's not having to go anyone it's not you know, throwing any shade anything but having spent the time that we've spent in this space 27:17.000 --> 27:21.000 I would really really like to be able to know that I had some sort of impact. Thank you 27:22.000 --> 27:32.000 There's plenty more to come we're going to continue working on this. I've managed somehow to intertwine a couple of organisations that I work at and director of 27:33.000 --> 27:35.000 So that we can do that 27:35.000 --> 27:40.000 There's very many opportunities for us to work together with many of you out there 27:40.000 --> 27:47.000 All of the data that we've just spoken about is freely available as is all of the data on ecosystems 27:47.000 --> 27:56.000 This is not a sales pitch. It is just an open call to say that we are more than happy to work with you on any of these subjects because we genuinely care about the space 27:56.000 --> 28:03.000 So I think that is about it and now we've got a bit of time for questions if anyone has 28:08.000 --> 28:10.000 That's remember to request you 28:14.000 --> 28:24.000 Yes, I have a question and I hope this fits on the bracket of more to come as opposed to reach forward 28:25.000 --> 28:34.000 If we accept the premise that criticality is this kind of a measure or sorry usage is a better measure of criticality 28:34.000 --> 28:42.000 Can you extend that with sort of metrics or indicators that are on the project level 28:42.000 --> 28:47.000 And by that I mean sort of indicators about sort of community or project health 28:48.000 --> 29:00.000 So we have opinions I will repeat the question so if we give if we take it as a given maybe the usage is a determinant of criticality 29:00.000 --> 29:14.000 How might we include other scores and be able to maybe complement that data so I will say from the first thing to get out of the way is that some measures appear to actively punish projects 29:14.000 --> 29:21.000 Looking at them for not having maybe a correctly formatted license file or yes, I know that's another issue but let's get us ignore that for a second 29:21.000 --> 29:27.000 Or you know not not having great documentation so on to which point I think that is probably wrong 29:27.000 --> 29:35.000 If you accept that your project is one of the most used in a particular ecosystem and likely used in industry as a result of that which we've shown 29:35.000 --> 29:40.000 Why would you want to demote some sort of criticality school because they don't have certain documentation 29:40.000 --> 29:49.000 It's still going to have an impact if it blows up but yes, I think you know we're aware that the work of chaos and the work of open SSF 29:49.000 --> 30:02.000 And so on in drilling down into how you might direct your efforts based on measures that you might see in how well the project is managing the demands that are placed upon it 30:02.000 --> 30:05.000 Right so things like you know how well is 30:10.000 --> 30:12.000 You