WEBVTT 00:00.000 --> 00:12.000 I am very, very delighted to introduce Alexis Jacqueline, who is the author of an exciting 00:12.000 --> 00:19.600 project and is told it's going to be developing custom UIs to explore graph databases using 00:19.600 --> 00:32.960 sigma.js, take it away, thanks. Hello, so I'm Alexis, I'm a web developer at West Square, 00:32.960 --> 00:40.160 who are a small firm in France in North, and we mostly develop web applications for data exploration. 00:41.840 --> 00:47.760 We produce some open source code, including sigma.js and a tool which is named 00:47.760 --> 00:57.680 defy light, the web version of jiffy, something else graph related, and I'm here to talk 00:57.680 --> 01:08.320 about developing custom UIs for exploring graph databases. So let's start with Ricardo. Ricardo is a 01:08.400 --> 01:18.640 research project from France to explore international commerce in the 19th and early 20th century. 01:20.240 --> 01:26.080 It's been started, I think, at the Shonspomedia Lab, from people who are also big fans of 01:26.800 --> 01:35.840 Phas Dam and open source software, and the main goal is to craft data sets to explore international 01:35.840 --> 01:47.680 trades between sovereign entities and titists from 1830 to 1930. The core that a set is made of trades. 01:48.480 --> 01:54.640 Basically researchers took huge journals from various countries where they listed 01:55.360 --> 02:04.480 trades they had with other entities with varying currencies. In Ricardo, we don't care what people buy, 02:04.560 --> 02:14.800 we just care about the monetary fixes. This is the core starting block of this project. 02:16.000 --> 02:23.360 The main issue is that the entities that are reported as partners from sovereign countries are 02:23.360 --> 02:30.640 not sovereign entities, and this data set is full of France, Southern France, Eastern Europe, 02:30.720 --> 02:39.760 US, Atlantic coast, this kind of thing, and this is not good because people wanted to explore 02:39.760 --> 02:46.640 trades between sovereign and titists. So researchers started another project, which is named 02:46.640 --> 02:55.040 geopolitist, and it's a database of sovereign countries a long time, because the variety evolves 02:55.120 --> 03:02.560 in the 19th and early 20th century, so it's even harder to apprehend, and we have this new 03:02.560 --> 03:08.720 data set with antities and connections between them. Things like Paris is a part of France, 03:10.400 --> 03:19.120 France is sovereign from this data to this data, etc., etc. You can check the page of this project 03:19.200 --> 03:28.480 because there have lots of exploration tools for this specific data set. Another part that sounds 03:28.480 --> 03:36.800 necessary since what I told you, we have trades reported in varying currencies, in varying countries, 03:36.800 --> 03:45.520 in varying times, so people from Ricardo also drafted, drafted data sets to get all the 03:45.520 --> 03:52.720 external rates a long time, and so we can finally have rural, girly trade reports, 03:52.720 --> 03:57.520 we've normalized monetary values, and relations between antities, and that's what we're going 03:57.520 --> 04:04.000 to focus today. We have this huge network between antities, and we have connections that are 04:04.000 --> 04:11.840 trades, and connections that are geopolitical, kind of. So let's put everything in the 04:11.840 --> 04:21.040 Neo4j database, put the slide with draws the rest of the old. We have a Neo4j database nice, 04:23.040 --> 04:30.480 and we can open Neo4j browser. I don't know who is familiar with the Neo4j browser here. 04:33.520 --> 04:40.480 So I opened it in the Neo4j browser, and the first image I had when I opened something with this, 04:41.440 --> 04:48.800 because the issue is that for each year, for each pair of antities that reported trades 04:50.000 --> 04:59.200 to with each other, I have one edge, and I have 105 years, so if I want to just draw the network, 04:59.200 --> 05:07.920 it's directly unreadable. So we need to find better strategies to represent this message. 05:08.240 --> 05:17.840 Also, we want to extract, so in the recap of the project, they have lots of heuristics, 05:17.840 --> 05:25.040 etc., and code to actually generate network graphs of trades between sovereign antities. 05:25.040 --> 05:30.080 But here, since I kept the trades, I, the road trades I, there were in the initial dataset, 05:30.080 --> 05:40.400 I don't have necessarily direct trades. I can have trades between Paris and UK or Belgium and 05:40.400 --> 05:44.320 North of France and this kind of things, and we want to be able to actually observe them. 05:46.560 --> 05:55.680 And finally, the Neo4j console was very good to actually spot some issues, because we have 05:55.760 --> 06:04.400 lots of antities that do trades with themselves, and that's probably bugs in the scripts that 06:04.400 --> 06:14.560 took the original sources, and generated this road trades dataset. So it's a bit tricky to explore, 06:14.560 --> 06:22.080 and it's a good use case for some useful new eyes. So I'm going to talk about Sigma.js now, 06:22.400 --> 06:28.560 it's a JavaScript library. We developed a quest where to draw networks on web pages, 06:29.760 --> 06:37.440 and it's focused within its ecosystem to build applications for network analysis. 06:38.480 --> 06:43.520 It's not very good at drawing schemas, cytoskeptoj, it will be better, it's not necessarily 06:43.520 --> 06:49.200 great at handling very custom renderings and interactivity within small networks as 06:49.200 --> 06:55.600 difficult as well, but as soon as you want to display larger graphs, it's a very good tool. 06:55.600 --> 07:02.800 And one of the main reasons for that, it's just handles rendering, and another or some tool 07:03.760 --> 07:10.560 developed by Mediara of Sianzpo again, graphology, handles everything that is computing related. 07:11.520 --> 07:14.960 So we have graphology, which is basically a graph model for JavaScript, 07:15.680 --> 07:22.080 that provides a lot of algorithms to compute metrics, scores, to apply layout to the network, 07:22.080 --> 07:28.560 etc. And then we give this refine networks to Sigma.js, that's just going to run the rates on web 07:28.560 --> 07:38.400 pages, and then we can focus on interactivity. So I forgot with the slides. 07:38.480 --> 07:44.480 And now, we have some, there's lots of features. If you go to the website, 07:44.480 --> 07:53.120 you will see the list of features that Sianzpo provides, but let's dive into the application. 07:53.680 --> 07:59.760 When I think it's already used in many tools, and actually here, Jeffy Lite could have been a 07:59.760 --> 08:05.040 good solution, because we don't have that much of a big network if we merge the edges together. 08:05.040 --> 08:10.720 I mean, we have around 4,000 entities, so Jeffy Lite could have been a good solution, 08:10.720 --> 08:15.680 and also G.V., which is unfortunately not open source, but they do a lot for 08:15.680 --> 08:21.200 an source, especially, they pay us to actually develop most of the recent features of Sigma.js, 08:21.200 --> 08:30.640 so big thanks to them. And that's basically, yeah, the Neo4j browser on an 08:30.640 --> 08:39.200 steroid that could have been a good solution. But let's go, Q-Stone. Everything, the code from 08:39.200 --> 08:46.960 the application is on GitHub. I didn't want to put Neo4j server, so if you want to run it, 08:46.960 --> 08:54.560 you have to run it yourself locally. But all the instructions are in the repository to build 08:54.800 --> 09:05.840 that sets and runs the application. Also, I tried to keep as Vanilla as I could. I used Type 09:05.840 --> 09:11.440 Street, because the other Street without Type Street is too painful for me now, but there's no 09:11.440 --> 09:18.960 view, no React, no Angular, and I tried to just integrate Sigma with Web Components, which is not 09:18.960 --> 09:28.720 something I'm very used to do, but it worked well. So, the first view I wanted to draw was 09:28.720 --> 09:36.240 in good networks. Basically, what are the neighbors of a given entity and how they are connected 09:36.240 --> 09:46.640 with each other? This is how it works. We have this Neo4j database. We will run a 09:46.640 --> 09:54.480 Cypher query. We will extract some row graph data. We will use some draw flow demodic to get 09:55.840 --> 10:01.440 a drawable network with everything we want, and then we will give it to Sigma and get this 10:01.440 --> 10:08.480 interactive view. So, the Cypher query here is we have a center C. We want to get all networks, 10:08.480 --> 10:15.200 and at this connecting my center to this neighbors. Sorry. And for each unique neighbors pairs, 10:15.200 --> 10:21.840 I want their relations basically, and this will give me the Agonet work. In graphology, 10:21.840 --> 10:29.920 I will aggregate all parallel trades. So, to avoid having all those parallel edges as we saw earlier 10:29.920 --> 10:37.920 in the Neo4j browser, I aggregate them for all years, and I will have a size of the edge that is 10:37.920 --> 10:46.960 related to its monetary value. And then we set various graphical variables, and that's it for 10:46.960 --> 10:52.480 graphology. Then in Sigma, we just have to write a bit of code to handle parallel edges, 10:52.480 --> 10:59.600 because we can it's easy to tell Sigma that all edges are curved or all edges are 11:01.520 --> 11:07.040 eros, but here I want it to have straight edges, because I think it's more readable. That's 11:07.120 --> 11:14.080 my opinion. And parallel edges as curved. So, there's a bit of code there. Adding some buttons, 11:14.080 --> 11:21.280 captions, highlight neighbors, interactions, these kind of things. So, about the code itself, 11:25.120 --> 11:31.360 we have at some point a Pupere graph, which takes a data graph that kind of directly comes from 11:31.920 --> 11:42.400 Neo4j, and we run some graphology code, make parallel edges curved. That's what I said earlier. 11:42.400 --> 11:45.520 For the position of the nodes, I will first put them all on a circle, 11:46.400 --> 11:52.480 then this comes from the next feature, but if I have some fixed nodes that I want to 11:52.480 --> 11:59.840 polarize my final view, as we will see later, I put them on a larger circle, and then I run some 11:59.920 --> 12:07.600 false at last two algorithm. This is kind of physics algorithms to get some position for the nodes 12:07.600 --> 12:14.560 based on the topology of the network. Most of the graph images we see in the literature 12:15.600 --> 12:21.280 that are, if they look like hairballs, this is the algorithm that has been used probably. 12:21.280 --> 12:31.920 And yeah, some interactions, I can show more code later if needed, and then we got this kind of 12:31.920 --> 12:47.120 views. So, application itself looks like this. I want to explore reported partners of the entity 12:47.280 --> 12:55.360 Belgium. I want to include the center of this Egonet work. This is not mandatory because 12:56.320 --> 13:00.960 I know that every node I will have will be connected to this one, so this will bring a lot of 13:00.960 --> 13:08.640 noise, but let's try anyway. And I want all trades on the whole period, but I will only keep 13:08.960 --> 13:17.280 trades with over 500,000 dollars. So, if two entities traded for less than this amount, 13:17.280 --> 13:23.520 I will skip the edges. And I will keep only exports edges for now. 13:29.600 --> 13:37.440 Okay, and here comes the network. I have small gaps shown at the square. I can roll over 13:39.040 --> 13:46.320 a node to see its neighbors and its context. This one is interesting because it appears a lot of 13:46.320 --> 14:01.680 time in lots of networks. And I wonder what it is. Sorry. Okay. But yeah, we see that we have lots 14:01.760 --> 14:08.320 of various entities that are not sovereign. I don't, maybe this one is, I don't know. 14:09.520 --> 14:20.560 South Africa is, I'm looking for the weird ones. Okay. Is it? I think it's a weird one. 14:21.040 --> 14:32.880 But yeah, this is a bit messy. Let's do the same one, but with, not the bellion, not in it. 14:36.720 --> 14:46.400 Okay. Yeah. Well, it's a nice herbal. I think we might be able to see more interesting things. 14:46.880 --> 14:58.080 This exact graph actually is the networks of, look, this is not supposed to be true. 14:59.760 --> 15:06.000 Basically trades and geopolitical relationships around Belgium only in the first 17 years of the 15:06.000 --> 15:14.480 data sets. There's still this world estimation, not that takes most of the information. 15:16.800 --> 15:24.960 Actually, that just removes the trades to see how it would behave. Okay. This is an easy one. 15:25.680 --> 15:32.880 So at least this gives me all the, thanks, all the direct relations, all the direct political 15:32.960 --> 15:39.600 relations I have between Belgium and other entities and this starts to get informative. 15:41.360 --> 15:49.680 But we can hope to do more. Okay. Yeah. I did this exact query, but for United States of America to 15:49.680 --> 15:56.800 list of the political entities linked to America to United States of America. 16:03.600 --> 16:10.000 No, we want to see indirect trades between two sovereign entities. So I will take two 16:11.360 --> 16:18.320 entities and I want to see when they trade with each other. When one entities, one entity trades with 16:18.320 --> 16:24.960 a part of the other entity, or when the part of both entities trade together. 16:25.920 --> 16:34.160 So I have a new cyber query that gives me trades between the two entities and all the 16:34.160 --> 16:39.600 path with depth two and all the path with depth three basically. Then I take the same 16:39.600 --> 16:50.080 graphology and sigma scripting code and I get some new networks. So according to the sources we have 16:51.040 --> 17:02.080 how does India trades with United Kingdom between 1833 and 1938? I decided for this network 17:02.080 --> 17:07.040 not to display the direct trades. There is an option for it because it took too much of the 17:08.560 --> 17:14.960 information. But yeah, I see the United Kingdom trades with Bengal, China and Mumbai while India 17:14.960 --> 17:21.920 trades with British and South Africa and British Borneo. This is quite informative to me about 17:23.120 --> 17:30.640 the data set. No, I know I can challenge the lyrics of the really of the code in regard to see 17:30.640 --> 17:37.520 if the final trades contain all this trade I am observing right now. 17:37.680 --> 17:48.400 Yeah, this was another example between United Kingdom and United States of America and 17:48.400 --> 17:58.560 in this graph I kept the direct trades because for once I had significant edges that were 17:58.560 --> 18:07.840 not direct trades and in the reports United Kingdom trades a lot with Atlantic coast United 18:07.840 --> 18:16.000 States of America and this might be interesting for researchers to know how those trades are reported 18:16.080 --> 18:25.280 etc. Again, so the more time I already demote some things, I can show more. 18:28.560 --> 18:34.960 Yeah, I won't draw too much but I don't know if you would Belgium and United Kingdom 18:34.960 --> 18:40.720 the main issue is that if I take Belgium and in United Kingdom I kind of expect just to get 18:40.720 --> 18:46.720 the direct trades because the countries are very close to each other. Yeah, okay, because just 18:46.720 --> 18:55.600 some trades very minor with British Western Africa in this time span. But most of the times 18:55.600 --> 19:00.320 this network in this view will look like this basically, which is also informative. 19:01.200 --> 19:12.480 And so I'm going to show a bit of code because I am early and I speak fast. 19:16.000 --> 19:25.200 For the graphology I showed some things. Once I have this graph basically I just so sigma is 19:25.200 --> 19:33.760 instant-hated. It's a class you you spun it by giving it some some settings here with things 19:33.760 --> 19:41.200 or documented, but it says that yes I might want to order not the energies on the depth. 19:41.920 --> 19:50.240 I want to run the edge labels etc. and then in my web components when I have a new graph 19:50.320 --> 19:59.520 I basically just give it to sigma and I tell it to refresh. And this is some advanced code to handle 19:59.520 --> 20:08.560 this. I want to highlight the networks of neighbors when I hover another end to display the 20:08.560 --> 20:17.760 edge labels but it would be usable without it. Honestly, the part that was most annoying to me 20:18.720 --> 20:23.280 was the safer queries because I'm not used to write them. But it worked well. 20:29.040 --> 20:39.360 So one of the good things we the good thing writing custom you you eyes is that we don't have 20:39.440 --> 20:48.640 control to most solutions that are plug and play like the browser or I also think about Neo4j 20:48.640 --> 20:53.760 Bloom for instance is everything has to be done in the query and here we can cheat because 20:54.400 --> 21:00.160 so that I will receive will be small enough to be displayed or at least we hope so because 21:00.720 --> 21:06.480 then if it's not small enough to be displayed no tool can actually work kind of and we can do 21:06.480 --> 21:10.800 we cannot graphology to do things that are easier to do in graphology after the query. 21:11.760 --> 21:17.760 It's kind of splits the difficulty. So in my case here I just I didn't want to merge things in the 21:17.760 --> 21:23.120 safer query because it was too hard for me and I just did it with graphology after and it worked 21:23.120 --> 21:29.120 well and it was easier to implement than if I had only one solution which was the safer query. 21:29.200 --> 21:37.840 Scripting graphology is really I really love this tool and that's why sigma is built on it as well and 21:38.640 --> 21:49.360 there's very very many different algorithms and yeah having this I have in this library just handling 21:49.360 --> 21:57.200 computation this makes things so easy I think then sigma just and does yeah rendering and 21:57.280 --> 22:07.280 interaction also when I when I display the the second view where I wanted to see all in the 22:07.280 --> 22:13.280 indirect trades between two sovereign and tight is I actually couldn't find examples where I have 22:14.160 --> 22:20.400 three levels I have never found any case where two and two and tight is both have a part 22:21.280 --> 22:29.360 that trades together and I was curious but it makes sense because the reporting and tight is 22:29.360 --> 22:37.360 are most of the time sovereign and tight is so if I come back to this one the reports I have 22:37.360 --> 22:45.040 from United Kingdom are from United Kingdom and not from British Western Africa so that explains 22:45.040 --> 22:49.040 why I didn't find it and I'm sad because I wrote the query for nothing. 22:50.480 --> 23:07.360 And thank you very much. If you have question we have to be the best thanks. 23:07.440 --> 23:28.240 Yeah yeah so the question was can I explain what's the difference between sigma j s and 23:28.320 --> 23:38.160 cytoscapes. cytoscapes is really focused on drawing networks for schema as I'd say so they have lots 23:38.160 --> 23:49.840 of tools to actually draw lines in more schematic ways to get some huge diagrams but if you use cytoscapes 23:49.920 --> 23:58.080 for huge networks and forth directed layouts first of all cytoscapes is all in one 23:58.080 --> 24:05.920 I need handles computation and rendering to my knowledge and it doesn't scale that much 24:07.280 --> 24:14.480 it's it's a generic tools to draw network diagrams as I'd say why sigma is a better tool to handle 24:14.480 --> 24:30.080 visual network analysis. I don't know if that's clear oh yeah sorry. 24:44.560 --> 25:03.200 Yeah so the question was is it possible with sigma j s to write applications where we 25:03.840 --> 25:10.720 modify the graph runtime and have animations etc yes it's completely possible here for this 25:11.280 --> 25:21.600 I wanted to keep the demo as low code as I could but I could actually go directly to the website 25:25.600 --> 25:34.800 you can have you can run the layout algorithm in live this is an example 25:35.120 --> 25:44.880 this kind of animations and about alterating the graph itself by adding data I think we have 25:44.880 --> 25:56.240 the story somewhere like hmm okay I can move and here I delete some code to say when I click the 25:56.240 --> 26:04.160 stage it pops the nodes and it connects to close nodes I see that the story book is broken but 26:04.160 --> 26:10.880 you have you have examples of this kind of an interaction yes 26:34.240 --> 26:41.520 so the question is how sigma j s behaves with graphs of feldons of nodes it behaves very well actually 26:41.520 --> 26:47.440 the thing is it uses webGL so if you don't alterate the network if you don't modify the data 26:48.480 --> 26:55.200 zooming and zooming in zooming out and panning and rotating the application it's all done by the GPU 26:55.840 --> 27:04.400 so if I come to this example we have an example for this so let's put 50,000 nodes and 27:05.840 --> 27:14.720 100,000 edges if I run the algorithm this is going to be very very slow 27:17.040 --> 27:25.120 but for zooming in zooming out no problem so if you have if your nodes positions are already 27:25.440 --> 27:36.080 handled yeah this works very well it's it's webGL too the labels are rendered in canvas so 27:36.080 --> 27:42.320 this can slow things sometimes if you have too much of them that they're asked what it is to mitigate 27:42.320 --> 27:55.040 that yes how do you compare sigma j s with cytoscape after you're here already 27:55.040 --> 28:02.640 as the question but no no where is basically cytoscape I think has been really designed to 28:03.520 --> 28:14.720 handle rendering network diagrams with hierarchy with more deterministic things things out most of 28:14.720 --> 28:21.760 the time in squares or rectangle and this kind of things and sigma doesn't handle network diagrams 28:21.760 --> 28:31.440 very well this is more done for what we call visuals visual network analysis so it's more for 28:31.440 --> 28:38.560 I have I have a network of things that are connected and I want to see patterns emerge by themselves 28:39.680 --> 28:48.480 so most of the time when you see sigma it's with this kind of visually appearing networks 28:49.280 --> 28:56.560 I think basically cytoscape is way more flexible and you can render things in way more 28:56.560 --> 29:03.440 different possibilities while sigma is less flexible but it scales better I'd say