WEBVTT 00:00.000 --> 00:11.800 And without further ado, Rubin will be talking about expanding GGML hardware for supporting 00:11.800 --> 00:13.400 the Vulcan API. 00:13.400 --> 00:14.800 Take it away, Rubin. 00:14.800 --> 00:15.800 Thank you. 00:15.800 --> 00:16.800 Thank you. 00:16.800 --> 00:17.800 Thank you. 00:17.800 --> 00:18.800 Thank you. 00:18.800 --> 00:19.800 Thank you. 00:19.800 --> 00:20.800 All right. 00:20.800 --> 00:29.000 So basically, we've already heard quite a bit about GGML and I'm here to give you 00:29.000 --> 00:36.000 a different perspective from a developer point of view. 00:36.000 --> 00:37.000 Yeah. 00:37.000 --> 00:38.000 Yes. 00:38.000 --> 00:47.000 As the target says, I contributed the Vulcan backend to the Lama CPU and the GGML project. 00:47.000 --> 00:56.800 And yeah, that was basically my first, my first major open source contribution. 00:56.800 --> 01:03.200 So I quickly wrote the structure, just a little bit about myself and about the project, 01:03.200 --> 01:08.200 or the two projects, a few problems I ran into, I solved them. 01:08.200 --> 01:11.800 And in the end, a little bit about the community itself. 01:11.800 --> 01:18.680 So just very briefly, my name is Humadla and I'm 27 years old from Germany. 01:18.680 --> 01:25.680 I've got a mass science and computer science from the University of Mactaburg. 01:25.680 --> 01:32.680 And ever since I was a software engineer with a focused on C++ and Python. 01:32.680 --> 01:38.680 So the Lama CPU project, I'm not going to say too much about that, we've already seen quite 01:38.680 --> 01:40.680 a bit about it. 01:40.680 --> 01:45.680 But it was started by the already gag enough to run Lama models on the CPU. 01:45.680 --> 01:51.680 And I think you was focused on MacBooks, or aiming at that because they don't have the, 01:51.680 --> 01:53.680 they don't have support for Python. 01:53.680 --> 01:59.680 It should be what previously mostly used for running large language miles. 01:59.680 --> 02:04.680 And it was, yeah, at first it was just for CPU influence. 02:04.680 --> 02:10.680 And the Lama CPU project itself acts as a kind of playground for the GGML development. 02:10.680 --> 02:16.680 Like all of the earlier, all of the experimental stuff happens in the Lama CPU project. 02:16.680 --> 02:19.680 And then that's part of the GGML project. 02:19.680 --> 02:27.680 Synchronize with the GGML project, which is the more basic tensor library for running machine learning, 02:27.680 --> 02:30.680 or machine learning library for running models. 02:30.680 --> 02:35.680 So yeah, that's basically a C++ space. 02:36.680 --> 02:44.680 It's optimized for CPUs, and at this point it also has a lot of support for running other accelerators, 02:44.680 --> 02:50.680 mostly GPUs, but as we've already heard there's work on other hardware as well. 02:50.680 --> 03:01.680 So basically I started, I joined the project at some point, I think, like early 2023. 03:01.680 --> 03:08.680 It was very, very early all of the back end stuff that you heard about didn't exist yet. 03:08.680 --> 03:15.680 And yeah, and we, in the early days, we had a big problem with it. 03:15.680 --> 03:21.680 Like when you, when you run something using PyTorch, as you did previously, 03:21.680 --> 03:26.680 then you had to put the entire model onto the GPU. 03:26.680 --> 03:32.680 Otherwise, as soon as you put a little bit of PyTorch caught onto the CPU, it would slow down terribly. 03:32.680 --> 03:36.680 Lama CPU solved that, but it came with its own problems. 03:36.680 --> 03:40.680 So in this case, it ran on the CPU, it ran very quickly. 03:40.680 --> 03:46.680 But there are two parts to running a large language model to stages, basically, to executing, 03:46.680 --> 03:53.680 to get running a prompt, and that is parsing the prompt, which is called prompt processing. 03:53.680 --> 03:59.680 And in text generation, you can think of this reading and writing, basically. 03:59.680 --> 04:04.680 And the writing part was very quick, even from beginning on the CPU, 04:04.680 --> 04:12.680 because it basically doesn't need that much computer just runs as quickly as it can read the model from RAM. 04:12.680 --> 04:18.680 But the prompt processing part on CPU was very slow in the beginning. 04:18.680 --> 04:23.680 And it's still, if you just run on CPU, it's still not very fast. 04:23.680 --> 04:30.680 So, basically, as someone with at least some development experience, 04:30.680 --> 04:39.680 and some, and the master science, basically, and some experience from university, 04:39.680 --> 04:43.680 and that kind of thing, I felt that I could probably solve in some way, 04:43.680 --> 04:48.680 and the ideas to go back to the thing we just left behind GPUs, 04:48.680 --> 04:58.680 and use that to speed up the slowest operations, which are matrix modifications, 04:58.680 --> 05:04.680 and that's basically what's, or that's the main thing that CPUs slow it, 05:04.680 --> 05:08.680 and that slows down the reading part, the prompt processing. 05:08.680 --> 05:12.680 So, there's a number of APIs you could use for that, like the biggest ones, of course, 05:12.680 --> 05:15.680 the biggest ones, of course, CUDA, by Nvidia. 05:15.680 --> 05:19.680 In the years, an alternative called Rock M, into this kind of, 05:19.680 --> 05:24.680 that's all I'm thinking with the one API, which we've also already seen here. 05:24.680 --> 05:30.680 And I think in the FPGA talk, because it's a few more things, 05:30.680 --> 05:33.680 and then there's a few open variants. 05:34.680 --> 05:40.680 That are OpenCIO sickle, which is the lower-neuer and Vulcan. 05:40.680 --> 05:47.680 So, basically, because I want something that runs on any kind of hardware, 05:47.680 --> 05:52.680 I decided to use Vulcan in the end, but at first I went for OpenCIO, 05:52.680 --> 05:54.680 because it was much easier to implement. 05:54.680 --> 05:58.680 We've also already heard about blast libraries, which are, like, 05:58.680 --> 06:03.680 linear algebra, which can accelerate, or they provide waste, 06:03.680 --> 06:05.680 accelerational duplications. 06:05.680 --> 06:09.680 In this case, for OpenCIO, there was already such a library, 06:09.680 --> 06:10.680 CUDA blast. 06:10.680 --> 06:13.680 And so, in, I think, in March, 2020, 06:13.680 --> 06:18.680 I just replaced the CPU-based blast acceleration that already existed, 06:18.680 --> 06:22.680 and I must say, PPU with a GPU-based CUDA blast implementation. 06:22.680 --> 06:26.680 I think it was, by, like, a week, maybe the first, 06:26.680 --> 06:29.680 the first one to publish that. 06:29.680 --> 06:33.680 But it's also very, yeah, this was simply implement, 06:33.680 --> 06:36.680 but it's very far away from what we now call GPU-back, 06:36.680 --> 06:40.680 and, basically, it just moved the matrix multiplication part 06:40.680 --> 06:42.680 from the CPU to the GPU. 06:42.680 --> 06:46.680 So, you copy the data from the CPU to the GPU memory. 06:46.680 --> 06:49.680 You execute the matrix multiplication on the GPU, 06:49.680 --> 06:53.680 and then you move the data, the result data back to the CPU. 06:53.680 --> 06:59.680 And, of course, you spend a lot of time, a lot of time transferring data with that. 06:59.680 --> 07:03.680 So, the solution to that is, it's a transferring data we have 07:03.680 --> 07:09.680 the data already in VRAM, and we run more of the model on the GPU, 07:09.680 --> 07:14.680 and that's basically what we started working towards. 07:14.680 --> 07:18.680 But at that point, I found out that OpenCIO, 07:18.680 --> 07:21.680 while it's a nice idea, it's a little limited, 07:21.680 --> 07:24.680 like the driver supporters in great, in many cases, 07:24.680 --> 07:31.680 there are specific extensions, which are not supported by some of the vendors, 07:31.680 --> 07:38.680 like, if you use Nvidia GPUs, and you cannot use 16-bit floats, 07:38.680 --> 07:42.680 which are very important for machine learning calculations, 07:42.680 --> 07:47.680 and so on the end, I decided to, basically, 07:47.680 --> 07:51.680 abandon OpenCIO, and try again with something else. 07:51.680 --> 07:53.680 So, I went for Vulkan. 07:53.680 --> 07:58.680 If you've already looked into graphics programming, 07:58.680 --> 08:02.680 and you've already ran to Vulkan, it's a graphics API, 08:02.680 --> 08:06.680 but modern games, modern graphics applications, 08:06.680 --> 08:10.680 also rely a lot on compute shaders, and that's basically general purpose compute. 08:10.680 --> 08:14.680 So, you can use that to run machine learning stuff as well. 08:14.680 --> 08:17.680 It's the support is actually much better than OpenCIO, 08:17.680 --> 08:19.680 and that's because of the gaming part. 08:19.680 --> 08:23.680 So, every vendor wants people to be able to play games on their hardware, 08:23.680 --> 08:26.680 and ship you vendor, and so they support Vulkan. 08:26.680 --> 08:30.680 And some of it's very complex, because it operates very close to hardware, 08:30.680 --> 08:35.680 but much of it can be avoided if you just don't, 08:35.680 --> 08:38.680 since I don't need the graphics part, I don't need to look into 08:38.680 --> 08:41.680 the software chains and image part, image stuff, 08:41.680 --> 08:43.680 and that kind of thing, I can just avoid all of it. 08:43.680 --> 08:48.680 I just need to be able to compute to run computers. 08:48.680 --> 08:53.680 So, yeah, one of the advantages that Vulkan actually has over 08:53.680 --> 08:58.680 Oculus, that they binary set result from it, are relatively small, 08:58.680 --> 09:01.680 like Oculus, Oculus, you always run, 09:01.680 --> 09:04.680 you always end up with a lot of device code, 09:05.680 --> 09:10.680 and this ends up getting, if you support motor devices, 09:10.680 --> 09:13.680 it's more device code, and that kind of thing, 09:13.680 --> 09:14.680 that's a bit very quickly. 09:14.680 --> 09:17.680 So, you end up with binaries that are like hundreds of megabytes large, 09:17.680 --> 09:18.680 or in larger. 09:18.680 --> 09:21.680 I think Q blast itself, which is Qdass blast library, 09:21.680 --> 09:25.680 is more to the gigabyte software of device code. 09:25.680 --> 09:29.680 And this, in Vulkan, you don't immediately have the problem, 09:29.680 --> 09:33.680 because the driver interprets and creates the device code 09:33.680 --> 09:35.680 on the fly. 09:35.680 --> 09:39.680 So, the first obstacle I ran into is that Vulkan is extremely verbose. 09:39.680 --> 09:42.680 So, for any step, like in this case, 09:42.680 --> 09:44.680 like the first thing you need to do, 09:44.680 --> 09:47.680 you always initialize a struct, 09:47.680 --> 09:49.680 fill it with some data, run some operation, 09:49.680 --> 09:52.680 put it into another struct, and so on. 09:52.680 --> 09:56.680 So, in this case, like you initialize the application 09:56.680 --> 09:59.680 for struct, you pick instance extensions, 10:00.680 --> 10:03.680 which can, there are stuff like validation layers, 10:03.680 --> 10:06.680 which basically, at the developer, 10:06.680 --> 10:08.680 when he's done something wrong, 10:08.680 --> 10:11.680 and that kind of thing, 10:11.680 --> 10:13.680 and then you create the next thing, 10:13.680 --> 10:15.680 which in this case would be an instance created, 10:15.680 --> 10:17.680 infrastructure, and you actually create the instance, 10:17.680 --> 10:19.680 the query physically devices, 10:19.680 --> 10:23.680 there's another differentiation there, 10:23.680 --> 10:27.680 but three in physically and virtual devices, and so on. 10:27.680 --> 10:30.680 So, that kind of change just keeps on going, 10:30.680 --> 10:33.680 and so you end up with a lot of code that you have to write, 10:33.680 --> 10:38.680 just to get going basically. 10:38.680 --> 10:43.680 In this case, like, you need to hide all of this behind functions, 10:43.680 --> 10:47.680 so when you use QDA, you already have stuff like QDA, 10:47.680 --> 10:50.680 a name copy, where you can copy data from the CPU to the GPU. 10:50.680 --> 10:54.680 In this case, you don't have to write it yourself. 10:55.680 --> 10:58.680 There are some libraries which help you with that, 10:58.680 --> 11:01.680 but as a little bit of a personal challenge, 11:01.680 --> 11:04.680 I decided to just use the working API itself, 11:04.680 --> 11:07.680 and do everything myself. 11:07.680 --> 11:10.680 So, yeah, that ends up with a lot of code. 11:10.680 --> 11:13.680 A lot of work, just for the boilerplate stuff, 11:13.680 --> 11:16.680 so, for example, like instance initialization, 11:16.680 --> 11:19.680 buffer creation, shader loading, command buffers, 11:19.680 --> 11:21.680 shader invocations and copying, 11:21.680 --> 11:23.680 all of that needs to be hidden away, 11:24.680 --> 11:27.680 and only then can you start actually using 11:27.680 --> 11:29.680 the device for anything useful. 11:29.680 --> 11:32.680 So, the second obstacle was that, 11:32.680 --> 11:37.680 the GISR is shader language that you use to write computer shaders. 11:37.680 --> 11:40.680 It's not immediately, like, 11:40.680 --> 11:43.680 it's not as user-friendly as QDAs. 11:43.680 --> 11:46.680 So, in QDA, you can just write the device code 11:46.680 --> 11:48.680 directly into the C++ file, 11:48.680 --> 11:52.680 and GISR you need to write a separate computer shader 11:52.680 --> 11:55.680 and it's on file compiler to spell V, 11:55.680 --> 11:59.680 and then somehow loaded, 11:59.680 --> 12:01.680 loaded intermediate representation, 12:01.680 --> 12:03.680 and send it to the driver, 12:03.680 --> 12:05.680 and the driver compiles that into something 12:05.680 --> 12:07.680 that it can execute. 12:07.680 --> 12:09.680 One of the big differences, 12:09.680 --> 12:12.680 here is that there are no pointers at least in base. 12:12.680 --> 12:15.680 Here, as well, that means there's no pointer casting, 12:16.680 --> 12:19.680 and since I had to, 12:19.680 --> 12:24.680 there was already a QDA backend that I could orient, 12:24.680 --> 12:26.680 that I could use as orientation, 12:26.680 --> 12:30.680 but since it's based a lot of its operation 12:30.680 --> 12:33.680 on pointer magic, like pointer casting, 12:33.680 --> 12:34.680 and that kind of thing, 12:34.680 --> 12:37.680 I had to translate all of that into something else, 12:37.680 --> 12:41.680 that avoided that, which in times can be very verbose. 12:41.680 --> 12:45.680 But also, in the corners, 12:45.680 --> 12:46.680 you need to take care, 12:46.680 --> 12:49.680 because there's a lot of working, 12:49.680 --> 12:52.680 it's a big chunk of hardware, 12:52.680 --> 12:56.680 but not every piece of hardware supports the same things. 12:56.680 --> 12:58.680 I mentioned, like, 16-bit floats earlier. 12:58.680 --> 13:01.680 They are supported by most working devices, 13:01.680 --> 13:03.680 but not by all of them. 13:03.680 --> 13:05.680 And so, you need to, in the shaders, 13:05.680 --> 13:07.680 where at matters, you need to provide, 13:07.680 --> 13:08.680 like, a 16-bit implementation, 13:08.680 --> 13:10.680 at the 32-bit implementation, 13:10.680 --> 13:11.680 at the only way to do that, 13:11.680 --> 13:14.680 and GSA, is with macros, 13:14.680 --> 13:15.680 and so you end up with, 13:15.680 --> 13:18.680 in some cases, with rather, 13:18.680 --> 13:20.680 with codeless, rather hard to read. 13:20.680 --> 13:24.680 But the advantage of doing all of this in GSA, 13:24.680 --> 13:27.680 and with the work in this support, 13:27.680 --> 13:29.680 actually goes back to end of GSA, 13:29.680 --> 13:31.680 the first end of GSA, 13:31.680 --> 13:33.680 and generation, which was, I think, 2011, 13:33.680 --> 13:36.680 which meets us to maintaining support for, 13:37.680 --> 13:40.680 and with a couple of, which is 600 to use, 13:40.680 --> 13:42.680 I think, and, of course, the integral GPUs, 13:42.680 --> 13:46.680 all of them can, in the end, run the code that you write here. 13:46.680 --> 13:49.680 But the last, or the third, 13:49.680 --> 13:51.680 and the probably largest obstacle, 13:51.680 --> 13:55.680 ran into that there is no plus library for a broken. 13:55.680 --> 13:58.680 So, there was no easy way out of, 13:58.680 --> 14:01.680 out of, like, 14:01.680 --> 14:05.680 out of, doing the matrix multiplication, 14:05.680 --> 14:06.680 myself, the, 14:06.680 --> 14:08.680 in GSA, you can use Q-blast, 14:08.680 --> 14:10.680 and, obviously, I was able to use C-blast, 14:10.680 --> 14:11.680 and here, I didn't. 14:11.680 --> 14:13.680 So, I had to write the matrix multiplication, 14:13.680 --> 14:15.680 I don't myself. 14:15.680 --> 14:18.680 Luckily, there was some work by C-1 Burn, 14:18.680 --> 14:20.680 who spent some time optimizing, 14:20.680 --> 14:22.680 optimizing Q-da, 14:22.680 --> 14:23.680 and matrix multiplication, 14:23.680 --> 14:27.680 can it, to try to reach the speed of Q-blast, 14:27.680 --> 14:31.680 and I was able to pod some of that work to work into GSA, 14:31.680 --> 14:33.680 and use it to, 14:34.680 --> 14:38.680 yeah, to, to do matrix multiplications for Lama C-p-p. 14:38.680 --> 14:39.680 And in this case, 14:39.680 --> 14:43.680 like, it took me around six months of working in the evening, 14:43.680 --> 14:44.680 and on weekends, 14:44.680 --> 14:46.680 besides my job, 14:46.680 --> 14:49.680 to, to get this finished, 14:49.680 --> 14:51.680 to implement all of the operations, 14:51.680 --> 14:52.680 and to create, 14:52.680 --> 14:54.680 a pre-request in Lama C-p-p, 14:54.680 --> 14:56.680 was absolutely massive. 14:56.680 --> 14:58.680 I think there were, like, 14:58.680 --> 15:00.680 7,000 lines of code or something, 15:00.680 --> 15:03.680 and a lot more data that came, 15:03.680 --> 15:04.680 with it, 15:04.680 --> 15:05.680 but at that point, 15:05.680 --> 15:09.680 you could run actual language models using working. 15:09.680 --> 15:12.680 On, on GPUs, 15:12.680 --> 15:15.680 that previously couldn't run any larger language models anymore, 15:15.680 --> 15:16.680 at all, 15:16.680 --> 15:18.680 because they have no support for Q-da, 15:18.680 --> 15:20.680 they have no support for rock M. 15:20.680 --> 15:21.680 So, basically, working, 15:21.680 --> 15:22.680 to last thing they have, 15:22.680 --> 15:23.680 it can kind of like, 15:23.680 --> 15:26.680 OpenCM might work in some cases, 15:26.680 --> 15:29.680 but, yeah, in this case, 15:29.680 --> 15:31.680 like, working, actually, 15:31.680 --> 15:33.680 made it different. 15:33.680 --> 15:34.680 So, last thing here, 15:34.680 --> 15:36.680 so working code is supposed to even like, 15:36.680 --> 15:37.680 not six, 15:37.680 --> 15:38.680 or you write something in runs, 15:38.680 --> 15:39.680 anywhere, 15:39.680 --> 15:41.680 basically, as long as the hardware supports everything you do, 15:41.680 --> 15:42.680 but sadly, 15:42.680 --> 15:43.680 of course, not that easy, 15:43.680 --> 15:45.680 and each implementation has its quacks, 15:45.680 --> 15:46.680 and so, 15:46.680 --> 15:48.680 a lot of time after that, 15:48.680 --> 15:49.680 was spent, like, 15:49.680 --> 15:50.680 deep-lying stuff. 15:50.680 --> 15:51.680 I had to build a server, 15:51.680 --> 15:52.680 with, 15:52.680 --> 15:55.680 with GPUs from every vendor. 15:55.680 --> 15:56.680 So, in this case, 15:56.680 --> 15:58.680 like, the bottom one is into art, 15:58.680 --> 16:00.680 so, radion buffed, 16:00.680 --> 16:02.680 and RTX Nvidia GPU at the top, 16:02.680 --> 16:04.680 just so I can run the code, 16:04.680 --> 16:06.680 I write on every device 16:06.680 --> 16:07.680 and check that it actually works, 16:07.680 --> 16:08.680 or, 16:08.680 --> 16:09.680 if someone reports a bug, 16:09.680 --> 16:10.680 I can debug it. 16:10.680 --> 16:11.680 Yeah, 16:11.680 --> 16:12.680 and there's still a lot of, 16:12.680 --> 16:13.680 a lot of stuff to do. 16:13.680 --> 16:15.680 There's, like, 16:15.680 --> 16:16.680 seven open poll requests right now, 16:16.680 --> 16:19.680 they don't have time right now to work on. 16:19.680 --> 16:21.680 But, yeah, 16:21.680 --> 16:22.680 so, just quickly, 16:22.680 --> 16:23.680 the community, 16:23.680 --> 16:24.680 and I must be, 16:24.680 --> 16:25.680 since there are a lot of developers here, 16:25.680 --> 16:26.680 and, 16:26.680 --> 16:28.680 if someone wants to join it, 16:28.680 --> 16:30.680 there's all of the interactions on GitHub, 16:30.680 --> 16:31.680 you can, 16:31.680 --> 16:33.680 it's just issues and discussions there. 16:33.680 --> 16:34.680 Basically, 16:34.680 --> 16:36.680 you can contribute anything you want in poll requests, 16:36.680 --> 16:37.680 and have a, 16:37.680 --> 16:38.680 in my, 16:38.680 --> 16:39.680 in my experience, 16:39.680 --> 16:40.680 so we just, 16:40.680 --> 16:41.680 discussion with, 16:41.680 --> 16:43.680 there's a lot of nice people that will help you, 16:43.680 --> 16:45.680 and give you good feedback. 16:45.680 --> 16:46.680 Yeah, 16:46.680 --> 16:47.680 there's a base team of maintainers, 16:47.680 --> 16:48.680 some back-end maintainers, 16:48.680 --> 16:49.680 like, 16:49.680 --> 16:51.680 I'm responsible for what I wrote there, 16:51.680 --> 16:53.680 and trying to maintain the work on back-end, 16:53.680 --> 16:55.680 and there's a lot of smaller contributors, 16:55.680 --> 16:57.680 that do their own, 16:57.680 --> 16:59.680 do their own thing in smaller, 16:59.680 --> 17:00.680 like, 17:00.680 --> 17:01.680 chance. 17:01.680 --> 17:02.680 Yeah, 17:02.680 --> 17:03.680 there's a lot of, 17:03.680 --> 17:04.680 like, 17:04.680 --> 17:05.680 recently I got support by, 17:05.680 --> 17:06.680 from Jeff Boyd's, 17:06.680 --> 17:07.680 playing video, 17:07.680 --> 17:08.680 which is very nice, 17:08.680 --> 17:09.680 like, 17:09.680 --> 17:10.680 he has much more experience than me, 17:10.680 --> 17:11.680 writing, 17:11.680 --> 17:12.680 and debugging, 17:12.680 --> 17:14.680 working code, 17:14.680 --> 17:16.680 and was able to speed up a lot of code, 17:16.680 --> 17:19.680 and still trying to understand exactly what he's doing, 17:19.680 --> 17:20.680 learning from it, 17:20.680 --> 17:22.680 how to optimize to be your code. 17:22.680 --> 17:25.680 And also some GCN optimizations, 17:25.680 --> 17:27.680 and quantization methods, 17:27.680 --> 17:29.680 that were previously not supported, 17:29.680 --> 17:32.680 like the IQ ones we heard about earlier. 17:32.680 --> 17:33.680 Yeah, 17:33.680 --> 17:34.680 that's it. 17:34.680 --> 17:35.680 Thank you. 17:35.680 --> 17:36.680 If you have some questions, 17:36.680 --> 17:37.680 I mean, 17:37.680 --> 17:38.680 you can reach me on Netflix discord, 17:38.680 --> 17:39.680 I'll get up. 17:39.680 --> 17:40.680 Yeah. 17:40.680 --> 17:41.680 That's it. 17:45.680 --> 17:47.680 And Robyn should be walking that direction, 17:47.680 --> 17:49.680 so if you have any questions for him, 17:53.680 --> 17:54.680 or if you have any questions for him, 17:54.680 --> 17:55.680 or if you have any questions for him, 17:55.680 --> 17:56.680 or if you have any questions for him, 17:56.680 --> 17:57.680 or if you have any questions for him, 17:57.680 --> 17:58.680 or if you have any questions for him, 17:58.680 --> 17:59.680 or if you have any questions for him, 17:59.680 --> 18:00.680 or if you have any questions for him, 18:00.680 --> 18:01.680 or if you have any questions for him, 18:01.680 --> 18:02.680 or if you have any questions for him, 18:02.680 --> 18:03.680 or if you have any questions for him, 18:03.680 --> 18:04.680 or if you have any questions for him, 18:04.680 --> 18:05.680 or if you have any questions for him, 18:05.680 --> 18:06.680 or if you have any questions for him, 18:06.680 --> 18:07.680 or if you have any questions for him, 18:07.680 --> 18:08.680 or if you have any questions for him, 18:08.680 --> 18:09.680 or if you have any questions for him, 18:09.680 --> 18:10.680 or if you have any questions for him, 18:10.680 --> 18:11.680 or if you have any questions for him, 18:11.680 --> 18:12.680 or if you have any questions for him, 18:12.680 --> 18:13.680 or if you have any questions for him,