WEBVTT

00:00.000 --> 00:11.800
And without further ado, Rubin will be talking about expanding GGML hardware for supporting

00:11.800 --> 00:13.400
the Vulcan API.

00:13.400 --> 00:14.800
Take it away, Rubin.

00:14.800 --> 00:15.800
Thank you.

00:15.800 --> 00:16.800
Thank you.

00:16.800 --> 00:17.800
Thank you.

00:17.800 --> 00:18.800
Thank you.

00:18.800 --> 00:19.800
Thank you.

00:19.800 --> 00:20.800
All right.

00:20.800 --> 00:29.000
So basically, we've already heard quite a bit about GGML and I'm here to give you

00:29.000 --> 00:36.000
a different perspective from a developer point of view.

00:36.000 --> 00:37.000
Yeah.

00:37.000 --> 00:38.000
Yes.

00:38.000 --> 00:47.000
As the target says, I contributed the Vulcan backend to the Lama CPU and the GGML project.

00:47.000 --> 00:56.800
And yeah, that was basically my first, my first major open source contribution.

00:56.800 --> 01:03.200
So I quickly wrote the structure, just a little bit about myself and about the project,

01:03.200 --> 01:08.200
or the two projects, a few problems I ran into, I solved them.

01:08.200 --> 01:11.800
And in the end, a little bit about the community itself.

01:11.800 --> 01:18.680
So just very briefly, my name is Humadla and I'm 27 years old from Germany.

01:18.680 --> 01:25.680
I've got a mass science and computer science from the University of Mactaburg.

01:25.680 --> 01:32.680
And ever since I was a software engineer with a focused on C++ and Python.

01:32.680 --> 01:38.680
So the Lama CPU project, I'm not going to say too much about that, we've already seen quite

01:38.680 --> 01:40.680
a bit about it.

01:40.680 --> 01:45.680
But it was started by the already gag enough to run Lama models on the CPU.

01:45.680 --> 01:51.680
And I think you was focused on MacBooks, or aiming at that because they don't have the,

01:51.680 --> 01:53.680
they don't have support for Python.

01:53.680 --> 01:59.680
It should be what previously mostly used for running large language miles.

01:59.680 --> 02:04.680
And it was, yeah, at first it was just for CPU influence.

02:04.680 --> 02:10.680
And the Lama CPU project itself acts as a kind of playground for the GGML development.

02:10.680 --> 02:16.680
Like all of the earlier, all of the experimental stuff happens in the Lama CPU project.

02:16.680 --> 02:19.680
And then that's part of the GGML project.

02:19.680 --> 02:27.680
Synchronize with the GGML project, which is the more basic tensor library for running machine learning,

02:27.680 --> 02:30.680
or machine learning library for running models.

02:30.680 --> 02:35.680
So yeah, that's basically a C++ space.

02:36.680 --> 02:44.680
It's optimized for CPUs, and at this point it also has a lot of support for running other accelerators,

02:44.680 --> 02:50.680
mostly GPUs, but as we've already heard there's work on other hardware as well.

02:50.680 --> 03:01.680
So basically I started, I joined the project at some point, I think, like early 2023.

03:01.680 --> 03:08.680
It was very, very early all of the back end stuff that you heard about didn't exist yet.

03:08.680 --> 03:15.680
And yeah, and we, in the early days, we had a big problem with it.

03:15.680 --> 03:21.680
Like when you, when you run something using PyTorch, as you did previously,

03:21.680 --> 03:26.680
then you had to put the entire model onto the GPU.

03:26.680 --> 03:32.680
Otherwise, as soon as you put a little bit of PyTorch caught onto the CPU, it would slow down terribly.

03:32.680 --> 03:36.680
Lama CPU solved that, but it came with its own problems.

03:36.680 --> 03:40.680
So in this case, it ran on the CPU, it ran very quickly.

03:40.680 --> 03:46.680
But there are two parts to running a large language model to stages, basically, to executing,

03:46.680 --> 03:53.680
to get running a prompt, and that is parsing the prompt, which is called prompt processing.

03:53.680 --> 03:59.680
And in text generation, you can think of this reading and writing, basically.

03:59.680 --> 04:04.680
And the writing part was very quick, even from beginning on the CPU,

04:04.680 --> 04:12.680
because it basically doesn't need that much computer just runs as quickly as it can read the model from RAM.

04:12.680 --> 04:18.680
But the prompt processing part on CPU was very slow in the beginning.

04:18.680 --> 04:23.680
And it's still, if you just run on CPU, it's still not very fast.

04:23.680 --> 04:30.680
So, basically, as someone with at least some development experience,

04:30.680 --> 04:39.680
and some, and the master science, basically, and some experience from university,

04:39.680 --> 04:43.680
and that kind of thing, I felt that I could probably solve in some way,

04:43.680 --> 04:48.680
and the ideas to go back to the thing we just left behind GPUs,

04:48.680 --> 04:58.680
and use that to speed up the slowest operations, which are matrix modifications,

04:58.680 --> 05:04.680
and that's basically what's, or that's the main thing that CPUs slow it,

05:04.680 --> 05:08.680
and that slows down the reading part, the prompt processing.

05:08.680 --> 05:12.680
So, there's a number of APIs you could use for that, like the biggest ones, of course,

05:12.680 --> 05:15.680
the biggest ones, of course, CUDA, by Nvidia.

05:15.680 --> 05:19.680
In the years, an alternative called Rock M, into this kind of,

05:19.680 --> 05:24.680
that's all I'm thinking with the one API, which we've also already seen here.

05:24.680 --> 05:30.680
And I think in the FPGA talk, because it's a few more things,

05:30.680 --> 05:33.680
and then there's a few open variants.

05:34.680 --> 05:40.680
That are OpenCIO sickle, which is the lower-neuer and Vulcan.

05:40.680 --> 05:47.680
So, basically, because I want something that runs on any kind of hardware,

05:47.680 --> 05:52.680
I decided to use Vulcan in the end, but at first I went for OpenCIO,

05:52.680 --> 05:54.680
because it was much easier to implement.

05:54.680 --> 05:58.680
We've also already heard about blast libraries, which are, like,

05:58.680 --> 06:03.680
linear algebra, which can accelerate, or they provide waste,

06:03.680 --> 06:05.680
accelerational duplications.

06:05.680 --> 06:09.680
In this case, for OpenCIO, there was already such a library,

06:09.680 --> 06:10.680
CUDA blast.

06:10.680 --> 06:13.680
And so, in, I think, in March, 2020,

06:13.680 --> 06:18.680
I just replaced the CPU-based blast acceleration that already existed,

06:18.680 --> 06:22.680
and I must say, PPU with a GPU-based CUDA blast implementation.

06:22.680 --> 06:26.680
I think it was, by, like, a week, maybe the first,

06:26.680 --> 06:29.680
the first one to publish that.

06:29.680 --> 06:33.680
But it's also very, yeah, this was simply implement,

06:33.680 --> 06:36.680
but it's very far away from what we now call GPU-back,

06:36.680 --> 06:40.680
and, basically, it just moved the matrix multiplication part

06:40.680 --> 06:42.680
from the CPU to the GPU.

06:42.680 --> 06:46.680
So, you copy the data from the CPU to the GPU memory.

06:46.680 --> 06:49.680
You execute the matrix multiplication on the GPU,

06:49.680 --> 06:53.680
and then you move the data, the result data back to the CPU.

06:53.680 --> 06:59.680
And, of course, you spend a lot of time, a lot of time transferring data with that.

06:59.680 --> 07:03.680
So, the solution to that is, it's a transferring data we have

07:03.680 --> 07:09.680
the data already in VRAM, and we run more of the model on the GPU,

07:09.680 --> 07:14.680
and that's basically what we started working towards.

07:14.680 --> 07:18.680
But at that point, I found out that OpenCIO,

07:18.680 --> 07:21.680
while it's a nice idea, it's a little limited,

07:21.680 --> 07:24.680
like the driver supporters in great, in many cases,

07:24.680 --> 07:31.680
there are specific extensions, which are not supported by some of the vendors,

07:31.680 --> 07:38.680
like, if you use Nvidia GPUs, and you cannot use 16-bit floats,

07:38.680 --> 07:42.680
which are very important for machine learning calculations,

07:42.680 --> 07:47.680
and so on the end, I decided to, basically,

07:47.680 --> 07:51.680
abandon OpenCIO, and try again with something else.

07:51.680 --> 07:53.680
So, I went for Vulkan.

07:53.680 --> 07:58.680
If you've already looked into graphics programming,

07:58.680 --> 08:02.680
and you've already ran to Vulkan, it's a graphics API,

08:02.680 --> 08:06.680
but modern games, modern graphics applications,

08:06.680 --> 08:10.680
also rely a lot on compute shaders, and that's basically general purpose compute.

08:10.680 --> 08:14.680
So, you can use that to run machine learning stuff as well.

08:14.680 --> 08:17.680
It's the support is actually much better than OpenCIO,

08:17.680 --> 08:19.680
and that's because of the gaming part.

08:19.680 --> 08:23.680
So, every vendor wants people to be able to play games on their hardware,

08:23.680 --> 08:26.680
and ship you vendor, and so they support Vulkan.

08:26.680 --> 08:30.680
And some of it's very complex, because it operates very close to hardware,

08:30.680 --> 08:35.680
but much of it can be avoided if you just don't,

08:35.680 --> 08:38.680
since I don't need the graphics part, I don't need to look into

08:38.680 --> 08:41.680
the software chains and image part, image stuff,

08:41.680 --> 08:43.680
and that kind of thing, I can just avoid all of it.

08:43.680 --> 08:48.680
I just need to be able to compute to run computers.

08:48.680 --> 08:53.680
So, yeah, one of the advantages that Vulkan actually has over

08:53.680 --> 08:58.680
Oculus, that they binary set result from it, are relatively small,

08:58.680 --> 09:01.680
like Oculus, Oculus, you always run,

09:01.680 --> 09:04.680
you always end up with a lot of device code,

09:05.680 --> 09:10.680
and this ends up getting, if you support motor devices,

09:10.680 --> 09:13.680
it's more device code, and that kind of thing,

09:13.680 --> 09:14.680
that's a bit very quickly.

09:14.680 --> 09:17.680
So, you end up with binaries that are like hundreds of megabytes large,

09:17.680 --> 09:18.680
or in larger.

09:18.680 --> 09:21.680
I think Q blast itself, which is Qdass blast library,

09:21.680 --> 09:25.680
is more to the gigabyte software of device code.

09:25.680 --> 09:29.680
And this, in Vulkan, you don't immediately have the problem,

09:29.680 --> 09:33.680
because the driver interprets and creates the device code

09:33.680 --> 09:35.680
on the fly.

09:35.680 --> 09:39.680
So, the first obstacle I ran into is that Vulkan is extremely verbose.

09:39.680 --> 09:42.680
So, for any step, like in this case,

09:42.680 --> 09:44.680
like the first thing you need to do,

09:44.680 --> 09:47.680
you always initialize a struct,

09:47.680 --> 09:49.680
fill it with some data, run some operation,

09:49.680 --> 09:52.680
put it into another struct, and so on.

09:52.680 --> 09:56.680
So, in this case, like you initialize the application

09:56.680 --> 09:59.680
for struct, you pick instance extensions,

10:00.680 --> 10:03.680
which can, there are stuff like validation layers,

10:03.680 --> 10:06.680
which basically, at the developer,

10:06.680 --> 10:08.680
when he's done something wrong,

10:08.680 --> 10:11.680
and that kind of thing,

10:11.680 --> 10:13.680
and then you create the next thing,

10:13.680 --> 10:15.680
which in this case would be an instance created,

10:15.680 --> 10:17.680
infrastructure, and you actually create the instance,

10:17.680 --> 10:19.680
the query physically devices,

10:19.680 --> 10:23.680
there's another differentiation there,

10:23.680 --> 10:27.680
but three in physically and virtual devices, and so on.

10:27.680 --> 10:30.680
So, that kind of change just keeps on going,

10:30.680 --> 10:33.680
and so you end up with a lot of code that you have to write,

10:33.680 --> 10:38.680
just to get going basically.

10:38.680 --> 10:43.680
In this case, like, you need to hide all of this behind functions,

10:43.680 --> 10:47.680
so when you use QDA, you already have stuff like QDA,

10:47.680 --> 10:50.680
a name copy, where you can copy data from the CPU to the GPU.

10:50.680 --> 10:54.680
In this case, you don't have to write it yourself.

10:55.680 --> 10:58.680
There are some libraries which help you with that,

10:58.680 --> 11:01.680
but as a little bit of a personal challenge,

11:01.680 --> 11:04.680
I decided to just use the working API itself,

11:04.680 --> 11:07.680
and do everything myself.

11:07.680 --> 11:10.680
So, yeah, that ends up with a lot of code.

11:10.680 --> 11:13.680
A lot of work, just for the boilerplate stuff,

11:13.680 --> 11:16.680
so, for example, like instance initialization,

11:16.680 --> 11:19.680
buffer creation, shader loading, command buffers,

11:19.680 --> 11:21.680
shader invocations and copying,

11:21.680 --> 11:23.680
all of that needs to be hidden away,

11:24.680 --> 11:27.680
and only then can you start actually using

11:27.680 --> 11:29.680
the device for anything useful.

11:29.680 --> 11:32.680
So, the second obstacle was that,

11:32.680 --> 11:37.680
the GISR is shader language that you use to write computer shaders.

11:37.680 --> 11:40.680
It's not immediately, like,

11:40.680 --> 11:43.680
it's not as user-friendly as QDAs.

11:43.680 --> 11:46.680
So, in QDA, you can just write the device code

11:46.680 --> 11:48.680
directly into the C++ file,

11:48.680 --> 11:52.680
and GISR you need to write a separate computer shader

11:52.680 --> 11:55.680
and it's on file compiler to spell V,

11:55.680 --> 11:59.680
and then somehow loaded,

11:59.680 --> 12:01.680
loaded intermediate representation,

12:01.680 --> 12:03.680
and send it to the driver,

12:03.680 --> 12:05.680
and the driver compiles that into something

12:05.680 --> 12:07.680
that it can execute.

12:07.680 --> 12:09.680
One of the big differences,

12:09.680 --> 12:12.680
here is that there are no pointers at least in base.

12:12.680 --> 12:15.680
Here, as well, that means there's no pointer casting,

12:16.680 --> 12:19.680
and since I had to,

12:19.680 --> 12:24.680
there was already a QDA backend that I could orient,

12:24.680 --> 12:26.680
that I could use as orientation,

12:26.680 --> 12:30.680
but since it's based a lot of its operation

12:30.680 --> 12:33.680
on pointer magic, like pointer casting,

12:33.680 --> 12:34.680
and that kind of thing,

12:34.680 --> 12:37.680
I had to translate all of that into something else,

12:37.680 --> 12:41.680
that avoided that, which in times can be very verbose.

12:41.680 --> 12:45.680
But also, in the corners,

12:45.680 --> 12:46.680
you need to take care,

12:46.680 --> 12:49.680
because there's a lot of working,

12:49.680 --> 12:52.680
it's a big chunk of hardware,

12:52.680 --> 12:56.680
but not every piece of hardware supports the same things.

12:56.680 --> 12:58.680
I mentioned, like, 16-bit floats earlier.

12:58.680 --> 13:01.680
They are supported by most working devices,

13:01.680 --> 13:03.680
but not by all of them.

13:03.680 --> 13:05.680
And so, you need to, in the shaders,

13:05.680 --> 13:07.680
where at matters, you need to provide,

13:07.680 --> 13:08.680
like, a 16-bit implementation,

13:08.680 --> 13:10.680
at the 32-bit implementation,

13:10.680 --> 13:11.680
at the only way to do that,

13:11.680 --> 13:14.680
and GSA, is with macros,

13:14.680 --> 13:15.680
and so you end up with,

13:15.680 --> 13:18.680
in some cases, with rather,

13:18.680 --> 13:20.680
with codeless, rather hard to read.

13:20.680 --> 13:24.680
But the advantage of doing all of this in GSA,

13:24.680 --> 13:27.680
and with the work in this support,

13:27.680 --> 13:29.680
actually goes back to end of GSA,

13:29.680 --> 13:31.680
the first end of GSA,

13:31.680 --> 13:33.680
and generation, which was, I think, 2011,

13:33.680 --> 13:36.680
which meets us to maintaining support for,

13:37.680 --> 13:40.680
and with a couple of, which is 600 to use,

13:40.680 --> 13:42.680
I think, and, of course, the integral GPUs,

13:42.680 --> 13:46.680
all of them can, in the end, run the code that you write here.

13:46.680 --> 13:49.680
But the last, or the third,

13:49.680 --> 13:51.680
and the probably largest obstacle,

13:51.680 --> 13:55.680
ran into that there is no plus library for a broken.

13:55.680 --> 13:58.680
So, there was no easy way out of,

13:58.680 --> 14:01.680
out of, like,

14:01.680 --> 14:05.680
out of, doing the matrix multiplication,

14:05.680 --> 14:06.680
myself, the,

14:06.680 --> 14:08.680
in GSA, you can use Q-blast,

14:08.680 --> 14:10.680
and, obviously, I was able to use C-blast,

14:10.680 --> 14:11.680
and here, I didn't.

14:11.680 --> 14:13.680
So, I had to write the matrix multiplication,

14:13.680 --> 14:15.680
I don't myself.

14:15.680 --> 14:18.680
Luckily, there was some work by C-1 Burn,

14:18.680 --> 14:20.680
who spent some time optimizing,

14:20.680 --> 14:22.680
optimizing Q-da,

14:22.680 --> 14:23.680
and matrix multiplication,

14:23.680 --> 14:27.680
can it, to try to reach the speed of Q-blast,

14:27.680 --> 14:31.680
and I was able to pod some of that work to work into GSA,

14:31.680 --> 14:33.680
and use it to,

14:34.680 --> 14:38.680
yeah, to, to do matrix multiplications for Lama C-p-p.

14:38.680 --> 14:39.680
And in this case,

14:39.680 --> 14:43.680
like, it took me around six months of working in the evening,

14:43.680 --> 14:44.680
and on weekends,

14:44.680 --> 14:46.680
besides my job,

14:46.680 --> 14:49.680
to, to get this finished,

14:49.680 --> 14:51.680
to implement all of the operations,

14:51.680 --> 14:52.680
and to create,

14:52.680 --> 14:54.680
a pre-request in Lama C-p-p,

14:54.680 --> 14:56.680
was absolutely massive.

14:56.680 --> 14:58.680
I think there were, like,

14:58.680 --> 15:00.680
7,000 lines of code or something,

15:00.680 --> 15:03.680
and a lot more data that came,

15:03.680 --> 15:04.680
with it,

15:04.680 --> 15:05.680
but at that point,

15:05.680 --> 15:09.680
you could run actual language models using working.

15:09.680 --> 15:12.680
On, on GPUs,

15:12.680 --> 15:15.680
that previously couldn't run any larger language models anymore,

15:15.680 --> 15:16.680
at all,

15:16.680 --> 15:18.680
because they have no support for Q-da,

15:18.680 --> 15:20.680
they have no support for rock M.

15:20.680 --> 15:21.680
So, basically, working,

15:21.680 --> 15:22.680
to last thing they have,

15:22.680 --> 15:23.680
it can kind of like,

15:23.680 --> 15:26.680
OpenCM might work in some cases,

15:26.680 --> 15:29.680
but, yeah, in this case,

15:29.680 --> 15:31.680
like, working, actually,

15:31.680 --> 15:33.680
made it different.

15:33.680 --> 15:34.680
So, last thing here,

15:34.680 --> 15:36.680
so working code is supposed to even like,

15:36.680 --> 15:37.680
not six,

15:37.680 --> 15:38.680
or you write something in runs,

15:38.680 --> 15:39.680
anywhere,

15:39.680 --> 15:41.680
basically, as long as the hardware supports everything you do,

15:41.680 --> 15:42.680
but sadly,

15:42.680 --> 15:43.680
of course, not that easy,

15:43.680 --> 15:45.680
and each implementation has its quacks,

15:45.680 --> 15:46.680
and so,

15:46.680 --> 15:48.680
a lot of time after that,

15:48.680 --> 15:49.680
was spent, like,

15:49.680 --> 15:50.680
deep-lying stuff.

15:50.680 --> 15:51.680
I had to build a server,

15:51.680 --> 15:52.680
with,

15:52.680 --> 15:55.680
with GPUs from every vendor.

15:55.680 --> 15:56.680
So, in this case,

15:56.680 --> 15:58.680
like, the bottom one is into art,

15:58.680 --> 16:00.680
so, radion buffed,

16:00.680 --> 16:02.680
and RTX Nvidia GPU at the top,

16:02.680 --> 16:04.680
just so I can run the code,

16:04.680 --> 16:06.680
I write on every device

16:06.680 --> 16:07.680
and check that it actually works,

16:07.680 --> 16:08.680
or,

16:08.680 --> 16:09.680
if someone reports a bug,

16:09.680 --> 16:10.680
I can debug it.

16:10.680 --> 16:11.680
Yeah,

16:11.680 --> 16:12.680
and there's still a lot of,

16:12.680 --> 16:13.680
a lot of stuff to do.

16:13.680 --> 16:15.680
There's, like,

16:15.680 --> 16:16.680
seven open poll requests right now,

16:16.680 --> 16:19.680
they don't have time right now to work on.

16:19.680 --> 16:21.680
But, yeah,

16:21.680 --> 16:22.680
so, just quickly,

16:22.680 --> 16:23.680
the community,

16:23.680 --> 16:24.680
and I must be,

16:24.680 --> 16:25.680
since there are a lot of developers here,

16:25.680 --> 16:26.680
and,

16:26.680 --> 16:28.680
if someone wants to join it,

16:28.680 --> 16:30.680
there's all of the interactions on GitHub,

16:30.680 --> 16:31.680
you can,

16:31.680 --> 16:33.680
it's just issues and discussions there.

16:33.680 --> 16:34.680
Basically,

16:34.680 --> 16:36.680
you can contribute anything you want in poll requests,

16:36.680 --> 16:37.680
and have a,

16:37.680 --> 16:38.680
in my,

16:38.680 --> 16:39.680
in my experience,

16:39.680 --> 16:40.680
so we just,

16:40.680 --> 16:41.680
discussion with,

16:41.680 --> 16:43.680
there's a lot of nice people that will help you,

16:43.680 --> 16:45.680
and give you good feedback.

16:45.680 --> 16:46.680
Yeah,

16:46.680 --> 16:47.680
there's a base team of maintainers,

16:47.680 --> 16:48.680
some back-end maintainers,

16:48.680 --> 16:49.680
like,

16:49.680 --> 16:51.680
I'm responsible for what I wrote there,

16:51.680 --> 16:53.680
and trying to maintain the work on back-end,

16:53.680 --> 16:55.680
and there's a lot of smaller contributors,

16:55.680 --> 16:57.680
that do their own,

16:57.680 --> 16:59.680
do their own thing in smaller,

16:59.680 --> 17:00.680
like,

17:00.680 --> 17:01.680
chance.

17:01.680 --> 17:02.680
Yeah,

17:02.680 --> 17:03.680
there's a lot of,

17:03.680 --> 17:04.680
like,

17:04.680 --> 17:05.680
recently I got support by,

17:05.680 --> 17:06.680
from Jeff Boyd's,

17:06.680 --> 17:07.680
playing video,

17:07.680 --> 17:08.680
which is very nice,

17:08.680 --> 17:09.680
like,

17:09.680 --> 17:10.680
he has much more experience than me,

17:10.680 --> 17:11.680
writing,

17:11.680 --> 17:12.680
and debugging,

17:12.680 --> 17:14.680
working code,

17:14.680 --> 17:16.680
and was able to speed up a lot of code,

17:16.680 --> 17:19.680
and still trying to understand exactly what he's doing,

17:19.680 --> 17:20.680
learning from it,

17:20.680 --> 17:22.680
how to optimize to be your code.

17:22.680 --> 17:25.680
And also some GCN optimizations,

17:25.680 --> 17:27.680
and quantization methods,

17:27.680 --> 17:29.680
that were previously not supported,

17:29.680 --> 17:32.680
like the IQ ones we heard about earlier.

17:32.680 --> 17:33.680
Yeah,

17:33.680 --> 17:34.680
that's it.

17:34.680 --> 17:35.680
Thank you.

17:35.680 --> 17:36.680
If you have some questions,

17:36.680 --> 17:37.680
I mean,

17:37.680 --> 17:38.680
you can reach me on Netflix discord,

17:38.680 --> 17:39.680
I'll get up.

17:39.680 --> 17:40.680
Yeah.

17:40.680 --> 17:41.680
That's it.

17:45.680 --> 17:47.680
And Robyn should be walking that direction,

17:47.680 --> 17:49.680
so if you have any questions for him,

17:53.680 --> 17:54.680
or if you have any questions for him,

17:54.680 --> 17:55.680
or if you have any questions for him,

17:55.680 --> 17:56.680
or if you have any questions for him,

17:56.680 --> 17:57.680
or if you have any questions for him,

17:57.680 --> 17:58.680
or if you have any questions for him,

17:58.680 --> 17:59.680
or if you have any questions for him,

17:59.680 --> 18:00.680
or if you have any questions for him,

18:00.680 --> 18:01.680
or if you have any questions for him,

18:01.680 --> 18:02.680
or if you have any questions for him,

18:02.680 --> 18:03.680
or if you have any questions for him,

18:03.680 --> 18:04.680
or if you have any questions for him,

18:04.680 --> 18:05.680
or if you have any questions for him,

18:05.680 --> 18:06.680
or if you have any questions for him,

18:06.680 --> 18:07.680
or if you have any questions for him,

18:07.680 --> 18:08.680
or if you have any questions for him,

18:08.680 --> 18:09.680
or if you have any questions for him,

18:09.680 --> 18:10.680
or if you have any questions for him,

18:10.680 --> 18:11.680
or if you have any questions for him,

18:11.680 --> 18:12.680
or if you have any questions for him,

18:12.680 --> 18:13.680
or if you have any questions for him,