WEBVTT

00:00.000 --> 00:07.000
Here on file systems, it was fascinating.

00:07.000 --> 00:17.000
It's nine o'clock exactly, so I'm going to hand over to Thomas, who's going to talk about performance

00:17.000 --> 00:23.000
improvements in Postgres for the last 20 years, and hopefully everyone can hear us on the live stream.

00:23.000 --> 00:25.000
Thomas, you might be working?

00:25.000 --> 00:28.000
Yeah, I think it's working fine.

00:28.000 --> 00:30.000
Everyone can hear me?

00:30.000 --> 00:32.000
Cool.

00:32.000 --> 00:35.000
So, hello, welcome to my talk.

00:35.000 --> 00:36.000
I'm Thomas Mondra.

00:36.000 --> 00:45.000
I'm a long-term Postgres contributor, developer, cometer, and at the moment I'm working for Microsoft,

00:45.000 --> 00:50.000
and where I still work on the open source stuff.

00:50.000 --> 00:55.000
But I'm also the first thing I'm a developer.

00:55.000 --> 01:02.000
So, this is a talk about development of Postgres, and looking back over the last 20 years,

01:02.000 --> 01:07.000
of improvements.

01:07.000 --> 01:17.000
The slides are already online on that URL, but that doesn't matter, I think.

01:17.000 --> 01:21.000
If you have any questions during the talk, please ask them right away, because there's going to be,

01:21.000 --> 01:28.000
like, a lot of different topics discussed, so it will just confusing to ask them at the end.

01:28.000 --> 01:33.000
So, just shout during the talk, and I will try to answer that.

01:33.000 --> 01:39.000
This talk is not really something that would teach you how to do new stuff, right?

01:39.000 --> 01:46.000
So, if you are here expecting to learn how to tune Postgres or something like that, that's not going to happen.

01:46.000 --> 01:52.000
So, this is like the rough agenda of this talk.

01:52.000 --> 02:00.000
First, I think I need to explain why I do these talks, because this is not like the first talk of this type that I do,

02:00.000 --> 02:08.000
and it's simply to inform myself, because when you are developing, working on a project for a long time,

02:08.000 --> 02:19.000
you kind of lose the idea how much did it actually improve over a longer period of time,

02:19.000 --> 02:24.000
because we only deal with individual releases.

02:24.000 --> 02:32.000
That's when you probably see, like, you install a new release, and you see, if it improves performance or something like that,

02:32.000 --> 02:37.000
but for level of persistence even worse, because we are only dealing with individual patches.

02:37.000 --> 02:45.000
We are benchmarking individual pieces of code, and, like, how does it exactly does it compose into, like, long-term view?

02:45.000 --> 02:47.000
I have no idea.

02:47.000 --> 02:54.000
So, I did this, these kind of benchmarks, I do them for, to inform myself, right?

02:54.000 --> 03:04.000
I've been asking myself, so, how did the performance of Postgres improve for typical workloads over a longer period of time?

03:04.000 --> 03:07.000
This is the result, right?

03:07.000 --> 03:16.000
I will be talking about two kind of, like, extreme, or, I wouldn't say typical, but extreme cases of workloads.

03:16.000 --> 03:21.000
The first one is, like, really, a transactional OLTP, right?

03:21.000 --> 03:29.000
Small transactions, small transactions, small queries, updates, singularos, something like that.

03:30.000 --> 03:40.000
Chances are your application does something more complicated, right? You are probably, like, loading larger amounts of data or processing larger amounts of data.

03:40.000 --> 03:50.000
It's somewhere on the spectrum between OLTP and OLAP, and OLAP is the second workload that I've been testing benchmarking.

03:50.000 --> 03:55.000
In this case, I've been using, like, a subset of TPCH.

03:55.000 --> 03:59.000
It's a data warehousing benchmark, right?

03:59.000 --> 04:07.000
It's something that runs loads larger amounts of data, like, tens of gigabytes, something like that, in this case,

04:07.000 --> 04:12.000
and runs queries that are aggregating the data and so on.

04:12.000 --> 04:18.000
So, it's, like, much more expensive, each query, it's, like, much more expensive that the queries in OLTP.

04:19.000 --> 04:26.000
And, usually, applications are somewhere in between, right? They do a bit of that, a bit of that, and so on.

04:26.000 --> 04:36.000
So, I will show you how the performance, how the truth put of postgres improved over the years for these two benchmarks,

04:36.000 --> 04:46.000
and then I will also speculate a bit about the future, right, about, like, we saw this behavior in the past.

04:46.000 --> 04:51.000
What's going to happen in the future? What would I, as a developer, expect?

04:51.000 --> 04:57.000
So, well, I already spoke about the motivation.

04:57.000 --> 05:07.000
So, I think it's actually quite tricky question to ask, like, how did the performance change over the longer period of time?

05:07.000 --> 05:13.000
First, because we just don't do the kind of benchmarks that allow, give you a answer, right?

05:13.000 --> 05:22.000
We do, we compare two comments, we may be compared to minor versions or major versions, but never, like, all the versions that want.

05:22.000 --> 05:31.000
You might think that if you are running an application, you will have, like, a better overview.

05:31.000 --> 05:36.000
But that's not really the case, right? Because the applications are not static, right?

05:36.000 --> 05:39.000
You are adding new features to the application.

05:39.000 --> 05:51.000
You are improving the application in various ways, and most importantly, I very much doubt you are still running the application on the hardware that you had, like, 20 years ago, right?

05:51.000 --> 05:58.000
You have done multiple upgrades, which completely, like, change the behavior of the database.

05:58.000 --> 06:03.000
So, this is why I do these benchmarks.

06:03.000 --> 06:15.000
And I think, also, it's not entirely, not only it's tricky question, but it's also not very fair, because if you think about that, then you are developing a piece of software.

06:15.000 --> 06:20.000
You are developing it in the context of the hardware, you have at that time, right?

06:20.000 --> 06:29.000
Like, the first multi-core CPUs came in, like, 2005, right, like, roughly.

06:29.000 --> 06:34.000
And that would, may not be perfectly accurate.

06:34.000 --> 06:40.000
In some cases, I think Spark had, like, more cores before that.

06:40.000 --> 06:47.000
Intel initially started things like hyperfitting, which is not, like, a full core.

06:47.000 --> 06:56.000
But the point is that before, like, 2004, 2005, it didn't really make sense to optimize for, like, multi-core performance, right?

06:56.000 --> 07:00.000
So, no one did that, at least in the post-gres world.

07:00.000 --> 07:13.000
So, now, looking at, like, a software from that time, and benchmarking that using benchmarks that actually require, like, good multi-core, doesn't make much sense.

07:13.000 --> 07:18.000
Well, it's not fair, right?

07:19.000 --> 07:26.000
The other thing is, of course, that post-gres is not living in vacuum, right?

07:26.000 --> 07:30.000
Post-gres is one of the applications running on the operating system.

07:30.000 --> 07:36.000
And there is, like, a lot of things that improve that we rely on, right?

07:36.000 --> 07:47.000
We rely on the kernel, we rely on file systems, we rely on various files of, like, of the system in general.

07:47.000 --> 07:51.000
And all of these things, like, improve significantly.

07:51.000 --> 07:55.000
I'm sure there are talks about these different subsystems.

07:55.000 --> 07:58.000
And those improve, and we are benefiting from that, right?

07:58.000 --> 08:02.000
So, Post-gres relies on the page cache in the kernel.

08:02.000 --> 08:07.000
So, if that became more scalable, we are getting the benefits.

08:07.000 --> 08:13.000
There have been some massive improvements in file systems.

08:13.000 --> 08:16.000
We also automatically get the benefits and so on.

08:17.000 --> 08:24.000
And, of course, like, users do not see those improvements in isolation, right?

08:24.000 --> 08:30.000
You get the new version, new system, with the new kernel, with the new storage, and all of that.

08:30.000 --> 08:32.000
And you see all just the combination of that.

08:32.000 --> 08:37.000
So, I've been trying to kind of, like, isolate just the post-gres bait.

08:37.000 --> 08:43.000
So, I have to warn you, I'm going to present a lot of, like, charts.

08:43.000 --> 08:54.000
I'm not going to show you, like, associates, but I'm going to show you charts that, hopefully, visualize the throughput or, like, the performance.

08:54.000 --> 08:57.000
The short version is, like, it's much faster, right?

08:57.000 --> 09:04.000
Like, if you just want to conclusion the executive summary, this is that.

09:04.000 --> 09:10.000
First, let's talk about all of the, you know, the simple transactional workload.

09:10.000 --> 09:12.000
There is, like, a TPCB.

09:12.000 --> 09:19.000
If you are running, if you are using PGBange with the default workload, this is what you will get.

09:19.000 --> 09:25.000
And I've been, the question was, which hardware to use?

09:25.000 --> 09:30.000
And I do happen to have some machines at home that I use for benchmarking.

09:30.000 --> 09:35.000
They happen to be just in the middle of the, you know, 20 years that I decided to benchmark.

09:35.000 --> 09:45.000
Because the first, first post-gres version that I decided to benchmark is, I think, A.0, which is almost exactly 20 years ago, right?

09:45.000 --> 09:49.000
So, I ended up using a machine, which is, like, roughly in between.

09:49.000 --> 09:52.000
It's not the biggest machine nowadays.

09:52.000 --> 09:59.000
I think it has, like, 4480 or 880 credits.

09:59.000 --> 10:05.000
And, like, 64GB of RAM, some SSDs, current kernel on their VM, and that's it, right?

10:05.000 --> 10:11.000
It's, like, fairly typical machine, nothing terrible.

10:11.000 --> 10:19.000
And the TPCB is, like, a data set, a simple web publication, like, a bank application.

10:19.000 --> 10:24.000
And I'm usually benchmarking three different data sizes, right?

10:24.000 --> 10:32.000
Because, um, if you are familiar with Postgres, you know that there's, like, a shared buffer, which is a cache managed by Postgres.

10:32.000 --> 10:38.000
So, the first sizes, when this, when the data set fits into shared buffers, right?

10:38.000 --> 10:44.000
Um, then there's, like, a medium data set, which fits into RAM, but it's larger than shared buffers.

10:44.000 --> 10:51.000
And then we have, like, a large, which is much larger than available memory, which means it's going to do a lot of IO.

10:51.000 --> 10:59.000
And each of those data set sizes usually hits slightly different different buffal max in the database, right?

10:59.000 --> 11:09.000
Um, the small data set usually hits, like, uh, locking issues, the medium data set usually hits, like, CPU, buffal max.

11:09.000 --> 11:15.000
Uh, and large, usually, like, IO, uh, bucklet.

11:15.000 --> 11:19.000
And then, of course, we have, like, different combinations of parameters, right?

11:19.000 --> 11:23.000
We're on either read only transactions or updates.

11:23.000 --> 11:27.000
Um, you can test with different client counts, right?

11:27.000 --> 11:32.000
It's something different when you run with a single connection and, like, a hundred connections.

11:32.000 --> 11:34.000
And all of them working concurrently.

11:34.000 --> 11:43.000
Uh, I had to do fairly short runs in, like, single digit minutes, because, uh, all of these, uh, combinations.

11:43.000 --> 11:45.000
That's a lot of combinations.

11:45.000 --> 11:49.000
In total, a single run of the benchmark was, like, two weeks, right?

11:49.000 --> 11:54.000
So I just needed to limit the, the, the runs.

11:54.000 --> 12:00.000
Ideally, if you are doing, like, a proper benchmark, it would be, like, multiple checkpoints and all of that.

12:00.000 --> 12:04.000
I had to, uh, you know, compromise on that.

12:04.000 --> 12:12.000
And I did, like, a configuration, which is, like, roughly the same on all the, all the versions, right?

12:12.000 --> 12:18.000
Starting from 80, uh, to the current development version.

12:18.000 --> 12:26.000
Um, so it's a, on new versions, I would probably, I should probably use, like, large-ish head buffers.

12:26.000 --> 12:31.000
On 8.0, 2 gigabytes is the largest that it's actually possible, right?

12:31.000 --> 12:35.000
So, it's a trade-off, right?

12:35.000 --> 12:38.000
It should work reasonably well.

12:39.000 --> 12:45.000
Uh, and, of course, the benchmarking script, the tool, I always use from Postgres 18, right?

12:45.000 --> 12:48.000
Because there was a lot of improvements, even in the benchmarking tool.

12:48.000 --> 12:51.000
And I don't want to be benchmarking that, right?

12:51.000 --> 12:54.000
I want to see the, you know, consistent results.

12:54.000 --> 13:00.000
So, and this is the result of how it looks on the small data set read only.

13:00.000 --> 13:04.000
Um, it's just 1.5 gigabytes, so it fits into shared buffers.

13:04.000 --> 13:10.000
And all these charts that I'm going to show for OLTP look like this.

13:10.000 --> 13:15.000
The different colors, each color is like a fixed client count, right?

13:15.000 --> 13:17.000
It's the same number of clients.

13:17.000 --> 13:24.000
And on, on the X axis is the version of Postgres up to 18, which is, like, that's the current development version.

13:24.000 --> 13:26.000
That's not released yet, right?

13:26.000 --> 13:32.000
And the Y axis, that's true number of transactions per second, right?

13:32.000 --> 13:37.000
And you can see that, like, that's much by the truth put, right?

13:37.000 --> 13:43.000
It's like, we did maybe, I don't know, 100,000 or 50,000 transactions on 8.0.

13:43.000 --> 13:46.000
Now we are doing like a million, right?

13:46.000 --> 13:50.000
So it's like a massive speed up.

13:50.000 --> 13:56.000
Um, this is for simple transactions, a simple mode with prepared statements.

13:56.000 --> 13:59.000
It's like 50% faster, yeah, right?

13:59.000 --> 14:02.000
And that was always the case, mostly.

14:02.000 --> 14:08.000
You can see that we also did some some weird stuff in the past, right?

14:08.000 --> 14:13.000
For example, on 9.2, 9.3, that was actually a regression.

14:13.000 --> 14:15.000
Like a pretty significant one, right?

14:15.000 --> 14:19.000
It's like, I didn't look into what exactly happened here,

14:19.000 --> 14:24.000
but 9.2, 9.3 actually progressed.

14:25.000 --> 14:30.000
There are also cases where you switch to prepared statements.

14:30.000 --> 14:33.000
And it gets slower, right?

14:33.000 --> 14:35.000
These are things that are, right?

14:35.000 --> 14:39.000
In retrospect, it's like, well, we shouldn't have done that.

14:39.000 --> 14:43.000
That's like, that's something that we missed during testing.

14:43.000 --> 14:48.000
But since I think 9.5, where Robert House introduced,

14:48.000 --> 14:55.000
I think, fast-bought looking, which is like a simpler, faster way to lock stuff,

14:55.000 --> 14:58.000
which scales much better.

14:58.000 --> 15:03.000
Since then we are in like really, really good, you know, situation.

15:03.000 --> 15:05.000
I mean, like, it's like almost perfectly stable.

15:05.000 --> 15:10.000
And this is, this is a workload, which is like super simple.

15:10.000 --> 15:15.000
So it's like optimized as much as possible at this point, right?

15:16.000 --> 15:20.000
So this is like the small read only data set.

15:20.000 --> 15:22.000
If you look at like small read, right?

15:22.000 --> 15:24.000
You also see that there are some, you know,

15:24.000 --> 15:27.000
weird stuff happening here.

15:27.000 --> 15:32.000
But again, since roughly, possibly 10,

15:32.000 --> 15:36.000
we are in a really, really like, stable situation.

15:36.000 --> 15:39.000
The same thing is for, like, the pre-closed statement.

15:40.000 --> 15:45.000
So that's good.

15:45.000 --> 15:52.000
If you look at the, you know, read only, I will skip the medium data set size,

15:52.000 --> 15:59.000
because that has almost the same results as the small one in this case.

15:59.000 --> 16:04.000
If you look at the large data set, which is Iow bound, again,

16:05.000 --> 16:10.000
you see some very significant improvements initially.

16:10.000 --> 16:14.000
And then it gets like, mostly stable, right?

16:14.000 --> 16:18.000
There is a bit more jitter here, like, these runs are like,

16:18.000 --> 16:21.000
slower, faster and so on.

16:21.000 --> 16:28.000
That's mostly because the runs were so, so short that it actually didn't even out, right?

16:28.000 --> 16:32.000
The same thing for, for prepare statements.

16:32.000 --> 16:38.000
Also, it doesn't help that much to prepare, to use,

16:38.000 --> 16:45.000
to use prepare statements in this case, because it's Iow bound, right?

16:45.000 --> 16:56.000
So, and prepare statements matter most for, for, for CPU bound queries.

16:56.000 --> 16:58.000
Can you still hear me?

16:58.000 --> 17:01.000
Okay, you didn't break it.

17:01.000 --> 17:03.000
Yeah, okay.

17:03.000 --> 17:07.000
So, do I need to stand in, like, a certain place or?

17:07.000 --> 17:09.000
Okay, I will stay here.

17:09.000 --> 17:11.000
Yeah, okay.

17:11.000 --> 17:17.000
So, this was like, prepare statements, or simple LTP.

17:17.000 --> 17:23.000
But of course, like, LTP or TPCB, this kind of, yeah?

17:23.000 --> 17:27.000
On the private slide, what does that mean?

17:28.000 --> 17:30.000
Yeah, I don't know.

17:30.000 --> 17:32.000
I did look into that.

17:32.000 --> 17:37.000
It might have been somewhat like a hiccup in, in the benchmarking script.

17:37.000 --> 17:38.000
I don't know.

17:38.000 --> 17:39.000
It seems weird.

17:39.000 --> 17:40.000
I agree.

17:40.000 --> 17:41.000
Good.

17:41.000 --> 17:43.000
That you noticed.

17:43.000 --> 17:45.000
Yeah.

17:45.000 --> 17:51.000
Anyway, TPCB is not like the only LTP workload that you can be using, right?

17:51.000 --> 17:54.000
It's a simple, there are no joins for example.

17:54.000 --> 17:57.000
And chances are that if you, even if you have, like, LTP workload,

17:57.000 --> 18:00.000
you are still joining multiple tables.

18:00.000 --> 18:03.000
And quite typical thing is, like, a star join.

18:03.000 --> 18:08.000
Like, star join is usually, if mentioned in the data warehousing,

18:08.000 --> 18:12.000
you know, applications.

18:12.000 --> 18:17.000
But you can actually have star join in LTP,

18:17.000 --> 18:21.000
because you are looking up data in one table and then

18:21.000 --> 18:25.000
enriching the data with, like, additional information from, from other tables,

18:25.000 --> 18:26.000
right?

18:26.000 --> 18:27.000
And that's a star join.

18:27.000 --> 18:36.000
It's like, the fact that you just filter the, the fact table doesn't really change that.

18:36.000 --> 18:39.000
So, I did actually try, like, a star join.

18:39.000 --> 18:43.000
It's a very common pattern, actually, in applications in LTP.

18:43.000 --> 18:48.000
And I did a simple test with, like, a tiny amounts of data,

18:49.000 --> 18:51.000
because you are assuming it's in memory.

18:51.000 --> 18:56.000
And you only have, like, ten, you have, for one table, the main one, the fact.

18:56.000 --> 18:59.000
And then you have, like, ten, do dimension tables, right?

18:59.000 --> 19:03.000
So, it might be a customer info with, like, address and, like,

19:03.000 --> 19:07.000
transaction info and, this kind of stuff.

19:07.000 --> 19:11.000
This is how the query looks like, roughly, right?

19:11.000 --> 19:15.000
A simple join to ten tables.

19:15.000 --> 19:19.000
And it only does 2,000 transactions per second on Postgres.

19:19.000 --> 19:22.000
Like, even on 44 cars.

19:22.000 --> 19:25.000
Let's terrible, right?

19:25.000 --> 19:31.000
The thing is, I've been looking into, like, why this is happening.

19:31.000 --> 19:35.000
And we are just spending so much time on, like, you know,

19:35.000 --> 19:39.000
going through all the possible combinations of the join order,

19:39.000 --> 19:44.000
that, like, ten factorial options right here.

19:44.000 --> 19:47.000
So, that's variable.

19:47.000 --> 19:51.000
And if you, if you just disable, like, prepared statements,

19:51.000 --> 19:55.000
which means you only do the planning once, then the execution is, like,

19:55.000 --> 19:57.000
super, super fast, right?

19:57.000 --> 20:02.000
It's like, you are suddenly getting, like, half a million transactions per second.

20:02.000 --> 20:08.000
And you can actually see the main improvements in scalability on this chart.

20:08.000 --> 20:13.000
Because in 92, we did get the fast spot looking, right?

20:13.000 --> 20:17.000
So, that's a huge jump here.

20:17.000 --> 20:22.000
Then, in working, I'm not sure that was a lot of, like,

20:22.000 --> 20:27.000
improvements of scalability of the MVCC and snapshots and all of that.

20:27.000 --> 20:31.000
So, I don't have a single feature that I could put here.

20:31.000 --> 20:33.000
It's a combination of those.

20:33.000 --> 20:38.000
And then, in 18, I committed an improvement of the fast spot

20:38.000 --> 20:42.000
looking to actually scale to even more funds, you know, relations.

20:42.000 --> 20:45.000
So, in 18, we will get faster again.

20:45.000 --> 20:49.000
But that doesn't change the fact that if you, this only works

20:49.000 --> 20:52.000
if you are using prepare statements, right?

20:52.000 --> 20:55.000
And maybe you can't, right?

20:55.000 --> 21:00.000
That's, the problems with prepare statements and connection

21:00.000 --> 21:02.000
pulling for example, right?

21:02.000 --> 21:08.000
So, the thing is, there are a couple simple hacks that you can do.

21:08.000 --> 21:11.000
For example, you can, you can say, well,

21:11.000 --> 21:13.000
I'm not going to do simple joints.

21:13.000 --> 21:16.000
I'm going to do left joints, right?

21:16.000 --> 21:20.000
And that restricts the possible joint orders a little bit.

21:20.000 --> 21:26.000
And you are not doing, you are not doing, you know,

21:26.000 --> 21:30.000
half a million transactions, but you are doing, like, five times

21:30.000 --> 21:35.000
the original throughput, right, which is good.

21:35.000 --> 21:39.000
There are some other things that you can cue, but really,

21:39.000 --> 21:42.000
this is something we need to fix or improve in Postgres, right?

21:42.000 --> 21:46.000
It's a limitation of the, of the planner.

21:46.000 --> 21:51.000
So, just to give you a brief summary of the OLVP,

21:51.000 --> 21:54.000
there have been some massive improvements in scale building, right?

21:54.000 --> 22:00.000
It's like, and I mean, like, all the of magnitude or more.

22:00.000 --> 22:02.000
Like, we did maybe, I don't know,

22:02.000 --> 22:10.000
50,000 transactions on AWS, we are doing a million transactions now, right?

22:10.000 --> 22:12.000
So, that's really good.

22:12.000 --> 22:16.000
They're having some small regressions in the past,

22:16.000 --> 22:21.000
where something weird happened, right?

22:21.000 --> 22:27.000
Either the throughput decreased with the new major version,

22:27.000 --> 22:33.000
or, like, you did get lower throughput with prepare statements

22:33.000 --> 22:38.000
compared to, you know, full parsing and full planning.

22:38.000 --> 22:42.000
That's really, that's mostly gone since, like, 90.

22:42.000 --> 22:47.000
And I think that's a testament to how much are we testing Postgres,

22:47.000 --> 22:50.000
and also how much we improved, actually,

22:50.000 --> 22:56.000
to develop our processes and, like, how much more careful we are?

22:56.000 --> 22:59.000
This is, like, what could happen in the future?

22:59.000 --> 23:03.000
Like, should you expect other, like, massive growth?

23:03.000 --> 23:07.000
And for me, this is a bit, like, you know, the famous last words,

23:07.000 --> 23:11.000
because every time I make a prediction, like, immediately the next day,

23:11.000 --> 23:14.000
someone sends a patch to the mailing list, proving me wrong, right?

23:14.000 --> 23:17.000
It's like, usually it's undressed, right?

23:17.000 --> 23:22.000
But I think we are kind of, like, out of the low hanging fruit, right?

23:22.000 --> 23:26.000
A big thing's, I don't think there's a change.

23:26.000 --> 23:32.000
That would be, like, a minor fix that results in massive improvements

23:32.000 --> 23:33.000
in throughput.

23:33.000 --> 23:36.000
There are things that we can improve,

23:36.000 --> 23:39.000
but I would expect to be more, like, incremental, right?

23:39.000 --> 23:44.000
We are, like, right now, we are often talking about patches that improve,

23:44.000 --> 23:47.000
for example, my 5%, right?

23:47.000 --> 23:50.000
So, sure, if you do a lot of those,

23:51.000 --> 23:53.000
that could be, like, a significant improvement,

23:53.000 --> 23:56.000
but it's not something that you commit a fast path looking,

23:56.000 --> 23:59.000
and it gets, like, 10 times faster, right?

24:04.000 --> 24:05.000
Yeah.

24:05.000 --> 24:09.000
This was just, like, a simple single number result, right?

24:09.000 --> 24:11.000
It only showed, like, truthwood.

24:11.000 --> 24:15.000
I haven't been showing, like, the consistency of the results.

24:15.000 --> 24:20.000
In the sense that how stable the results are over time, right?

24:20.000 --> 24:26.000
It's, like, you could have, like, very violently jumping results,

24:26.000 --> 24:30.000
and it would give you the same average in the end.

24:30.000 --> 24:33.000
But I think we also got, like, much better in that.

24:33.000 --> 24:35.000
It's much smoother, right?

24:35.000 --> 24:39.000
It's partially because, like, we are just, like, so much faster in,

24:39.000 --> 24:44.000
in the absolute number, that the variation doesn't really matter that much.

24:44.000 --> 24:49.000
So, yeah, I will skip that.

24:49.000 --> 24:53.000
Or the other work that I want to talk about is a lot,

24:53.000 --> 24:58.000
and you can imagine that as, like, a large data set process

24:58.000 --> 25:01.000
in analytical workload, right?

25:01.000 --> 25:05.000
TPCH is, like, a fairly traditional benchmark.

25:05.000 --> 25:07.000
A really simplistic has a lot of flaws,

25:07.000 --> 25:10.000
but also, it's very simple to understand, in fact,

25:10.000 --> 25:12.000
sufficiently good.

25:13.000 --> 25:16.000
So, for this benchmark, I use, like, a much smaller machine,

25:16.000 --> 25:21.000
again, roughly from, like, 20, 10, 15 years ago.

25:21.000 --> 25:24.000
I don't have this machine anymore.

25:24.000 --> 25:27.000
That's, but because I replaced that,

25:27.000 --> 25:30.000
but it had, like, only four cores,

25:30.000 --> 25:33.000
16 year by so far, I am, and so on.

25:33.000 --> 25:39.000
The logic here is that this benchmark is not really about concurrency, right?

25:39.000 --> 25:43.000
It's not about running 100, like, connections at once.

25:43.000 --> 25:46.000
It's about one connection running query,

25:46.000 --> 25:49.000
maybe with parallel workers or something.

25:49.000 --> 25:52.000
So, the number, of course, doesn't really matter.

25:52.000 --> 25:58.000
And also, it doesn't need to be super large amounts of data

25:58.000 --> 26:04.000
to actually show the behavior of aggregations, for example, right?

26:05.000 --> 26:07.000
So, other than that, it's, like,

26:07.000 --> 26:09.000
again, running current debyan,

26:09.000 --> 26:12.000
PhD4 and so on.

26:12.000 --> 26:15.000
And TPCH is a very simple benchmark.

26:15.000 --> 26:19.000
It's very simple to set up, run.

26:19.000 --> 26:23.000
It only has, like, 22 queries that are, like, large,

26:23.000 --> 26:28.000
analytical queries using different different types of, you know,

26:28.000 --> 26:32.000
conditions or aggregations, joins and so on.

26:32.000 --> 26:36.000
And I tested not just this, 22 queries,

26:36.000 --> 26:39.000
but also the initial data load, right?

26:39.000 --> 26:41.000
Because, that also matters.

26:41.000 --> 26:45.000
How fast you can actually load data into Postgres, it's important.

26:45.000 --> 26:49.000
If you are looking for a great paper about PCH,

26:49.000 --> 26:51.000
I highly recommend this, you know,

26:51.000 --> 26:57.000
this paper by Peter Bunch, Tomach, Noiman and Ori Arling.

26:57.000 --> 27:01.000
They are actually looking for, they are looking at individual queries,

27:01.000 --> 27:04.000
and pointing out the bottleneck.

27:04.000 --> 27:09.000
Each query is actually designed to hit,

27:09.000 --> 27:12.000
it's a wonderful paper.

27:12.000 --> 27:16.000
So, the first, let's talk about data loads.

27:16.000 --> 27:20.000
So, these chances will be different, again,

27:20.000 --> 27:23.000
on the X axis is the version of Postgres,

27:23.000 --> 27:26.000
and the P here means parallelism, right?

27:26.000 --> 27:30.000
That means we have enabled parallelism for the application.

27:30.000 --> 27:32.000
Postgres started to support parallel queries,

27:32.000 --> 27:34.000
or like, parallel workers,

27:34.000 --> 27:37.000
and maintenance operations in 9.6,

27:37.000 --> 27:41.000
and then improved, like, I did more,

27:41.000 --> 27:44.000
more support over time,

27:44.000 --> 27:49.000
and on the Y axis is duration in seconds.

27:49.000 --> 27:52.000
And this is, like, the thinking by data set.

27:52.000 --> 27:58.000
So, you can see that there's, like, a huge improvement initially,

27:58.000 --> 28:02.000
and then some smaller improvements here,

28:02.000 --> 28:06.000
where we, you know, optimize how we are doing bulk loading,

28:06.000 --> 28:07.000
and that kind of stuff.

28:07.000 --> 28:13.000
And also, the blue is the initial copy, right?

28:13.000 --> 28:16.000
So, a copy improved significantly here,

28:16.000 --> 28:18.000
and then it remains mostly the same,

28:18.000 --> 28:21.000
because we simply can't really improve the,

28:21.000 --> 28:25.000
the parsing of the CSV or something.

28:25.000 --> 28:29.000
And then the other thing is create index that the yellow one,

28:29.000 --> 28:32.000
you can see that, actually, it improved quite a bit,

28:32.000 --> 28:34.000
even without parsing.

28:34.000 --> 28:37.000
And then, at some point,

28:37.000 --> 28:41.000
we started to support parallel index builds here in 11,

28:41.000 --> 28:44.000
and since then it's mostly stable, right?

28:44.000 --> 28:47.000
So, we have optimized, like, most of the stuff,

28:48.000 --> 28:56.000
and, yeah, it doesn't improve as fast as in the parsing.

28:56.000 --> 28:59.000
If you look at the, at the queries,

28:59.000 --> 29:03.000
this is just a, again,

29:03.000 --> 29:06.000
Y axis is number of seconds,

29:06.000 --> 29:10.000
and this shows, if you run all the queries and some

29:10.000 --> 29:14.000
the durations together, how long it will take, right?

29:14.000 --> 29:16.000
And you can see, again,

29:16.000 --> 29:18.000
massive speed-ups initially, right,

29:18.000 --> 29:20.000
between A and A.2,

29:20.000 --> 29:22.000
I stay at the bottom,

29:22.000 --> 29:24.000
because I'm not too bad.

29:24.000 --> 29:28.000
It's so long ago, it didn't work for me.

29:28.000 --> 29:31.000
But there is, like, a massive improvements here,

29:31.000 --> 29:34.000
and I've been, I've been told that this is probably

29:34.000 --> 29:37.000
supportful bitmap index scans, right?

29:37.000 --> 29:39.000
We didn't have bitmap index scans,

29:39.000 --> 29:41.000
so this probably was,

29:41.000 --> 29:43.000
thanks for that.

29:43.000 --> 29:46.000
Then, in A.4, bitmap index scans

29:46.000 --> 29:49.000
actually started to support perfection, right?

29:49.000 --> 29:52.000
We introduced effective IO concurrency meaning,

29:52.000 --> 29:54.000
like, how fast you can actually

29:54.000 --> 29:57.000
pre-fetch data from, that you will need soon, right?

29:57.000 --> 30:01.000
Kind of, like, asynchronous IO a little bit.

30:01.000 --> 30:04.000
Then we started supporting index on this scan,

30:04.000 --> 30:07.000
again, significant speed up here, right?

30:07.000 --> 30:11.000
And then there are, like, a couple of releases

30:11.000 --> 30:14.000
where the improvements are a bit smaller,

30:14.000 --> 30:16.000
and it's a combination of multiple,

30:16.000 --> 30:18.000
multiple improvements.

30:18.000 --> 30:21.000
And since then, it's, again, mostly stable.

30:21.000 --> 30:24.000
You can see that the, the parallelism helps a bit.

30:24.000 --> 30:26.000
In this case, it actually,

30:26.000 --> 30:29.000
it was a bit slower than here.

30:29.000 --> 30:32.000
But generally, it's quite stable, right?

30:32.000 --> 30:34.000
You will get a good performance

30:34.000 --> 30:37.000
for row or into database here.

30:38.000 --> 30:44.000
I could even show a chart

30:44.000 --> 30:49.000
where the duration is broken by query, right?

30:49.000 --> 30:51.000
Those, like, each color here is, like,

30:51.000 --> 30:54.000
one of those 22 queries into DPCH,

30:54.000 --> 30:56.000
and you can see that initially,

30:56.000 --> 30:59.000
there have been many queries that were, like,

30:59.000 --> 31:02.000
substantial fraction of, of the run.

31:02.000 --> 31:06.000
And most of those actually, you know, disappeared, right?

31:06.000 --> 31:10.000
The only thing that actually remains is the query one,

31:10.000 --> 31:14.000
which is just, like, a huge aggregation of large amounts of data, right?

31:14.000 --> 31:16.000
There are no joints, nothing,

31:16.000 --> 31:18.000
it's just about the row power,

31:18.000 --> 31:20.000
of actually processing data.

31:20.000 --> 31:24.000
Paralism helps, that's very effective in handling that,

31:24.000 --> 31:27.000
but actually to improve that further,

31:27.000 --> 31:30.000
we would need something, like,

31:30.000 --> 31:34.000
columnar, columnar storage with columnar execution,

31:34.000 --> 31:36.000
something like that, which we don't, right?

31:36.000 --> 31:41.000
We are still on, on row store database.

31:44.000 --> 31:48.000
So, I would like to talk about,

31:48.000 --> 31:52.000
I didn't talk about, like, planning before,

31:52.000 --> 31:56.000
but here, I think I need to talk about all app regressions,

31:56.000 --> 32:03.000
because the queries in, in TPCH are more complex, right?

32:03.000 --> 32:09.000
And therefore, that are kind of, like, more susceptible to,

32:09.000 --> 32:12.000
you know, changes in, in the query plan, like,

32:12.000 --> 32:17.000
the queries in TPCB, those are all going to be,

32:17.000 --> 32:21.000
Lukeups by primary key, right, index scans and,

32:21.000 --> 32:22.000
this can't stop.

32:22.000 --> 32:26.000
With, with all app, it's more complicated,

32:26.000 --> 32:29.000
and it's susceptible, not just to mistakes,

32:29.000 --> 32:33.000
but also to changes due to changes in,

32:33.000 --> 32:37.000
in configuration parameters, and how we actually interpret those, right?

32:37.000 --> 32:42.000
So, for example, I wouldn't say that's, like,

32:42.000 --> 32:45.000
a real regression, because that's intentional change,

32:45.000 --> 32:47.000
and the consequence of that.

32:47.000 --> 32:51.000
But, for example, we have increased the,

32:51.000 --> 32:54.000
the value of, effectively, exercise,

32:54.000 --> 32:56.000
which doesn't really mean,

32:56.000 --> 32:59.000
because this is going to use more memory,

32:59.000 --> 33:02.000
it's just, like, information how much data is cached,

33:02.000 --> 33:07.000
which means it's making random IO Luke cheaper,

33:07.000 --> 33:12.000
which means that the database is more likely to pick index scans, right?

33:12.000 --> 33:15.000
And if you are on a system where index,

33:15.000 --> 33:19.000
then random IO is actually not as cheap as the database believes,

33:19.000 --> 33:23.000
this is more likely to lead to regressions, right?

33:24.000 --> 33:26.000
Similarly, effective IO concurrency.

33:26.000 --> 33:29.000
In Postgres 14, we have changed how we interpret

33:29.000 --> 33:31.000
the configuration volume a bit, right?

33:31.000 --> 33:33.000
So suddenly, if you didn't change,

33:33.000 --> 33:36.000
if you just copied the configuration file,

33:36.000 --> 33:38.000
suddenly, the database would perfect,

33:38.000 --> 33:41.000
much less data, right?

33:41.000 --> 33:44.000
And again, if you look into the release notes,

33:44.000 --> 33:46.000
well, it's there, right?

33:46.000 --> 33:49.000
But people do not actually, you know,

33:50.000 --> 33:52.000
it's surprised me, and I'm, I'm,

33:52.000 --> 33:55.000
I'm experienced postgresive operator, right?

33:55.000 --> 33:58.000
And there are some additional improvements

33:58.000 --> 34:00.000
that, of course, can,

34:00.000 --> 34:04.000
there are meant to help the database, right?

34:04.000 --> 34:06.000
If the database knows, for example,

34:06.000 --> 34:10.000
about foreign key relationships during query planning,

34:10.000 --> 34:13.000
that gives it additional information,

34:13.000 --> 34:18.000
but maybe maybe the query plan was by accident,

34:18.000 --> 34:22.000
actually, the world plan was actually faster, right?

34:22.000 --> 34:24.000
That can happen.

34:24.000 --> 34:26.000
And they're probably more.

34:26.000 --> 34:28.000
So I run into a couple of those.

34:28.000 --> 34:30.000
I wouldn't say those are like problems,

34:30.000 --> 34:32.000
like box in Postgres.

34:32.000 --> 34:34.000
Those are, I'm just explaining that it's more like

34:34.000 --> 34:37.000
a limitation of the technology in general, right?

34:37.000 --> 34:40.000
Like, you improve configuration,

34:40.000 --> 34:43.000
and by accident, it breaks something,

34:43.000 --> 34:45.000
like, makes something worse, right?

34:46.000 --> 34:49.000
Sometimes, like, two wrongs are actually better.

34:53.000 --> 34:56.000
So, for all I'm, what's the summary,

34:56.000 --> 34:58.000
and what's the future?

34:58.000 --> 35:02.000
I think, just like for, I think it's,

35:02.000 --> 35:04.000
for the quality of people's runs,

35:04.000 --> 35:06.000
we have seen some significant improvements

35:06.000 --> 35:07.000
over the years.

35:07.000 --> 35:10.000
I mean, like, a zero got prefetching

35:10.000 --> 35:12.000
for bitmapscans.

35:12.000 --> 35:14.000
Nine or two got indexed only scans.

35:14.000 --> 35:16.000
Nine or six got parallelism.

35:16.000 --> 35:18.000
And there are probably some additional

35:18.000 --> 35:19.000
like improvements.

35:19.000 --> 35:22.000
Like, I didn't actually mention like,

35:22.000 --> 35:25.000
uh, parallel index builds here,

35:25.000 --> 35:28.000
um, in Postgres 11, right?

35:28.000 --> 35:32.000
Um, and since Postgres 11,

35:32.000 --> 35:34.000
it's mostly unchanged, right?

35:34.000 --> 35:37.000
Like, there are some small improvements, of course.

35:37.000 --> 35:40.000
I'm not saying that's not the case.

35:40.000 --> 35:42.000
Uh, but for the future,

35:42.000 --> 35:44.000
I would probably expect more, like,

35:44.000 --> 35:46.000
incremental improvements.

35:46.000 --> 35:50.000
Again, people that will optimize individual,

35:50.000 --> 35:52.000
uh, individual operations

35:52.000 --> 35:58.000
to, to get a couple more percent, uh,

35:58.000 --> 35:59.000
of that.

35:59.000 --> 36:02.000
But those are, like, mostly small gains.

36:02.000 --> 36:06.000
I think, uh, we will see some improvements

36:06.000 --> 36:08.000
in, like, a tooling, like,

36:08.000 --> 36:11.000
something that Postgres doesn't change in the code,

36:11.000 --> 36:14.000
but we will, for example, start using, uh,

36:14.000 --> 36:17.000
like, optimization of the binaries.

36:17.000 --> 36:19.000
Like, not just the optimizations built

36:19.000 --> 36:21.000
into Postgres directly,

36:21.000 --> 36:24.000
uh, but, like,

36:24.000 --> 36:28.000
there is PGO, which is profile guided optimization

36:28.000 --> 36:31.000
and bald, which is an optimizer

36:31.000 --> 36:33.000
doing something like that.

36:33.000 --> 36:36.000
Um, and that can actually, uh,

36:36.000 --> 36:39.000
help a lot with data intensive applications.

36:40.000 --> 36:43.000
Um, but I think, for, like, really fundamental

36:43.000 --> 36:45.000
improvements,

36:45.000 --> 36:47.000
we would need to,

36:47.000 --> 36:49.000
to do something,

36:49.000 --> 36:51.000
the analytical databases aren't doing, right?

36:51.000 --> 36:53.000
Postgres is a general purpose database,

36:53.000 --> 36:55.000
which is very capable.

36:55.000 --> 36:56.000
I, I, I love it.

36:56.000 --> 36:59.000
Uh, but still, it's a roaster.

36:59.000 --> 37:01.000
Um, it was kind of, like,

37:01.000 --> 37:04.000
most of the workloads are probably transactional,

37:04.000 --> 37:06.000
uh, like, OLTP.

37:07.000 --> 37:09.000
It can do a lot of analytics,

37:09.000 --> 37:11.000
but it can't really compete with the databases

37:11.000 --> 37:13.000
that have been, you know,

37:13.000 --> 37:15.000
designed for that use case for their workload,

37:15.000 --> 37:17.000
specifically, right?

37:17.000 --> 37:20.000
So we would need to adopt something like column storage

37:20.000 --> 37:22.000
and not just the storage,

37:22.000 --> 37:24.000
but also change the executor, right,

37:24.000 --> 37:26.000
to actually leverage the,

37:26.000 --> 37:30.000
the efficiencies in, in the columnar, uh,

37:30.000 --> 37:32.000
format.

37:33.000 --> 37:35.000
Or maybe in Postgres is, like,

37:35.000 --> 37:38.000
very, very extensible and very flexible.

37:38.000 --> 37:40.000
And we do have a,

37:40.000 --> 37:42.000
a long history of actually,

37:42.000 --> 37:45.000
allowing Postgres to become bind with other databases

37:45.000 --> 37:48.000
or, or some other engines, right?

37:48.000 --> 37:50.000
So, um,

37:50.000 --> 37:53.000
we had foreign data wrappers for a long time,

37:53.000 --> 37:55.000
which just allowed you to interact very,

37:55.000 --> 37:58.000
uh, efficiently with, with other databases.

37:58.000 --> 38:00.000
Or we could develop something completely new,

38:00.000 --> 38:03.000
like, um, to, for example,

38:03.000 --> 38:06.000
I know that there are people working on

38:06.000 --> 38:09.000
DAGDB engine and actually using that

38:09.000 --> 38:11.000
to run queries from Postgres, right?

38:11.000 --> 38:13.000
Um, offloading some of the queries

38:13.000 --> 38:16.000
that can benefit from DAGDB.

38:16.000 --> 38:17.000
And stuff like that.

38:17.000 --> 38:20.000
So, I don't know what is going to happen.

38:20.000 --> 38:23.000
Um, people are, uh,

38:23.000 --> 38:27.000
often have, like, really, really crazy ideas.

38:27.000 --> 38:30.000
Uh, but I think, if we are,

38:30.000 --> 38:32.000
if we are to improve this,

38:32.000 --> 38:36.000
some fundamental change needs not to.

38:36.000 --> 38:39.000
So, to some all of this up,

38:39.000 --> 38:43.000
um, I'm quite happy with, uh,

38:43.000 --> 38:46.000
with improvements in Postgres over 20 years.

38:46.000 --> 38:48.000
I mean, like, it makes me feel good as a developer

38:48.000 --> 38:50.000
that we have achieved that.

38:50.000 --> 38:54.000
Uh, what do I expect in the future,

38:55.000 --> 38:57.000
incremental changes,

38:57.000 --> 39:00.000
but also people, uh,

39:00.000 --> 39:02.000
surprising me with some, you know,

39:02.000 --> 39:04.000
fundamental, massive improvements.

39:04.000 --> 39:06.000
Um, I would like to,

39:06.000 --> 39:08.000
this talk was about performance, right?

39:08.000 --> 39:10.000
About truth put and, like,

39:10.000 --> 39:11.000
some benchmarks,

39:11.000 --> 39:13.000
but I want to make it very clear that

39:13.000 --> 39:16.000
that's not the only thing that makes the database accessible.

39:16.000 --> 39:17.000
All right.

39:17.000 --> 39:20.000
There is a lot of other stuff that actually

39:20.000 --> 39:21.000
matters for users.

39:21.000 --> 39:25.000
Maybe more than just the role performance, right?

39:25.000 --> 39:29.000
It's how easily you can actually operate the stuff.

39:29.000 --> 39:31.000
I mean, like, if you have a database,

39:31.000 --> 39:32.000
which is, like, super fast,

39:32.000 --> 39:34.000
but it's like,

39:34.000 --> 39:38.000
shittons of work to actually keep it running.

39:38.000 --> 39:41.000
You will not do that in production, right?

39:41.000 --> 39:43.000
And we are trying to improve,

39:43.000 --> 39:47.000
um, improve this, uh, all the time.

39:47.000 --> 39:50.000
But let's know what this talk about, right?

39:50.000 --> 39:54.000
So I hope you enjoy this talk.

39:54.000 --> 39:58.000
Uh, well, there are a couple of questions.

39:58.000 --> 40:00.000
Um, this is actually, uh,

40:00.000 --> 40:03.000
the, uh, the question that was already asked.

40:03.000 --> 40:04.000
I don't know what that is.

40:04.000 --> 40:06.000
Uh, would be interesting.

40:06.000 --> 40:10.000
Um, there's an opportunity to optimize, you know,

40:10.000 --> 40:12.000
some special queries.

40:12.000 --> 40:14.000
Like a star join.

40:14.000 --> 40:17.000
I think that might be actually a good, like,

40:17.000 --> 40:18.000
project, all right?

40:18.000 --> 40:21.000
Let's, project how to learn about Postgres.

40:21.000 --> 40:27.000
Uh, there are things that we maybe could use in Postgres,

40:27.000 --> 40:31.000
like, both, which I did actually try on the, uh,

40:31.000 --> 40:33.000
on the OLTP.

40:33.000 --> 40:35.000
And it actually improved, like,

40:35.000 --> 40:36.000
two would by 40%.

40:36.000 --> 40:37.000
Right?

40:37.000 --> 40:39.000
So there's, there's potential.

40:39.000 --> 40:41.000
Uh, yeah.

40:41.000 --> 40:42.000
There's potential.

40:42.000 --> 40:43.000
I, I'm not sure.

40:43.000 --> 40:45.000
I will actually do that in production,

40:45.000 --> 40:48.000
or, like, in builds.

40:48.000 --> 40:51.000
There is plenty of, like, new must have.

40:51.000 --> 40:53.000
Um, under I actually had a,

40:53.000 --> 40:56.000
a really nice talk at PIGICONF EU.

40:56.000 --> 40:57.000
Uh, it's recorded.

40:57.000 --> 40:58.000
It's on YouTube.

40:58.000 --> 41:03.000
So if you want to understand how the database needs to deal with machines,

41:03.000 --> 41:06.000
that have, like, known uniform memory,

41:06.000 --> 41:11.000
which nowadays is everything, like, all the, all the machines have that.

41:11.000 --> 41:13.000
Uh, not just the machines with multiple circuits,

41:13.000 --> 41:16.000
but even just, like, single circuit machines.

41:16.000 --> 41:20.000
Uh, then I think this is a wonderful, uh,

41:20.000 --> 41:24.000
you know, interaction into that topic.

41:24.000 --> 41:28.000
For a lab, uh,

41:28.000 --> 41:31.000
I was surprised that actually the, the optimization of,

41:31.000 --> 41:33.000
of the binaries didn't actually help that much.

41:33.000 --> 41:35.000
I don't know why.

41:35.000 --> 41:40.000
Uh, it might be due to how we do, uh,

41:41.000 --> 41:43.000
evaluation of expressions.

41:43.000 --> 41:44.000
Why?

41:44.000 --> 41:46.000
That might be, uh, incompatible with,

41:46.000 --> 41:49.000
what the optimization is supposed to be doing.

41:49.000 --> 41:54.000
I think a more radical, uh, architecture,

41:54.000 --> 41:58.000
ideas need to be implemented.

41:58.000 --> 42:03.000
Uh, and there are some ways to, to that, like, complex plans,

42:03.000 --> 42:07.000
to confuse the database optimizer a little bit, right?

42:07.000 --> 42:09.000
So I'm not sure.

42:09.000 --> 42:13.000
So there are, uh, slides, links, and so on.

42:13.000 --> 42:18.000
Uh, a couple, uh, couple block posts about this kind of stuff.

42:18.000 --> 42:21.000
Uh, and if you have any questions, I'm happy to answer that.

42:21.000 --> 42:25.000
Either now, or, I think I will be hanging around, uh,

42:25.000 --> 42:27.000
uh, around the booth.

42:27.000 --> 42:30.000
Uh, so.

42:31.000 --> 42:40.000
Thank you very much, Thomas.

42:40.000 --> 42:42.000
So we've got a little bit of time for Q&A.

42:42.000 --> 42:45.000
Uh, the format of Q&A, if you can shout out your question,

42:45.000 --> 42:47.000
and Thomas, the people on line,

42:47.000 --> 42:50.000
because the, if you shout out the question in the room,

42:50.000 --> 42:51.000
we're all here.

42:51.000 --> 42:53.000
It's, but unfortunately, the microphone, what, the,

42:53.000 --> 42:54.000
the people on line works.

42:54.000 --> 42:57.000
So Thomas is going to repeat the question summarise it.

42:58.000 --> 43:00.000
So, uh, would you?

43:00.000 --> 43:03.000
So, TPC has a board to take process.

43:03.000 --> 43:07.000
Are there any vendors or other vendors to still go through that?

43:07.000 --> 43:08.000
I have no idea.

43:08.000 --> 43:10.000
Uh, we didn't repeat the question much.

43:10.000 --> 43:11.000
Yeah, okay.

43:11.000 --> 43:14.000
So, um, the question is like, uh, whether any process,

43:14.000 --> 43:19.000
when and the vendors are going through the TPC, um, auditing,

43:19.000 --> 43:22.000
because both the TPCH and TPCB,

43:22.000 --> 43:24.000
those are actually, um,

43:24.000 --> 43:28.000
clearly specified and a require that if you want to use those results,

43:28.000 --> 43:33.000
uh, you need to go through the TPC organization

43:33.000 --> 43:36.000
and make actually audited like the results, right?

43:36.000 --> 43:40.000
You are not supposed to just run benchmarks and,

43:40.000 --> 43:46.000
and, uh, publish the results on your own, right?

43:46.000 --> 43:50.000
I don't know if anyone is still doing that.

43:51.000 --> 43:53.000
Um, it kind of, I don't know.

43:53.000 --> 43:57.000
And for me, it wasn't my intention to, to get anything like that,

43:57.000 --> 44:00.000
because it's fairly expensive process, right?

44:00.000 --> 44:01.000
It's like a long thing.

44:01.000 --> 44:05.000
So, and I don't need audited results for my own use, right?

44:05.000 --> 44:07.000
So, I don't know.

44:07.000 --> 44:09.000
I haven't seen, for a long time,

44:09.000 --> 44:12.000
I haven't seen any vendor, like, you know,

44:12.000 --> 44:14.000
uh, in marketing materials to showing like,

44:14.000 --> 44:17.000
oh, we have this number, right?

44:18.000 --> 44:20.000
Um, I think the,

44:20.000 --> 44:23.000
that was probably more relevant in, in,

44:23.000 --> 44:26.000
in times of, uh, like, on-premise databases and, like,

44:26.000 --> 44:29.000
like, systems developed, uh, and, uh,

44:29.000 --> 44:33.000
provided as, uh, as a complete solution, right?

44:33.000 --> 44:35.000
Because, you know,

44:35.000 --> 44:37.000
the vendors will,

44:37.000 --> 44:40.000
they'll, they're giving you the database,

44:40.000 --> 44:42.000
but also the hardware and everything,

44:42.000 --> 44:44.000
all the services around it.

44:44.000 --> 44:46.000
So, it made sense to kind of like,

44:46.000 --> 44:49.000
and have, like, a stamp of, of the performance.

44:49.000 --> 44:51.000
Um,

44:51.000 --> 44:53.000
I don't think people are doing that now,

44:53.000 --> 44:55.000
because people are just, you know,

44:55.000 --> 44:57.000
taking a piece from there,

44:57.000 --> 44:58.000
taking a piece from there,

44:58.000 --> 45:02.000
running that on, on the cloud provider of their,

45:02.000 --> 45:03.000
uh, choice,

45:03.000 --> 45:06.000
and I don't think, uh,

45:06.000 --> 45:08.000
it's very relevant, anyway.

45:08.000 --> 45:09.000
Thank you.

45:09.000 --> 45:10.000
Thank you.

45:10.000 --> 45:12.000
Any more questions?

45:13.000 --> 45:17.000
Hello. Hello.

45:17.000 --> 45:18.000
Uh, hello.

45:18.000 --> 45:21.000
Uh, so question is about migration and updates,

45:21.000 --> 45:22.000
regarding AWS.

45:22.000 --> 45:24.000
Current veterans, for example,

45:24.000 --> 45:26.000
13.14, uh,

45:26.000 --> 45:28.000
have updates, uh,

45:28.000 --> 45:30.000
global updates in AWS, uh,

45:30.000 --> 45:32.000
and we will move all our data.

45:32.000 --> 45:33.000
Uh, for example,

45:33.000 --> 45:35.000
Chef used external post-greSQL,

45:35.000 --> 45:37.000
and, uh,

45:37.000 --> 45:39.000
will this migration pass smoothly?

45:39.000 --> 45:40.000
Uh,

45:40.000 --> 45:43.000
I mean, if Chef already in releases,

45:43.000 --> 45:46.000
says that it's used 13.18 version,

45:46.000 --> 45:47.000
like, okay.

45:47.000 --> 45:48.000
And, uh,

45:48.000 --> 45:50.000
AWS says that all versions from 14

45:50.000 --> 45:52.000
will be updated to 18.

45:52.000 --> 45:54.000
So, uh,

45:54.000 --> 45:56.000
should I make any tests

45:56.000 --> 45:57.000
before, uh,

45:57.000 --> 45:59.000
make this migration on product?

45:59.000 --> 46:02.000
Um,

46:02.000 --> 46:03.000
I'm not sure.

46:03.000 --> 46:05.000
I understood the question,

46:05.000 --> 46:07.000
the sound is really bad here.

46:07.000 --> 46:09.000
But, um,

46:09.000 --> 46:11.000
let me try, uh,

46:11.000 --> 46:13.000
to answer what I think is a question.

46:13.000 --> 46:14.000
Like,

46:14.000 --> 46:16.000
when you are upgrading and, uh,

46:16.000 --> 46:17.000
Postgres,

46:17.000 --> 46:19.000
the,

46:19.000 --> 46:20.000
um,

46:20.000 --> 46:22.000
it should be either compatible.

46:22.000 --> 46:24.000
You don't need to export import anything.

46:24.000 --> 46:25.000
Uh,

46:25.000 --> 46:26.000
and there's, uh,

46:26.000 --> 46:28.000
a tool which is called PG upgrade

46:28.000 --> 46:30.000
that you can use to, you know,

46:30.000 --> 46:32.000
transfer the catalogs, right?

46:32.000 --> 46:34.000
We haven't changed the,

46:34.000 --> 46:36.000
we've already changed the catalogs.

46:36.000 --> 46:38.000
So,

46:38.000 --> 46:40.000
the data files are still the same,

46:40.000 --> 46:42.000
and you shouldn't mean,

46:42.000 --> 46:43.000
and they like,

46:43.000 --> 46:44.000
copy out,

46:44.000 --> 46:46.000
copy in, or anything like that.

46:46.000 --> 46:48.000
So, was, was the answer for the version?

46:48.000 --> 46:49.000
Yeah.

46:49.000 --> 46:50.000
Yeah.

46:50.000 --> 46:51.000
Yeah.

46:51.000 --> 46:52.000
Um,

46:52.000 --> 46:53.000
I think it's short.

46:53.000 --> 46:54.000
I mean, like,

46:54.000 --> 46:55.000
Postgres has a lot of testing,

46:55.000 --> 46:56.000
exactly for, like, PG upgrade.

46:56.000 --> 46:59.000
Sorry.

46:59.000 --> 47:00.000
Yeah.

47:00.000 --> 47:01.000
Sorry.

47:01.000 --> 47:02.000
Postgres has a lot of, like,

47:02.000 --> 47:03.000
testing for upgrade.

47:03.000 --> 47:04.000
Uh,

47:04.000 --> 47:05.000
cases.

47:05.000 --> 47:06.000
I mean, like,

47:06.000 --> 47:07.000
it can break.

47:07.000 --> 47:08.000
Um,

47:08.000 --> 47:10.000
I have no doubt about that.

47:10.000 --> 47:11.000
But, uh,

47:11.000 --> 47:12.000
it shouldn't, right?

47:12.000 --> 47:15.000
We do our best to make sure that

47:15.000 --> 47:17.000
it's reliable.

47:17.000 --> 47:18.000
And, uh,

47:18.000 --> 47:20.000
I haven't seen a PG upgrade, like,

47:20.000 --> 47:22.000
failure for a long time.

47:22.000 --> 47:23.000
All right.

47:23.000 --> 47:24.000
I mean, like,

47:24.000 --> 47:26.000
usually it's not because of Postgres,

47:26.000 --> 47:28.000
but it's because of, um,

47:28.000 --> 47:31.000
like, a problem in the operating system, right?

47:31.000 --> 47:32.000
Um,

47:32.000 --> 47:34.000
like, using link mode in cases where

47:34.000 --> 47:36.000
that's not possible and stuff like that.

47:36.000 --> 47:38.000
So,

47:38.000 --> 47:39.000
I'm like,

47:39.000 --> 47:41.000
if you run into an issue,

47:41.000 --> 47:43.000
please report it.

47:43.000 --> 47:44.000
Um,

47:44.000 --> 47:45.000
and that's, and we will fix it.

47:45.000 --> 47:46.000
But, um,

47:46.000 --> 47:48.000
that's about that.

47:48.000 --> 47:49.000
Thank you.

47:49.000 --> 47:51.000
Any more questions,

47:51.000 --> 47:52.000
but he could, maybe at the back.

47:52.000 --> 47:53.000
Yeah.

47:53.000 --> 47:54.000
Put a hand up.

47:54.000 --> 47:55.000
Shout.

47:55.000 --> 47:56.000
Shout.

47:56.000 --> 47:58.000
Really shout out.

47:58.000 --> 48:00.000
Yeah.

48:00.000 --> 48:01.000
Yeah.

48:01.000 --> 48:02.000
Right.

48:02.000 --> 48:05.000
So, right.

48:05.000 --> 48:06.000
So, right.

48:06.000 --> 48:07.000
So, right.

48:07.000 --> 48:09.000
So, right.

48:09.000 --> 48:11.000
So, into the PG benchmark.

48:11.000 --> 48:14.000
So to the question is that in the,

48:14.000 --> 48:16.000
the PG benchmark,

48:16.000 --> 48:17.000
here.

48:17.000 --> 48:18.000
The, the, the one,

48:18.000 --> 48:20.000
one,

48:20.000 --> 48:21.000
a client.

48:21.000 --> 48:22.000
Um,

48:22.000 --> 48:23.000
line is like,

48:23.000 --> 48:24.000
almost.

48:24.000 --> 48:25.000
Uh,

48:25.000 --> 48:26.000
almost like flat person.

48:26.000 --> 48:29.000
It doesn't improve, right?

48:29.000 --> 48:39.000
It's a very simple benchmark, so the per client per client throughput didn't improve that much.

48:39.000 --> 48:42.000
Actually, it'll be grasped a little bit.

48:42.000 --> 48:49.000
But it's not visible here, but most of the improvements are really about concurrency,

48:49.000 --> 48:56.000
about handling, like, locking between sessions and so on, like more efficiently,

48:56.000 --> 48:59.000
those are most of the improvements.

48:59.000 --> 49:00.000
Yes.

49:00.000 --> 49:01.000
Thank you.

49:01.000 --> 49:03.000
We have time for just one more question.

49:03.000 --> 49:06.000
We have to finish at 9.50 exactly.

49:06.000 --> 49:07.000
I believe.

49:07.000 --> 49:12.000
So any more questions?

49:12.000 --> 49:13.000
Okay.

49:13.000 --> 49:15.000
Maybe I can give you one minute of your time, extra back.

49:15.000 --> 49:16.000
Thank you very much Thomas.

49:16.000 --> 49:18.000
That was really interesting.