WEBVTT 00:00.000 --> 00:07.000 Here on file systems, it was fascinating. 00:07.000 --> 00:17.000 It's nine o'clock exactly, so I'm going to hand over to Thomas, who's going to talk about performance 00:17.000 --> 00:23.000 improvements in Postgres for the last 20 years, and hopefully everyone can hear us on the live stream. 00:23.000 --> 00:25.000 Thomas, you might be working? 00:25.000 --> 00:28.000 Yeah, I think it's working fine. 00:28.000 --> 00:30.000 Everyone can hear me? 00:30.000 --> 00:32.000 Cool. 00:32.000 --> 00:35.000 So, hello, welcome to my talk. 00:35.000 --> 00:36.000 I'm Thomas Mondra. 00:36.000 --> 00:45.000 I'm a long-term Postgres contributor, developer, cometer, and at the moment I'm working for Microsoft, 00:45.000 --> 00:50.000 and where I still work on the open source stuff. 00:50.000 --> 00:55.000 But I'm also the first thing I'm a developer. 00:55.000 --> 01:02.000 So, this is a talk about development of Postgres, and looking back over the last 20 years, 01:02.000 --> 01:07.000 of improvements. 01:07.000 --> 01:17.000 The slides are already online on that URL, but that doesn't matter, I think. 01:17.000 --> 01:21.000 If you have any questions during the talk, please ask them right away, because there's going to be, 01:21.000 --> 01:28.000 like, a lot of different topics discussed, so it will just confusing to ask them at the end. 01:28.000 --> 01:33.000 So, just shout during the talk, and I will try to answer that. 01:33.000 --> 01:39.000 This talk is not really something that would teach you how to do new stuff, right? 01:39.000 --> 01:46.000 So, if you are here expecting to learn how to tune Postgres or something like that, that's not going to happen. 01:46.000 --> 01:52.000 So, this is like the rough agenda of this talk. 01:52.000 --> 02:00.000 First, I think I need to explain why I do these talks, because this is not like the first talk of this type that I do, 02:00.000 --> 02:08.000 and it's simply to inform myself, because when you are developing, working on a project for a long time, 02:08.000 --> 02:19.000 you kind of lose the idea how much did it actually improve over a longer period of time, 02:19.000 --> 02:24.000 because we only deal with individual releases. 02:24.000 --> 02:32.000 That's when you probably see, like, you install a new release, and you see, if it improves performance or something like that, 02:32.000 --> 02:37.000 but for level of persistence even worse, because we are only dealing with individual patches. 02:37.000 --> 02:45.000 We are benchmarking individual pieces of code, and, like, how does it exactly does it compose into, like, long-term view? 02:45.000 --> 02:47.000 I have no idea. 02:47.000 --> 02:54.000 So, I did this, these kind of benchmarks, I do them for, to inform myself, right? 02:54.000 --> 03:04.000 I've been asking myself, so, how did the performance of Postgres improve for typical workloads over a longer period of time? 03:04.000 --> 03:07.000 This is the result, right? 03:07.000 --> 03:16.000 I will be talking about two kind of, like, extreme, or, I wouldn't say typical, but extreme cases of workloads. 03:16.000 --> 03:21.000 The first one is, like, really, a transactional OLTP, right? 03:21.000 --> 03:29.000 Small transactions, small transactions, small queries, updates, singularos, something like that. 03:30.000 --> 03:40.000 Chances are your application does something more complicated, right? You are probably, like, loading larger amounts of data or processing larger amounts of data. 03:40.000 --> 03:50.000 It's somewhere on the spectrum between OLTP and OLAP, and OLAP is the second workload that I've been testing benchmarking. 03:50.000 --> 03:55.000 In this case, I've been using, like, a subset of TPCH. 03:55.000 --> 03:59.000 It's a data warehousing benchmark, right? 03:59.000 --> 04:07.000 It's something that runs loads larger amounts of data, like, tens of gigabytes, something like that, in this case, 04:07.000 --> 04:12.000 and runs queries that are aggregating the data and so on. 04:12.000 --> 04:18.000 So, it's, like, much more expensive, each query, it's, like, much more expensive that the queries in OLTP. 04:19.000 --> 04:26.000 And, usually, applications are somewhere in between, right? They do a bit of that, a bit of that, and so on. 04:26.000 --> 04:36.000 So, I will show you how the performance, how the truth put of postgres improved over the years for these two benchmarks, 04:36.000 --> 04:46.000 and then I will also speculate a bit about the future, right, about, like, we saw this behavior in the past. 04:46.000 --> 04:51.000 What's going to happen in the future? What would I, as a developer, expect? 04:51.000 --> 04:57.000 So, well, I already spoke about the motivation. 04:57.000 --> 05:07.000 So, I think it's actually quite tricky question to ask, like, how did the performance change over the longer period of time? 05:07.000 --> 05:13.000 First, because we just don't do the kind of benchmarks that allow, give you a answer, right? 05:13.000 --> 05:22.000 We do, we compare two comments, we may be compared to minor versions or major versions, but never, like, all the versions that want. 05:22.000 --> 05:31.000 You might think that if you are running an application, you will have, like, a better overview. 05:31.000 --> 05:36.000 But that's not really the case, right? Because the applications are not static, right? 05:36.000 --> 05:39.000 You are adding new features to the application. 05:39.000 --> 05:51.000 You are improving the application in various ways, and most importantly, I very much doubt you are still running the application on the hardware that you had, like, 20 years ago, right? 05:51.000 --> 05:58.000 You have done multiple upgrades, which completely, like, change the behavior of the database. 05:58.000 --> 06:03.000 So, this is why I do these benchmarks. 06:03.000 --> 06:15.000 And I think, also, it's not entirely, not only it's tricky question, but it's also not very fair, because if you think about that, then you are developing a piece of software. 06:15.000 --> 06:20.000 You are developing it in the context of the hardware, you have at that time, right? 06:20.000 --> 06:29.000 Like, the first multi-core CPUs came in, like, 2005, right, like, roughly. 06:29.000 --> 06:34.000 And that would, may not be perfectly accurate. 06:34.000 --> 06:40.000 In some cases, I think Spark had, like, more cores before that. 06:40.000 --> 06:47.000 Intel initially started things like hyperfitting, which is not, like, a full core. 06:47.000 --> 06:56.000 But the point is that before, like, 2004, 2005, it didn't really make sense to optimize for, like, multi-core performance, right? 06:56.000 --> 07:00.000 So, no one did that, at least in the post-gres world. 07:00.000 --> 07:13.000 So, now, looking at, like, a software from that time, and benchmarking that using benchmarks that actually require, like, good multi-core, doesn't make much sense. 07:13.000 --> 07:18.000 Well, it's not fair, right? 07:19.000 --> 07:26.000 The other thing is, of course, that post-gres is not living in vacuum, right? 07:26.000 --> 07:30.000 Post-gres is one of the applications running on the operating system. 07:30.000 --> 07:36.000 And there is, like, a lot of things that improve that we rely on, right? 07:36.000 --> 07:47.000 We rely on the kernel, we rely on file systems, we rely on various files of, like, of the system in general. 07:47.000 --> 07:51.000 And all of these things, like, improve significantly. 07:51.000 --> 07:55.000 I'm sure there are talks about these different subsystems. 07:55.000 --> 07:58.000 And those improve, and we are benefiting from that, right? 07:58.000 --> 08:02.000 So, Post-gres relies on the page cache in the kernel. 08:02.000 --> 08:07.000 So, if that became more scalable, we are getting the benefits. 08:07.000 --> 08:13.000 There have been some massive improvements in file systems. 08:13.000 --> 08:16.000 We also automatically get the benefits and so on. 08:17.000 --> 08:24.000 And, of course, like, users do not see those improvements in isolation, right? 08:24.000 --> 08:30.000 You get the new version, new system, with the new kernel, with the new storage, and all of that. 08:30.000 --> 08:32.000 And you see all just the combination of that. 08:32.000 --> 08:37.000 So, I've been trying to kind of, like, isolate just the post-gres bait. 08:37.000 --> 08:43.000 So, I have to warn you, I'm going to present a lot of, like, charts. 08:43.000 --> 08:54.000 I'm not going to show you, like, associates, but I'm going to show you charts that, hopefully, visualize the throughput or, like, the performance. 08:54.000 --> 08:57.000 The short version is, like, it's much faster, right? 08:57.000 --> 09:04.000 Like, if you just want to conclusion the executive summary, this is that. 09:04.000 --> 09:10.000 First, let's talk about all of the, you know, the simple transactional workload. 09:10.000 --> 09:12.000 There is, like, a TPCB. 09:12.000 --> 09:19.000 If you are running, if you are using PGBange with the default workload, this is what you will get. 09:19.000 --> 09:25.000 And I've been, the question was, which hardware to use? 09:25.000 --> 09:30.000 And I do happen to have some machines at home that I use for benchmarking. 09:30.000 --> 09:35.000 They happen to be just in the middle of the, you know, 20 years that I decided to benchmark. 09:35.000 --> 09:45.000 Because the first, first post-gres version that I decided to benchmark is, I think, A.0, which is almost exactly 20 years ago, right? 09:45.000 --> 09:49.000 So, I ended up using a machine, which is, like, roughly in between. 09:49.000 --> 09:52.000 It's not the biggest machine nowadays. 09:52.000 --> 09:59.000 I think it has, like, 4480 or 880 credits. 09:59.000 --> 10:05.000 And, like, 64GB of RAM, some SSDs, current kernel on their VM, and that's it, right? 10:05.000 --> 10:11.000 It's, like, fairly typical machine, nothing terrible. 10:11.000 --> 10:19.000 And the TPCB is, like, a data set, a simple web publication, like, a bank application. 10:19.000 --> 10:24.000 And I'm usually benchmarking three different data sizes, right? 10:24.000 --> 10:32.000 Because, um, if you are familiar with Postgres, you know that there's, like, a shared buffer, which is a cache managed by Postgres. 10:32.000 --> 10:38.000 So, the first sizes, when this, when the data set fits into shared buffers, right? 10:38.000 --> 10:44.000 Um, then there's, like, a medium data set, which fits into RAM, but it's larger than shared buffers. 10:44.000 --> 10:51.000 And then we have, like, a large, which is much larger than available memory, which means it's going to do a lot of IO. 10:51.000 --> 10:59.000 And each of those data set sizes usually hits slightly different different buffal max in the database, right? 10:59.000 --> 11:09.000 Um, the small data set usually hits, like, uh, locking issues, the medium data set usually hits, like, CPU, buffal max. 11:09.000 --> 11:15.000 Uh, and large, usually, like, IO, uh, bucklet. 11:15.000 --> 11:19.000 And then, of course, we have, like, different combinations of parameters, right? 11:19.000 --> 11:23.000 We're on either read only transactions or updates. 11:23.000 --> 11:27.000 Um, you can test with different client counts, right? 11:27.000 --> 11:32.000 It's something different when you run with a single connection and, like, a hundred connections. 11:32.000 --> 11:34.000 And all of them working concurrently. 11:34.000 --> 11:43.000 Uh, I had to do fairly short runs in, like, single digit minutes, because, uh, all of these, uh, combinations. 11:43.000 --> 11:45.000 That's a lot of combinations. 11:45.000 --> 11:49.000 In total, a single run of the benchmark was, like, two weeks, right? 11:49.000 --> 11:54.000 So I just needed to limit the, the, the runs. 11:54.000 --> 12:00.000 Ideally, if you are doing, like, a proper benchmark, it would be, like, multiple checkpoints and all of that. 12:00.000 --> 12:04.000 I had to, uh, you know, compromise on that. 12:04.000 --> 12:12.000 And I did, like, a configuration, which is, like, roughly the same on all the, all the versions, right? 12:12.000 --> 12:18.000 Starting from 80, uh, to the current development version. 12:18.000 --> 12:26.000 Um, so it's a, on new versions, I would probably, I should probably use, like, large-ish head buffers. 12:26.000 --> 12:31.000 On 8.0, 2 gigabytes is the largest that it's actually possible, right? 12:31.000 --> 12:35.000 So, it's a trade-off, right? 12:35.000 --> 12:38.000 It should work reasonably well. 12:39.000 --> 12:45.000 Uh, and, of course, the benchmarking script, the tool, I always use from Postgres 18, right? 12:45.000 --> 12:48.000 Because there was a lot of improvements, even in the benchmarking tool. 12:48.000 --> 12:51.000 And I don't want to be benchmarking that, right? 12:51.000 --> 12:54.000 I want to see the, you know, consistent results. 12:54.000 --> 13:00.000 So, and this is the result of how it looks on the small data set read only. 13:00.000 --> 13:04.000 Um, it's just 1.5 gigabytes, so it fits into shared buffers. 13:04.000 --> 13:10.000 And all these charts that I'm going to show for OLTP look like this. 13:10.000 --> 13:15.000 The different colors, each color is like a fixed client count, right? 13:15.000 --> 13:17.000 It's the same number of clients. 13:17.000 --> 13:24.000 And on, on the X axis is the version of Postgres up to 18, which is, like, that's the current development version. 13:24.000 --> 13:26.000 That's not released yet, right? 13:26.000 --> 13:32.000 And the Y axis, that's true number of transactions per second, right? 13:32.000 --> 13:37.000 And you can see that, like, that's much by the truth put, right? 13:37.000 --> 13:43.000 It's like, we did maybe, I don't know, 100,000 or 50,000 transactions on 8.0. 13:43.000 --> 13:46.000 Now we are doing like a million, right? 13:46.000 --> 13:50.000 So it's like a massive speed up. 13:50.000 --> 13:56.000 Um, this is for simple transactions, a simple mode with prepared statements. 13:56.000 --> 13:59.000 It's like 50% faster, yeah, right? 13:59.000 --> 14:02.000 And that was always the case, mostly. 14:02.000 --> 14:08.000 You can see that we also did some some weird stuff in the past, right? 14:08.000 --> 14:13.000 For example, on 9.2, 9.3, that was actually a regression. 14:13.000 --> 14:15.000 Like a pretty significant one, right? 14:15.000 --> 14:19.000 It's like, I didn't look into what exactly happened here, 14:19.000 --> 14:24.000 but 9.2, 9.3 actually progressed. 14:25.000 --> 14:30.000 There are also cases where you switch to prepared statements. 14:30.000 --> 14:33.000 And it gets slower, right? 14:33.000 --> 14:35.000 These are things that are, right? 14:35.000 --> 14:39.000 In retrospect, it's like, well, we shouldn't have done that. 14:39.000 --> 14:43.000 That's like, that's something that we missed during testing. 14:43.000 --> 14:48.000 But since I think 9.5, where Robert House introduced, 14:48.000 --> 14:55.000 I think, fast-bought looking, which is like a simpler, faster way to lock stuff, 14:55.000 --> 14:58.000 which scales much better. 14:58.000 --> 15:03.000 Since then we are in like really, really good, you know, situation. 15:03.000 --> 15:05.000 I mean, like, it's like almost perfectly stable. 15:05.000 --> 15:10.000 And this is, this is a workload, which is like super simple. 15:10.000 --> 15:15.000 So it's like optimized as much as possible at this point, right? 15:16.000 --> 15:20.000 So this is like the small read only data set. 15:20.000 --> 15:22.000 If you look at like small read, right? 15:22.000 --> 15:24.000 You also see that there are some, you know, 15:24.000 --> 15:27.000 weird stuff happening here. 15:27.000 --> 15:32.000 But again, since roughly, possibly 10, 15:32.000 --> 15:36.000 we are in a really, really like, stable situation. 15:36.000 --> 15:39.000 The same thing is for, like, the pre-closed statement. 15:40.000 --> 15:45.000 So that's good. 15:45.000 --> 15:52.000 If you look at the, you know, read only, I will skip the medium data set size, 15:52.000 --> 15:59.000 because that has almost the same results as the small one in this case. 15:59.000 --> 16:04.000 If you look at the large data set, which is Iow bound, again, 16:05.000 --> 16:10.000 you see some very significant improvements initially. 16:10.000 --> 16:14.000 And then it gets like, mostly stable, right? 16:14.000 --> 16:18.000 There is a bit more jitter here, like, these runs are like, 16:18.000 --> 16:21.000 slower, faster and so on. 16:21.000 --> 16:28.000 That's mostly because the runs were so, so short that it actually didn't even out, right? 16:28.000 --> 16:32.000 The same thing for, for prepare statements. 16:32.000 --> 16:38.000 Also, it doesn't help that much to prepare, to use, 16:38.000 --> 16:45.000 to use prepare statements in this case, because it's Iow bound, right? 16:45.000 --> 16:56.000 So, and prepare statements matter most for, for, for CPU bound queries. 16:56.000 --> 16:58.000 Can you still hear me? 16:58.000 --> 17:01.000 Okay, you didn't break it. 17:01.000 --> 17:03.000 Yeah, okay. 17:03.000 --> 17:07.000 So, do I need to stand in, like, a certain place or? 17:07.000 --> 17:09.000 Okay, I will stay here. 17:09.000 --> 17:11.000 Yeah, okay. 17:11.000 --> 17:17.000 So, this was like, prepare statements, or simple LTP. 17:17.000 --> 17:23.000 But of course, like, LTP or TPCB, this kind of, yeah? 17:23.000 --> 17:27.000 On the private slide, what does that mean? 17:28.000 --> 17:30.000 Yeah, I don't know. 17:30.000 --> 17:32.000 I did look into that. 17:32.000 --> 17:37.000 It might have been somewhat like a hiccup in, in the benchmarking script. 17:37.000 --> 17:38.000 I don't know. 17:38.000 --> 17:39.000 It seems weird. 17:39.000 --> 17:40.000 I agree. 17:40.000 --> 17:41.000 Good. 17:41.000 --> 17:43.000 That you noticed. 17:43.000 --> 17:45.000 Yeah. 17:45.000 --> 17:51.000 Anyway, TPCB is not like the only LTP workload that you can be using, right? 17:51.000 --> 17:54.000 It's a simple, there are no joins for example. 17:54.000 --> 17:57.000 And chances are that if you, even if you have, like, LTP workload, 17:57.000 --> 18:00.000 you are still joining multiple tables. 18:00.000 --> 18:03.000 And quite typical thing is, like, a star join. 18:03.000 --> 18:08.000 Like, star join is usually, if mentioned in the data warehousing, 18:08.000 --> 18:12.000 you know, applications. 18:12.000 --> 18:17.000 But you can actually have star join in LTP, 18:17.000 --> 18:21.000 because you are looking up data in one table and then 18:21.000 --> 18:25.000 enriching the data with, like, additional information from, from other tables, 18:25.000 --> 18:26.000 right? 18:26.000 --> 18:27.000 And that's a star join. 18:27.000 --> 18:36.000 It's like, the fact that you just filter the, the fact table doesn't really change that. 18:36.000 --> 18:39.000 So, I did actually try, like, a star join. 18:39.000 --> 18:43.000 It's a very common pattern, actually, in applications in LTP. 18:43.000 --> 18:48.000 And I did a simple test with, like, a tiny amounts of data, 18:49.000 --> 18:51.000 because you are assuming it's in memory. 18:51.000 --> 18:56.000 And you only have, like, ten, you have, for one table, the main one, the fact. 18:56.000 --> 18:59.000 And then you have, like, ten, do dimension tables, right? 18:59.000 --> 19:03.000 So, it might be a customer info with, like, address and, like, 19:03.000 --> 19:07.000 transaction info and, this kind of stuff. 19:07.000 --> 19:11.000 This is how the query looks like, roughly, right? 19:11.000 --> 19:15.000 A simple join to ten tables. 19:15.000 --> 19:19.000 And it only does 2,000 transactions per second on Postgres. 19:19.000 --> 19:22.000 Like, even on 44 cars. 19:22.000 --> 19:25.000 Let's terrible, right? 19:25.000 --> 19:31.000 The thing is, I've been looking into, like, why this is happening. 19:31.000 --> 19:35.000 And we are just spending so much time on, like, you know, 19:35.000 --> 19:39.000 going through all the possible combinations of the join order, 19:39.000 --> 19:44.000 that, like, ten factorial options right here. 19:44.000 --> 19:47.000 So, that's variable. 19:47.000 --> 19:51.000 And if you, if you just disable, like, prepared statements, 19:51.000 --> 19:55.000 which means you only do the planning once, then the execution is, like, 19:55.000 --> 19:57.000 super, super fast, right? 19:57.000 --> 20:02.000 It's like, you are suddenly getting, like, half a million transactions per second. 20:02.000 --> 20:08.000 And you can actually see the main improvements in scalability on this chart. 20:08.000 --> 20:13.000 Because in 92, we did get the fast spot looking, right? 20:13.000 --> 20:17.000 So, that's a huge jump here. 20:17.000 --> 20:22.000 Then, in working, I'm not sure that was a lot of, like, 20:22.000 --> 20:27.000 improvements of scalability of the MVCC and snapshots and all of that. 20:27.000 --> 20:31.000 So, I don't have a single feature that I could put here. 20:31.000 --> 20:33.000 It's a combination of those. 20:33.000 --> 20:38.000 And then, in 18, I committed an improvement of the fast spot 20:38.000 --> 20:42.000 looking to actually scale to even more funds, you know, relations. 20:42.000 --> 20:45.000 So, in 18, we will get faster again. 20:45.000 --> 20:49.000 But that doesn't change the fact that if you, this only works 20:49.000 --> 20:52.000 if you are using prepare statements, right? 20:52.000 --> 20:55.000 And maybe you can't, right? 20:55.000 --> 21:00.000 That's, the problems with prepare statements and connection 21:00.000 --> 21:02.000 pulling for example, right? 21:02.000 --> 21:08.000 So, the thing is, there are a couple simple hacks that you can do. 21:08.000 --> 21:11.000 For example, you can, you can say, well, 21:11.000 --> 21:13.000 I'm not going to do simple joints. 21:13.000 --> 21:16.000 I'm going to do left joints, right? 21:16.000 --> 21:20.000 And that restricts the possible joint orders a little bit. 21:20.000 --> 21:26.000 And you are not doing, you are not doing, you know, 21:26.000 --> 21:30.000 half a million transactions, but you are doing, like, five times 21:30.000 --> 21:35.000 the original throughput, right, which is good. 21:35.000 --> 21:39.000 There are some other things that you can cue, but really, 21:39.000 --> 21:42.000 this is something we need to fix or improve in Postgres, right? 21:42.000 --> 21:46.000 It's a limitation of the, of the planner. 21:46.000 --> 21:51.000 So, just to give you a brief summary of the OLVP, 21:51.000 --> 21:54.000 there have been some massive improvements in scale building, right? 21:54.000 --> 22:00.000 It's like, and I mean, like, all the of magnitude or more. 22:00.000 --> 22:02.000 Like, we did maybe, I don't know, 22:02.000 --> 22:10.000 50,000 transactions on AWS, we are doing a million transactions now, right? 22:10.000 --> 22:12.000 So, that's really good. 22:12.000 --> 22:16.000 They're having some small regressions in the past, 22:16.000 --> 22:21.000 where something weird happened, right? 22:21.000 --> 22:27.000 Either the throughput decreased with the new major version, 22:27.000 --> 22:33.000 or, like, you did get lower throughput with prepare statements 22:33.000 --> 22:38.000 compared to, you know, full parsing and full planning. 22:38.000 --> 22:42.000 That's really, that's mostly gone since, like, 90. 22:42.000 --> 22:47.000 And I think that's a testament to how much are we testing Postgres, 22:47.000 --> 22:50.000 and also how much we improved, actually, 22:50.000 --> 22:56.000 to develop our processes and, like, how much more careful we are? 22:56.000 --> 22:59.000 This is, like, what could happen in the future? 22:59.000 --> 23:03.000 Like, should you expect other, like, massive growth? 23:03.000 --> 23:07.000 And for me, this is a bit, like, you know, the famous last words, 23:07.000 --> 23:11.000 because every time I make a prediction, like, immediately the next day, 23:11.000 --> 23:14.000 someone sends a patch to the mailing list, proving me wrong, right? 23:14.000 --> 23:17.000 It's like, usually it's undressed, right? 23:17.000 --> 23:22.000 But I think we are kind of, like, out of the low hanging fruit, right? 23:22.000 --> 23:26.000 A big thing's, I don't think there's a change. 23:26.000 --> 23:32.000 That would be, like, a minor fix that results in massive improvements 23:32.000 --> 23:33.000 in throughput. 23:33.000 --> 23:36.000 There are things that we can improve, 23:36.000 --> 23:39.000 but I would expect to be more, like, incremental, right? 23:39.000 --> 23:44.000 We are, like, right now, we are often talking about patches that improve, 23:44.000 --> 23:47.000 for example, my 5%, right? 23:47.000 --> 23:50.000 So, sure, if you do a lot of those, 23:51.000 --> 23:53.000 that could be, like, a significant improvement, 23:53.000 --> 23:56.000 but it's not something that you commit a fast path looking, 23:56.000 --> 23:59.000 and it gets, like, 10 times faster, right? 24:04.000 --> 24:05.000 Yeah. 24:05.000 --> 24:09.000 This was just, like, a simple single number result, right? 24:09.000 --> 24:11.000 It only showed, like, truthwood. 24:11.000 --> 24:15.000 I haven't been showing, like, the consistency of the results. 24:15.000 --> 24:20.000 In the sense that how stable the results are over time, right? 24:20.000 --> 24:26.000 It's, like, you could have, like, very violently jumping results, 24:26.000 --> 24:30.000 and it would give you the same average in the end. 24:30.000 --> 24:33.000 But I think we also got, like, much better in that. 24:33.000 --> 24:35.000 It's much smoother, right? 24:35.000 --> 24:39.000 It's partially because, like, we are just, like, so much faster in, 24:39.000 --> 24:44.000 in the absolute number, that the variation doesn't really matter that much. 24:44.000 --> 24:49.000 So, yeah, I will skip that. 24:49.000 --> 24:53.000 Or the other work that I want to talk about is a lot, 24:53.000 --> 24:58.000 and you can imagine that as, like, a large data set process 24:58.000 --> 25:01.000 in analytical workload, right? 25:01.000 --> 25:05.000 TPCH is, like, a fairly traditional benchmark. 25:05.000 --> 25:07.000 A really simplistic has a lot of flaws, 25:07.000 --> 25:10.000 but also, it's very simple to understand, in fact, 25:10.000 --> 25:12.000 sufficiently good. 25:13.000 --> 25:16.000 So, for this benchmark, I use, like, a much smaller machine, 25:16.000 --> 25:21.000 again, roughly from, like, 20, 10, 15 years ago. 25:21.000 --> 25:24.000 I don't have this machine anymore. 25:24.000 --> 25:27.000 That's, but because I replaced that, 25:27.000 --> 25:30.000 but it had, like, only four cores, 25:30.000 --> 25:33.000 16 year by so far, I am, and so on. 25:33.000 --> 25:39.000 The logic here is that this benchmark is not really about concurrency, right? 25:39.000 --> 25:43.000 It's not about running 100, like, connections at once. 25:43.000 --> 25:46.000 It's about one connection running query, 25:46.000 --> 25:49.000 maybe with parallel workers or something. 25:49.000 --> 25:52.000 So, the number, of course, doesn't really matter. 25:52.000 --> 25:58.000 And also, it doesn't need to be super large amounts of data 25:58.000 --> 26:04.000 to actually show the behavior of aggregations, for example, right? 26:05.000 --> 26:07.000 So, other than that, it's, like, 26:07.000 --> 26:09.000 again, running current debyan, 26:09.000 --> 26:12.000 PhD4 and so on. 26:12.000 --> 26:15.000 And TPCH is a very simple benchmark. 26:15.000 --> 26:19.000 It's very simple to set up, run. 26:19.000 --> 26:23.000 It only has, like, 22 queries that are, like, large, 26:23.000 --> 26:28.000 analytical queries using different different types of, you know, 26:28.000 --> 26:32.000 conditions or aggregations, joins and so on. 26:32.000 --> 26:36.000 And I tested not just this, 22 queries, 26:36.000 --> 26:39.000 but also the initial data load, right? 26:39.000 --> 26:41.000 Because, that also matters. 26:41.000 --> 26:45.000 How fast you can actually load data into Postgres, it's important. 26:45.000 --> 26:49.000 If you are looking for a great paper about PCH, 26:49.000 --> 26:51.000 I highly recommend this, you know, 26:51.000 --> 26:57.000 this paper by Peter Bunch, Tomach, Noiman and Ori Arling. 26:57.000 --> 27:01.000 They are actually looking for, they are looking at individual queries, 27:01.000 --> 27:04.000 and pointing out the bottleneck. 27:04.000 --> 27:09.000 Each query is actually designed to hit, 27:09.000 --> 27:12.000 it's a wonderful paper. 27:12.000 --> 27:16.000 So, the first, let's talk about data loads. 27:16.000 --> 27:20.000 So, these chances will be different, again, 27:20.000 --> 27:23.000 on the X axis is the version of Postgres, 27:23.000 --> 27:26.000 and the P here means parallelism, right? 27:26.000 --> 27:30.000 That means we have enabled parallelism for the application. 27:30.000 --> 27:32.000 Postgres started to support parallel queries, 27:32.000 --> 27:34.000 or like, parallel workers, 27:34.000 --> 27:37.000 and maintenance operations in 9.6, 27:37.000 --> 27:41.000 and then improved, like, I did more, 27:41.000 --> 27:44.000 more support over time, 27:44.000 --> 27:49.000 and on the Y axis is duration in seconds. 27:49.000 --> 27:52.000 And this is, like, the thinking by data set. 27:52.000 --> 27:58.000 So, you can see that there's, like, a huge improvement initially, 27:58.000 --> 28:02.000 and then some smaller improvements here, 28:02.000 --> 28:06.000 where we, you know, optimize how we are doing bulk loading, 28:06.000 --> 28:07.000 and that kind of stuff. 28:07.000 --> 28:13.000 And also, the blue is the initial copy, right? 28:13.000 --> 28:16.000 So, a copy improved significantly here, 28:16.000 --> 28:18.000 and then it remains mostly the same, 28:18.000 --> 28:21.000 because we simply can't really improve the, 28:21.000 --> 28:25.000 the parsing of the CSV or something. 28:25.000 --> 28:29.000 And then the other thing is create index that the yellow one, 28:29.000 --> 28:32.000 you can see that, actually, it improved quite a bit, 28:32.000 --> 28:34.000 even without parsing. 28:34.000 --> 28:37.000 And then, at some point, 28:37.000 --> 28:41.000 we started to support parallel index builds here in 11, 28:41.000 --> 28:44.000 and since then it's mostly stable, right? 28:44.000 --> 28:47.000 So, we have optimized, like, most of the stuff, 28:48.000 --> 28:56.000 and, yeah, it doesn't improve as fast as in the parsing. 28:56.000 --> 28:59.000 If you look at the, at the queries, 28:59.000 --> 29:03.000 this is just a, again, 29:03.000 --> 29:06.000 Y axis is number of seconds, 29:06.000 --> 29:10.000 and this shows, if you run all the queries and some 29:10.000 --> 29:14.000 the durations together, how long it will take, right? 29:14.000 --> 29:16.000 And you can see, again, 29:16.000 --> 29:18.000 massive speed-ups initially, right, 29:18.000 --> 29:20.000 between A and A.2, 29:20.000 --> 29:22.000 I stay at the bottom, 29:22.000 --> 29:24.000 because I'm not too bad. 29:24.000 --> 29:28.000 It's so long ago, it didn't work for me. 29:28.000 --> 29:31.000 But there is, like, a massive improvements here, 29:31.000 --> 29:34.000 and I've been, I've been told that this is probably 29:34.000 --> 29:37.000 supportful bitmap index scans, right? 29:37.000 --> 29:39.000 We didn't have bitmap index scans, 29:39.000 --> 29:41.000 so this probably was, 29:41.000 --> 29:43.000 thanks for that. 29:43.000 --> 29:46.000 Then, in A.4, bitmap index scans 29:46.000 --> 29:49.000 actually started to support perfection, right? 29:49.000 --> 29:52.000 We introduced effective IO concurrency meaning, 29:52.000 --> 29:54.000 like, how fast you can actually 29:54.000 --> 29:57.000 pre-fetch data from, that you will need soon, right? 29:57.000 --> 30:01.000 Kind of, like, asynchronous IO a little bit. 30:01.000 --> 30:04.000 Then we started supporting index on this scan, 30:04.000 --> 30:07.000 again, significant speed up here, right? 30:07.000 --> 30:11.000 And then there are, like, a couple of releases 30:11.000 --> 30:14.000 where the improvements are a bit smaller, 30:14.000 --> 30:16.000 and it's a combination of multiple, 30:16.000 --> 30:18.000 multiple improvements. 30:18.000 --> 30:21.000 And since then, it's, again, mostly stable. 30:21.000 --> 30:24.000 You can see that the, the parallelism helps a bit. 30:24.000 --> 30:26.000 In this case, it actually, 30:26.000 --> 30:29.000 it was a bit slower than here. 30:29.000 --> 30:32.000 But generally, it's quite stable, right? 30:32.000 --> 30:34.000 You will get a good performance 30:34.000 --> 30:37.000 for row or into database here. 30:38.000 --> 30:44.000 I could even show a chart 30:44.000 --> 30:49.000 where the duration is broken by query, right? 30:49.000 --> 30:51.000 Those, like, each color here is, like, 30:51.000 --> 30:54.000 one of those 22 queries into DPCH, 30:54.000 --> 30:56.000 and you can see that initially, 30:56.000 --> 30:59.000 there have been many queries that were, like, 30:59.000 --> 31:02.000 substantial fraction of, of the run. 31:02.000 --> 31:06.000 And most of those actually, you know, disappeared, right? 31:06.000 --> 31:10.000 The only thing that actually remains is the query one, 31:10.000 --> 31:14.000 which is just, like, a huge aggregation of large amounts of data, right? 31:14.000 --> 31:16.000 There are no joints, nothing, 31:16.000 --> 31:18.000 it's just about the row power, 31:18.000 --> 31:20.000 of actually processing data. 31:20.000 --> 31:24.000 Paralism helps, that's very effective in handling that, 31:24.000 --> 31:27.000 but actually to improve that further, 31:27.000 --> 31:30.000 we would need something, like, 31:30.000 --> 31:34.000 columnar, columnar storage with columnar execution, 31:34.000 --> 31:36.000 something like that, which we don't, right? 31:36.000 --> 31:41.000 We are still on, on row store database. 31:44.000 --> 31:48.000 So, I would like to talk about, 31:48.000 --> 31:52.000 I didn't talk about, like, planning before, 31:52.000 --> 31:56.000 but here, I think I need to talk about all app regressions, 31:56.000 --> 32:03.000 because the queries in, in TPCH are more complex, right? 32:03.000 --> 32:09.000 And therefore, that are kind of, like, more susceptible to, 32:09.000 --> 32:12.000 you know, changes in, in the query plan, like, 32:12.000 --> 32:17.000 the queries in TPCB, those are all going to be, 32:17.000 --> 32:21.000 Lukeups by primary key, right, index scans and, 32:21.000 --> 32:22.000 this can't stop. 32:22.000 --> 32:26.000 With, with all app, it's more complicated, 32:26.000 --> 32:29.000 and it's susceptible, not just to mistakes, 32:29.000 --> 32:33.000 but also to changes due to changes in, 32:33.000 --> 32:37.000 in configuration parameters, and how we actually interpret those, right? 32:37.000 --> 32:42.000 So, for example, I wouldn't say that's, like, 32:42.000 --> 32:45.000 a real regression, because that's intentional change, 32:45.000 --> 32:47.000 and the consequence of that. 32:47.000 --> 32:51.000 But, for example, we have increased the, 32:51.000 --> 32:54.000 the value of, effectively, exercise, 32:54.000 --> 32:56.000 which doesn't really mean, 32:56.000 --> 32:59.000 because this is going to use more memory, 32:59.000 --> 33:02.000 it's just, like, information how much data is cached, 33:02.000 --> 33:07.000 which means it's making random IO Luke cheaper, 33:07.000 --> 33:12.000 which means that the database is more likely to pick index scans, right? 33:12.000 --> 33:15.000 And if you are on a system where index, 33:15.000 --> 33:19.000 then random IO is actually not as cheap as the database believes, 33:19.000 --> 33:23.000 this is more likely to lead to regressions, right? 33:24.000 --> 33:26.000 Similarly, effective IO concurrency. 33:26.000 --> 33:29.000 In Postgres 14, we have changed how we interpret 33:29.000 --> 33:31.000 the configuration volume a bit, right? 33:31.000 --> 33:33.000 So suddenly, if you didn't change, 33:33.000 --> 33:36.000 if you just copied the configuration file, 33:36.000 --> 33:38.000 suddenly, the database would perfect, 33:38.000 --> 33:41.000 much less data, right? 33:41.000 --> 33:44.000 And again, if you look into the release notes, 33:44.000 --> 33:46.000 well, it's there, right? 33:46.000 --> 33:49.000 But people do not actually, you know, 33:50.000 --> 33:52.000 it's surprised me, and I'm, I'm, 33:52.000 --> 33:55.000 I'm experienced postgresive operator, right? 33:55.000 --> 33:58.000 And there are some additional improvements 33:58.000 --> 34:00.000 that, of course, can, 34:00.000 --> 34:04.000 there are meant to help the database, right? 34:04.000 --> 34:06.000 If the database knows, for example, 34:06.000 --> 34:10.000 about foreign key relationships during query planning, 34:10.000 --> 34:13.000 that gives it additional information, 34:13.000 --> 34:18.000 but maybe maybe the query plan was by accident, 34:18.000 --> 34:22.000 actually, the world plan was actually faster, right? 34:22.000 --> 34:24.000 That can happen. 34:24.000 --> 34:26.000 And they're probably more. 34:26.000 --> 34:28.000 So I run into a couple of those. 34:28.000 --> 34:30.000 I wouldn't say those are like problems, 34:30.000 --> 34:32.000 like box in Postgres. 34:32.000 --> 34:34.000 Those are, I'm just explaining that it's more like 34:34.000 --> 34:37.000 a limitation of the technology in general, right? 34:37.000 --> 34:40.000 Like, you improve configuration, 34:40.000 --> 34:43.000 and by accident, it breaks something, 34:43.000 --> 34:45.000 like, makes something worse, right? 34:46.000 --> 34:49.000 Sometimes, like, two wrongs are actually better. 34:53.000 --> 34:56.000 So, for all I'm, what's the summary, 34:56.000 --> 34:58.000 and what's the future? 34:58.000 --> 35:02.000 I think, just like for, I think it's, 35:02.000 --> 35:04.000 for the quality of people's runs, 35:04.000 --> 35:06.000 we have seen some significant improvements 35:06.000 --> 35:07.000 over the years. 35:07.000 --> 35:10.000 I mean, like, a zero got prefetching 35:10.000 --> 35:12.000 for bitmapscans. 35:12.000 --> 35:14.000 Nine or two got indexed only scans. 35:14.000 --> 35:16.000 Nine or six got parallelism. 35:16.000 --> 35:18.000 And there are probably some additional 35:18.000 --> 35:19.000 like improvements. 35:19.000 --> 35:22.000 Like, I didn't actually mention like, 35:22.000 --> 35:25.000 uh, parallel index builds here, 35:25.000 --> 35:28.000 um, in Postgres 11, right? 35:28.000 --> 35:32.000 Um, and since Postgres 11, 35:32.000 --> 35:34.000 it's mostly unchanged, right? 35:34.000 --> 35:37.000 Like, there are some small improvements, of course. 35:37.000 --> 35:40.000 I'm not saying that's not the case. 35:40.000 --> 35:42.000 Uh, but for the future, 35:42.000 --> 35:44.000 I would probably expect more, like, 35:44.000 --> 35:46.000 incremental improvements. 35:46.000 --> 35:50.000 Again, people that will optimize individual, 35:50.000 --> 35:52.000 uh, individual operations 35:52.000 --> 35:58.000 to, to get a couple more percent, uh, 35:58.000 --> 35:59.000 of that. 35:59.000 --> 36:02.000 But those are, like, mostly small gains. 36:02.000 --> 36:06.000 I think, uh, we will see some improvements 36:06.000 --> 36:08.000 in, like, a tooling, like, 36:08.000 --> 36:11.000 something that Postgres doesn't change in the code, 36:11.000 --> 36:14.000 but we will, for example, start using, uh, 36:14.000 --> 36:17.000 like, optimization of the binaries. 36:17.000 --> 36:19.000 Like, not just the optimizations built 36:19.000 --> 36:21.000 into Postgres directly, 36:21.000 --> 36:24.000 uh, but, like, 36:24.000 --> 36:28.000 there is PGO, which is profile guided optimization 36:28.000 --> 36:31.000 and bald, which is an optimizer 36:31.000 --> 36:33.000 doing something like that. 36:33.000 --> 36:36.000 Um, and that can actually, uh, 36:36.000 --> 36:39.000 help a lot with data intensive applications. 36:40.000 --> 36:43.000 Um, but I think, for, like, really fundamental 36:43.000 --> 36:45.000 improvements, 36:45.000 --> 36:47.000 we would need to, 36:47.000 --> 36:49.000 to do something, 36:49.000 --> 36:51.000 the analytical databases aren't doing, right? 36:51.000 --> 36:53.000 Postgres is a general purpose database, 36:53.000 --> 36:55.000 which is very capable. 36:55.000 --> 36:56.000 I, I, I love it. 36:56.000 --> 36:59.000 Uh, but still, it's a roaster. 36:59.000 --> 37:01.000 Um, it was kind of, like, 37:01.000 --> 37:04.000 most of the workloads are probably transactional, 37:04.000 --> 37:06.000 uh, like, OLTP. 37:07.000 --> 37:09.000 It can do a lot of analytics, 37:09.000 --> 37:11.000 but it can't really compete with the databases 37:11.000 --> 37:13.000 that have been, you know, 37:13.000 --> 37:15.000 designed for that use case for their workload, 37:15.000 --> 37:17.000 specifically, right? 37:17.000 --> 37:20.000 So we would need to adopt something like column storage 37:20.000 --> 37:22.000 and not just the storage, 37:22.000 --> 37:24.000 but also change the executor, right, 37:24.000 --> 37:26.000 to actually leverage the, 37:26.000 --> 37:30.000 the efficiencies in, in the columnar, uh, 37:30.000 --> 37:32.000 format. 37:33.000 --> 37:35.000 Or maybe in Postgres is, like, 37:35.000 --> 37:38.000 very, very extensible and very flexible. 37:38.000 --> 37:40.000 And we do have a, 37:40.000 --> 37:42.000 a long history of actually, 37:42.000 --> 37:45.000 allowing Postgres to become bind with other databases 37:45.000 --> 37:48.000 or, or some other engines, right? 37:48.000 --> 37:50.000 So, um, 37:50.000 --> 37:53.000 we had foreign data wrappers for a long time, 37:53.000 --> 37:55.000 which just allowed you to interact very, 37:55.000 --> 37:58.000 uh, efficiently with, with other databases. 37:58.000 --> 38:00.000 Or we could develop something completely new, 38:00.000 --> 38:03.000 like, um, to, for example, 38:03.000 --> 38:06.000 I know that there are people working on 38:06.000 --> 38:09.000 DAGDB engine and actually using that 38:09.000 --> 38:11.000 to run queries from Postgres, right? 38:11.000 --> 38:13.000 Um, offloading some of the queries 38:13.000 --> 38:16.000 that can benefit from DAGDB. 38:16.000 --> 38:17.000 And stuff like that. 38:17.000 --> 38:20.000 So, I don't know what is going to happen. 38:20.000 --> 38:23.000 Um, people are, uh, 38:23.000 --> 38:27.000 often have, like, really, really crazy ideas. 38:27.000 --> 38:30.000 Uh, but I think, if we are, 38:30.000 --> 38:32.000 if we are to improve this, 38:32.000 --> 38:36.000 some fundamental change needs not to. 38:36.000 --> 38:39.000 So, to some all of this up, 38:39.000 --> 38:43.000 um, I'm quite happy with, uh, 38:43.000 --> 38:46.000 with improvements in Postgres over 20 years. 38:46.000 --> 38:48.000 I mean, like, it makes me feel good as a developer 38:48.000 --> 38:50.000 that we have achieved that. 38:50.000 --> 38:54.000 Uh, what do I expect in the future, 38:55.000 --> 38:57.000 incremental changes, 38:57.000 --> 39:00.000 but also people, uh, 39:00.000 --> 39:02.000 surprising me with some, you know, 39:02.000 --> 39:04.000 fundamental, massive improvements. 39:04.000 --> 39:06.000 Um, I would like to, 39:06.000 --> 39:08.000 this talk was about performance, right? 39:08.000 --> 39:10.000 About truth put and, like, 39:10.000 --> 39:11.000 some benchmarks, 39:11.000 --> 39:13.000 but I want to make it very clear that 39:13.000 --> 39:16.000 that's not the only thing that makes the database accessible. 39:16.000 --> 39:17.000 All right. 39:17.000 --> 39:20.000 There is a lot of other stuff that actually 39:20.000 --> 39:21.000 matters for users. 39:21.000 --> 39:25.000 Maybe more than just the role performance, right? 39:25.000 --> 39:29.000 It's how easily you can actually operate the stuff. 39:29.000 --> 39:31.000 I mean, like, if you have a database, 39:31.000 --> 39:32.000 which is, like, super fast, 39:32.000 --> 39:34.000 but it's like, 39:34.000 --> 39:38.000 shittons of work to actually keep it running. 39:38.000 --> 39:41.000 You will not do that in production, right? 39:41.000 --> 39:43.000 And we are trying to improve, 39:43.000 --> 39:47.000 um, improve this, uh, all the time. 39:47.000 --> 39:50.000 But let's know what this talk about, right? 39:50.000 --> 39:54.000 So I hope you enjoy this talk. 39:54.000 --> 39:58.000 Uh, well, there are a couple of questions. 39:58.000 --> 40:00.000 Um, this is actually, uh, 40:00.000 --> 40:03.000 the, uh, the question that was already asked. 40:03.000 --> 40:04.000 I don't know what that is. 40:04.000 --> 40:06.000 Uh, would be interesting. 40:06.000 --> 40:10.000 Um, there's an opportunity to optimize, you know, 40:10.000 --> 40:12.000 some special queries. 40:12.000 --> 40:14.000 Like a star join. 40:14.000 --> 40:17.000 I think that might be actually a good, like, 40:17.000 --> 40:18.000 project, all right? 40:18.000 --> 40:21.000 Let's, project how to learn about Postgres. 40:21.000 --> 40:27.000 Uh, there are things that we maybe could use in Postgres, 40:27.000 --> 40:31.000 like, both, which I did actually try on the, uh, 40:31.000 --> 40:33.000 on the OLTP. 40:33.000 --> 40:35.000 And it actually improved, like, 40:35.000 --> 40:36.000 two would by 40%. 40:36.000 --> 40:37.000 Right? 40:37.000 --> 40:39.000 So there's, there's potential. 40:39.000 --> 40:41.000 Uh, yeah. 40:41.000 --> 40:42.000 There's potential. 40:42.000 --> 40:43.000 I, I'm not sure. 40:43.000 --> 40:45.000 I will actually do that in production, 40:45.000 --> 40:48.000 or, like, in builds. 40:48.000 --> 40:51.000 There is plenty of, like, new must have. 40:51.000 --> 40:53.000 Um, under I actually had a, 40:53.000 --> 40:56.000 a really nice talk at PIGICONF EU. 40:56.000 --> 40:57.000 Uh, it's recorded. 40:57.000 --> 40:58.000 It's on YouTube. 40:58.000 --> 41:03.000 So if you want to understand how the database needs to deal with machines, 41:03.000 --> 41:06.000 that have, like, known uniform memory, 41:06.000 --> 41:11.000 which nowadays is everything, like, all the, all the machines have that. 41:11.000 --> 41:13.000 Uh, not just the machines with multiple circuits, 41:13.000 --> 41:16.000 but even just, like, single circuit machines. 41:16.000 --> 41:20.000 Uh, then I think this is a wonderful, uh, 41:20.000 --> 41:24.000 you know, interaction into that topic. 41:24.000 --> 41:28.000 For a lab, uh, 41:28.000 --> 41:31.000 I was surprised that actually the, the optimization of, 41:31.000 --> 41:33.000 of the binaries didn't actually help that much. 41:33.000 --> 41:35.000 I don't know why. 41:35.000 --> 41:40.000 Uh, it might be due to how we do, uh, 41:41.000 --> 41:43.000 evaluation of expressions. 41:43.000 --> 41:44.000 Why? 41:44.000 --> 41:46.000 That might be, uh, incompatible with, 41:46.000 --> 41:49.000 what the optimization is supposed to be doing. 41:49.000 --> 41:54.000 I think a more radical, uh, architecture, 41:54.000 --> 41:58.000 ideas need to be implemented. 41:58.000 --> 42:03.000 Uh, and there are some ways to, to that, like, complex plans, 42:03.000 --> 42:07.000 to confuse the database optimizer a little bit, right? 42:07.000 --> 42:09.000 So I'm not sure. 42:09.000 --> 42:13.000 So there are, uh, slides, links, and so on. 42:13.000 --> 42:18.000 Uh, a couple, uh, couple block posts about this kind of stuff. 42:18.000 --> 42:21.000 Uh, and if you have any questions, I'm happy to answer that. 42:21.000 --> 42:25.000 Either now, or, I think I will be hanging around, uh, 42:25.000 --> 42:27.000 uh, around the booth. 42:27.000 --> 42:30.000 Uh, so. 42:31.000 --> 42:40.000 Thank you very much, Thomas. 42:40.000 --> 42:42.000 So we've got a little bit of time for Q&A. 42:42.000 --> 42:45.000 Uh, the format of Q&A, if you can shout out your question, 42:45.000 --> 42:47.000 and Thomas, the people on line, 42:47.000 --> 42:50.000 because the, if you shout out the question in the room, 42:50.000 --> 42:51.000 we're all here. 42:51.000 --> 42:53.000 It's, but unfortunately, the microphone, what, the, 42:53.000 --> 42:54.000 the people on line works. 42:54.000 --> 42:57.000 So Thomas is going to repeat the question summarise it. 42:58.000 --> 43:00.000 So, uh, would you? 43:00.000 --> 43:03.000 So, TPC has a board to take process. 43:03.000 --> 43:07.000 Are there any vendors or other vendors to still go through that? 43:07.000 --> 43:08.000 I have no idea. 43:08.000 --> 43:10.000 Uh, we didn't repeat the question much. 43:10.000 --> 43:11.000 Yeah, okay. 43:11.000 --> 43:14.000 So, um, the question is like, uh, whether any process, 43:14.000 --> 43:19.000 when and the vendors are going through the TPC, um, auditing, 43:19.000 --> 43:22.000 because both the TPCH and TPCB, 43:22.000 --> 43:24.000 those are actually, um, 43:24.000 --> 43:28.000 clearly specified and a require that if you want to use those results, 43:28.000 --> 43:33.000 uh, you need to go through the TPC organization 43:33.000 --> 43:36.000 and make actually audited like the results, right? 43:36.000 --> 43:40.000 You are not supposed to just run benchmarks and, 43:40.000 --> 43:46.000 and, uh, publish the results on your own, right? 43:46.000 --> 43:50.000 I don't know if anyone is still doing that. 43:51.000 --> 43:53.000 Um, it kind of, I don't know. 43:53.000 --> 43:57.000 And for me, it wasn't my intention to, to get anything like that, 43:57.000 --> 44:00.000 because it's fairly expensive process, right? 44:00.000 --> 44:01.000 It's like a long thing. 44:01.000 --> 44:05.000 So, and I don't need audited results for my own use, right? 44:05.000 --> 44:07.000 So, I don't know. 44:07.000 --> 44:09.000 I haven't seen, for a long time, 44:09.000 --> 44:12.000 I haven't seen any vendor, like, you know, 44:12.000 --> 44:14.000 uh, in marketing materials to showing like, 44:14.000 --> 44:17.000 oh, we have this number, right? 44:18.000 --> 44:20.000 Um, I think the, 44:20.000 --> 44:23.000 that was probably more relevant in, in, 44:23.000 --> 44:26.000 in times of, uh, like, on-premise databases and, like, 44:26.000 --> 44:29.000 like, systems developed, uh, and, uh, 44:29.000 --> 44:33.000 provided as, uh, as a complete solution, right? 44:33.000 --> 44:35.000 Because, you know, 44:35.000 --> 44:37.000 the vendors will, 44:37.000 --> 44:40.000 they'll, they're giving you the database, 44:40.000 --> 44:42.000 but also the hardware and everything, 44:42.000 --> 44:44.000 all the services around it. 44:44.000 --> 44:46.000 So, it made sense to kind of like, 44:46.000 --> 44:49.000 and have, like, a stamp of, of the performance. 44:49.000 --> 44:51.000 Um, 44:51.000 --> 44:53.000 I don't think people are doing that now, 44:53.000 --> 44:55.000 because people are just, you know, 44:55.000 --> 44:57.000 taking a piece from there, 44:57.000 --> 44:58.000 taking a piece from there, 44:58.000 --> 45:02.000 running that on, on the cloud provider of their, 45:02.000 --> 45:03.000 uh, choice, 45:03.000 --> 45:06.000 and I don't think, uh, 45:06.000 --> 45:08.000 it's very relevant, anyway. 45:08.000 --> 45:09.000 Thank you. 45:09.000 --> 45:10.000 Thank you. 45:10.000 --> 45:12.000 Any more questions? 45:13.000 --> 45:17.000 Hello. Hello. 45:17.000 --> 45:18.000 Uh, hello. 45:18.000 --> 45:21.000 Uh, so question is about migration and updates, 45:21.000 --> 45:22.000 regarding AWS. 45:22.000 --> 45:24.000 Current veterans, for example, 45:24.000 --> 45:26.000 13.14, uh, 45:26.000 --> 45:28.000 have updates, uh, 45:28.000 --> 45:30.000 global updates in AWS, uh, 45:30.000 --> 45:32.000 and we will move all our data. 45:32.000 --> 45:33.000 Uh, for example, 45:33.000 --> 45:35.000 Chef used external post-greSQL, 45:35.000 --> 45:37.000 and, uh, 45:37.000 --> 45:39.000 will this migration pass smoothly? 45:39.000 --> 45:40.000 Uh, 45:40.000 --> 45:43.000 I mean, if Chef already in releases, 45:43.000 --> 45:46.000 says that it's used 13.18 version, 45:46.000 --> 45:47.000 like, okay. 45:47.000 --> 45:48.000 And, uh, 45:48.000 --> 45:50.000 AWS says that all versions from 14 45:50.000 --> 45:52.000 will be updated to 18. 45:52.000 --> 45:54.000 So, uh, 45:54.000 --> 45:56.000 should I make any tests 45:56.000 --> 45:57.000 before, uh, 45:57.000 --> 45:59.000 make this migration on product? 45:59.000 --> 46:02.000 Um, 46:02.000 --> 46:03.000 I'm not sure. 46:03.000 --> 46:05.000 I understood the question, 46:05.000 --> 46:07.000 the sound is really bad here. 46:07.000 --> 46:09.000 But, um, 46:09.000 --> 46:11.000 let me try, uh, 46:11.000 --> 46:13.000 to answer what I think is a question. 46:13.000 --> 46:14.000 Like, 46:14.000 --> 46:16.000 when you are upgrading and, uh, 46:16.000 --> 46:17.000 Postgres, 46:17.000 --> 46:19.000 the, 46:19.000 --> 46:20.000 um, 46:20.000 --> 46:22.000 it should be either compatible. 46:22.000 --> 46:24.000 You don't need to export import anything. 46:24.000 --> 46:25.000 Uh, 46:25.000 --> 46:26.000 and there's, uh, 46:26.000 --> 46:28.000 a tool which is called PG upgrade 46:28.000 --> 46:30.000 that you can use to, you know, 46:30.000 --> 46:32.000 transfer the catalogs, right? 46:32.000 --> 46:34.000 We haven't changed the, 46:34.000 --> 46:36.000 we've already changed the catalogs. 46:36.000 --> 46:38.000 So, 46:38.000 --> 46:40.000 the data files are still the same, 46:40.000 --> 46:42.000 and you shouldn't mean, 46:42.000 --> 46:43.000 and they like, 46:43.000 --> 46:44.000 copy out, 46:44.000 --> 46:46.000 copy in, or anything like that. 46:46.000 --> 46:48.000 So, was, was the answer for the version? 46:48.000 --> 46:49.000 Yeah. 46:49.000 --> 46:50.000 Yeah. 46:50.000 --> 46:51.000 Yeah. 46:51.000 --> 46:52.000 Um, 46:52.000 --> 46:53.000 I think it's short. 46:53.000 --> 46:54.000 I mean, like, 46:54.000 --> 46:55.000 Postgres has a lot of testing, 46:55.000 --> 46:56.000 exactly for, like, PG upgrade. 46:56.000 --> 46:59.000 Sorry. 46:59.000 --> 47:00.000 Yeah. 47:00.000 --> 47:01.000 Sorry. 47:01.000 --> 47:02.000 Postgres has a lot of, like, 47:02.000 --> 47:03.000 testing for upgrade. 47:03.000 --> 47:04.000 Uh, 47:04.000 --> 47:05.000 cases. 47:05.000 --> 47:06.000 I mean, like, 47:06.000 --> 47:07.000 it can break. 47:07.000 --> 47:08.000 Um, 47:08.000 --> 47:10.000 I have no doubt about that. 47:10.000 --> 47:11.000 But, uh, 47:11.000 --> 47:12.000 it shouldn't, right? 47:12.000 --> 47:15.000 We do our best to make sure that 47:15.000 --> 47:17.000 it's reliable. 47:17.000 --> 47:18.000 And, uh, 47:18.000 --> 47:20.000 I haven't seen a PG upgrade, like, 47:20.000 --> 47:22.000 failure for a long time. 47:22.000 --> 47:23.000 All right. 47:23.000 --> 47:24.000 I mean, like, 47:24.000 --> 47:26.000 usually it's not because of Postgres, 47:26.000 --> 47:28.000 but it's because of, um, 47:28.000 --> 47:31.000 like, a problem in the operating system, right? 47:31.000 --> 47:32.000 Um, 47:32.000 --> 47:34.000 like, using link mode in cases where 47:34.000 --> 47:36.000 that's not possible and stuff like that. 47:36.000 --> 47:38.000 So, 47:38.000 --> 47:39.000 I'm like, 47:39.000 --> 47:41.000 if you run into an issue, 47:41.000 --> 47:43.000 please report it. 47:43.000 --> 47:44.000 Um, 47:44.000 --> 47:45.000 and that's, and we will fix it. 47:45.000 --> 47:46.000 But, um, 47:46.000 --> 47:48.000 that's about that. 47:48.000 --> 47:49.000 Thank you. 47:49.000 --> 47:51.000 Any more questions, 47:51.000 --> 47:52.000 but he could, maybe at the back. 47:52.000 --> 47:53.000 Yeah. 47:53.000 --> 47:54.000 Put a hand up. 47:54.000 --> 47:55.000 Shout. 47:55.000 --> 47:56.000 Shout. 47:56.000 --> 47:58.000 Really shout out. 47:58.000 --> 48:00.000 Yeah. 48:00.000 --> 48:01.000 Yeah. 48:01.000 --> 48:02.000 Right. 48:02.000 --> 48:05.000 So, right. 48:05.000 --> 48:06.000 So, right. 48:06.000 --> 48:07.000 So, right. 48:07.000 --> 48:09.000 So, right. 48:09.000 --> 48:11.000 So, into the PG benchmark. 48:11.000 --> 48:14.000 So to the question is that in the, 48:14.000 --> 48:16.000 the PG benchmark, 48:16.000 --> 48:17.000 here. 48:17.000 --> 48:18.000 The, the, the one, 48:18.000 --> 48:20.000 one, 48:20.000 --> 48:21.000 a client. 48:21.000 --> 48:22.000 Um, 48:22.000 --> 48:23.000 line is like, 48:23.000 --> 48:24.000 almost. 48:24.000 --> 48:25.000 Uh, 48:25.000 --> 48:26.000 almost like flat person. 48:26.000 --> 48:29.000 It doesn't improve, right? 48:29.000 --> 48:39.000 It's a very simple benchmark, so the per client per client throughput didn't improve that much. 48:39.000 --> 48:42.000 Actually, it'll be grasped a little bit. 48:42.000 --> 48:49.000 But it's not visible here, but most of the improvements are really about concurrency, 48:49.000 --> 48:56.000 about handling, like, locking between sessions and so on, like more efficiently, 48:56.000 --> 48:59.000 those are most of the improvements. 48:59.000 --> 49:00.000 Yes. 49:00.000 --> 49:01.000 Thank you. 49:01.000 --> 49:03.000 We have time for just one more question. 49:03.000 --> 49:06.000 We have to finish at 9.50 exactly. 49:06.000 --> 49:07.000 I believe. 49:07.000 --> 49:12.000 So any more questions? 49:12.000 --> 49:13.000 Okay. 49:13.000 --> 49:15.000 Maybe I can give you one minute of your time, extra back. 49:15.000 --> 49:16.000 Thank you very much Thomas. 49:16.000 --> 49:18.000 That was really interesting.