WEBVTT

00:00.000 --> 00:16.000
So, this is the, my talk on liberal office technology and some atomic trading improvements we had and some work on there and some thoughts on all of that.

00:16.000 --> 00:23.000
So, just as there's in general, I know the computers are rubbish of floating pint numbers, so I expect nothing from them.

00:23.000 --> 00:33.000
But I always trust manager, I know that the computers pretty good at adding one to them and they get bigger and, okay, there's limits, but I'm pretty, pretty confident that the dentures are happy.

00:33.000 --> 00:48.000
So, we use them for reference counting things, we know that it's going to be a problem if we're trying to reference count things with a simple integer, but we know that we have standard atomic in, we can get a reliable number that's good across strengths and we're happy with that.

00:48.000 --> 01:02.000
Now, only graph is we don't, we have like a predecessor to standard atomic, but it's the same thing and we're using the same, it trinsics on gc platforms as standard atomic users, so let's just call it standard atomic and move on.

01:02.000 --> 01:13.000
The common things we use it for our strings, our 16-bit strings and our 8-bit strings, and we use them for a whole bunch of other things like formula tokens aside in count and also other things.

01:13.000 --> 01:23.000
It's grace, it's smaller memory, quicker in copying them, and computers are really good at adding and subtracting one from things because that never goes wrong.

01:23.000 --> 01:29.000
But then turns out that it can be rubbish with this too under certain circumstances.

01:29.000 --> 01:41.000
So, two quick kind of case studies, the first one is where we have a very slow down, especially if we're coming up to 20 threads, we've got pradons in this interpret formula group.

01:41.000 --> 01:47.000
The main work is done by just classically finding issues where those things, but those mutics is taken.

01:47.000 --> 01:51.000
We need to get rid of all this and understand good stuff adding cash.

01:51.000 --> 01:58.000
So, we make the huge improvement in this 20-trade case by doing this work, but we're so not quite at the target area.

01:58.000 --> 02:09.000
So, when we look at hot spot or something like that, we can see that our reference commenting is starting to become an issue, once you get up to these large number of threads.

02:10.000 --> 02:17.000
In this case, again, it's the same atomic incremental deck that is here in the deck with an incorrect fear of taking up significant amounts of time.

02:17.000 --> 02:28.000
So, in this particular scenario, we know that all of these tokens that this reference counting of going up and going down and one, it's all a whole pile of work which is absolutely unnecessary.

02:28.000 --> 02:37.000
Because we know what's supposed to happen is that we start to have calculating this bunch of tokens, but it's how we come to the end, we're supposed to have the same reference count anyway.

02:37.000 --> 02:42.000
We're only incrementing for the sake of decrementing, and vice versa, vice versa.

02:42.000 --> 02:50.000
So, in this particular area, we know that if we could actually turn off reference counting entirely, we should have pretty much the same results that we did beforehand.

02:50.000 --> 02:58.000
So, in this particular scenario, what we do is we turn off reference counting entirely during this previous calculation zone.

02:58.000 --> 03:07.000
We also know there's a cache going on, but we know that cache is per thread, so we don't need a thread safe reference counting for that section.

03:07.000 --> 03:17.000
So, we have created these three policies of your normal thread safe reference counting a trail and safe one, and then just, you know, don't do anything at all.

03:17.000 --> 03:26.000
And with those results then, we can get that, say, three seconds back down to two and a half seconds by adding in these policies.

03:26.000 --> 03:35.000
So, a lot of significant amount of time is being used adding and subtracting, which really just seems rather ridiculous.

03:35.000 --> 03:42.000
Similar scenario again, we've got a problem with creating formulas using our standard optimizations work.

03:42.000 --> 03:51.000
We're bringing that down from whatever you're 20 seconds to 10 seconds, but you still got a lot of remaining time, and again we go back and we look at where is that going on.

03:52.000 --> 04:03.000
And once it's just the same arterial string of choir is back to the same atomic increment again, and we have a lot of these increments going on.

04:03.000 --> 04:20.000
And we can kind of, we can see that a lot of time is missing that string, but we can't really easily see why is that string a problem versus is it just because we are using a huge number of strings or is it because we've got a smaller number of strings,

04:20.000 --> 04:28.000
because reference count is being contended on by multiple threads, and we counted at the end, and the end of that was why I want to do this in the first place.

04:28.000 --> 04:35.000
So, we're assuming here that we've got a problem with very contention with the same reference count on multiple threads.

04:35.000 --> 04:49.000
In this particular case here, we can see that we are returning strings quite a lot, and if we change this around instead to returning strings where we're trying a copy of the string that's a reference count, we have like whatever 20 million of them.

04:49.000 --> 05:04.000
But if we still do like 20 million visits down to that string and we don't do any copy of the raw, then we can take, you know, we can take our eight seconds down to five and a half seconds just by removing these reference counts as increments and decrements.

05:04.000 --> 05:09.000
So there's a cost here, and it's really occurring costs that I'm kind of seeing again and again.

05:09.000 --> 05:16.000
There are more sophisticated reference counting things out there to people are talking about keeping reference counts per threads,

05:16.000 --> 05:21.000
or dividing a reference count is open to various sophisticated two level things.

05:21.000 --> 05:26.000
So there is techniques out there, we don't make any use of those techniques.

05:26.000 --> 05:31.000
The fact that they exist is possibly worth looking into, but that's not what I'm doing.

05:31.000 --> 05:38.000
We're just doing here, it's just trying to identify where they are and then take simple techniques to work around them from now.

05:38.000 --> 05:52.000
But I really want to do is I just want to be able to use perf to find these contentions and do that directly without inferring it from the farmers' chats that these are probably contented ones.

05:52.000 --> 06:01.000
And it's difficult to find information with information is actually there if you just read it all or if you search for it in the wrong places and eventually find you way to the right places.

06:01.000 --> 06:10.000
The right place to look for is perflist-v verbostly, and that then will tell you that they have these contented access things.

06:10.000 --> 06:17.000
And if you don't read that further searching for that, it gives you these rather obscure books verbost things.

06:17.000 --> 06:30.000
And if you then follow those hints and you profile and you go back to my cases and I revert the work I did and I find this and I guess the positions where I have particularly problematic trade contention.

06:30.000 --> 06:35.000
I can see that it's those particular strings that are the ones that are causing problems.

06:35.000 --> 06:48.000
And then when I put them by the case that we didn't back on and I can see that our RTO string thing has disappeared entirely from the profile and I'm fairly confident that's where the issue was.

06:48.000 --> 06:56.000
And that's as good as I can do on the strings and there's maybe a few more remaining places where there's more content and I have a new technique to find them.

06:56.000 --> 06:58.000
And that's it. Thanks.

07:00.000 --> 07:03.000
Thank you.