WEBVTT 00:00.000 --> 00:07.000 OK, so it's green, we're taking on faith for it's working. 00:07.000 --> 00:10.000 All right, I think we're starting. 00:10.000 --> 00:13.000 OK, and I have working mouse, excellent. 00:13.000 --> 00:15.000 And I will stay behind the desk. 00:15.000 --> 00:20.000 OK, so I'm a compiler dev over at AMD. 00:20.000 --> 00:23.000 And I want to float a slightly weird idea, 00:23.000 --> 00:25.000 as a bunch of friendly compiler engineers, 00:25.000 --> 00:27.000 and falls down to quite friendly, 00:27.000 --> 00:29.000 and some of you are probably compiler engineers. 00:29.000 --> 00:32.000 So I hope that's going to work out. 00:32.000 --> 00:35.000 Just for those brief scientists check for me, 00:35.000 --> 00:36.000 just at your hand of the air, 00:36.000 --> 00:40.000 if you've committed something to LOM or GC. 00:40.000 --> 00:43.000 OK, that's really good news. 00:43.000 --> 00:46.000 This is a short block of time. 00:46.000 --> 00:48.000 So I can't start with a nice discussion 00:48.000 --> 00:50.000 of what a compiler front end is. 00:50.000 --> 00:52.000 So we're going to go straight for, 00:52.000 --> 00:55.000 I think the compiler back end 00:56.000 --> 01:00.000 is more specialized to the task that it needs to be. 01:00.000 --> 01:03.000 But we would have a happier time as compiler engineers, 01:03.000 --> 01:05.000 and as users of the compiler, 01:05.000 --> 01:09.000 if we gradually threw away the back end, 01:09.000 --> 01:13.000 by stretching the middle end further and further back, 01:13.000 --> 01:17.000 until it actually hits instruction, the mission. 01:17.000 --> 01:20.000 And that is not a widely popular view. 01:21.000 --> 01:25.000 But it's not completely unfounded. 01:25.000 --> 01:29.000 So the spirit of establishing some form of credibility, 01:29.000 --> 01:32.000 I'm going to try to talk about a couple of very strongly 01:32.000 --> 01:35.000 back end specific things, which I mentioned in the back end, 01:35.000 --> 01:38.000 which I didn't, and it was fine. 01:38.000 --> 01:40.000 There. 01:40.000 --> 01:44.000 It's a little bit GPU-specific, but not horrendously so. 01:44.000 --> 01:48.000 I think complete ignorance of GPU architectures will be fine. 01:48.000 --> 01:52.000 So that, notwithstanding, first example is memory allocation, 01:52.000 --> 01:54.000 specific to GPU. 01:54.000 --> 01:56.000 Oh, over this context. 01:56.000 --> 01:57.000 Excellent. 01:57.000 --> 02:00.000 Yes, please talk to me while I'm speaking, 02:00.000 --> 02:03.000 or after I'm speaking, or using these computer things, 02:03.000 --> 02:07.000 if you must, questions are welcome anywhere in this. 02:07.000 --> 02:09.000 If none of you ask any questions, 02:09.000 --> 02:11.000 there's going to be a lot of time at the end, 02:11.000 --> 02:14.000 so I probably don't have 20 minutes of content for you all. 02:15.000 --> 02:18.000 So it's been called ODS. 02:18.000 --> 02:21.000 This is a small block, our very fast memory, 02:21.000 --> 02:26.000 which it's important to use if you want your GPU code to run quickly. 02:26.000 --> 02:31.000 There's a GPU is structured as a kind of... 02:31.000 --> 02:34.000 Emure was a number of completely independent programs, 02:34.000 --> 02:36.000 which can talk to each other a bit, 02:36.000 --> 02:41.000 backed by slightly jibis terminology. 02:41.000 --> 02:46.000 The best way to describe this is as a register allocation problem. 02:46.000 --> 02:50.000 The premises, one part of your paid, 02:50.000 --> 02:54.000 has a number on it for how many bytes of its magic memory do you want, 02:54.000 --> 02:56.000 and you have to ask for a small number, 02:56.000 --> 02:57.000 it doesn't start. 02:57.000 --> 03:00.000 But you can say, I want 16 kilobytes, 03:00.000 --> 03:03.000 and you have that, and that's fine. 03:03.000 --> 03:05.000 And then other parts of your code, 03:05.000 --> 03:08.000 want to use variables, which are somewhere in the single, 03:08.000 --> 03:12.000 integer's block, and they need to find them. 03:12.000 --> 03:15.000 So the original game here, 03:15.000 --> 03:19.000 is if you want to reference ODS from a different function, 03:19.000 --> 03:22.000 to one which allocates a debt, you can't, 03:22.000 --> 03:25.000 just like tough, doesn't compile. 03:25.000 --> 03:27.000 So you can say, 03:27.000 --> 03:29.000 who has 16 kilobytes, 03:29.000 --> 03:31.000 and a few calls bar, 03:31.000 --> 03:34.000 and bar wants to reference a variable number ODS, 03:34.000 --> 03:35.000 air can't. 03:36.000 --> 03:39.000 That was unpopular, because people write functions, 03:39.000 --> 03:41.000 and if he functions, 03:41.000 --> 03:44.000 cannot be fully in line into the caller, 03:44.000 --> 03:48.000 you get a compile error, and it's sad. 03:48.000 --> 03:52.000 So what I want to do here is take 03:52.000 --> 03:55.000 the block memory allocates in one function, 03:55.000 --> 04:01.000 and scribble out enough metadata that it can be found from other functions. 04:02.000 --> 04:05.000 And the task here amounts to, 04:05.000 --> 04:08.000 x is a double, y is a float. 04:08.000 --> 04:11.000 They both need an address in your single block memory, 04:11.000 --> 04:15.000 about 10 and 16 or whatever you want. 04:15.000 --> 04:18.000 Such that it can be located elsewhere. 04:18.000 --> 04:20.000 And, 04:20.000 --> 04:22.000 for the compile engine, there's some buildings, 04:22.000 --> 04:25.000 but it's really obviously register allocation. 04:25.000 --> 04:29.000 We used to think it registers as you have 16 distinct values, 04:29.000 --> 04:31.000 and we all have our own special name. 04:31.000 --> 04:36.000 But it's clearly a single block of memory of length 16 times which we've registered. 04:36.000 --> 04:41.000 And the goal is assigning offsets in this block of memory to the magic names, 04:41.000 --> 04:45.000 so that you can find stuff in these variables from somewhere else. 04:51.000 --> 04:55.000 And the reason you do register allocation in the back end, 04:55.000 --> 04:58.000 which is you want to do it relatively late. 04:59.000 --> 05:02.000 And the reason we did ODS allocation in the back end, 05:02.000 --> 05:05.000 is because that's where we did register allocation. 05:09.000 --> 05:12.000 So yeah, there's a small common theme here. 05:12.000 --> 05:14.000 There's a thing called global ISO. 05:14.000 --> 05:18.000 I love the underside of the selection-day based back end 05:18.000 --> 05:22.000 was ugly and could be improved, 05:22.000 --> 05:26.000 and adopted a more as safe and alternative IR. 05:26.000 --> 05:29.000 I want to say 10 years ago. 05:29.000 --> 05:32.000 I remember it being announced and being very excited. 05:32.000 --> 05:35.000 And I don't believe we've moved over to it yet. 05:35.000 --> 05:39.000 But if you write code for the NDGP back end, 05:39.000 --> 05:42.000 you end up implementing for SDG and for global ISO, 05:42.000 --> 05:44.000 and the code path is different and it's, 05:44.000 --> 05:47.000 I don't want to write all of test cases twice. 05:47.000 --> 05:49.000 So instead, 05:49.000 --> 05:51.000 there's an IR pass, 05:51.000 --> 05:53.000 which does all the red, 05:53.000 --> 05:56.000 like sort of things you'd expect to see. 05:56.000 --> 05:58.000 It proves for call graph going, 05:58.000 --> 06:00.000 who can reach this variable, 06:00.000 --> 06:01.000 do this variable as ALS, 06:01.000 --> 06:04.000 where shall we allocate these variables? 06:04.000 --> 06:06.000 And the IR pass, 06:06.000 --> 06:10.000 and it's a table investigator as IR. 06:10.000 --> 06:15.000 It's a constant array of integers. 06:15.000 --> 06:18.000 And that's tested in IR. 06:18.000 --> 06:20.000 It has lists of optimizations, 06:20.000 --> 06:22.000 which try to do a better job of laying out memory. 06:22.000 --> 06:25.000 Or to an IR. 06:25.000 --> 06:28.000 And because it's written as a single pass, 06:28.000 --> 06:32.000 the SDG and global ISO parts are tiny. 06:32.000 --> 06:35.000 In fact, it's mostly implemented in the assembler, 06:35.000 --> 06:40.000 which had to be taught what a constant number was for ALDS. 06:40.000 --> 06:44.000 And it was fine. 06:44.000 --> 06:47.000 I got some pushback from colleagues, 06:47.000 --> 06:50.000 because I was writing it in my wrong place. 06:51.000 --> 06:52.000 But it worked. 06:52.000 --> 06:54.000 It's since been modified by people. 06:54.000 --> 06:57.000 People have now mostly stopped tagging me and reviews for it. 06:57.000 --> 06:59.000 So it means that code is coherent enough 06:59.000 --> 07:03.000 for other people to change it without dragging me into a loop, which is great. 07:03.000 --> 07:06.000 And even though it's been done in the wrong place, 07:06.000 --> 07:08.000 it's fine. 07:08.000 --> 07:12.000 It's kind of a simple instance of red sugar allocation. 07:12.000 --> 07:17.000 But if we can do some forms of red sugar allocation in IR, 07:18.000 --> 07:22.000 we can't do other forms of red sugar allocation in IR. 07:22.000 --> 07:26.000 We're like brief side note that some other compilers do 07:26.000 --> 07:29.000 in fact do red sugar allocation on S-saform. 07:29.000 --> 07:31.000 And it's okay. 07:31.000 --> 07:34.000 It works out all right. 07:34.000 --> 07:35.000 Here's your example. 07:35.000 --> 07:39.000 I've got cooler and is very back end. 07:39.000 --> 07:41.000 It's very architecture-specific. 07:41.000 --> 07:44.000 There's lots of scribbling stuff in specific registers. 07:44.000 --> 07:46.000 And carefully managing the stack. 07:46.000 --> 07:50.000 I'm lucky that previous talk had pictures of the stack, 07:50.000 --> 07:52.000 manipulation code around function calls, 07:52.000 --> 07:55.000 because I don't have any code. 07:55.000 --> 07:59.000 And very edit functions are crafty awkward things, 07:59.000 --> 08:03.000 where you have to scribble state into particular parts of the stack 08:03.000 --> 08:06.000 and go forth and find it later. 08:06.000 --> 08:08.000 And people want, 08:08.000 --> 08:11.000 strictly speaking people want to print F on the GPU. 08:11.000 --> 08:15.000 The only very edit function anyone ever cares about is print F. 08:16.000 --> 08:18.000 Yeah, it's cyber. 08:18.000 --> 08:21.000 It seems to be the case. 08:21.000 --> 08:24.000 And in fact, print F is currently implemented in, 08:24.000 --> 08:26.000 I think, four different ways on AMD, 08:26.000 --> 08:28.000 four different languages. 08:28.000 --> 08:30.000 Like, hip and open-open, open-open, 08:30.000 --> 08:31.000 or have their own one. 08:31.000 --> 08:33.000 Oh, and lib C has its own one as well. 08:33.000 --> 08:36.000 And all of that one is a dreadful. 08:40.000 --> 08:43.000 So I was looking at, 08:43.000 --> 08:48.000 implementing their edit functions for lib C mostly. 08:48.000 --> 08:51.000 And the similar theme turned up. 08:51.000 --> 08:53.000 I don't want to do this twice. 08:53.000 --> 08:56.000 Ah, I have fixed that typo under review, 08:56.000 --> 08:58.000 and then failed to update the slides. 08:58.000 --> 08:59.000 Never mind. 08:59.000 --> 09:03.000 And I don't want to write the test code for, 09:03.000 --> 09:05.000 are we lowering their edit calls correctly? 09:05.000 --> 09:08.000 Because I've done that for a simple architecture. 09:08.000 --> 09:10.000 It was dreadful. 09:10.000 --> 09:12.000 It was for a huge comment or a problem. 09:12.000 --> 09:15.000 And you have to do it for a stack and the global ISO. 09:15.000 --> 09:17.000 And it's just very edit functions. 09:17.000 --> 09:19.000 No one cares about much about them. 09:19.000 --> 09:22.000 So really, who's got the patience? 09:22.000 --> 09:25.000 Also, that's kind of common theme. 09:25.000 --> 09:27.000 People don't care about their edit functions. 09:27.000 --> 09:30.000 The standard thing which you get when you're bringing up 09:30.000 --> 09:32.000 a new architecture is to do, 09:32.000 --> 09:34.000 or like, fatal error, unimplemented. 09:34.000 --> 09:38.000 Or maybe if name is print F, do this otherwise fatal error. 09:39.000 --> 09:43.000 And there's nothing that magic about varied functions, 09:43.000 --> 09:44.000 inherently. 09:44.000 --> 09:48.000 You can kill these things off in IR. 09:48.000 --> 09:51.000 What a varied function amongst you is, 09:51.000 --> 09:53.000 take all of the arguments, 09:53.000 --> 09:56.000 and stick them in contingent memory, 09:56.000 --> 09:59.000 and pass a pointer to that contingent memory. 09:59.000 --> 10:00.000 That's all we're doing. 10:00.000 --> 10:01.000 That's all ARCH does. 10:01.000 --> 10:02.000 It's all X86 does. 10:02.000 --> 10:05.000 They have different weird ideas about pointer. 10:05.000 --> 10:07.000 Shout out to WebAssembly for actually just 10:07.000 --> 10:09.000 doing a really obvious simple thing. 10:09.000 --> 10:11.000 A WebAssembly just sticks them all in struct, 10:11.000 --> 10:13.000 and passes a void start of a struct. 10:13.000 --> 10:15.000 That's it. 10:15.000 --> 10:18.000 And that works for everything. 10:18.000 --> 10:19.000 Right? 10:19.000 --> 10:22.000 So AMD GPU and the APTX, 10:22.000 --> 10:24.000 they can totally take all the arguments 10:24.000 --> 10:25.000 and put them in a struct, 10:25.000 --> 10:28.000 and pass a pointer to a struct. 10:28.000 --> 10:31.000 And then you don't have to do a bunch of stuff in back end. 10:31.000 --> 10:33.000 But that would have been even worse, 10:33.000 --> 10:34.000 because this was partly for lib C. 10:34.000 --> 10:35.000 I'd have had to do this for 10:35.000 --> 10:39.000 S-Dite and Global ISO for AMD GPU and for MDPTX, 10:39.000 --> 10:42.000 and I really don't want to write test cases for MDPTX. 10:42.000 --> 10:45.000 So just not having it. 10:45.000 --> 10:50.000 So I do this as an ARCH bus. 10:50.000 --> 10:54.000 This actually works out really pretty, 10:54.000 --> 10:57.000 because all the horrible target specific craft, 10:57.000 --> 10:58.000 which was very scary, 10:58.000 --> 11:01.000 and made people want to do this in the back end. 11:01.000 --> 11:06.000 Both of you actually had to send the ARCH 64 API docs 11:06.000 --> 11:09.000 are crazy in having described 11:09.000 --> 11:11.000 but you have to lower the stuff. 11:11.000 --> 11:14.000 But all the architecture dependent stuff 11:14.000 --> 11:17.000 is kind of hidden behind the fearless destroyer, 11:17.000 --> 11:20.000 which is basically a pointer 11:20.000 --> 11:22.000 that you can walk forwards. 11:22.000 --> 11:24.000 So you can take a very early function 11:24.000 --> 11:26.000 as we're presented in IR, 11:26.000 --> 11:29.000 and you can build a struct, 11:29.000 --> 11:30.000 an alica, 11:31.000 --> 11:34.000 and you can copy the ARCH into it. 11:34.000 --> 11:38.000 And then you can kill the dot dot dots at the end of a function, 11:38.000 --> 11:41.000 and pass a VA list instead. 11:41.000 --> 11:44.000 And now you're a very early function's gone, 11:44.000 --> 11:46.000 although the no longer really knows 11:46.000 --> 11:48.000 if it used to be a very early function, 11:48.000 --> 11:51.000 it thinks you're just passing a pointer to an alica. 11:51.000 --> 11:54.000 And that means stuff like inlining, now works again. 11:54.000 --> 11:57.000 And for AMD and MDPTX, 11:57.000 --> 11:59.000 that's just how it works. 11:59.000 --> 12:01.000 Veredic functions are free. 12:01.000 --> 12:03.000 Instead of this crafty aggravesing thing, 12:03.000 --> 12:05.000 they're just weird syntactic shooter 12:05.000 --> 12:07.000 for passing a struct to a function. 12:07.000 --> 12:09.000 For a web assembly, 12:09.000 --> 12:10.000 I actually have this conference. 12:10.000 --> 12:11.000 I need to find someone in WebAssembly 12:11.000 --> 12:13.000 and get them to review a change 12:13.000 --> 12:15.000 because I implemented a thing for WebAssembly 12:15.000 --> 12:17.000 and couldn't get them sign off on it. 12:17.000 --> 12:20.000 So Veredic should be free on WebAssembly too. 12:20.000 --> 12:23.000 And we'll be, once I find one of them. 12:23.000 --> 12:26.000 Strictly speaking, I haven't bothered to do this. 12:27.000 --> 12:30.000 I've implemented this website to six and eight out of six to four. 12:30.000 --> 12:31.000 And when I went to check, 12:31.000 --> 12:33.000 May and it turns out I haven't pushed it. 12:33.000 --> 12:35.000 So they will be free on those scenes, 12:35.000 --> 12:36.000 but currently they're not. 12:39.000 --> 12:41.000 Yeah. 12:41.000 --> 12:43.000 So I like testing code in IR, 12:43.000 --> 12:46.000 because you write for code you've got 12:46.000 --> 12:48.000 and you write for your code you expect. 12:48.000 --> 12:50.000 And then you argue with file checks, 12:50.000 --> 12:52.000 red-exengine for a while. 12:52.000 --> 12:54.000 And then you're done. 12:54.000 --> 12:57.000 And relative to testing in MIR. 12:57.000 --> 12:58.000 It's great. 12:58.000 --> 13:00.000 Relative to testing clan. 13:00.000 --> 13:01.000 It's great. 13:01.000 --> 13:04.000 You have an IR pass. 13:04.000 --> 13:05.000 You can print bits of IR, 13:05.000 --> 13:07.000 or the objects you've got. 13:07.000 --> 13:08.000 You can dump. 13:08.000 --> 13:10.000 That is so easy. 13:10.000 --> 13:13.000 And the backend does not always like that. 13:13.000 --> 13:18.000 So this is sort of a question I want to post to you guys. 13:18.000 --> 13:23.000 The compiler backend is very specialized to machine code. 13:23.000 --> 13:27.000 So specialized to specific targets machine code. 13:27.000 --> 13:29.000 You drop out of a safe form, 13:29.000 --> 13:33.000 as soon as you do register allocation. 13:33.000 --> 13:36.000 So that actually necessary. 13:36.000 --> 13:41.000 We've got here in a sensible half dependent kind of fashion. 13:41.000 --> 13:45.000 But I live from does red-alip on a safe form. 13:45.000 --> 13:47.000 And it's fine. 13:47.000 --> 13:49.000 You had an infinite set of SSA variables. 13:50.000 --> 13:54.000 And you marked some of them as this one needs to be in Rax. 13:54.000 --> 13:56.000 So this one. 13:56.000 --> 13:57.000 And you're fine. 13:57.000 --> 14:01.000 It's the same as red-alip always is. 14:01.000 --> 14:03.000 I sell. 14:03.000 --> 14:07.000 My favorite is not GPA. 14:07.000 --> 14:11.000 My favorite is the crackpot ASIC, which is no longer with us. 14:11.000 --> 14:16.000 But that featured a kind of MIPS style instruction set, 14:16.000 --> 14:18.000 which is very friendly to work with. 14:18.000 --> 14:23.000 And we just did, I just did an intrinsic for every instruction. 14:23.000 --> 14:26.000 Every instruction you could write an assembly had an intrinsic, 14:26.000 --> 14:29.000 with a name very like the assembly instruction. 14:29.000 --> 14:33.000 So if you didn't want to do instruction selection in the compiler, 14:33.000 --> 14:35.000 and you didn't want to write assembly, 14:35.000 --> 14:39.000 you could just write the sequence of intrinsic. 14:39.000 --> 14:40.000 You wanted. 14:40.000 --> 14:42.000 And you got out exactly about sequence of assembly, 14:42.000 --> 14:46.000 position intrinsic turned into one assembly instruction. 14:47.000 --> 14:49.000 And scheduling's even easier. 14:49.000 --> 14:53.000 We can reorder instructions in SSA for almost not problem. 14:53.000 --> 14:55.000 And this DAQ and BIND thing. 14:55.000 --> 14:57.000 I really like S DAQ and BIND. 14:57.000 --> 14:58.000 That's very pretty. 14:58.000 --> 15:02.000 You write your little transform and see this passing turn into a simple thing. 15:02.000 --> 15:03.000 It's lovely. 15:03.000 --> 15:08.000 But it's combined as the same thing, with a better DSL. 15:08.000 --> 15:14.000 And MIR rewrite passes looks an awful lot like an IR rewrite pass. 15:14.000 --> 15:19.000 It's the same premise, right? Just with more awkward notation. 15:19.000 --> 15:21.000 So yeah. 15:21.000 --> 15:25.000 I kind of it we shouldn't do this. 15:25.000 --> 15:29.000 I'm running our sign and no one's asking any questions. 15:29.000 --> 15:31.000 Which is dreadful. 15:31.000 --> 15:35.000 So I'm going to try to prompt you to ask something here, 15:35.000 --> 15:38.000 because I've just told you wildly contentious things, right? 15:38.000 --> 15:42.000 We've got a compiler back end, it has its own special structures. 15:42.000 --> 15:46.000 I'm saying we didn't have to do that. 15:46.000 --> 15:49.000 No one wants to call me on that. 15:49.000 --> 15:51.000 Wonderful, we have to take. 15:51.000 --> 15:52.000 You're the one. 15:52.000 --> 15:53.000 Go, you're first. 15:53.000 --> 15:56.000 Well, I think you're working your business here. 15:56.000 --> 15:57.000 I have to like this. 15:57.000 --> 16:02.000 I have to refer to this idea of being an estimator. 16:02.000 --> 16:05.000 So you work on a formula that I've always scaled up. 16:05.000 --> 16:08.000 I'm trying to realize that I'm a bit in the end. 16:09.000 --> 16:12.000 But on the end, do you mean you're, what the heck it does? 16:12.000 --> 16:16.000 And in terms of the director, so we're dealing with a lot of things. 16:16.000 --> 16:20.000 So that is tomorrow's talk. 16:20.000 --> 16:24.000 The question here is paraphrasing slightly. 16:24.000 --> 16:27.000 Isn't the AMD GB, isn't the AMD back end? 16:27.000 --> 16:29.000 Crazy complicated. 16:29.000 --> 16:31.000 Which is yes. 16:31.000 --> 16:36.000 It would be an awful lot simpler if we kept it in IR. 16:37.000 --> 16:42.000 But the AMD GB back end, slowly, solely under my prodding, 16:42.000 --> 16:45.000 trying to move stuff out of the back end into IR, 16:45.000 --> 16:47.000 because the back end confuses the hell out of me. 16:47.000 --> 16:51.000 And every time I touch it it breaks in really weird ways. 16:51.000 --> 16:56.000 So I don't think we should throw the back end away in really right fashion. 16:56.000 --> 17:01.000 Though I need to find some MLIR people to see if they're a bit more game for that. 17:01.000 --> 17:05.000 I think we should just bear in mind that when you're writing something in the back end, 17:06.000 --> 17:08.000 wouldn't to be easier in IR. 17:08.000 --> 17:11.000 There's madly slow optimization you're writing in MIR. 17:11.000 --> 17:15.000 Wouldn't to be much nicer to write it in IR instead. 17:15.000 --> 17:18.000 How much structured do you need to add to the IR, 17:18.000 --> 17:21.000 so you can write the optimization in IR instead, 17:21.000 --> 17:24.000 as still get the code out. 17:24.000 --> 17:31.000 And if we can slowly drift things like, I don't say, 17:32.000 --> 17:35.000 all the type mungent nonsense around lowering, 17:35.000 --> 17:40.000 but you've got an I-66, because I love the info that was a good idea, 17:40.000 --> 17:42.000 and your target doesn't know what that is. 17:42.000 --> 17:45.000 So you do the type legalising and all the re-rising. 17:45.000 --> 17:47.000 Totally divasin IR. 17:47.000 --> 17:51.000 And that's for this huge block of really complicated stuff in IR stack, 17:51.000 --> 17:55.000 which we could just gently migrate up into IR, 17:55.000 --> 17:58.000 and factor out of IR and out of global IRs, 17:58.000 --> 18:01.000 and release in two places when you add it in one place, 18:01.000 --> 18:03.000 and you test them more easily, 18:03.000 --> 18:06.000 and you just have two slightly simpler back ends. 18:06.000 --> 18:09.000 And everything is better, except for compile time. 18:09.000 --> 18:15.000 But it's, but it's, it's something that everything is better. 18:15.000 --> 18:17.000 I'm not sure I've got any more slides. 18:17.000 --> 18:19.000 Oh yeah, we should do more of this. 18:19.000 --> 18:21.000 And we've had one question. 18:21.000 --> 18:23.000 Did you have another one? 18:23.000 --> 18:26.000 Oh yes, one person speaks, two more speak. 18:26.000 --> 18:27.000 Yeah. 18:27.000 --> 18:34.000 So I understand, I, once I'm communicating about where you draw the line, 18:34.000 --> 18:39.000 meet with middle end and back end, because in my act, 18:39.000 --> 18:43.000 well, I'm not very much involved with either 360 or LLVM. 18:43.000 --> 18:48.000 I, but my knife interpretation is something like the middle end 18:48.000 --> 18:51.000 is something architecture independent stuff, 18:51.000 --> 18:55.000 and the back end is where the architecture independent stuff happens. 18:55.000 --> 18:56.000 Excellent. 18:56.000 --> 18:58.000 It is one way of cutting it. 18:58.000 --> 19:02.000 But I get the impression you use a different way of cutting it 19:02.000 --> 19:03.000 with a sign. 19:03.000 --> 19:07.000 So you would say, middle end is where you are in IR, 19:07.000 --> 19:13.000 which is a mostly machine independent representation. 19:13.000 --> 19:16.000 And then we have MIR with, if I gather correctly, 19:16.000 --> 19:21.000 means something like machine dependent IR in LLVM. 19:21.000 --> 19:24.000 And you are mostly talking about doing stuff 19:24.000 --> 19:27.000 on IR instead of MIR. 19:27.000 --> 19:31.000 But the stuff may still be targeted dependent, 19:31.000 --> 19:36.000 because you introduce more target specific. 19:36.000 --> 19:40.000 The target specific in 26 into IR. 19:40.000 --> 19:45.000 So if I get the point of your talk correctly, 19:45.000 --> 19:48.000 you don't say do everything platform independent, 19:48.000 --> 19:52.000 because there are really things that are platform dependent. 19:52.000 --> 19:55.000 But you are more questioning. 19:55.000 --> 19:57.000 I think this is a very good question, 19:57.000 --> 19:59.000 and we should consider it. 19:59.000 --> 20:03.000 Do we need some different kinds of IR for the back end 20:03.000 --> 20:05.000 and for the middle end? 20:05.000 --> 20:10.000 And I think you just show that most likely 20:10.000 --> 20:12.000 we don't need to kinds of IR. 20:12.000 --> 20:13.000 Excellent. 20:13.000 --> 20:14.000 The main point. 20:14.000 --> 20:16.000 OK, I must try to repeat that. 20:16.000 --> 20:21.000 So the first of it is where you draw the line 20:21.000 --> 20:25.000 between the middle end and the back end. 20:25.000 --> 20:27.000 It's different between different people. 20:27.000 --> 20:30.000 And the general premise of this talk is roughly 20:30.000 --> 20:33.000 that we're drawing it in the wrong place. 20:33.000 --> 20:35.000 One belief is that I love the IR. 20:35.000 --> 20:36.000 It's target independent. 20:36.000 --> 20:39.000 I think the target independent work in the middle end 20:39.000 --> 20:41.000 on IR. 20:41.000 --> 20:42.000 I love the IR. 20:42.000 --> 20:43.000 It's not target independent. 20:43.000 --> 20:47.000 Also, that was a nice idea which we lost 20 years ago. 20:47.000 --> 20:49.000 They're in tracks really badly. 20:49.000 --> 20:51.000 We're like, see. 20:51.000 --> 20:53.000 So you can't have that. 20:53.000 --> 21:00.000 Currently, we try to keep the very back end specific stuff. 21:00.000 --> 21:02.000 Way down in MIR. 21:02.000 --> 21:04.000 But by the time it was work out, 21:04.000 --> 21:08.000 we increasingly drift information back up towards the middle end. 21:08.000 --> 21:11.000 Because if you don't spend things like that, 21:11.000 --> 21:14.000 there are vectorizer optimization in middle end. 21:14.000 --> 21:16.000 You don't want to write about for every target. 21:16.000 --> 21:18.000 I know what registers you've got. 21:18.000 --> 21:20.000 What's not going to work. 21:20.000 --> 21:25.000 So I think we have some slightly historic ideas 21:25.000 --> 21:28.000 about where we should do work in the compiler. 21:28.000 --> 21:31.000 And it should be recognised as such. 21:31.000 --> 21:36.000 There's not so much of a right place to do the transform. 21:36.000 --> 21:40.000 It's where do we currently do similar transforms? 21:40.000 --> 21:43.000 And there's some inertia to moving stuff. 21:43.000 --> 21:51.000 But you can absolutely write the entirety of X86 as a safe form. 21:51.000 --> 21:53.000 And you can do the same with MDGPA. 21:53.000 --> 21:55.000 And you should not do it with SIMT. 21:55.000 --> 21:57.000 You should do it with vector types. 21:57.000 --> 21:59.000 I'm a fat gentleman. 21:59.000 --> 22:03.000 But if you can describe machine code, 22:03.000 --> 22:06.000 literally verbatim in a safe form. 22:06.000 --> 22:10.000 And we do most of our optimizations in a safe form. 22:11.000 --> 22:15.000 Why are we accidentally, well, 22:15.000 --> 22:19.000 are we sure that it's important to change 22:19.000 --> 22:21.000 to this much more awkward representation 22:21.000 --> 22:23.000 to shuffle stuff around for a bit? 22:23.000 --> 22:26.000 Because it feels like our testing would be best 22:26.000 --> 22:27.000 if we didn't. 22:27.000 --> 22:29.000 Ah, I've run out of time. 22:29.000 --> 22:30.000 Thank you. 22:30.000 --> 22:31.000 Your question was wonderful. 22:31.000 --> 22:33.000 It was a summary of the entire talk. 22:33.000 --> 22:35.000 Thank you.