WEBVTT 00:00.000 --> 00:10.460 Hide the system, because we just change, just go ahead and change the code about having 00:10.460 --> 00:11.820 the CPU skip it. 00:11.820 --> 00:15.980 If we change the code on one CPU and another CPU executes it, I'm not going to go into the 00:15.980 --> 00:19.340 details why I've been talking before, but it can crash the kernel. 00:19.340 --> 00:23.700 So we have to make sure all the CPUs don't see us modify this. 00:23.700 --> 00:27.700 So we put it in a little breakpoint, we change the code, and then we remove the breakpoint with 00:27.700 --> 00:31.940 the new thing, and we are able to modify running code. 00:31.940 --> 00:35.140 Now, I give a talk about this before, in fact, I have the kernel recipes this year. 00:35.140 --> 00:37.220 You can watch that. 00:37.220 --> 00:38.660 So how does this work? 00:38.660 --> 00:41.580 When we hit the breakpoint, it causes an exception. 00:41.580 --> 00:45.340 Well, the exception just means is that the CPU is going along, and we hit the breakpoint, 00:45.340 --> 00:50.340 the CPU is going to switch mode, jump into the kernel, and jump into a vector and call 00:50.340 --> 00:54.060 another kind of a trampoline, and says, what do we do this? 00:54.060 --> 00:55.060 Well, we have a handler. 00:55.100 --> 00:58.580 We have a do in three handler inside the kernel. 00:58.580 --> 01:03.980 And inside this kernel, I mean, this is overly simplified version of it, it will store all 01:03.980 --> 01:10.420 the registers of that, so basically registers are the state of your code, as you're running, 01:10.420 --> 01:13.660 your applications running, the registers are happening. 01:13.660 --> 01:16.340 When interrupts happen, everything else, we just store all the registers, because that's 01:16.340 --> 01:21.380 the state of your processor, and do whatever we want, then we restore the registers and go 01:21.380 --> 01:22.380 back. 01:22.380 --> 01:27.980 But we are the interrupt handler, or the exception handler is given a pointer to the state. 01:27.980 --> 01:34.060 So what we do here is the IP is the instruction pointer, and we add five to it. 01:34.060 --> 01:38.100 Remember, it was five bytes, and then we return, and it returns past it, so we actually 01:38.100 --> 01:39.220 never actually get to code. 01:39.220 --> 01:43.780 So basically, we just emulated a null up, no matter what was there, we're able to emulate 01:43.780 --> 01:49.300 a null up, and the kernel is all happy about that. 01:49.300 --> 01:54.420 This is great when you're going from function to null up, and from null up to call function. 01:54.420 --> 02:00.020 By emulating a null up, you're emulating one of the states of the transition back or forth. 02:00.020 --> 02:03.620 So when you're going on a function, if you say, I want to make it a null up, and you put 02:03.620 --> 02:06.820 the break point in, you just automatically went to the next state. 02:06.820 --> 02:07.820 We don't care. 02:07.820 --> 02:11.900 Or if you're in a null up, and we want to make a function, you put the null up, or the 02:11.900 --> 02:14.380 break point in, you're emulating a null up, you're keeping the same state, and when you 02:14.380 --> 02:16.460 return back, you've switched to the next state. 02:16.460 --> 02:18.660 All that works. 02:18.660 --> 02:21.700 We just know up is part of the state transition. 02:21.700 --> 02:26.740 But what happens if we want to go from calling food to calling bar? 02:26.740 --> 02:28.740 There's no no up there. 02:28.740 --> 02:33.740 So if we were to do this, if we're calling food, we have to switch to calling bar by emulating 02:33.740 --> 02:39.700 by putting instruction pointer five points or five away, and then coming back, it doesn't 02:39.700 --> 02:41.700 work. 02:41.700 --> 02:47.340 So what we must do is emulate a function call from that in three handler, and this is where 02:47.420 --> 02:53.420 the fun begins, because a call doesn't do one thing, it does two things. 02:53.420 --> 02:58.580 I told you earlier, it pushes the return address onto the stack, and then changes the 02:58.580 --> 03:01.820 instruction pointer to go to where you're calling. 03:01.820 --> 03:05.580 And the return does the opposite, it pops from the stack and goes. 03:05.580 --> 03:09.780 So we need to emulate, to emulate the function call, we need to push the return on the 03:09.780 --> 03:13.020 stack, and that's where things get tricky. 03:13.020 --> 03:17.860 So this is why. 03:17.860 --> 03:23.180 When we're running, and we hit that in three, remember, I told you that we save state, 03:23.180 --> 03:26.620 well, we save a bunch of state on the stack. 03:26.620 --> 03:33.740 Before we ever get to the in three, the hardware will put in the stack segment, the stack 03:33.740 --> 03:38.780 pointer, the flags, all your flags will be done, like you know, all the state of your 03:38.780 --> 03:42.660 flags, you know, whether or not you have a compare, you do compares, those are flags, 03:42.660 --> 03:46.900 that's all stored, it will tell you the code segment, and the instruction pointers all save, 03:46.900 --> 03:51.180 the hardware does this, this is even user space, the hardware does this. 03:51.180 --> 03:55.340 Then we put our own stack, actually, we go back, then we put our own register state, and 03:55.340 --> 03:56.980 then we call this guy. 03:56.980 --> 04:03.300 But if we want to go to food and emulate it, we need to put a return address on the stack 04:03.300 --> 04:04.940 as well. 04:04.940 --> 04:09.260 So you see the problem here, where we need to put the return address, it's exactly where 04:09.340 --> 04:13.500 the hardware had put its own stack address. 04:13.500 --> 04:20.100 So it's not just trace events that are problem. 04:20.100 --> 04:25.980 We could have, if we have, when you register a single call back to one of the function 04:25.980 --> 04:29.460 tracing, it will call your trampoline directly. 04:29.460 --> 04:34.660 What happens is if I say I want to do function trace on a single function, it will create 04:34.660 --> 04:37.460 dynamically a trampoline and call that. 04:37.460 --> 04:40.940 So what's all doing direct calls, there's no indirect calls, it will create a trampoline 04:40.940 --> 04:46.620 that calls your call back directly, and then it will have the tramp, what's called 04:46.620 --> 04:48.860 the no-wop at the beginning of the function, like your trampoline, I'll call your 04:48.860 --> 04:52.020 phone, and make it look really, really nice. 04:52.020 --> 04:58.860 Here's the problem, everyone loved it, we got great reviews from everyone, 04:58.860 --> 05:03.300 and Venus was absolutely against it. 05:03.300 --> 05:07.100 So what do we do? 05:07.100 --> 05:18.420 I think the reason why he was against it is this was the last remaining code from 1991. 05:18.420 --> 05:24.580 We were getting rid of his baby, and he was dead set against us from doing this. 05:24.580 --> 05:31.580 Like I said, this is really, really old code, and we didn't realize this, because 05:31.580 --> 05:36.340 we're all going, why is Venus being such a hard head on this? 05:36.340 --> 05:41.740 This is obviously, this is a good solution, and the solution he came up with, that's 05:41.740 --> 05:44.300 good. 05:44.300 --> 05:53.140 So he wanted these perceived variables again, this is almost like the first patch, and so 05:53.140 --> 05:58.460 what it required was we had to make a special trampoline for every type of context we 05:58.460 --> 06:00.180 were kind of in. 06:00.180 --> 06:03.940 We had to make a trampoline, whether we had interrupts enabled, and we had to make a trampoline, 06:03.940 --> 06:06.100 whether we didn't have interrupts disabled. 06:06.100 --> 06:11.260 And I think, if I was, Venus seems excited about this hack, because it was really a hack. 06:11.260 --> 06:16.500 The way this looked was that, and you had to check to see if you were an enemy, and you 06:16.500 --> 06:17.780 used one trampoline. 06:17.780 --> 06:21.180 If you weren't an enemy, you had to see if interrupts were enabled, and you used another 06:21.180 --> 06:22.180 trampoline. 06:22.220 --> 06:25.580 If you were in another enemy, you were in another trampoline. 06:25.580 --> 06:32.740 And what the trick was, is that you had to jump to a trampoline to, so basically, remember 06:32.740 --> 06:39.340 the stack frame there, when you returned back to the trampoline, you actually had your 06:39.340 --> 06:42.860 returning back to a trampoline, now you got rid of the hardware stack frame, and then you 06:42.860 --> 06:47.900 could do, this trampoline could add the return address onto your stack frame, so you had 06:47.980 --> 06:53.180 to use a perceived variable of where your stack is, jumped to this guy, now if interrupts 06:53.180 --> 06:57.740 were enabled, when you had this happened, if you jump back, an interrupt could come in and 06:57.740 --> 06:59.540 screw everything up for you. 06:59.540 --> 07:05.020 So what he said was, well, when the exception happens, it automatically disables interrupts. 07:05.020 --> 07:09.580 So what you do is, you return back to the trampoline with, and keeping the interrupts disabled, 07:09.580 --> 07:14.900 so you modified the flag spit to say, okay, jump back with interrupts disabled still, and 07:14.900 --> 07:20.060 then modify your flag or modify the stack, and then enable interrupts from this trampoline 07:20.060 --> 07:22.700 and then jump back. 07:22.700 --> 07:25.060 He thought this was better. 07:25.060 --> 07:32.660 So this is what the trampoline looked like, you had enable calls, something like that. 07:32.660 --> 07:41.900 Well, lean us obviously liked it, everyone else hated it, they had locked that problem, 07:42.900 --> 07:48.740 we have shadow stacks that's coming soon to prevent stack corruptions, it's a hardware 07:48.740 --> 07:53.740 feature, so like that that you actually can create a shadow stack, and if anything ever 07:53.740 --> 08:00.060 modifies a stack without doing it in normal way, using you, if there's a bug, usually a 08:00.060 --> 08:05.140 lot of times you'll have a lot of security features done by array overflows, where you 08:05.140 --> 08:09.540 go off by one bugs, or you overflow, usually get to the stack, modify the stack and do something 08:09.620 --> 08:13.580 different, the hardware is going to do a shadow stack, and if anything like this modifies 08:13.580 --> 08:18.500 a stack, it will give a fault, so you wouldn't be able to do these things, so that's exactly 08:18.500 --> 08:19.020 what this is doing.