WEBVTT 00:00.000 --> 00:20.480 So, hello everybody, my name is Marty Nechas, I work at Red Hat, and I work there as a technical 00:20.480 --> 00:24.840 lead team lead on a maintainer on the project called forklift. 00:24.840 --> 00:34.360 It is an open source project for migrating VMs from various places to Kubernetes Cubevert. 00:34.360 --> 00:41.640 And I think most of you here are familiar with the VMR situation, but for those who are 00:41.640 --> 00:43.640 not. 00:43.640 --> 00:45.640 Okay, this is what happened. 00:45.640 --> 00:50.960 We are also acquired by Broadcom, it's not that well visible. 00:51.880 --> 01:01.720 So, I love this image, and immediately after that, we started getting some articles. 01:01.720 --> 01:10.080 This is from Red Hat, from Register, and from Business Insider. 01:10.080 --> 01:15.560 There are many more, and I think some of you who are also from what I heard from statistics 01:15.560 --> 01:17.880 were also influenced by this. 01:17.880 --> 01:22.280 So, it feels a bit like being held for ransom. 01:22.280 --> 01:24.880 So, how to escape this ransom? 01:24.880 --> 01:32.440 It could answer a Ruby 2V, which I think many of you know, and for forklift. 01:32.440 --> 01:33.440 And that's it. 01:33.440 --> 01:34.440 Thank you. 01:34.440 --> 01:41.520 Okay, so, let's go to the details. 01:41.520 --> 01:49.480 So, Ruby 2V, really known tool, been developed by Richard Jones for many, many years, 01:49.480 --> 01:57.640 focusing lightly on the VMR to for migration to Cube. 01:57.640 --> 02:04.960 Why Ruby 2V, why we will one with some choices, main thing is gas conversion. 02:04.960 --> 02:13.640 The VMs running on VMR have the their own drivers, their own devices, and we want 02:13.640 --> 02:19.280 to use the proper Cube materials and virtual drivers. 02:19.280 --> 02:25.920 So, what Ruby 2V is doing, it removes the VMR tools, it insults the virtual drivers, 02:25.920 --> 02:34.520 insults the gas agent, and does a lot of editing inside a VM, and configurations. 02:35.320 --> 02:40.280 It even rebels in a trauma phase, and many more, I'm not a perfect guy for this. 02:40.280 --> 02:42.840 The Richard Jones would be perfect for it. 02:42.840 --> 02:47.320 I'm just kind of user of Ruby 2V. 02:47.320 --> 02:49.400 How does the Ruby 2V work? 02:49.400 --> 02:53.560 Inside it there is a tool called NVDKit. 02:53.560 --> 03:01.920 I highly recommend this possum talk from Richard a few years back, the NVDKit allows you 03:01.920 --> 03:12.400 to attach devices, remote devices to your machine as they would be in your local laptop. 03:12.400 --> 03:20.600 It allows you to do many things to it, the protocol itself and NVD is straightforward. 03:20.600 --> 03:27.120 It allows you to read rights, I have a quick demo, which highly depends on the Wi-Fi, 03:27.120 --> 03:29.480 so we'll see. 03:29.720 --> 03:44.040 For example, I got here the Fedora RhoxE file and using NVDKit, I can, oh, that's not that well visible. 03:44.040 --> 03:55.000 I apply the curl plug-in to the image itself, and the file is exe compressed, so what it does, 03:55.000 --> 04:01.000 it uncompresses all the blocks, so it doesn't download the file and open it. 04:01.000 --> 04:06.840 It reads only the blocks which it needs, and then it applies a co-filter on top of it, 04:06.840 --> 04:12.840 so all rights will be done to the local machine, and I can do whatever I want with it. 04:12.840 --> 04:35.720 I can, let me just copy past it, I can see the partitions, that Wi-Fi, I can mount it, 04:35.720 --> 04:46.920 and now I will not have the TMP, please, I have TMP anyway, yeah, I can mount it and do whatever 04:46.920 --> 04:54.520 with it Wi-Fi is slow, so let's move on, so how does the V3 to be worked? 04:54.520 --> 05:05.480 This was the NVDKit, the V3 to V, we have the VMDK on the VMware side, this format through which 05:05.480 --> 05:14.440 we connect to the NVDKit. VMware has the library called PDDK, which are to manipulate their 05:14.440 --> 05:22.840 disks, and NVDKit has a plug-in for it, which allows the NVDKit to do the same things as I was 05:22.840 --> 05:29.640 trying to show with the curl. On top of the NVDKit, there is started another tool called 05:29.720 --> 05:37.800 Guestafest, which attaches the NVDKit to VM, and inside it it's running a demon, 05:39.000 --> 05:49.480 the LDR allows you to manipulate disks securely and quite stable, and it runs a lot of the conversion 05:49.480 --> 05:57.720 scripts inside it, so we have the VMDK, and then all the changes, what we are trying to do 05:58.040 --> 06:09.160 are written to the cache. So first, we do the conversion, we fail fast, and we see what goes wrong 06:09.160 --> 06:17.000 with the conversion itself. It also allows us to use a tool like a pest stream, so we throw away 06:17.000 --> 06:23.320 the unnecessary files, unnecessary blocks, and they are written to like zeros to the cow cache, 06:23.400 --> 06:30.360 and we don't need to transfer it from the VMDK, the blocks themselves, so we even improve the migration 06:30.360 --> 06:40.200 time. So then we do the NVDK first reading from the cache, and then the remote blocks from the 06:40.200 --> 06:49.160 VMDK, and we got the destination. Advantage is, it fails fast, it's really good at it, if something 06:49.160 --> 06:56.520 goes wrong, it supports many operating systems, various really old Windows versions, the third 06:56.520 --> 07:04.600 depends on the virtual drivers, which are available, it supports Ubuntu, Santos, Fedora, many, many more, 07:05.880 --> 07:13.640 and it manages the disk transfers for you. So it manages the disk transfers, so all you need to do 07:13.640 --> 07:20.520 is run V2V, and you have it. You don't need to do anything fancy with it. disadvantage, it has 07:20.520 --> 07:26.200 high downtime, because the VM needs to be turned off throughout the whole process. Otherwise, 07:26.200 --> 07:32.600 you would get disk eruptions. And disadvantage, it manages the disk transfer. So when for 07:32.600 --> 07:40.680 cliff game, we wanted to do some tricks around it, and we needed to get rid of it, that those 07:40.760 --> 07:49.240 parts. Okay, so that was weird to be. What about for cliff? For cliff is tool around Cuba, 07:50.360 --> 07:59.640 and it focuses on migrating from VMware, open site, over to the Cuba itself. So the whole migration 07:59.640 --> 08:08.280 processes, you have configured VMware infrastructure on on your source, and you need to first 08:08.360 --> 08:15.400 configure the Kubernetes itself. We have some projects and some POCs to do it for you, but nothing 08:15.400 --> 08:22.120 concrete yet. So the administrators need to go and create a storage process, and they need to 08:22.120 --> 08:29.080 create a networking. The system administrators know them the best, so we are relying on them. 08:29.800 --> 08:36.680 Then the user needs to add a provider to the forklift, immediately scans the infrastructure, 08:36.760 --> 08:42.200 there are lists of all the VMs, and sees the which networks are used for the VMs, and with which 08:42.200 --> 08:49.080 storage data sources are used. Then we have internal validations. We tell users which features 08:49.080 --> 08:56.120 are available, which are not if the VM migration will fail. For example, we are not supporting 08:56.120 --> 09:03.480 right now are the VMs. So we let users immediately know. Next, user needs to create the 09:04.200 --> 09:11.080 network and storage. So we have VM with some specific storage, and they need to tell us 09:11.080 --> 09:17.640 from which data store needs to be migrated to which storage class, and the same goes for networks. 09:19.000 --> 09:25.000 We have done two migration types called and warm. The cold one is the V2E flow. It 09:25.000 --> 09:33.800 shut down the VM migrates it, and then boots it up. The high downtime are the same as V2E. It's 09:33.800 --> 09:39.880 actually using V2E like under the hood, and then starting in migrations. But more interestingly 09:39.880 --> 09:52.600 is the warm migration. So we have again the VMDK on VMware side. We have removed the NVDK 09:53.320 --> 10:01.320 from V2V to separate project, called a keyword container data importer, CDI, and which manages 10:01.320 --> 10:11.160 the transfer. This allows us to use additional VM bar features such as change block tracking. 10:12.360 --> 10:20.040 So we create a snapshot on top of the VM, and we migrate underlying this. 10:20.040 --> 10:29.160 So for example, the blue one could be 500 gigs. We transfer it, and then we would be transferring 10:29.160 --> 10:35.400 only the changes, which would be happening. So the orange one can be five gigabytes. And then 10:35.400 --> 10:40.040 we do another one. And the red one can be even less, because the migration time of five gigabytes 10:40.040 --> 10:48.040 it will be much less than migration time of 500 gigs. And we do this periodically, until the 10:48.040 --> 10:59.800 user sets the cut-over time. And that point we will shut down the VM and do the conversion 10:59.800 --> 11:06.200 using V3E to be in place on the already finished disk. So there is still downtime. It's not 11:06.200 --> 11:16.440 life migration, but it's warm, so something in between. Advantage is low downtime. Much 11:16.440 --> 11:23.560 lower than the called migrations. It removes the V3E to be disk transfer. And this allows us 11:23.560 --> 11:33.000 additional features. V3E to be will not continue the migration if the gas conversion fails. 11:34.040 --> 11:43.000 That's how it was created. And if the users want to try using emulation instead of 11:43.080 --> 11:51.640 drivers and proper tools, they can still try it. We have this working progress, but it allows 11:51.640 --> 12:00.120 it to us. Additionally, it allows us to do a short transfer share disk. V2E looks at all disks 12:00.120 --> 12:06.920 attached to the VM and transfers all of them. Here we can select the disks independently. And 12:06.920 --> 12:14.120 we are also starting working on a floating, that we would copy the disks within the storage 12:14.120 --> 12:22.360 arrays. Right now and be the kit everything goes over the network. And it can take some time. 12:22.360 --> 12:29.640 This advantage is it fails the slow and requires a digital tracking on the VM. And it can take 12:29.640 --> 12:35.400 a bit longer because there are more steps we create snapshots, we delete them and do everything 12:35.400 --> 12:49.800 what's needed. And I have a quick demo. Yes, Wi-Fi working. So I do create this. This is in the 12:49.800 --> 12:56.200 open chip. It can be also in the OKD UI. I create a vis-villier provider. 13:10.040 --> 13:17.560 Credentials. And now it's got all the information from the VMR. I see all my VMs. I see 13:17.560 --> 13:22.920 the validations which are on top of it. There is warning that it's running. 13:30.760 --> 13:35.320 I create migration plan with the source provider. 13:35.560 --> 13:46.200 I choose the storage mappings. I need to name it. 13:54.520 --> 14:01.880 I can choose which destination the VM should be migrated to. So it can be isolated for 14:02.440 --> 14:08.360 several administrators. We can choose wherever we go. There are a lot of other configurations and 14:08.360 --> 14:18.680 settings. I enable the word migration for changebook tracking and starting the plan. 14:18.680 --> 14:41.800 We do create this on the cluster itself to which we will start the migration to. 14:41.880 --> 14:55.960 I'll go to the VM. The migration is happening on the background. I write some simple file 14:56.040 --> 15:01.000 to it. As a test, it will be also on the destination. 15:01.000 --> 15:22.440 And then I hit a couple of times. I set it immediately. 15:31.000 --> 15:40.520 I can see the process progress of this transfers. 15:45.720 --> 15:54.760 Then we will run the conversion and at the end we create a VM. If the VM was turned off, 15:54.760 --> 16:02.040 we will not turn it on. If it was running, we turn it on again. We are trying to keep it as 16:02.040 --> 16:11.480 persistent as possible. Now I see a Cuba that the VM is running. I'll look into it again. 16:24.920 --> 16:45.960 I have my demo file. How is it with large scales? We have tested it with many hundreds of 16:45.960 --> 16:54.840 VMs, huge disks, and even a high-eye of databases. Those are a bit tricky. We got it in the end 16:54.840 --> 17:06.440 and now it works without a problem. It works quite well. We are working on the mapping of the 17:07.720 --> 17:14.280 networks and trying to add additional operating systems. The storage of loading that we would 17:14.280 --> 17:19.720 do the copying not over the network, but within the storage areas or some improvement to the speed 17:19.720 --> 17:27.800 transfers. Maybe additional source providers, we've been asked also to add something like AWS 17:27.800 --> 17:35.240 and other hyperscalers, but right now we are not working on them. But feel free to go so and try it 17:35.240 --> 17:45.880 yourself. We have it on GitHub. I really made it to do it a bit better. That's it. Thank you very 17:46.040 --> 17:54.840 everybody. 17:56.040 --> 18:04.200 Yes, please. 18:04.200 --> 18:11.400 Question was if it's already available in open shift, it is available in open shift already. 18:11.400 --> 18:18.400 We have an operator hub that you can install it as operator and which maintains and installs everything for you. 18:18.400 --> 18:20.400 Yes, please. 18:20.400 --> 18:24.400 What are the common problems during the migration? 18:24.400 --> 18:29.400 What are the common problems for the during the migrations? 18:29.400 --> 18:37.400 Most often it's what we've been hitting various operating systems and various configurations. 18:38.400 --> 18:40.400 There is no standardization. 18:40.400 --> 18:43.400 Any administrator can do whatever they want with VM. 18:43.400 --> 18:49.400 We are getting a lot of strange configurations which we are creating custom scripts. 18:49.400 --> 18:54.400 Also, problems are the devices themselves that we need to, 18:54.400 --> 18:59.400 because between VMware and Cumo, there are so many differences. 18:59.400 --> 19:10.400 So we need to inject some new dev rules to keep some, for example, names to be persistent during migrations. 19:10.400 --> 19:12.400 Yes, in the back. 19:12.400 --> 19:14.400 So we know, we know, yes. 19:14.400 --> 19:17.400 Can I install the virtual virtual virtual environment? 19:17.400 --> 19:19.400 Why the machine is on VMware? 19:19.400 --> 19:20.400 Not reboot. 19:20.400 --> 19:23.400 And then move them to VMware by sharing them now. 19:23.400 --> 19:26.400 And then whatever you come, go ahead and do virtual browsers. 19:27.400 --> 19:34.400 So the question was if we can install the virtual drivers first on the VMware side, 19:34.400 --> 19:39.400 and then shut it down and then transfer it. 19:39.400 --> 19:43.400 You can do it. 19:43.400 --> 19:46.400 It would be much better for us. 19:46.400 --> 19:51.400 For example, if the conversion will would fail, 19:52.400 --> 19:53.400 you could do it. 19:53.400 --> 19:58.400 But we are trying to do it this for the users themselves automatically. 19:58.400 --> 19:59.400 But yeah. 19:59.400 --> 20:02.400 Because isn't it good if you can just keep the VMware drivers just in case, 20:02.400 --> 20:05.400 but what is the further problem we're moving it? 20:05.400 --> 20:09.400 So we are doing the, 20:09.400 --> 20:16.400 if we need to remove the VMware tools and VMware drivers, 20:16.400 --> 20:18.400 we are doing coping. 20:18.400 --> 20:20.400 So we are not touching the source VM. 20:21.400 --> 20:23.400 The source VM is still staying there. 20:23.400 --> 20:25.400 If you want to, 20:25.400 --> 20:29.400 if the migration process fails or something happens, 20:29.400 --> 20:32.400 or you decide it's on working for me, 20:32.400 --> 20:35.400 the VM is still there until you delete it. 20:35.400 --> 20:36.400 Yes? 20:36.400 --> 20:37.400 Yes. 20:37.400 --> 20:40.400 I would love to explore it. 20:40.400 --> 20:47.400 Stay on and touch my VM from two of the errors in the letter of the system storage. 20:47.400 --> 20:49.400 Can you tell or, 20:49.400 --> 20:54.400 it's very nice that I'm going to want to come back to the end? 20:54.400 --> 20:59.400 I quite a question was about exporting from Cuba itself. 20:59.400 --> 21:00.400 Yeah. 21:00.400 --> 21:03.400 Could you be asking for data? 21:03.400 --> 21:08.400 So I am not sure if Cuba has something like that right now, 21:08.400 --> 21:11.400 about exporting VMs from Cuba, 21:11.400 --> 21:13.400 but we do a, 21:13.400 --> 21:16.400 the work that allows an important of OVAs. 21:16.400 --> 21:17.400 Yeah. 21:17.400 --> 21:18.400 Yeah. 21:18.400 --> 21:20.400 Because I experimented with the, 21:20.400 --> 21:22.400 the OCI and the work there, 21:22.400 --> 21:25.400 I couldn't find, you know, the opposite of that. 21:25.400 --> 21:28.400 And I was wondering if you're looking for something or anything. 21:28.400 --> 21:29.400 Just, 21:29.400 --> 21:32.400 I thought that you see that sometimes the, 21:32.400 --> 21:33.400 the VM image, 21:33.400 --> 21:35.400 and then talking from there. 21:35.400 --> 21:36.400 It's a pity. 21:36.400 --> 21:37.400 Exactly. 21:37.400 --> 21:38.400 Yeah. 21:38.400 --> 21:40.400 And then you can use it because it's clear. 21:40.400 --> 21:41.400 Yeah. 21:41.400 --> 21:42.400 Yeah. 21:42.400 --> 21:44.400 And then you can use it because it's clear VM. 21:44.400 --> 21:45.400 Yeah. 21:45.400 --> 21:46.400 Yeah. 21:46.400 --> 21:48.400 And then you can use it because it's clear VM. 21:48.400 --> 21:49.400 Yeah. 21:49.400 --> 21:50.400 Yeah. 21:50.400 --> 21:51.400 Yeah. 21:51.400 --> 21:53.400 What kind of problem did you, 21:53.400 --> 21:55.400 with a really high workload system, 21:55.400 --> 21:57.400 like the database system, 21:57.400 --> 21:58.400 you, 21:58.400 --> 22:00.400 you have a, 22:00.400 --> 22:01.400 like, 22:01.400 --> 22:02.400 as a limitation? 22:02.400 --> 22:03.400 The question was, 22:03.400 --> 22:06.400 what problems we encountered with a, 22:06.400 --> 22:08.400 like, 22:08.400 --> 22:09.400 the data basis? 22:09.400 --> 22:11.400 With a very high workload system, 22:11.400 --> 22:12.400 like the data basis? 22:12.400 --> 22:14.400 With high IOPS. 22:14.400 --> 22:16.400 So we had some problems with, 22:16.400 --> 22:20.400 as CDI, 22:20.400 --> 22:23.400 that it wasn't querying correctly, 22:23.400 --> 22:24.400 the changes, 22:24.400 --> 22:27.400 using the VM versus changing bug tracking, 22:27.400 --> 22:28.400 system, 22:28.400 --> 22:31.400 and we fixed it a few months ago. 22:31.400 --> 22:32.400 And, 22:32.400 --> 22:33.400 yeah. 22:34.400 --> 22:35.400 So it returns, 22:35.400 --> 22:37.400 not all the changes at once. 22:37.400 --> 22:39.400 We needed to do some additional queries for it. 22:39.400 --> 22:41.400 But that was just technical problem. 22:41.400 --> 22:42.400 Yes? 22:42.400 --> 22:44.400 Well, it's slightly off talking. 22:44.400 --> 22:45.400 But if, 22:45.400 --> 22:46.400 broken, 22:46.400 --> 22:47.400 or VM, 22:47.400 --> 22:48.400 while say, 22:48.400 --> 22:49.400 18 months earlier, 22:49.400 --> 22:51.400 do you think red hat would still drop, 22:51.400 --> 22:53.400 so both are over? 22:53.400 --> 22:55.400 I have worked on over it. 22:55.400 --> 22:56.400 And I, 22:56.400 --> 22:59.400 I'm not the right person to answer this. 22:59.400 --> 23:00.400 And, 23:01.400 --> 23:02.400 oh, 23:02.400 --> 23:05.400 there were in the world that they wouldn't revive it or not. 23:05.400 --> 23:06.400 Yeah. 23:06.400 --> 23:07.400 At the end of the day, 23:07.400 --> 23:09.400 it seems like Google is the way to go. 23:09.400 --> 23:10.400 They have, 23:10.400 --> 23:11.400 just for the recording, 23:11.400 --> 23:12.400 the question was, 23:12.400 --> 23:14.400 if the broadcast would announce 23:14.400 --> 23:16.400 our acquired VM earlier, 23:16.400 --> 23:20.400 if red hat would kept over it. 23:20.400 --> 23:21.400 Yeah, 23:21.400 --> 23:23.400 you could hear the answers. 23:23.400 --> 23:25.400 Yes? 23:25.400 --> 23:26.400 So, 23:27.400 --> 23:34.400 if we have tried it on other Kubernetes distributions, 23:34.400 --> 23:35.400 like Rancher, 23:35.400 --> 23:36.400 personally, 23:36.400 --> 23:37.400 I have not. 23:37.400 --> 23:38.400 Sorry. 23:38.400 --> 23:40.400 If it would work. 23:40.400 --> 23:41.400 Yeah. 23:41.400 --> 23:45.400 We are not using anything open to specific. 23:45.400 --> 23:47.400 I think it would work. 23:47.400 --> 23:49.400 Well, 23:49.400 --> 23:50.400 if not, 23:50.400 --> 23:53.400 please open an issue on GitHub. 23:54.400 --> 23:55.400 Yes? 23:55.400 --> 23:56.400 Yes? 23:56.400 --> 23:58.400 If not, 23:58.400 --> 24:00.400 please open an issue on GitHub. 24:00.400 --> 24:01.400 Yes? 24:01.400 --> 24:04.400 Do you have any questions for migration? 24:04.400 --> 24:05.400 Exactly. 24:05.400 --> 24:06.400 Why is that? 24:06.400 --> 24:08.400 If you see the question from Google, 24:08.400 --> 24:09.400 Google, 24:09.400 --> 24:10.400 Google, 24:10.400 --> 24:11.400 Google, 24:11.400 --> 24:12.400 can I say, 24:12.400 --> 24:13.400 Pakistan trust with Google, 24:13.400 --> 24:15.400 only after Google based? 24:15.400 --> 24:17.400 The question was, 24:17.400 --> 24:22.400 if we can order or group the VMs throughout the migration, 24:23.400 --> 24:24.400 that is feature, 24:24.400 --> 24:27.400 which will actually cost us the lot. 24:27.400 --> 24:29.400 We've been thinking about it, 24:29.400 --> 24:30.400 and not right now, 24:30.400 --> 24:33.400 but what we can do is create a separate plans. 24:33.400 --> 24:36.400 You can do a group of VMs together, 24:36.400 --> 24:38.400 within the plan itself, 24:38.400 --> 24:41.400 and then you can create another plan for additional VMs. 24:41.400 --> 24:43.400 Cool. 24:43.400 --> 24:46.400 Anybody else? 24:46.400 --> 24:47.400 Cool. 24:47.400 --> 24:48.400 In that case, 24:48.400 --> 24:49.400 thank you everybody. 24:49.400 --> 24:50.400 Thank you. 24:50.400 --> 24:51.400 Thank you.