Home Lab Threadripper Build Thread.

I had similar issues when building an ESXi lab on some HP Pro Desktops, tried multiple versions but in the end got 6.5 working with BIOS update and some boot parameters.

Also check to make sure you dont have anything related to boot level security in the BIOS and maybe also look for some legacy BIOS support, both of those were needed from memory, something to do with UEFI bypass, was a while back now so cant be more specific sadly.

Thanks, anything to go on is better than nothing so I'll see what I can find in the bios and I will of course update in here as I go with what I have tried ect. I'm at home now so am gonna watch the GoT then have a poke around with ESXi and see where I get. I also bought my nas home so when/if I get ESXi up and running ill do a little write up on that and the firewall. :)

Edit: I was just thinking that in my production environment I am running 5.5 still because of some other issues that littered version 6 alongside the san we run so I might even give that a blast just to see if it stalls at the same point.
 
Last edited:
Finally I got esxi 6.5 working. The threadripper esxi host lives :) It's been a nightmare and I spent nights trawling through the bios not really sure what combination of enabling and disabling stuff might get it going but nothing seemed to work. So tonight I sat down and decided to take a different approach. I loaded up my esxi 6.5 media onto a usb and then in workstation I created a new esxi VM and booted the usb.

In workstation it built slowly but without a problem so knowing the installer hangs at or just after loading vwm_ahci I set about seeing what was next in the boot order, I was trawling through the boot.cgf and found this:



A list of modules loaded and what appeared to be the order so next up was xhci_xhc or USB3, now I had disabled all the usb3 on the taichi but apparently that didn't matter so knowing that on the bare metal this is where it crashes I went about disabling the new versions of the drivers 6.5 uses:



Once I booted back into the bare metal it was game on:



Now we are at a good point I'll work on exposing storage and getting some VM's built. I might even have a fiddle with passing graphics etc through just because I have never tried before. I started rebuilding the NAS last night and it's really not a bad little unit for what it is. It doesn't hold a candle to the qnap 8bay I've been using but it's certainly feature rich enough, supports lan teaming and has the right stuff to be exposed directly to esxi. For now I've packed it full of 1tb drives and left it building a raid 5 array. I'll take some screen shots and pics of it tonight to share with you good folks.
 
Last edited:
Great work, ESXi on Threadripper is very nice indeed.

It just goes to prove that if your prepared to tinker anything is possible :)

I do think I may have a little problem, I keep looking at the RX Founders Edition and have been talking myself out of pressing the go button for days. Perhaps Vega 56 will fill the gap but that blue card does look nice.... and its blue so surely that makes up for the sky high pricing?

Edit: Did I mention it is blue?
 
There is a thread on the Unraid forums discussing this issue in depth, pass through gfx is still an issue. I have my rig in the shopping basket with Ocuk but need to have this feature working before I pull the pin.

https://forums.lime-technology.com/topic/55150-anybody-planning-a-ryzen-build/?page=26

I think i'm over the guts of the issues. I can now do pretty much all the stuff I need to, create VM's, distribute resources etc. However I still don't have access to all my core storage, ESXi sees the controllers, nvme drives and usb drives but nothing more. It doesn't see any sata hard disks. Right now I'm exposing an NFS share from a nas but would like more access to local drives. I am pretty sure I can sort this though as the AHCI drives are mapped via hardware ID's in a map file. It's documented by a fella called Andreas Peetz, If you search for making an unsupported ahci driver work in esxi.

Still not fully played with gpu pass through as had quite a lot of esxi issues up to this point if I am honest. Where I am at now though I have a bunch of servers up and running, can happily switch between esxi running inside windows vmware workstation to a full bare metal hypervisor just by restarting so thats all good. I guess really its just working and I have as much cpu resource as I need and more to run my labs. I keep meaning to spend a few hours attempting to get those ahci drivers into the build as well.

The firewall is also still not set up and because of that im not quite getting the throughput I would like from the nas but for now that is ok.
 
Thanks for the detailed reply.

I need something that will work with VMs and give me no issues.

Do you see that possible with ryzen or TR?

No issues is going to be a little bit of a stretch as to me it appears that there are some unsupported devices that you need to deal with, these are primarily USB3.1 and the AHCI drivers for SATA disks. If you buy the right board (the taichi for example) has supported NIC's and two of them for good measure so you should be good there. If you can live with USB2 compatability and are willing to do some messing about to get AHCI working then your golden. To get it up and running you will need to use the 6.5U1 iso (not the rollup iso) as the rollup iso incorporates another AHCI driver and the rollback puts you at yet another non working ACHI driver rather than the legacy one.

I am now at a point where, for me, everything (bar local sata) seems to work so this has been somewhat a success. Now I appreciate that not everybody will have a NAS capable of NFS which can be used as an ESXI Datastore so really I need to step this up to the next level and get the unsupported SATA controllers working, this should help people who are on the fence with such a build I imagine. I guess because I know it can be done I haven't really spent the time to do it in order to prove what I think. Over the next few nights when I can find a tad more time I will set about getting the controllers running. These are the AMD sata controller ID's as follows:

0000:09:00.2 SATA controller Mass storage controller: Advanced Micro Devices Inc AMD FCH SATA Controller [AHCI Mode] [vmhba1]
Class 0106: 1022:7901
--
0000:44:00.2 SATA controller Mass storage controller: Advanced Micro Devices Inc AMD FCH SATA Controller [AHCI Mode] [vmhba2]
Class 0106: 1022:7901

Now armed with this info really I need to modify the map files to incorporate these devices which in itself shouldn't be that much of an issue. What I guess I am trying to say is that depending on your storage subsystem and requirements, and forgetting GPU pass through for now IMO Ryzen and TR are viable, certainly in a home lab environment at least. In production I would be slightly more apprehensive given the lack of support and info out there.

Remember if you do build, following the way I have done it here will save you a significant amount of time so installing ESXi within Workstation, making the changes and then booting into the bare metal.

I really need to bring this thread up to date with a bit more info and seen as I have a new Video card coming tomorrow I may try and find the time tonight to tweak the final bits. Just to be clear what I do have no is stable, fast and works.
 
Last edited:
is it blue? :>

It was in fact very blue and this secretly turned me on a little (shh don't tell the wife). But the smart man in me stepped in and I settled for a lesser card :( No Blue Vega FE for me! I'll just have to settle with a boring black Vega 56.
 
I was thinking on the way home, anybody in here interested in a TR/Ryzen ready ISO of 6.5? I have been having a bit of a play and don't think it would be all that much trouble to make one.
 
Even if you don't get many replies outright, I daresay that image would end up being used by quite a few people mate. I'd say go for it (and I don't have either CPU... yet?). Maybe cross post it into the Servers and Enterprise section.

Always interest in a Ryzen ready ISO, and with Ryzen gaining popularity it'd be a well used one. Go for it :)

I think I may put something together. The only real issue I see here is that you would have to run ESXI in community supported mode which really isn't what you want for production. I have reached out to a couple of guys at VMWARE today as well as a channel partner who has a little clout. I will see what they come back to me with before investing the time. The guy I know heads up VMWare Horizons side of things but with a bit of luck will be able to give me a little insight.

What I do find quite interesting is VMWARE claim full support for this device ID:



EDIT: Just had a call back from my guy as VMWare and he is going to do some digging and attempt to give some timelines. Cant ask for more than that really. Now all I have to do is give a detailed email of everything I want to know.
 
Last edited:
Missed getting an email notification so missed the progress. Vince was that you posting on VMware forum with the same issues as here? ;)

Would be interested in the ISO. Mostly interested in IF passthrough works without issues as with KVM and the NPT issue you have a choice of losing performance one of two mutually exclusive ways :( For me the whole point of TR is to run multiple VMs each with a GPU and USB card (most likely, if onboard USB is needing to be disabled). Also need to check the IOMMU groupings as well as from your posts it looks like they also matter on ESXi like with KVM ??

Great work sharing your experiences!

That was indeed me.

Let me try and answer a few things here and also ill try and run a few other tests for you tonight to see where I am at with other things. In terms of USB, the command that I run in esxi only drops back to a legacy driver (usb2 driver) so I still have full use of USB in ESXI just not at USB3 speeds, I have checked and can mount USB devices into the servers without any issue.

Things im going to explore this evening

-GPU Passthrough - I now have a couple of GPU's to try this with so might try it with both Vega and rx480.
-IOMMU Groupings is something that is pretty new to me, I had never heard of it probably because it's not something that has ever given me any issues in our production environment, but since you mentioned it I have done a little research and I do mean a little - I haven't had any issues here yet and in fact I have the feature disabled in my bios right now. I suspect that I might see some problems when I enable IOMMU and start playing with passthrough. Ill update you later. Tonight ill install a windows 7 vm and see what we can do.
 
GPU Passthrough works well on the Taichi. Last night I used pci pass through of my Vega to a server 2016 machine. I Installed the drivers AND then had a little issue with the console not being accessible but was able to rdp to the machine and confirm that the device is there which it was. Couple of things to consider, if you pass through your only gpu esxi will look like it has crashed on start as esxi consumes the gpu for pass through, this means you lose the esxi console. I confirmed that all was well with an SSH session into the host as well as accessing the web front end of the server.

Tonight I'll install a Linux vm and run the script @Methanoid posted on gigabytes forum. I can't remember the last time I played with a Linux distro, must be 10 years or more.

Edit: It's important I think to point out here as well that in terms of what I do for work, we would never have a requirement for gpu pass through, in fact I would guess that it wouldn't be a requirements for your average corporate who would probably not be using it unless they are for example rendering or encoding but then you would most likely be using a commercially supported product anyway.
 
Last edited:
Yeah but always useful to know how each board handles it.. whilst I intend to get the Gigabyte the Asrock might turn out better for me

I actually ordered the Gigabyte about a week before Threadrippers release but had problems with supply so when ocuk had some taichi in stock I had a look, saw that it had dual intel nics on the esxi support list and decided to take the plunge. I'm going to build a linux distro tonight and get that info for you, it will be interesting to get your thoughts.
 
Ahh I forgot to update the other night. I did build a Linux machine, Ubuntu Live which I got running. I just didn't manage to run the script - I will do that tonight. I built it then bowed to peer pressure and went for a game of PUBG after which I went to bed and forgot about it. I have only just got in from work but I will get back on it later this evening after dinner :)

@Methanoid Question for you mate, you said in the Gigabyte forums post "Make sure you boot with kernel options including "amd_iommu=on" otherwise it wont be enable in OS", Im a complete Linux noob can you tell me how to do this?
 
Hello there! Thanks for sharing about your build - in fact, I've a similar ESXi build to yours in my plans, with an AMD 1900X and the same ASRock X399 Taichi. If you don't mind me requesting, perhaps you could have a simple summary of the major issues you had and the fixes for the issues? Did you manage to get SATA working out in the end? I'm not one who has a spare NAS to work with so I'd consider it to be pretty high in my "required to be working" list, along with pcie (GPU and other pcie peripherals) and usb pass through, which I see you have managed to get running.

Hey buddy, I can indeed. The build at this point in time isn't a straight forward plug and play jobbie as I expect it will be with ESXi updates. There are certainly issues as you point out some of which I am yet to find a fix for. A brief summary of the issues in the order in which you are likely to hit them (caveat to this is using the following install files, VMware-VMvisor-Installer-6.5.0.update01-5969303.x86_64.iso, the rollup iso cannot at this point be used to install esxi in threadripper as the rollback scripts roll back to the driver used in the first iso which as we know at this point don't work):

1) On install the installer hangs on vmx_ahci, To get past this issue I rolled back the AHCI drivers. With the rolled back drivers I can see AHCI controllers and even select them for pass through but cannot see the disks in ESXI to use as datastores. M.2 Disks do not have this issue and are working a treat in esxi. Be warned that onboard sata raid is also probably not an option unless supported by esxi which I am yet to test but can certainly build a quick array of spare 1tb drives destined for the Members Market to test.

2) The installer hangs on xhci_xhc on install, I suspect that this is because the usb 3.1 Type C support on the chipset, again I rolled back the drivers and am currently running them at USB2 speeds.

3) GPU and PCI pass through, all testing indicates that this just works, well kinda, if using AMD cards in terms of GPU, as for other pci devices I see no reason why the same would not apply. I have not tested this on NV GPU's but could if required, I see no reason why they would be any different. Some things to be aware of here, pass through of your only gpu will result in the esxi loader looking like it has crashed on boot and this means no direct console to that host. It hasn't crashed! and if you look at where it appears to have crashed you will notice that it crashes as the gpu is passed through. Luckily by this point I had enabled remote management and SSH on my host so opened up a shell and confirmed that the host was in fact up and loaded. Next up I used the web client which BTW I absolutely hate compared to the old 5.5 vsphere (which still works up to V6.0) and logged into the host over the web and everything was golden. I fired up the VM with the GPU passed through and tried the web console which wouldn't work for the VM, again thinking something was up and having already enabled RDP I was able to log onto the machine via that method and check that all the hardware was installed and it was, at this point I needed to get a bit creative as many GPU tasks simply don't work over RDP so I installed VNC, connected that way and installed the AMD drivers. A bit of a faf but working. It looks to me like the vmware virtual adapter which I believe is required for the console gives it's duties over to the new adapter. I am convinced I can fix this but haven't done more testing simply because I haven't had the time.

To summarise, so far everything I have works save using SATA devices as datastores, which is a pretty big deal and probably a deal breaker for most. Because I had the NAS and a schedule with something to finish I haven't invested the time to properly fix this but think that I can and at this point in time have a couple of methods to try. Something I did find very interesting was poking around in the bios a few days ago I noticed an option to run the SATA as a different device ID (I'll dig more into this later), secondly I have some contacts in VMWare who I promised to send info to but haven't yet as BT have been consuming my days for all the wrong reasons. I suggest though that I will be able to resolve this issue.

Hopefully some stuff in here will help, I mentioned a threadripper iso previously which is still not off the cards, I do also know that all of these issues are 100% correctable but might involve some deep dives into incorporating device drivers or community supported vib files into the build which in reality is fine for a home lab but in production you would have to be slightly mental or perhaps slightly sadistic to run.

I do intend to finish the linux piece for Methanoid and will dig deeper into the SATA issue but work and life have been getting in the way and I don't have as much spare time as I would like to do these things. I do intend to do a few more updates though including some on the Forti, NAS and UPS and I will of course keep updating this with progress when I get around to trying to resolve other issues.
 
Last edited:
Ah, of course totally unreasonable request but have you/could you tried/try Xen Hypervisor also? It seems the trinity is ESXi/KVM/Xen. We know ESXi has some issues but kinda works, KVM is a no (until they fix the AMD NPT issue) but I am hearing Xen may be fine....

It's ok - I will give it a shot. I messed about for ages again last night only to find that the new updated iso is exactly the same as the one I tried last time. Same Hashes and everything... Probably should have checked that properly before tearing it all down :)
 
Thanks for the details Vince. Hope you can figure out how to make the Sata port work!

Here's what I've gathered about ESXi X399 board compatibility thus far, hopefully it can help with your troubleshooting:
1) There appears to be similar installer PSOD issues with Gigabyte X399 AORUS Gaming 7 ( https://communities.vmware.com/thread/570854 ).
2) Youtube has a video showing the ASUS ROG Zenith Extreme X399 to be fully functional with ESXi without the need for any hacks ( https://www.youtube.com/watch?v=owQq2XmHiQg ).
3) Reddit contains a post with instructions for enabling GPU-passthrough on the ASUS Prime X399-A so I presume it should be functional too ( https://www.reddit.com/r/Amd/comments/72ula0/tr1950x_gtx_1060_passthrough_with_esxi ).

Cheers for this, I have been playing about with this tonight and actually just commented on that video before coming back here and checking the thread. That video unfortunately doesn't at this point tell us much as the sata controller on X399 as far as I can tell is exactly the same on every board so unless there is something glaringly obvious that I am missing I suspect that many of the same workarounds have probably been applied. I am just about to have another attempt with the latest iso off of the vmware site but I don't think that the results will be any different as all the hashes are the same as the version I downloaded at the beginning of the project. I can run everything NVME, NFS, GPU's etc but with the sata nobody is really giving much away. That video for example shows us nothing at all and if you look over to the left towards the beginning it isn't showing any local storage datastores. I have left this thread and my one over at the vmware community forums in the comments so with a bit of luck we might get some info.

Ill update with my findings after giving this new install a bash.
 
From the comments in the video, it appears that SATA works on the ROG board:

Looks like I might have to buy a ROG and test then. I really didnt want to have to change the board but if it 100% works on the ROG then that is what I will do.
 
Last edited:
Right I have watched that video over and over again and I have some issues dropping £500 on a board before I can clear up a few things. The guy says he has VM's running but his video suggests otherwise. If you look at the beginning of the video 20 seconds you will note his install from the same part as this image from my install:



The grey squares on the left show 0 vm's 0 available storage and 1 in the VM network area. I am not saying that I don't believe the guy but to drop £500 on that video alone is not something I am willing to do at this point without some confirmation that he does actually have available sata storage. If sata storage was available the available disks should be in this list along with the nvme drive, if you have 0 under storage there as in that video then storage is not exposed to esxi.

I also don't see any datastores on his host like in this image below:



At this point unless somebody tells me otherwise with some decent proof, a video of the install would do the job, I am assuming that anybody running esxi on threadripper is either on san, nas or m.2 storage and using the walkarounds I have published in here and in the VMWare community forums.

To update on my situation I have tried all the images available on the vmware site again and they are all the same as they were before, same issues, same fixes. I've spent another few hours messing around and have not been able to improve the sata situatuation. I am about to try one more thing before I call it a night, for me I can get along like this for the moment but I would really benefit from having the sata available. I could have a look at running raid and see what happens but I suspect we wont see much in the way of improvement, it could be an option though if vmware can see a single disk or more in raid 0. I will give that a try but I will need to shift around some data so perhaps a job for tomorrow.

I will read the manual for the rog as its actually got 2 less sata ports than the taichi. My bet is that they are the same amd 0x7901 device used on the Taichi. All the boards seem to be exactly the same, probably because the sata is provided by the x399 chipset.
 
Last edited:
Back
Top Bottom