Home Lab Threadripper Build Thread.

Soldato
Joined
17 Nov 2007
Posts
3,161
When people were trying Ryzen R7 with ESXi it was also failing, not sure on the exact error but disabling SMT resolved the issue and allowed it to boot.

Obviously disabling SMT is not ideal but worth checking.

On the AMD slides during the build up and release did they have VMWare listed as a partner, maybe the official patches will be released with EPYC.
 
Man of Honour
OP
Joined
30 Oct 2003
Posts
13,229
Location
Essex
When people were trying Ryzen R7 with ESXi it was also failing, not sure on the exact error but disabling SMT resolved the issue and allowed it to boot.

Obviously disabling SMT is not ideal but worth checking.

On the AMD slides during the build up and release did they have VMWare listed as a partner, maybe the official patches will be released with EPYC.

The SMT issue with Ryzen 7 was related to an apparent 15 core hard limit in ESXI which I think was patched about a month ago. Disabling SMT kept the core count down allowing things to boot.

I Did try SMT last night with little luck. I do however wonder what happens if I disable a ccx essentially turning it into an 1800x. I'll give that a try as well. Life would be much easier if it threw an error but instead it just sits there. I do wonder if it could be something all together more simple like ESXI trying to pass vga, in turn crashing what I can see but perhaps working still behind the scenes. I should be able to verify if it's actually crashing by seeing if I can ping It or beat it into submission in the console this afternoon.
 
Soldato
Joined
17 Nov 2007
Posts
3,161
I had similar issues when building an ESXi lab on some HP Pro Desktops, tried multiple versions but in the end got 6.5 working with BIOS update and some boot parameters.

Also check to make sure you dont have anything related to boot level security in the BIOS and maybe also look for some legacy BIOS support, both of those were needed from memory, something to do with UEFI bypass, was a while back now so cant be more specific sadly.
 
Man of Honour
OP
Joined
30 Oct 2003
Posts
13,229
Location
Essex
I had similar issues when building an ESXi lab on some HP Pro Desktops, tried multiple versions but in the end got 6.5 working with BIOS update and some boot parameters.

Also check to make sure you dont have anything related to boot level security in the BIOS and maybe also look for some legacy BIOS support, both of those were needed from memory, something to do with UEFI bypass, was a while back now so cant be more specific sadly.

Thanks, anything to go on is better than nothing so I'll see what I can find in the bios and I will of course update in here as I go with what I have tried ect. I'm at home now so am gonna watch the GoT then have a poke around with ESXi and see where I get. I also bought my nas home so when/if I get ESXi up and running ill do a little write up on that and the firewall. :)

Edit: I was just thinking that in my production environment I am running 5.5 still because of some other issues that littered version 6 alongside the san we run so I might even give that a blast just to see if it stalls at the same point.
 
Last edited:
Man of Honour
OP
Joined
30 Oct 2003
Posts
13,229
Location
Essex
Finally I got esxi 6.5 working. The threadripper esxi host lives :) It's been a nightmare and I spent nights trawling through the bios not really sure what combination of enabling and disabling stuff might get it going but nothing seemed to work. So tonight I sat down and decided to take a different approach. I loaded up my esxi 6.5 media onto a usb and then in workstation I created a new esxi VM and booted the usb.

In workstation it built slowly but without a problem so knowing the installer hangs at or just after loading vwm_ahci I set about seeing what was next in the boot order, I was trawling through the boot.cgf and found this:



A list of modules loaded and what appeared to be the order so next up was xhci_xhc or USB3, now I had disabled all the usb3 on the taichi but apparently that didn't matter so knowing that on the bare metal this is where it crashes I went about disabling the new versions of the drivers 6.5 uses:



Once I booted back into the bare metal it was game on:



Now we are at a good point I'll work on exposing storage and getting some VM's built. I might even have a fiddle with passing graphics etc through just because I have never tried before. I started rebuilding the NAS last night and it's really not a bad little unit for what it is. It doesn't hold a candle to the qnap 8bay I've been using but it's certainly feature rich enough, supports lan teaming and has the right stuff to be exposed directly to esxi. For now I've packed it full of 1tb drives and left it building a raid 5 array. I'll take some screen shots and pics of it tonight to share with you good folks.
 
Last edited:
Man of Honour
OP
Joined
30 Oct 2003
Posts
13,229
Location
Essex
Great work, ESXi on Threadripper is very nice indeed.

It just goes to prove that if your prepared to tinker anything is possible :)

I do think I may have a little problem, I keep looking at the RX Founders Edition and have been talking myself out of pressing the go button for days. Perhaps Vega 56 will fill the gap but that blue card does look nice.... and its blue so surely that makes up for the sky high pricing?

Edit: Did I mention it is blue?
 
Associate
Joined
3 Dec 2005
Posts
986
Location
UK
Finally I got esxi 6.5 working. The threadripper esxi host lives :) It's been a nightmare and I spent nights trawling through the bios not really sure what combination of enabling and disabling stuff might get it going but nothing seemed to work. So tonight I sat down and decided to take a different approach. I loaded up my esxi 6.5 media onto a usb and then in workstation I created a new esxi VM and booted the usb.

In workstation it built slowly but without a problem so knowing the installer hangs at or just after loading vwm_ahci I set about seeing what was next in the boot order, I was trawling through the boot.cgf and found this:



A list of modules loaded and what appeared to be the order so next up was xhci_xhc or USB3, now I had disabled all the usb3 on the taichi but apparently that didn't matter so knowing that on the bare metal this is where it crashes I went about disabling the new versions of the drivers 6.5 uses:



Once I booted back into the bare metal it was game on:



Now we are at a good point I'll work on exposing storage and getting some VM's built. I might even have a fiddle with passing graphics etc through just because I have never tried before. I started rebuilding the NAS last night and it's really not a bad little unit for what it is. It doesn't hold a candle to the qnap 8bay I've been using but it's certainly feature rich enough, supports lan teaming and has the right stuff to be exposed directly to esxi. For now I've packed it full of 1tb drives and left it building a raid 5 array. I'll take some screen shots and pics of it tonight to share with you good folks.

There is a thread on the Unraid forums discussing this issue in depth, pass through gfx is still an issue. I have my rig in the shopping basket with Ocuk but need to have this feature working before I pull the pin.

https://forums.lime-technology.com/topic/55150-anybody-planning-a-ryzen-build/?page=26
 
Man of Honour
OP
Joined
30 Oct 2003
Posts
13,229
Location
Essex
There is a thread on the Unraid forums discussing this issue in depth, pass through gfx is still an issue. I have my rig in the shopping basket with Ocuk but need to have this feature working before I pull the pin.

https://forums.lime-technology.com/topic/55150-anybody-planning-a-ryzen-build/?page=26

I think i'm over the guts of the issues. I can now do pretty much all the stuff I need to, create VM's, distribute resources etc. However I still don't have access to all my core storage, ESXi sees the controllers, nvme drives and usb drives but nothing more. It doesn't see any sata hard disks. Right now I'm exposing an NFS share from a nas but would like more access to local drives. I am pretty sure I can sort this though as the AHCI drives are mapped via hardware ID's in a map file. It's documented by a fella called Andreas Peetz, If you search for making an unsupported ahci driver work in esxi.

Still not fully played with gpu pass through as had quite a lot of esxi issues up to this point if I am honest. Where I am at now though I have a bunch of servers up and running, can happily switch between esxi running inside windows vmware workstation to a full bare metal hypervisor just by restarting so thats all good. I guess really its just working and I have as much cpu resource as I need and more to run my labs. I keep meaning to spend a few hours attempting to get those ahci drivers into the build as well.

The firewall is also still not set up and because of that im not quite getting the throughput I would like from the nas but for now that is ok.
 
Associate
Joined
3 Dec 2005
Posts
986
Location
UK
Thanks for the detailed reply.

I need something that will work with VMs and give me no issues.

Do you see that possible with ryzen or TR?
 
Man of Honour
OP
Joined
30 Oct 2003
Posts
13,229
Location
Essex
Thanks for the detailed reply.

I need something that will work with VMs and give me no issues.

Do you see that possible with ryzen or TR?

No issues is going to be a little bit of a stretch as to me it appears that there are some unsupported devices that you need to deal with, these are primarily USB3.1 and the AHCI drivers for SATA disks. If you buy the right board (the taichi for example) has supported NIC's and two of them for good measure so you should be good there. If you can live with USB2 compatability and are willing to do some messing about to get AHCI working then your golden. To get it up and running you will need to use the 6.5U1 iso (not the rollup iso) as the rollup iso incorporates another AHCI driver and the rollback puts you at yet another non working ACHI driver rather than the legacy one.

I am now at a point where, for me, everything (bar local sata) seems to work so this has been somewhat a success. Now I appreciate that not everybody will have a NAS capable of NFS which can be used as an ESXI Datastore so really I need to step this up to the next level and get the unsupported SATA controllers working, this should help people who are on the fence with such a build I imagine. I guess because I know it can be done I haven't really spent the time to do it in order to prove what I think. Over the next few nights when I can find a tad more time I will set about getting the controllers running. These are the AMD sata controller ID's as follows:

0000:09:00.2 SATA controller Mass storage controller: Advanced Micro Devices Inc AMD FCH SATA Controller [AHCI Mode] [vmhba1]
Class 0106: 1022:7901
--
0000:44:00.2 SATA controller Mass storage controller: Advanced Micro Devices Inc AMD FCH SATA Controller [AHCI Mode] [vmhba2]
Class 0106: 1022:7901

Now armed with this info really I need to modify the map files to incorporate these devices which in itself shouldn't be that much of an issue. What I guess I am trying to say is that depending on your storage subsystem and requirements, and forgetting GPU pass through for now IMO Ryzen and TR are viable, certainly in a home lab environment at least. In production I would be slightly more apprehensive given the lack of support and info out there.

Remember if you do build, following the way I have done it here will save you a significant amount of time so installing ESXi within Workstation, making the changes and then booting into the bare metal.

I really need to bring this thread up to date with a bit more info and seen as I have a new Video card coming tomorrow I may try and find the time tonight to tweak the final bits. Just to be clear what I do have no is stable, fast and works.
 
Last edited:
Man of Honour
OP
Joined
30 Oct 2003
Posts
13,229
Location
Essex
is it blue? :>

It was in fact very blue and this secretly turned me on a little (shh don't tell the wife). But the smart man in me stepped in and I settled for a lesser card :( No Blue Vega FE for me! I'll just have to settle with a boring black Vega 56.
 
Man of Honour
OP
Joined
30 Oct 2003
Posts
13,229
Location
Essex
I am hoping VMWare roll out a TR / EPYC compatible ESXi release.

It wasn't that long ago I was at a vmware/AMD event I should really have a look over my vmware contacts and see if I can't get some info. I'll try and make some calls tomorrow.
 
Man of Honour
OP
Joined
30 Oct 2003
Posts
13,229
Location
Essex
I was thinking on the way home, anybody in here interested in a TR/Ryzen ready ISO of 6.5? I have been having a bit of a play and don't think it would be all that much trouble to make one.
 
Soldato
Joined
18 Aug 2007
Posts
9,689
Location
Liverpool
I was thinking on the way home, anybody in here interested in a TR/Ryzen ready ISO of 6.5? I have been having a bit of a play and don't think it would be all that much trouble to make one.

Even if you don't get many replies outright, I daresay that image would end up being used by quite a few people mate. I'd say go for it (and I don't have either CPU... yet?). Maybe cross post it into the Servers and Enterprise section.
 
Man of Honour
OP
Joined
30 Oct 2003
Posts
13,229
Location
Essex
Even if you don't get many replies outright, I daresay that image would end up being used by quite a few people mate. I'd say go for it (and I don't have either CPU... yet?). Maybe cross post it into the Servers and Enterprise section.

Always interest in a Ryzen ready ISO, and with Ryzen gaining popularity it'd be a well used one. Go for it :)

I think I may put something together. The only real issue I see here is that you would have to run ESXI in community supported mode which really isn't what you want for production. I have reached out to a couple of guys at VMWARE today as well as a channel partner who has a little clout. I will see what they come back to me with before investing the time. The guy I know heads up VMWare Horizons side of things but with a bit of luck will be able to give me a little insight.

What I do find quite interesting is VMWARE claim full support for this device ID:



EDIT: Just had a call back from my guy as VMWare and he is going to do some digging and attempt to give some timelines. Cant ask for more than that really. Now all I have to do is give a detailed email of everything I want to know.
 
Last edited:
Soldato
Joined
18 Oct 2002
Posts
3,027
Location
Pentonville Prison
Missed getting an email notification so missed the progress. Vince was that you posting on VMware forum with the same issues as here? ;)

Would be interested in the ISO. Mostly interested in IF passthrough works without issues as with KVM and the NPT issue you have a choice of losing performance one of two mutually exclusive ways :( For me the whole point of TR is to run multiple VMs each with a GPU and USB card (most likely, if onboard USB is needing to be disabled). Also need to check the IOMMU groupings as well as from your posts it looks like they also matter on ESXi like with KVM ??

Great work sharing your experiences!
 
Back
Top Bottom