Home Lab Threadripper Build Thread.

So for a bit of a test I have just bought myself a pci-e card with a couple of sata ports based on the ASMedia 1061 controller that I know should work. I may also buy the prime and have a look but am undecided at this point.
 
Last edited:
Another update. I blanked a disk last night, changed over to raid running the disk solo in a raid 0 array. Didn't Work!

I looked further into the map files /etc/vmware/ahci.map and you can see the ID for the amd Ahci device ID in the map showing support for the device but because I can't get past the loading of the driver I don't hold out much hope in seeing the hardware attached to the controllers. The controllers and disks alone can be passed to VM's via pass through as you don't need the drivers in esxi to allow this. Something tells me that the problem is more closely tied to chipset support than it is support for the cpu.

As far as I can see, unless you can force the load on vmk_ahci somehow without dropping back the driver to legacy then it's not going to work. Do we know if anybody on these here forums is rocking an Asus board and has a spare 2gb ish usb stick? Somebody who has some time on there hands and is willing to spare an hour or two in the name of science?

If your reading this and you also have Asus x399 and a couple of hours you could spare please trust me I will make it worth your while perhaps with some free hardware or something?

I will also update my comments on that video asking the fella to drop in here and perhaps join in the conversation.

Also does anybody know if there is an Asus rep on these forums?
 
Last edited:
So that fella replied to my comments on youtube and I have asked him to look at this thread for stuff that the community would love confirmation on.

@Art OF WAR

We really need to see what the ROG is presenting to VMWare in terms of storage. The following screenshots from your build would help massively:









Edit: It would also probably be better if these images were at sensible resolutions unlike mine :)
 
Last edited:
Before you put down money on a new motherboard, may I also check if you did update the BIOS of your current ASRock Taichi (Latest BIOS version 1.70, released on 2/Oct/2017 http://www.asrock.com/mb/AMD/X399 Taichi/index.asp#BIOS )?

AMD has been constantly working with the motherboard manufacturers and BIOS updates fixing compatibility and bugs are constantly being released. Perhaps you should at least get the board updated to the latest BIOS there is and see if things work better (Hopefully the vanilla installer might work too after update?) as I know that some x370 boards were also having problems with virtualization until the BIOS was updated.

In addition, I've spotted a MSI X399 Gaming Pro Carbon AC that ( https://qiita.com/strat/items/f741774d129206002cfc , Google Translate link: https://translate.google.com.sg/tra...://qiita.com/strat/items/f741774d129206002cfc ) seems to be working after injecting xachi drivers using ESXi-Customizer-PS. Perhaps you might want to give driver injection a go and see if things will work?
(Also do take note that after searching further, I've found a thread that the MSI board seems to PSOD under load https://forum-en.msi.com/index.php?topic=293170.0 , I've also found another page where I lost the link, where the MSI board was reported to install and function fine but PSOD approximately every 10-15 minutes).

Money on a new board is of course the last route I want to go down, I like the features of the Taichi which is why I picked it but if money fixes the problem then happy days. In terms of the bios I tried an install last night on the latest bios and have tried on every bios the board has available (all 3 of them). What I don't get is that the drivers are actually included in the VMWare images and this is shown by running the SSH command:

Code:
lspci -v | grep "Class 0106" -B 1

Which outputs the following:

Code:
0000:09:00.2 SATA controller Mass storage controller: Advanced Micro Devices Inc AMD FCH SATA Controller [AHCI Mode] [vmhba1]
         Class 0106: 1022:7901
--
0000:44:00.2 SATA controller Mass storage controller: Advanced Micro Devices Inc AMD FCH SATA Controller [AHCI Mode] [vmhba2]
         Class 0106: 1022:7901

The square brackets e.g. [vmhba1] designates that the device is already in the support list and consequently also in the device AHCI map file.



I have had a good read of what there is at that Japanese site and its vague to say the least, I have access to the ESXi customizer so I guess that is next on the hit list. What concerns me is that VMWare claim full native support for the device ID. With that post being so vague its difficult to determine what it is the person actually did. I will have a sata card tomorrow and that should allow me to show what should happen in ESXi when a standard sata disk is exposed. In the mean time I will look further into what driver I might be able to inject into the installer to make it work. My understanding is though and this could well be flawed, is that pretty much all we are doing when injecting drivers is modifying the ahci.map file and corresponding map files.

There is a small chance as well that somebody from VMWare might step in to shed some light for us. I have reached out again today to somebody I know at VMWare and have pointed him in the direction of this thread and my one on the vmware community forums. Not being a tech himself he has assured me he will pass it onto one of the techs there who hopefully might be able to find some time to shed some light on what they think the issue might be.
 
Last edited:
From my experience with patching my old systems that all require driver injection for SATA to function, I believe the xachi drivers package mentioned is the sata-xahci package available here: https://www.v-front.de/2013/11/how-to-make-your-unsupported-sata-ahci.html

P.S: I suspect the SATA issue may be a similar issue to the Intel i211 and i350 NICs being "detected" but not initialized as documented here: https://www.v-front.de/2015/08/a-fix-for-intel-i211-and-i350-adapters.html

That community supported VIB at the top there nukes my build. I have tried it a few times without success... I guess one more try can't hurt.
 
Just a random though, have you tried installing to a SATA disk? I had an install once (Intel NUC DC3217BY), where if the install was done on USB, it would boot just fine, but on SATA, and the NUC would just PSOD (Tested with both ESXi 5.5 and 6.0). I'm wondering if the opposite would happen here (Sort of a "force-enable" for the ESXi drivers since it detects but doesn't initialize it, maybe booting from it would force it to initialize it?)


I completely understand your standpoint. Just like you, I'm hesitating putting down lots of money on a Threadripper build only for it to not be fully-functional, plus the ASUS ROG Zenith Extreme X399 would essentially be putting down quite a sizeable sum for reduced "ESXi useable" features (Lesser SATA ports and paying a ton of additional "useless gaming features" (ESXi-unsupported 10G NIC, LED customization, integrated wireless AD, etc).

That's really my point, when it comes down to esxi of all the boards the Taichi is the one with the most going for it. If we can get over this sata issue then we have the ultimate home lab. ESXi without a custom build won't even get to the point of allowing an install on this board unless you get creative with the install process as I did or build custom installers. Talking of which I have just injected those drivers into some media so time to take it back to the bare metal for a test.

Well that community supported vib now boots. Still no drives.
 
Last edited:
In addition, I've spotted a MSI X399 Gaming Pro Carbon AC that ( https://qiita.com/strat/items/f741774d129206002cfc , Google Translate link: https://translate.google.com.sg/tra...://qiita.com/strat/items/f741774d129206002cfc ) seems to be working after injecting xachi drivers using ESXi-Customizer-PS. Perhaps you might want to give driver injection a go and see if things will work?
(Also do take note that after searching further, I've found a thread that the MSI board seems to PSOD under load https://forum-en.msi.com/index.php?topic=293170.0 , I've also found another page where I lost the link, where the MSI board was reported to install and function fine but PSOD approximately every 10-15 minutes).

Just to add. I refuse to look at the msi board as an option, to be honest it is unlikely I would ever buy one of their products again, everything MSI I have ever owned has been sub-par. Interesting that they have it working though just we need much more info, a list of what's working on what hardware.

I am pretty sure that I cant really afford to swap boards over and over again to find out what works and what doesn't. I just don't have the time or patience for that, I mean I would but the buying and selling boards losing a load of cash each time to find out isn't that appealing.
 
Last edited:
More updates - I am determined to try and work out what is actually happening here to try and work out just where this is failing, If I can work that out I may stand a chance in getting it working. I even spent a few hours digging through the vmkernel logs looking for anything that might point me in the direction of whats going wrong.

Firstly I just want to point out something about the asrock board, there is an option in the bios shown below:



Now looks promising right, the ability to change the device ID that the controller is running under, only problem is no matter what you set in this screen none of the options stick on a reboot which of course you have to do when you save and exit. I've reset cmos, loaded uefi defaults everything but if you go back into this screen everything is back at auto. Perhaps I am misunderstanding what these options do but was worth a go right?

From the Kernel logs I don't see any failures when running with the command:

Code:
 esxcli system module set --enabled=false --module=vmw_ahci

weirdly everything seems to load up just fine, here are a couple of dumps from the vmkernel logs:



Up the top there you can see it detecting the controller



Then you can see it grabbing the controller

But what you don't ever see anywhere in the logs and what I am pretty sure you should see is devices being registered, below you can see the nvme doing its thing, I looked and looked before giving up. Nothing anywhere in the logs suggests any failures:



Next up I fired up again and reversed the AHCI driver:

Code:
 esxcli system module set --enabled=true --module=vmw_ahci

So effectively forcing it to hang so I could probe the logs. Back in windows I loaded up using workstation again thinking I was all smart but literally couldn't find anything. I need to do a little reading as I haven't really needed to dig into vmware logs so much before. I cant be sure if the vmkernel logs offload to another log file on each boot or hold a certain amount of time. Either way I cant find any failures in the logs relating to achi even when it hangs.

All I can think of is waiting for this sata card to come today and see if it works, if it does I can compare what it is doing against the native sata and go from there. I'm not afraid to say that I still have no idea why the system hangs and with no PSOD or error message its feeling more and more like a stab in the dark.

Another quite interesting thing is what esxi reports on the devices if you probe them:



Link N/A it says, I disagree but what can you do?
 
Last edited:


Hmm.....



Nope! Not happy at all. Anybody know an inexpensive controller card supported by esxi? I am hunting but I am not finding. I guess its a case of buy cheap, buy twice.
 
Last edited:
Think im gonna cave and just buy a couple of 1tb m.2 drives.

Edit - I caved, a couple of 512GB M.2's en route.
 
Last edited:
I spent much time building ISO files last night onto USB's, the plan? not really sure tbh but am thinking just try a few different installs, poke a few things and see if I can get some kind of sata working most likely my ASM1061 card. I don't actually need it anymore tbh given my frivolous spending but I would still love access to that 16tb of local storage and I am sure people would be very interested in this sata issue being resolved. There must be some ESXi guru's who frequent these and the VMWare forums who can help. I am sure there must be people out there by now in the same boat.
 
Ah, of course totally unreasonable request but have you/could you tried/try Xen Hypervisor also? It seems the trinity is ESXi/KVM/Xen. We know ESXi has some issues but kinda works, KVM is a no (until they fix the AMD NPT issue) but I am hearing Xen may be fine....

Just to say that I will give you an update on this but it might take me another week or so. Although I failed miserably with the linux piece on IOMMU I hopefully won't fail so badly with this. To be honest I still have no idea how to do what I needed to do in linux and how anybody likes linux is still totally beyond me and in 15 years I have come across more unix machines than linux boxes in businesses. That being said had I spent a lot of time in linux I might be able to easier solve some of these issues.
 
Little update. That fella posted an updated video so I have purchased an Asus board to test which will be with me tomorrow. Very happy in this Instance to be proved wrong and am glad that there are solutions out there that just work, this Taichi had me totally convinced that the problem ran deeper than the hardware but I have been wrong before and it looks like I am again. I went for the prime and if that's no good it will go back and I'll work my way up the stack until I get to the rog.

Now I just have to deal with the lack of stuff on the board I have ordered. It will be here tomorrow so I guess we know if the prime is an option in a day or so. Dunno what to do with my couple of month old Taichi but may just use this as an excuse to build another TR machine for the wife.
 
Last edited:
A bit deflating but at least there is a solution, also if you can get away with another build with the missus then why not? Mine would kill me :D

There is indeed but what I would find interesting is if the prime has issues that the rog doesn't. I would be amazed if the only two boards that are marketed more towards the professional market are not up to scratch with the ROG when it comes to using them for more professional type applications. I mean this is for professional use more than anything and boards with gaming focus, gaming in the title etc are not really something I want to be buying if i'm doing it through the company, which was the case with the Taichi even if it isn't with the Prime. All the other stuff i'm now having to buy to get my lab up and running 100% is just a tad annoying but I guess if you want to be on the bleeding edge then you have to suck it up.

One issue I do have apart from outrageous prices, which is superficial to say the least is that I don't like the look of the rog at all really and that includes the strix. It just does nothing for me in terms of aesthetics where as the prime seems much more clean. I just hope it works.
 
Subbed.

There's a 4 node RAC 12c oracle build to throw at a suitably endowed ESXi setup on threadripper one day.

BTW - did you ever get further with tuning the ram or still 3066? I'd want it to be home-lab and next to no compromises on the gaming front (tall order, I know :p ).

Ram which is 3466 will do 3200 in the Taichi with cas 14 timings and 1.1 soc voltage, it was stable but felt like I was trying to squeeze every last bit out of it to get it there, it wasn't doing much for temps either so I did go back to 3066. To be honest others have had a more plug and play experience with the same ram in other boards so later today I should be able to compare how the Prime fares. Gotta be honest the last thing I wanted to do was tear down the whole build to replace the board but tonight that is what will happen.

In terms of gaming I can't say you would really be compromising with memory at 3000+, no game out there today really touches a 1950x threadripper cpu for more than about 10% so to some extent you are constrained to its single core performance and while it's not quite up there with the 7700k and 8700k in terms of gaming performance it's certainly no slouch, you do see gains at 3200 there is no doubt but they get smaller and smaller as you go past 3000. I have my 1950X paired with a vega 56 (running 64 air bios, overclocked and undervolted to 1680/1100) and so far for the majority of games I just stick it at 4k with the majority of the eye candy turned up and away she goes. Project cars 2, 4k with all the bells and whistles sits nicely pinned at pretty much 60fps, PUBG I have to reign it back a bit to 2k with medium settings and ultra draw and I sit somewhere around the 80 to 100 frames. I have installed a few games and it does what you would expect it to do, goes more than fast enough with masses left in the tank.
 
Couple of pics of the swap over:























Bitter sweet for me as the ASRock is the nicer board but the prime installed esxi 6.5 update 1 rollup straight off the bat. I can see all my devices, connect to my nfs datastores and importantly sata works and I can use sata devices as datastores.
 
Last edited:
Glad to hear that !!!! Congratulations Vince we celebrate your aport
did you had a guideline whit your experience?

i´ll keep on eye on the progress ... still i´m waiting the release of EPYC 8C /16T with supermicro mobo for my configuration

Thanks Vince thanks

In terms of guidelines I would say, and this is a guess right now until I dig deeper, but if your buying a threadripper board for ESXi look for boards that don't have the full compliment of 8 sata ports. The one common theme seems to be that the boards with 6 ports, so all the asus boards, seem to not have the issue with sata. At first I thought it was down to the number of controllers exposed to esxi as the device ID 0x7901. I don't think however after last nights investigations that this is the case. Both the ASRock Taichi and the Asus Prime expose two devices both with the same ID to esxi, that device is the AMD FCH Sata Controller. In terms of the devices in ESXi I cannot find any difference between the boards so right now I am thinking that there are some firmware/bios issues with at least the ASRock that cannot be resolved with simply poking it.

I can now push forward with further testing. If you guys want I can do a video or two of, well... whatever you want to see. I am happy to set up a VM with GPU pass through running a game or two? Also happy to just load up the platform and put it under a ton of stress if that's what you want to see? Basically from here on out Ill be guided by you good folks in here. Tell me what you want me to test and ill build it up and put the results in here or in a video on youtube. In terms of what I need to achieve I am pretty much there now, my requirements are met. I am not giving up on the Taichi though and think I am going to build up another threadripper machine which will mainly be for the wife but will also serve as a place to continue testing that board as ASRock/VMWare push out updates.

So to clarify, and it pains me to say it because I really liked the ASRock but right now its just a no go for ESXi. What I can say for sure is that the X399-a prime is a very good, relatively low cost candidate when it comes to building a workstation with ESXi in mind.
 
Last edited:
Back
Top Bottom