1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Home Lab Threadripper Build Thread.

Discussion in 'Project Logs' started by Vince, Aug 12, 2017.

Tags:
  1. cccy

    Associate

    Joined: Sep 10, 2017

    Posts: 12

    Thanks for the details Vince. Hope you can figure out how to make the Sata port work!

    Here's what I've gathered about ESXi X399 board compatibility thus far, hopefully it can help with your troubleshooting:
    1) There appears to be similar installer PSOD issues with Gigabyte X399 AORUS Gaming 7 ( https://communities.vmware.com/thread/570854 ).
    2) Youtube has a video showing the ASUS ROG Zenith Extreme X399 to be fully functional with ESXi without the need for any hacks ( https://www.youtube.com/watch?v=owQq2XmHiQg ).
    3) Reddit contains a post with instructions for enabling GPU-passthrough on the ASUS Prime X399-A so I presume it should be functional too ( https://www.reddit.com/r/Amd/comments/72ula0/tr1950x_gtx_1060_passthrough_with_esxi ).
     
    Last edited: Oct 5, 2017
  2. Vince

    Man of Honour

    Joined: Oct 30, 2003

    Posts: 9,150

    Location: Essex

    Cheers for this, I have been playing about with this tonight and actually just commented on that video before coming back here and checking the thread. That video unfortunately doesn't at this point tell us much as the sata controller on X399 as far as I can tell is exactly the same on every board so unless there is something glaringly obvious that I am missing I suspect that many of the same workarounds have probably been applied. I am just about to have another attempt with the latest iso off of the vmware site but I don't think that the results will be any different as all the hashes are the same as the version I downloaded at the beginning of the project. I can run everything NVME, NFS, GPU's etc but with the sata nobody is really giving much away. That video for example shows us nothing at all and if you look over to the left towards the beginning it isn't showing any local storage datastores. I have left this thread and my one over at the vmware community forums in the comments so with a bit of luck we might get some info.

    Ill update with my findings after giving this new install a bash.
     
  3. cccy

    Associate

    Joined: Sep 10, 2017

    Posts: 12

    From the comments in the video, it appears that SATA works on the ROG board:
     
  4. Vince

    Man of Honour

    Joined: Oct 30, 2003

    Posts: 9,150

    Location: Essex

    Looks like I might have to buy a ROG and test then. I really didnt want to have to change the board but if it 100% works on the ROG then that is what I will do.
     
    Last edited: Oct 5, 2017
  5. Vince

    Man of Honour

    Joined: Oct 30, 2003

    Posts: 9,150

    Location: Essex

    Right I have watched that video over and over again and I have some issues dropping £500 on a board before I can clear up a few things. The guy says he has VM's running but his video suggests otherwise. If you look at the beginning of the video 20 seconds you will note his install from the same part as this image from my install:

    [​IMG]

    The grey squares on the left show 0 vm's 0 available storage and 1 in the VM network area. I am not saying that I don't believe the guy but to drop £500 on that video alone is not something I am willing to do at this point without some confirmation that he does actually have available sata storage. If sata storage was available the available disks should be in this list along with the nvme drive, if you have 0 under storage there as in that video then storage is not exposed to esxi.

    I also don't see any datastores on his host like in this image below:

    [​IMG]

    At this point unless somebody tells me otherwise with some decent proof, a video of the install would do the job, I am assuming that anybody running esxi on threadripper is either on san, nas or m.2 storage and using the walkarounds I have published in here and in the VMWare community forums.

    To update on my situation I have tried all the images available on the vmware site again and they are all the same as they were before, same issues, same fixes. I've spent another few hours messing around and have not been able to improve the sata situatuation. I am about to try one more thing before I call it a night, for me I can get along like this for the moment but I would really benefit from having the sata available. I could have a look at running raid and see what happens but I suspect we wont see much in the way of improvement, it could be an option though if vmware can see a single disk or more in raid 0. I will give that a try but I will need to shift around some data so perhaps a job for tomorrow.

    I will read the manual for the rog as its actually got 2 less sata ports than the taichi. My bet is that they are the same amd 0x7901 device used on the Taichi. All the boards seem to be exactly the same, probably because the sata is provided by the x399 chipset.
     
    Last edited: May 4, 2018
  6. Vince

    Man of Honour

    Joined: Oct 30, 2003

    Posts: 9,150

    Location: Essex

    So for a bit of a test I have just bought myself a pci-e card with a couple of sata ports based on the ASMedia 1061 controller that I know should work. I may also buy the prime and have a look but am undecided at this point.
     
    Last edited: Oct 6, 2017
  7. Vince

    Man of Honour

    Joined: Oct 30, 2003

    Posts: 9,150

    Location: Essex

    Another update. I blanked a disk last night, changed over to raid running the disk solo in a raid 0 array. Didn't Work!

    I looked further into the map files /etc/vmware/ahci.map and you can see the ID for the amd Ahci device ID in the map showing support for the device but because I can't get past the loading of the driver I don't hold out much hope in seeing the hardware attached to the controllers. The controllers and disks alone can be passed to VM's via pass through as you don't need the drivers in esxi to allow this. Something tells me that the problem is more closely tied to chipset support than it is support for the cpu.

    As far as I can see, unless you can force the load on vmk_ahci somehow without dropping back the driver to legacy then it's not going to work. Do we know if anybody on these here forums is rocking an Asus board and has a spare 2gb ish usb stick? Somebody who has some time on there hands and is willing to spare an hour or two in the name of science?

    If your reading this and you also have Asus x399 and a couple of hours you could spare please trust me I will make it worth your while perhaps with some free hardware or something?

    I will also update my comments on that video asking the fella to drop in here and perhaps join in the conversation.

    Also does anybody know if there is an Asus rep on these forums?
     
    Last edited: Oct 6, 2017
  8. cccy

    Associate

    Joined: Sep 10, 2017

    Posts: 12

    Before you put down money on a new motherboard, may I also check if you did update the BIOS of your current ASRock Taichi (Latest BIOS version 1.70, released on 2/Oct/2017 http://www.asrock.com/mb/AMD/X399 Taichi/index.asp#BIOS )?

    AMD has been constantly working with the motherboard manufacturers and BIOS updates fixing compatibility and bugs are constantly being released. Perhaps you should at least get the board updated to the latest BIOS there is and see if things work better (Hopefully the vanilla installer might work too after update?) as I know that some x370 boards were also having problems with virtualization until the BIOS was updated.

    In addition, I've spotted a MSI X399 Gaming Pro Carbon AC that ( https://qiita.com/strat/items/f741774d129206002cfc , Google Translate link: https://translate.google.com.sg/tra...://qiita.com/strat/items/f741774d129206002cfc ) seems to be working after injecting xachi drivers using ESXi-Customizer-PS. Perhaps you might want to give driver injection a go and see if things will work?
    (Also do take note that after searching further, I've found a thread that the MSI board seems to PSOD under load https://forum-en.msi.com/index.php?topic=293170.0 , I've also found another page where I lost the link, where the MSI board was reported to install and function fine but PSOD approximately every 10-15 minutes).
     
  9. Vince

    Man of Honour

    Joined: Oct 30, 2003

    Posts: 9,150

    Location: Essex

    So that fella replied to my comments on youtube and I have asked him to look at this thread for stuff that the community would love confirmation on.

    @Art OF WAR

    We really need to see what the ROG is presenting to VMWare in terms of storage. The following screenshots from your build would help massively:

    [​IMG]

    [​IMG]

    [​IMG]

    [​IMG]

    Edit: It would also probably be better if these images were at sensible resolutions unlike mine :)
     
    Last edited: May 4, 2018
  10. Vince

    Man of Honour

    Joined: Oct 30, 2003

    Posts: 9,150

    Location: Essex

    Money on a new board is of course the last route I want to go down, I like the features of the Taichi which is why I picked it but if money fixes the problem then happy days. In terms of the bios I tried an install last night on the latest bios and have tried on every bios the board has available (all 3 of them). What I don't get is that the drivers are actually included in the VMWare images and this is shown by running the SSH command:

    Code:
    lspci -v | grep "Class 0106" -B 1
    Which outputs the following:

    Code:
    0000:09:00.2 SATA controller Mass storage controller: Advanced Micro Devices Inc AMD FCH SATA Controller [AHCI Mode] [vmhba1]
             Class 0106: 1022:7901
    --
    0000:44:00.2 SATA controller Mass storage controller: Advanced Micro Devices Inc AMD FCH SATA Controller [AHCI Mode] [vmhba2]
             Class 0106: 1022:7901
    The square brackets e.g. [vmhba1] designates that the device is already in the support list and consequently also in the device AHCI map file.

    [​IMG]

    I have had a good read of what there is at that Japanese site and its vague to say the least, I have access to the ESXi customizer so I guess that is next on the hit list. What concerns me is that VMWare claim full native support for the device ID. With that post being so vague its difficult to determine what it is the person actually did. I will have a sata card tomorrow and that should allow me to show what should happen in ESXi when a standard sata disk is exposed. In the mean time I will look further into what driver I might be able to inject into the installer to make it work. My understanding is though and this could well be flawed, is that pretty much all we are doing when injecting drivers is modifying the ahci.map file and corresponding map files.

    There is a small chance as well that somebody from VMWare might step in to shed some light for us. I have reached out again today to somebody I know at VMWare and have pointed him in the direction of this thread and my one on the vmware community forums. Not being a tech himself he has assured me he will pass it onto one of the techs there who hopefully might be able to find some time to shed some light on what they think the issue might be.
     
    Last edited: May 4, 2018
  11. cccy

    Associate

    Joined: Sep 10, 2017

    Posts: 12

    From my experience with patching my old systems that all require driver injection for SATA to function, I believe the xachi drivers package mentioned is the sata-xahci package available here: https://www.v-front.de/2013/11/how-to-make-your-unsupported-sata-ahci.html

    P.S: I suspect the SATA issue may be a similar issue to the Intel i211 and i350 NICs being "detected" but not initialized as documented here: https://www.v-front.de/2015/08/a-fix-for-intel-i211-and-i350-adapters.html
     
  12. Vince

    Man of Honour

    Joined: Oct 30, 2003

    Posts: 9,150

    Location: Essex

    That community supported VIB at the top there nukes my build. I have tried it a few times without success... I guess one more try can't hurt.
     
  13. cccy

    Associate

    Joined: Sep 10, 2017

    Posts: 12

    Just a random though, have you tried installing to a SATA disk? I had an install once (Intel NUC DC3217BY), where if the install was done on USB, it would boot just fine, but on SATA, and the NUC would just PSOD (Tested with both ESXi 5.5 and 6.0). I'm wondering if the opposite would happen here (Sort of a "force-enable" for the ESXi drivers since it detects but doesn't initialize it, maybe booting from it would force it to initialize it?)

    I completely understand your standpoint. Just like you, I'm hesitating putting down lots of money on a Threadripper build only for it to not be fully-functional, plus the ASUS ROG Zenith Extreme X399 would essentially be putting down quite a sizeable sum for reduced "ESXi useable" features (Lesser SATA ports and paying a ton of additional "useless gaming features" (ESXi-unsupported 10G NIC, LED customization, integrated wireless AD, etc).
     
    Last edited: Oct 6, 2017
  14. Vince

    Man of Honour

    Joined: Oct 30, 2003

    Posts: 9,150

    Location: Essex

    That's really my point, when it comes down to esxi of all the boards the Taichi is the one with the most going for it. If we can get over this sata issue then we have the ultimate home lab. ESXi without a custom build won't even get to the point of allowing an install on this board unless you get creative with the install process as I did or build custom installers. Talking of which I have just injected those drivers into some media so time to take it back to the bare metal for a test.

    Well that community supported vib now boots. Still no drives.
     
    Last edited: Oct 6, 2017
  15. Vince

    Man of Honour

    Joined: Oct 30, 2003

    Posts: 9,150

    Location: Essex

    Just to add. I refuse to look at the msi board as an option, to be honest it is unlikely I would ever buy one of their products again, everything MSI I have ever owned has been sub-par. Interesting that they have it working though just we need much more info, a list of what's working on what hardware.

    I am pretty sure that I cant really afford to swap boards over and over again to find out what works and what doesn't. I just don't have the time or patience for that, I mean I would but the buying and selling boards losing a load of cash each time to find out isn't that appealing.
     
    Last edited: Oct 6, 2017
  16. Vince

    Man of Honour

    Joined: Oct 30, 2003

    Posts: 9,150

    Location: Essex

    More updates - I am determined to try and work out what is actually happening here to try and work out just where this is failing, If I can work that out I may stand a chance in getting it working. I even spent a few hours digging through the vmkernel logs looking for anything that might point me in the direction of whats going wrong.

    Firstly I just want to point out something about the asrock board, there is an option in the bios shown below:

    [​IMG]

    Now looks promising right, the ability to change the device ID that the controller is running under, only problem is no matter what you set in this screen none of the options stick on a reboot which of course you have to do when you save and exit. I've reset cmos, loaded uefi defaults everything but if you go back into this screen everything is back at auto. Perhaps I am misunderstanding what these options do but was worth a go right?

    From the Kernel logs I don't see any failures when running with the command:

    Code:
     esxcli system module set --enabled=false --module=vmw_ahci 
    weirdly everything seems to load up just fine, here are a couple of dumps from the vmkernel logs:

    [​IMG]

    Up the top there you can see it detecting the controller

    [​IMG]

    Then you can see it grabbing the controller

    But what you don't ever see anywhere in the logs and what I am pretty sure you should see is devices being registered, below you can see the nvme doing its thing, I looked and looked before giving up. Nothing anywhere in the logs suggests any failures:

    [​IMG]

    Next up I fired up again and reversed the AHCI driver:

    Code:
     esxcli system module set --enabled=true --module=vmw_ahci 
    So effectively forcing it to hang so I could probe the logs. Back in windows I loaded up using workstation again thinking I was all smart but literally couldn't find anything. I need to do a little reading as I haven't really needed to dig into vmware logs so much before. I cant be sure if the vmkernel logs offload to another log file on each boot or hold a certain amount of time. Either way I cant find any failures in the logs relating to achi even when it hangs.

    All I can think of is waiting for this sata card to come today and see if it works, if it does I can compare what it is doing against the native sata and go from there. I'm not afraid to say that I still have no idea why the system hangs and with no PSOD or error message its feeling more and more like a stab in the dark.

    Another quite interesting thing is what esxi reports on the devices if you probe them:

    [​IMG]

    Link N/A it says, I disagree but what can you do?
     
    Last edited: May 4, 2018
  17. Vince

    Man of Honour

    Joined: Oct 30, 2003

    Posts: 9,150

    Location: Essex

    [​IMG]

    Hmm.....

    [​IMG]

    Nope! Not happy at all. Anybody know an inexpensive controller card supported by esxi? I am hunting but I am not finding. I guess its a case of buy cheap, buy twice.
     
    Last edited: May 4, 2018
  18. Vince

    Man of Honour

    Joined: Oct 30, 2003

    Posts: 9,150

    Location: Essex

    Think im gonna cave and just buy a couple of 1tb m.2 drives.

    Edit - I caved, a couple of 512GB M.2's en route.
     
    Last edited: Oct 8, 2017
  19. cccy

    Associate

    Joined: Sep 10, 2017

    Posts: 12

    That's the exact card that's working on my ESXi build! Requires the community drivers to get ESXi to recognize it though! ( https://www.v-front.de/2013/11/how-to-make-your-unsupported-sata-ahci.html )
    Here's the things I've noted using that card:
    1) Do check the instructions manual to ensure the jumpers are in the correct spot as it is a 2 port card, where 2 of the ports out of its 4 are active, set via its jumpers.
    2) It cannot be used as a boot drive for ESXi or ESXi will hang on boot. It can be used as a datastore drive though.
     
  20. Vince

    Man of Honour

    Joined: Oct 30, 2003

    Posts: 9,150

    Location: Essex

    I spent much time building ISO files last night onto USB's, the plan? not really sure tbh but am thinking just try a few different installs, poke a few things and see if I can get some kind of sata working most likely my ASM1061 card. I don't actually need it anymore tbh given my frivolous spending but I would still love access to that 16tb of local storage and I am sure people would be very interested in this sata issue being resolved. There must be some ESXi guru's who frequent these and the VMWare forums who can help. I am sure there must be people out there by now in the same boat.