1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

10GBe and jumbo frames help needed!

Discussion in 'Servers and Enterprise Solutions' started by Evil-I, Nov 10, 2017.

  1. Evil-I

    Wise Guy

    Joined: Apr 20, 2003

    Posts: 1,251

    Location: Gloucestershire

    Hi all,

    This has been bugging me for a couple of weeks now and I just can't work out what's going on so could really do with some advice...

    So the background (sorry might be a bit long winded...)

    • Recently upgraded my home server to be more robust with a supermicro xeon d board (X10SDV-TLN4F) that has both a pair of 1Gb Nics and a pair of 10Gb Nics (as well as an IPMI port)
    • I do high end video work so need to use my server as media storage over 10Gb network
    • I have two workstations one a 10 core xeon the other a heavily overclocked and watercooled 3930k (4.7Ghz 24/7).
    • Both workstations have been fitted with Asus XG-C100C 10Gb NICs
    • Workstations are both running windows 10
    • Server is running Centos 7 and webmin for administration
    • Both workstations are based on the x79 platform (main box is Asus x79 WS with the 3930k and my test rig/ render machine is Asus rampage 4 black edition running a 10 core xeon).
    • Both workstations have direct 10Gb cable connection to a 10Gb port on the server and I do mapped drives via nic ip address to specify whether I'm using the 10gb link or 1gb link

    So to the problem...

    For some reason as soon as I enable jumbo frames on my main box and the appropriate 10gb nic on the server it disconnects and says the cable is unplugged. Yet the same settings on my second workstation works fine... At times it has worked briefly, but then just gets unstable and drops.

    The Asus 10gb NIC gives the option up to jumbo frames at 9014 so I'd set that and the mtu on the linux box to the same for each connection.

    I've swapped cables, swapped which server NIC they connect to and always the same result suggesting its something specific to my main rig causing the problem.

    I haven't tried swapping the Asus 10Gb nics yet which might be worth doing...

    Performance differences are massive, running crystal disk mark of the mapped server share on my Xeon based test rig/ render station gives 626MB/S read and 917 MB/S write on Seq Q32T1 but only 272MB/S and 454MB/S on my main rig with jumbo frames disabled. Network usage also is very telling, it tops out at 7 Gigabit on the Xeon rig (limited by server drive array speed at this point I think), but only about 3 Gigabit on my main rig.

    So, any thoughts?

    Thanks in advance E-I
     
  2. Hellsmk2

    Mobster

    Joined: Oct 18, 2002

    Posts: 4,256

    I may have missed it, but what switch do you have. Jumbo frames is end to end, so every hop along the route, from server to switch to desktop must support jumbo frames and have a suitable mtu configured.

    Also 9014 mtu may be slightly off. A lot of devices support much higher mtu's, but a lot will support 9000 max.

    A good way to test is to use a simple ping to send different sized packets from one end to the other. If leg of the route doesn't support jumbo frames/doesn't have it enabled, you'll find then pings fail for an igmp packet above 1500. You may find using this method that something on your network simply does like your existing mtu, and can use this to find the highest setting it does like.

    [Edit] Just read the workstations are directly connected, so no switch involved. How do they connect to the server, and is the server os on physical tin?
     
    Last edited: Nov 17, 2017
  3. Hellsmk2

    Mobster

    Joined: Oct 18, 2002

    Posts: 4,256

    Also 7Gb throughput on 10gb is absolutely spot on. You'll be hard pressed to get anymore out of due to storage/CPU overheads.

    The one getting 3gb should be doing better... would be looking at the driver there. Anything above 5Gb throughput on a 10gb network is acceptable in my eyes.

    Once you get things working, use a tool like LANbench, which will give you a true throughput without disk overheads clouding things. You could also try NetStress as that has a tool to benchmark at different MTU sizes too... prob very beneficial in your circumstance.

    I'd question the gain you'll see from jumbo frames though. It's something we never ever configure in enterprise networks for anything other than ISCSi storage networks. Even then we only see performance gains of 2-10%. Marginal at best.

    (Then again... this is ocuk :)).
     
    Last edited: Nov 17, 2017
  4. Evil-I

    Wise Guy

    Joined: Apr 20, 2003

    Posts: 1,251

    Location: Gloucestershire

    Hi Hellsmk2,

    Thanks for the responses and apologies for taking so long to get back to you!

    Yes, they are direct connected machines, the server is a physical (not virtualised) Centos 7 box based on Supermicros X10SDV-4C-TLN4f which has a pair of 10Gbe from the Xeon D CPU and a fast Data disk system based on an LSI 9270CV RAID controller with WD Reds. The workstation is my main rig (i7 3930k overclocked a little... Well 4.7 Ghz at the moment). The 10Gbe on the workstation is an Asus XG-C100C

    On the jumbo frames side of things for video editing (where the server is the media location) Jumbo frames are supposed to make a significant difference. When I have had it working (its been very temperamental) It did seem to perform much better with Jumbo frames. In Premiere Pro the timeline responsiveness was noticeably better (we are talking broadcast quality 4K video here so big files that need to be accessed quickly)

    On the windows box, there are only 3 options under Jumbo frames for the NIC 2040, 4088 or 9014. On the Centos box I can set whatever MTU I'd like.

    In general I'm very much self taught on the networking side of things, so there's a good chance it's just not configured correctly :) The main issue is as soon as both the machines have MTU/ Jumbo frames enabled the connection drops and says no cable is connected, yet every now and again it works fine... Disable /Jumbo frames on windows box / Set MTU on Centos box to default and it comes back up again.

    The really weird thing is that my test rig which is a very similar setup to my main rig except it's got a engineering sample 10 core Xeon installed seems to work fine with jumbo frames enabled! So same Asus 10Gbe NIC, connecting into the server directly via one of its 10Gbe ports. I've also tried swapping everything around from test rig to main rig, so swapped the Asus 10Gbe, cables, server port, configuration... I've completely removed all NIC drivers on my main rig, also disabled overclock completely to make sure that isn't a factor and no difference... This is really starting to annoy me now...
     
  5. tres_kun

    Hitman

    Joined: Nov 7, 2012

    Posts: 545

    Location: Glasgow

    have a read into recive side scaling
    it spreds the network load over multiple cpu cores
    You have competent hardware and might have enouth to make do without jumbo frames

    Personaly i have an i3-4150 on one end and write limited to 190Mb/s read 900Mb/s
    Have no clue what to do to fix unless cpu upgrade
     
  6. Hellsmk2

    Mobster

    Joined: Oct 18, 2002

    Posts: 4,256

    I'm not familiar with the Asus nic, but normally you should be able to edit the mtu from the properties of the nic in the network control panel applet. Are you using software that came with the nic.

    What happens if you set all boxes to the lowest value you mentioned of 2040?
     
  7. Evil-I

    Wise Guy

    Joined: Apr 20, 2003

    Posts: 1,251

    Location: Gloucestershire

    Hi All,

    Apologies for taking such a long time to respond but given everyones help it seems churlish not to let you know the result of all this.

    I got it all, working!

    One of the main issues was that as i'm not going through a switch if i powered up one of the workstations when the server was already on with jumbo frames enabled it would show the connection as having the cable disconnected. However if i quickly login to webmin on the server and re-apply network settings then the connections come up. I'm not sure why this didnt happen with standard frame size and i assume if it was running through a switch this wouldnt be an issue either.

    Performance wise i'm easily getting 600MBs for large file transfers (likely the speed limit of my 4 x wd reds in raid5, if i need more i speed i can drop more drives in the array) and due to the flash caching built into the lsi raid card I'm getting over 900MBs and 100% saturation on the 10gb lan when testing with crystal disk mark.

    So all working well for the last few months other than the slight irritaton of having to webmin in each time i boot the workstations to activate the 10gb links.

    Thanks for everyones help!

    E-I