Infiniband setup issues

Associate
Joined
27 Sep 2009
Posts
1,693
Hi all,

I have fitted my Mellanox card into the PCIe 8x slot and installed Win 7 but it does not seem to detect it even after installing the OFED drivers. When i run ibstat it says "winmad.dll cannot be found".

However on my other Win 7 PC it works fine (this has an Asus P6T mb). On that PC the card is listed in Device manager under Infiniband Channel Adpaters however that group does not even appear on my server.

Is this an issue with my GB g3 Sniper not supporting 8 bus cards?
 
Hi all,

Ive been following David Hunt's guide but seem to be stuck at the first hurdle. Apparently I should have an ib0 interface. But when ifconfig I only have eth0 and lo.

This is what Ive done so far;
Edit /etc/modules and add the following modules:

ib_sa
ib_cm
ib_umad
ib_addr
ib_uverbs
ib_ipoib
ib_ipath
ib_qib

Next,

apt-get install opensm

I have passed through the Mellanox card in ESXi to the VM and it detects it in lspci. I am using Ubuntu Server 12.04.
 
Last edited:
As requested twice already in your other Infiniband not working thread here...

What is listed when you "more /proc/net/dev".

As I have explained, ifconfig displays active interfaces.

Try doing "ifcfg eth0 down" (has to be on the console as you will loose remote connection unless you have more than one nic and are connecting on a different one to eth0).

Then try "ifconfig".

The eth0 interface will not be shown as it is down but it does not mean it does not exist.

"ifcfg eth0 up" will bring it back up again.

/proc/net/dev lists all the network interface devices regardless of if they are up, down, configured. If it is listed there then you should only have to set the configuration and then it should work. If not then you have a bigger issue, although from you at you are saying I would expect it will be.

RB
 
Here are some results of the commands people have advised me to run:

"more /proc/net/dev" lists lo and eth0.

"ibstat" lists the 2 ports, with 1 as down and 2 as initializing.

"ibstat -l" lists mthca0

"ibverbs-utils" shows No IB devices found.
 
Last edited:
You may have to fine tune these as I am not used to using Ubuntu as I use RHEL and CentOS more. There may be small differences.

Is opensmd running ?.

What is the output in /var/log/osm.log

My understanding is that a subnet manager needs to be running otherwise you will not get a connection.

What is the port state;
cat /sys/class/infiniband/mthca0/ports/X/state (obviously where X is the port number).

RB

 
Ok,

Take a look here. It is fairly old but has a fair bit of info which may help you troubleshoot. It is RHEL flavoured but you should be fairly ok. There is more than one article for infiniband so check the articles list on the left for more basic and advanced diagnostic posts and also the base primer posts. Seems pretty good as a step by step guide. My help from here is limited as I do not have an Infiniband card at hand and don't really need one at this point.

One thought... if the SM subnet is up and the cards port 2 is listed as active, are you sure the other end of the connection (i.e. the other server) is setup correctly ?. Have you tried connecting port 1 to port 2 to get communication working on just this server before trying to connect to another server ?. The diag articles should be able to guide you through in any case.

RB
 
I have run through the diags, here are my results:

Code:
shell > lsmod | grep ib_
ib_mthca
ib_qid
ib_ipath
ib_uverbs
ib_addr
ib_umad
ib_cm
ib_sa
ib_mad
ib_core

Code:
shell > ibstat
Port 1: Down
Port 2: Active

This is what i expected to see as only one of the ports is plugged in.

However when I do:

Code:
shell > ifconfig -a

My infiniband interface is not showing.

When i do:

Code:
shell > ibchecknet

There is an error on port 2:
Code:
/usr/sbin/ibcheckerrs:222:exit:illegal number -1
 
Do you know for sure that both adapters work correctly (and are on the same firmware) and that the cable is ok?

I followed David's instructions and it all worked first time. The only difference is I was using the current desktop version of Ubuntu (because I'm a Linux n00b and wanted a friendly way to edit text files :p).

10GbE is much less hassle ;)
 
The OFED Linux user manual is available from Mellanox here (pdf) and it lists all the ib type commands although not all the error meanings. Page 189 has ibcheckerrs.

Have you tried port 1. Have you tried connecting port 1 to port 2 (i.e. loopback) to see if it can talk to itself.

There are man pages for the commands here as well.

I suspect the ibcheckerrs is failing as it is not getting LID for the adaptor. ibcheckerrs seems to be called by ibchecknet based on the info you have provided.

You could try
ibnetdiscover --Hca_list --ports # Should list topology of he IB network CAs an ports.

Still awaiting a reponse on whether the other IB card is working correctly in the other host and if so how have you verified ?.

Gut feel is there may be an issue with the cards as one refused to work in your Linux machine but then the other worked fine in the same slot. As Shad had mentioned, what firmware is on each card ?.

RB
 
Do you know for sure that both adapters work correctly (and are on the same firmware) and that the cable is ok?

I followed David's instructions and it all worked first time. The only difference is I was using the current desktop version of Ubuntu (because I'm a Linux n00b and wanted a friendly way to edit text files :p).

10GbE is much less hassle ;)

Yeah but the cost is still very high :(.

RB
 
Yeah but the cost is still very high :(.

RB
What would you say if I told you I've just configured a 3 node 10GbE network for only a fraction more than the cost of this 2 node Infiniband setup? ;)

3x Mellanox ConnectX-2 EN dual port SFP+ 10GbE adapters, £~50 ea.
4x Avago AFBR-703SDZ 10Gb SFP+ transceiver modules, £~25 ea.
2x Startech 30m LC-LC fibre patch leads, £~20 ea

Plus shipping. The total was something like £50 more than the IB kit, which I've been able to sell on to cover some of the cost. I also lucked out and got the fibre cables at cost price. But still, the overall cost difference is not that great given the added benefit that it's much easier to setup!

So far I've had 9.61Gbits/sec using iperf from my Windows 7 i7 workstation to a Server 2012 VM on my ESXi host (I could barely crack 1Gbits/sec with IPoIB, and no need for passthrough since the adapters are fully supported in ESX :)) and SMB performance from Server 2012/Windows 8 is looking very promising so far - more than double what I could achieve from IPoIB (caveat - at the moment I'm running a Windows 8 VM on my workstation using VMware Workstation 9, and the networking seems to be limited to about 30% utilisation of the 10GbE connection. I'm expecting that limitation to raise significantly and/or disappear when running Windows 8 on the base metal).

The firmware on my Windows client is OFED 3.1 win7 x64. On my Ubuntu server it is OFED 1.5.4.1
That sounds like the driver package version to me. You need to use the tools in the OFED package to query the card for the current firmware and the MFT tools (http://www.mellanox.com/content/pages.php?pg=management_tools&menu_section=34) to flash the latest image. In the nicest possible way, RTFM ;)
 
Last edited:
What would you say if I told you I've just configured a 3 node 10GbE network for only a fraction more than the cost of this 2 node Infiniband setup? ;)

3x Mellanox ConnectX-2 EN dual port SFP+ 10GbE adapters, £~50 ea.
4x Avago AFBR-703SDZ 10Gb SFP+ transceiver modules, £~25 ea.
2x Startech 30m LC-LC fibre patch leads, £~20 ea

Well the obvious question would be 'where from ?.' (email in trust). The prices I am seeing are significantly more for the Mellanox cards. The SPF+ modules are coming close to those prices you list though. Based on the prices above, yeah, I would probably have a go :D.

So far I've had 9.61Gbits/sec using iperf from my Windows 7 i7 workstation to a Server 2012 VM on my ESXi host (I could barely crack 1Gbits/sec with IPoIB, and no need for passthrough since the adapters are fully supported in ESX :)) and SMB performance from Server 2012/Windows 8 is looking very promising so far - more than double what I could achieve from IPoIB (caveat - at the moment I'm running a Windows 8 VM on my workstation using VMware Workstation 9, and the networking seems to be limited to about 30% utilisation of the 10GbE connection. I'm expecting that limitation to raise significantly and/or disappear when running Windows 8 on the base metal).

Very nice. I have seen some latency buildup with my iSCSI setup, not too serious, and it would be nice to see what more bandwidth will do. I also have Agility 3 120GB SSDs in all three of my servers mainly as I got them for around 30 quid each and was intending to use them as fast transfer drives between the servers. Not really used at the moment as the links are too slow to make use of them.


That sounds like the driver package version to me. You need to use the tools in the OFED package to query the card for the current firmware and the MFT tools (http://www.mellanox.com/content/pages.php?pg=management_tools&menu_section=34) to flash the latest image. In the nicest possible way, RTFM ;)

Yep agreed. That looks liek the software version of the Open IB packages on each machine.

RB
 
Hmm, seems those Mellanox cards do not require SPF+ modules at all and can be used with direct connect cables. IBM version details here. Cables don't look cheap though.

RB
 
Last edited:
Once you've enabled the SSH service in ESX you can use WinSCP (or anything else similar) to transfer the .vib onto the host.

I'm not sure if installing it will achieve what you need though. I'm guessing it will either allow ESX to use the adapter for iSCSI LUNs or it will expose it as a networking device that you can use on a vswitch, but I'd have thought that is likely to put you back into IPoIB hell.
 
Once you've enabled the SSH service in ESX you can use WinSCP (or anything else similar) to transfer the .vib onto the host.

I'm not sure if installing it will achieve what you need though. I'm guessing it will either allow ESX to use the adapter for iSCSI LUNs or it will expose it as a networking device that you can use on a vswitch, but I'd have thought that is likely to put you back into IPoIB hell.


Ah Ok. i was hoping using the adapter as a networking device and attach it to a VM rather then a passthrough PCI device might solve my issues!
 
Back
Top Bottom