Mellanox SX6036 basics (setup, licensing, noise, power, VPI)

Intro

The Mellanox SX6036 is a 36-port 56 Gb/s FDR Infiniband/Ethernet switch. Its forwarding is handled by the Mellanox SwitchX-2 switch chip, and its management plane is a Big Endian PowerPC chip running MLNX-OS (Linux).

The SX6036 is interesting because:

As for supporting hardware, QSFP 40/50 GBe Mellanox ConnectX-4 LX Ethernet NICs are about $20 all day at the time of writing, and the switch isn’t particularly picky about who made the optics or DACs that you plug into it (though Mellanox DACs are quite cheap - I picked up a buttload of 1M and 3M 40/100GBe DACs and some breakouts for about $100).

It also supports active/active L2 failover (MLAG) which is extremely rare on things that are cheap, good, and use little power. Though I don’t have a way to justify 72 40 GBe ports at the moment, I thought it was worth mentioning MLAG support since it’s one of the features I always look for when buying network switches.

Anyway, this all makes the SX6036 a wonderful (affordable, fast, and good) switch for > 1 GBe connections in a homelab. It’s so cheap and power efficient that it makes 10 GBe switches look terrible in comparison.

Inventory

In this post, I’ll be using:

Updating firmware

My SX6036 arrived running a version of MLNX-OS from 2012. Getting rid of that was obviously the first order of business.

Here’s a direct link to download MLNX-OS 3.6.8012, the final release for the SX1036/6036 from Nvidia.

You can find many older versions of MLNX-OS from HP here. You’ll need to grab a few of these to do an update from 3.2.x to 3.6.8012. I’ll mirror them somewhere eventually.

From start to end, I flashed my switch with:

I’ve read that you can skip 3.5.x and 3.6.1002 and just go straight to 3.6.8012 from 3.4.x. I read this while waiting for mine to update from 3.5.1006 to 3.6.1002, so I didn’t try it.

I plugged a standard Cisco RJ45 console cable into the switch and into a Linux box on my desk, and got the switch on my network.

Here’s the sh inv and sh ver right after getting it booted:

switch-b7b218 [standalone: master] # sh inventory
================================================================================
================
Module                Type                  Part number           Serial Number

================================================================================
================
CHASSIS               SX6036                MSX6036F-1SFR         MT1248U03520

MGMT                  SX6036                MSX6036F-1SFR         MT1248U03520
FAN                   SXX0XX_FAN            MSX60-FF              MT1247U00857
PS1                   SXX0XX_PS             MSX60-PF              MT1312X01851
PS2                   SXX0XX_PS             MSX60-PF              MT1247U01045
CPU                   CPU                   SA001203              MT1247U01375
switch-b7b218 [standalone: master] # sh ver
Product name:      SX_PPC_M460EX
Product release:   SX_3.2.0330-70
Build ID:          #1-dev
Build date:        2012-12-16 18:14:46
Target arch:       ppc
Target hw:         m460ex
Built by:          doront@fit15

Uptime:            1h 22m 32.880s

Product model:     ppc
Host ID:           0002C9B7B218
System memory:     105 MB used / 1922 MB free / 2027 MB total
Swap:              0 MB used / 0 MB free / 0 MB total
Number of CPUs:    1
CPU load averages: 0.00 / 0.00 / 0.00

This subnet doesn’t have a DHCP server running right now, so I’ll set mgmt0 up manually - note that you must manually disable DHCP with the no interface mgmt0 dhcp command if it’s enabled. If you don’t disable DHCP and try to set a static IP, it won’t be applied to the interface. The first Mellanox quirk I ran into.

Mellanox MLNX-OS Switch Management

switch-b7b218 login: admin
Password:
Last login: Thu Oct 16 00:47:21 on ttyS0

Mellanox Switch

switch-b7b218 [standalone: unknown] > en
switch-b7b218 [standalone: unknown] # conf t
switch-b7b218 [standalone: unknown] (config) # no interface mgmt0 dhcp
switch-b7b218 [standalone: unknown] (config) # int mgmt0
switch-b7b218 [standalone: unknown] (config interface mgmt0) # ip address 192.168.77.200 255.255.255.0
switch-b7b218 [standalone: unknown] (config interface mgmt0) # exit
switch-b7b218 [standalone: unknown] (config) # ip route 0.0.0.0 0.0.0.0 192.168.77.254
switch-b7b218 [standalone: unknown] (config) # ip name-server 172.27.254.254
switch-b7b218 [standalone: unknown] (config) # exit
switch-b7b218 [standalone: master] # wr mem
switch-b7b218 [standalone: master] # ping 192.168.77.254 -c 1
PING 192.168.77.254 (192.168.77.254) 56(84) bytes of data.
64 bytes from 192.168.77.254: icmp_seq=1 ttl=64 time=0.613 ms

--- 192.168.77.254 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.613/0.613/0.613/0.000 ms
switch-b7b218 [standalone: master] # sh ip route
Destination       Mask              Gateway           Interface   Source
default           0.0.0.0           192.168.77.254    mgmt0       static
192.168.77.0      255.255.255.0     0.0.0.0           mgmt0       interface
switch-b7b218 [standalone: master] # ping wporter.org -c 1
PING wporter.org (185.199.110.153) 56(84) bytes of data.
64 bytes from cdn-185-199-110-153.github.com (185.199.110.153): icmp_seq=1 ttl=56 time=6.76 ms

--- wporter.org ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 6.767/6.767/6.767/0.000 ms

Great! It’s got connectivity. At this point, you should be able to SSH to the switch with the admin user. Let’s get to updating.

I’ve got the files on a Linux box that’s serving them over HTTP. I’ll use the image fetch command on the switch to pull down those images and update the box. I think you can scp them from the switch (pull from remote) but HTTP was simpler in this case.

First, download the new image to the switch with the image fetch command:

switch-b7b218 [standalone: master] # image fetch http://192.168.77.254/mlnx/image-PPC_M460EX-SX_3.3.4100.img
 100.0%  [#################################################################]

Then, you can list the images on the box with show images:

switch-b7b218 [standalone: master] # sh im
Images available to be installed:
  image-PPC_M460EX-SX_3.3.4100.img
  SX_PPC_M460EX SX_3.3.4100 2013-09-16 18:27:29 ppc

Installed images:
  Partition 1:
  SX_PPC_M460EX SX_3.2.0330-70 2012-12-16 18:14:46 ppc

  Partition 2:
  SX_PPC_M460EX SX_3.2.0330-70 2012-12-16 18:14:46 ppc

Last boot partition: 1
Next boot partition: 1

Boot manager password is set.

No image install currently in progress.

Image signing: trusted signature always required
Admin require signed images: yes

Settings for next boot only:
   Fallback reboot on configuration failure: yes (default)

Install the image with the image install command. Optionally, specify a location (the partition to use) if you have a screwed up image on one of your two boot partitions and would like to overwrite it.

image install image-PPC_M460EX-SX_3.3.4100.img (location 1)

View the installed images again with sh image. The system keeps two OS images on flash, and recovering it if both are corrupted doesn’t seem to be too difficult.

switch-b7b218 [standalone: master] (config) # sh image
Images available to be installed:
  image-PPC_M460EX-SX_3.3.4100.img
  SX_PPC_M460EX SX_3.3.4100 2013-09-16 18:27:29 ppc

Installed images:
  Partition 1:
  SX_PPC_M460EX SX_3.2.0330-70 2012-12-16 18:14:46 ppc

  Partition 2:
  SX_PPC_M460EX SX_3.3.4100 2013-09-16 18:27:29 ppc

Last boot partition: 1
Next boot partition: 1

Boot manager password is set.

No image install currently in progress.

Image signing: trusted signature always required
Admin require signed images: yes

Settings for next boot only:
   Fallback reboot on configuration failure: yes (default)

Set the switch to boot from the new image with the image boot next command:

switch-b7b218 [standalone: master] (config) # image boot next

Write memory and reload:

switch-b7b218 [standalone: master] (config) # wr mem
switch-b7b218 [standalone: master] (config) # reload

Confirm the new image is booted:

switch-b7b218 [standalone: master] > sh ver
Product name:      SX_PPC_M460EX
Product release:   SX_3.3.4100
Build ID:          #1-dev
Build date:        2013-09-16 18:27:29

Delete the image file you just installed:

switch-b7b218 [standalone: master] # image delete image-PPC_M460EX-SX_3.3.4100.img

And continue:

switch-b7b218 [standalone: master] > en
switch-b7b218 [standalone: master] #
switch-b7b218 [standalone: master] #
switch-b7b218 [standalone: master] # image delete image-PPC_M460EX-SX_3.3.4100.img
switch-b7b218 [standalone: master] # image fetch http://192.168.77.254/mlnx/image-PPC_M460EX-3.4.3002.img
 100.0%  [#################################################################]
switch-b7b218 [standalone: master] # image install image-PPC_M460EX-3.4.3002.img
Step 1 of 4: Verify Image
 100.0%  [#################################################################]
Step 2 of 4: Uncompress Image
 100.0%  [#################################################################]
Step 3 of 4: Create Filesystems
 100.0%  [#################################################################]
Step 4 of 4: Extract Image
   5.5%  [###                                                              ]

If you try to jump too far ahead, the image will usually fail to extract with an error. For example, I tried to hop from 3.3.4100 to 3.4.3002 at one point, which didn’t work:

% /bin/tar: Unexpected EOF in archive
/bin/tar: Unexpected EOF in archive
/bin/tar: Error is not recoverable: exiting now
*** Could not extract files from /tmp/mnt_image_wi/tmpfs/unzip/image-PPC_M460EX-3.4.3002.tar : 2

All together:

switch-b7b218 [standalone: master] # image del image-PPC_M460EX-3.4.3002.img
switch-b7b218 [standalone: master] # image fetch http://192.168.77.254/mlnx/image-PPC_M460EX-3.5.1006.img
 100.0%  [#########################################################################################################]
switch-b7b218 [standalone: master] # image install image-PPC_M460EX-3.5.1006.img
Step 1 of 4: Verify Image
 100.0%  [#########################################################################################################]
Step 2 of 4: Uncompress Image
 100.0%  [#########################################################################################################]
Step 3 of 4: Create Filesystems
 100.0%  [#########################################################################################################]
Step 4 of 4: Extract Image
 100.0%  [#########################################################################################################]
switch-b7b218 [standalone: master] # conf t
switch-b7b218 [standalone: master] (config) # image boot next
switch-b7b218 [standalone: master] (config) # wr mem
switch-b7b218 [standalone: master] (config) # reload

To shut down the switch, issue the reload halt command, then yank the power once it’s done shutting down the OS.

switch-b7b218 [standalone: master] # reload halt
Configuration has been modified; save first? [yes] yes
Configuration changes saved.
Halting system...
switch-b7b218 [standalone: master] #

System shutdown initiated -- logging off.



Mellanox MLNX-OS Switch Management

Stopping pm: [  OK  ]
Shutting down kernel logger: [  OK  ]
Shutting down system logger: [  OK  ]
Starting killall:  [  OK  ]
Sending all processes the TERM signal...
Sending all processes the KILL signal...
Remounting root filesystem in read-write mode:
Saving random seed:
Syncing hardware clock to system time
Running vpart script:
Unmounting file systems:
Remounting root filesystem in read-only mode:
Running vpart script:
Halting system...
Power down.
System Halted, OK to turn off power

Unlocking shell access

The generic LK2-RESTRICTED_CMDS_GEN2-88A1-NEWD-BPNB-1 license key unocks “access to restricted system functionality”, including access to the Bash shell via the _shell command. Install it from configuration mode with:

switch-b7b218 [standalone: master] # en
switch-b7b218 [standalone: master] # conf t
switch-b7b218 [standalone: master] (config) # license install LK2-RESTRICTED_CMDS_GEN2-88A1-NEWD-BPNB-1
switch-b7b218 [standalone: master] (config) # exit
switch-b7b218 [standalone: master] # sh licenses
License 1: LK2-RESTRICTED_CMDS_GEN2-88A1-NEWD-BPNB-1
   Feature:          RESTRICTED_CMDS_GEN2
   Description:      Access to restricted system functionality
   Valid:            yes
   Active:           yes
switch-b7b218 [standalone: master] # wr mem

Then, you can access the shell on the SX6036 by typing _shell from enable mode, like so:

switch-b7b218 [standalone: master] # _shell
[admin@switch-b7b218 ~]# whoami
admin
[admin@switch-b7b218 ~]# uname -a
Linux switch-b7b218 3.10.94-MELLANOXuni-m460ex PPC_M460EX jenkins #1 2019-02-13 12:36:41 ppc GNU/Linux
[admin@switch-b7b218 ~]# which bash
/bin/bash

License generator

Importantly, this allows us to get at the genlicense utility - this may be used to unlock other things, like Ethernet. This is actually included in most MLNX-OS images, including the one on your SX6036.

[admin@switch-b7b218 ~]# which genlicense
/opt/tms/bin/genlicense

Running genlicense with no args will spit out a help dialog.

[admin@switch-b7b218 ~]# genlicense
Usage: genlicense <license type ID> <type-specific options ...>

genlicense 1 <feature name> <start date> <end date>
             <tied info id> <tied info str> <secret>

    Feature name:   String of characters not including hyphen '-'
    Start date:     0 for no limit, or date like 2003/10/15
    End date:       0 for no limit, or date like 2003/10/15
    Tied info id:   1 for first MAC address
                    2 for host ID
                    3 for a license not tied to any specific machine
    Tied info str:  For tied info id 1: MAC like 11:33:55:77:99:AA
                    For tied info id 2: Host ID like 8bbf35f82b2c
                    For tied info id 3: ignored (use "-")
    Secret:         Shared secret, quoted or escaped if necessary

genlicense 2 <feature name> <secret> [-h <hash type>] [-u <length>] [-i]
             [-f] [-o <option tag id> <option value>]*

    Feature name:   String of capital letters, numbers, and '_'
    Secret:         Shared secret, quoted or escaped if necessary

    Hash type: defaults to hmac_sha256_48 if not specified.
    Use either the number or the string in parentheses to specify.
       1 (hmac_md5_full): HMAC MD5 with full 128-bit length
       2 (hmac_md5_96): HMAC MD5 truncated to 96 bits
       3 (hmac_md5_48): HMAC MD5 truncated to 48 bits
       5 (hmac_sha256_full): HMAC SHA256 with full 256-bit length
       6 (hmac_sha256_128): HMAC SHA256 truncated to 128 bits
       7 (hmac_sha256_96): HMAC SHA256 truncated to 96 bits
       8 (hmac_sha256_48): HMAC SHA256 truncated to 48 bits

    The '-u' option requests to embed a randomly generated number in the
    license, which serves to make it unique from other licenses with the same
    options.  Useful e.g. if you are using a cumulative informational option.
    The parameter is a length in bits, up to 64.

    The '-i' option requests to add a validator identifier to the license.
    This makes the license five characters longer, but may speed up license
    calculations in some cases.

    The '-f' option forces generation of a license that otherwise failed our
    validation criteria: either (a) it does not meet the current requirements
    for hash length and/or algorithm, or (b) it uses an option tag which is
    not relevant to the particular feature chosen.

    The '-o' option can be used zero or more times to specify options
    for the license.  Each option tag has an id (which encodes the data
    type and meaning of the tag) and a value.  There are two classes of
    option tags, which share the same id namespace.  Specify the id either
    by number, or by the string shown in parentheses after the number:

    Activation option IDs:
        1 (start_date): Start date: license not active before this date
        2 (end_date): End date: license not active after this date
        3 (tied_primary_mac): Tie to this MAC address on primary interface
        4 (tied_host_id): Tie to this host ID
        5 (tied_host_id_hex): Tie to this host ID (lowercase hexadecimal)
       49 (efm_sx_serial_num): Chassis serial number for license verification

    Informational option IDs:
       48 (efm_sx_max_num_hca_ports): Maximum number of HCA ports supported by this MLNX-OS SwitchX license
       50 (efm_sx_active_ports): Active ports number supported by this MLNX-OS SwitchX license
       51 (efm_sx_l2_enabled): Eth L2 enabled by this MLNX-OS SwitchX license
       52 (efm_sx_ib_enabled): IB enabled by this MLNX-OS SwitchX license
       53 (efm_sx_eth_enabled): Eth enabled by this MLNX-OS SwitchX license
       54 (efm_sx_gw_ports): GW ports number supported by this MLNX-OS SwitchX license
       55 (efm_sx_max_ufm_ports): Maximum number of UFM ports supported by this MLNX-OS SwitchX license
       56 (efm_sx_ib_speed_sw_limit): IB port SW speed limit enabled by this MLNX-OS SwitchX license
       57 (efm_sx_eth_speed_sw_limit): Eth port SW speed limit enabled by this MLNX-OS SwitchX license
       58 (efm_sx_l3_enabled): Eth L3 enabled by this MLNX-OS SwitchX license
       59 (efm_sx_fcf_enabled): FCF enabled by this MLNX-OS SwitchX license
       60 (oem_lic_10gbps_enable): 10 Gbps ports licensed by OEM license
       61 (oem_lic_25gbps_enable): 25 Gbps ports licensed by OEM license
       62 (oem_lic_100gbps_enable): 100 Gbps ports licensed by OEM license

Note the “informational” option IDs; these are the switch feature toggles you pass to genlicense.

  1. 48 sets the maximum number of HCA (host channel adapter) ports on the switch. Not entirely sure what this does.
  2. 50 sets the maximum number of active ports on the switch.
  3. 51 specifies whether L2 Ethernet is enabled.
  4. 52 specifices whether Infiniband is enabled.
  5. 53 specifies whether Ethernet (in general) is enabled.
  6. I think 54 specifies how many Virtual Protocol Interconnect (VPI, IB to Ethernet GW) ports are available.
  7. 55 specifies the number of Unified Fabric Manager (UFM) ports that are available.
  8. 56 and 57 specify the speed limit on Infiniband and Ethernet ports, respectively.
  9. 58 specifies whether L3 Ethernet is enabled.
  10. 59 specifies whether the Fibre Channel Fabric (FCF) feature is enabled.
  11. 60, 61, 62 are OEM licenses that I don’t think are strictly applicable to the SX6036.

Someone ran a trace on the dumplicense binary a long time ago and found that the shared secret is m2l0n%0x9. If you want to look at it yourself, dumplicense is in the same spot as genlicense and takes a license as input. You’ll probably want to extract the x86 version (see the last part of this section) so you can run ltrace on it. Look for the magic number that dumplicense stores in memory.

Anyway.. to generate a generic license with no time or hardware restriction that will enable basically everything:

[admin@switch-b7b218 ~]# genlicense 2 EFM_SX m2l0n%0x9 -o 51 true -o 52 true -o 53 true -o 54 1024 -o 55 1024 -o 58 true -o 59 true
LK2-EFM_SX-5K11-5L11-5M11-5N31-005P-3100-5T11-5U11-88A1-2953-7H6W-T

That license key will work on any SX6036. Install your generated license on the switch with the config mode license install command:

switch-b7b218 [standalone: master] > en
switch-b7b218 [standalone: master] # conf t
switch-b7b218 [standalone: master] (config) # license install LK2-EFM_SX-5K11-5L11-5M11-5N31-005P-3100-5T11-5U11-88A1-2953-7H6W-T
License was installed successfully. Please wait 1 minute before further configurations.
switch-b7b218 [standalone: master] (config) # wr mem
switch-b7b218 [standalone: master] (config) # sh license
License 1: LK2-RESTRICTED_CMDS_GEN2-88A1-NEWD-BPNB-1
   Feature:          RESTRICTED_CMDS_GEN2
   Description:      Access to restricted system functionality
   Valid:            yes
   Active:           yes

License 2: LK2-EFM_SX-5K11-5L11-5M11-5N31-005P-3100-5T11-5U11-88A1-2953-7H6W-T
   Feature:          EFM_SX
   Description:      Generic SX license
   Valid:            yes
   Active:           yes
   Eth L2 enabled:   true
   IB enabled:       true
   Eth enabled:      true
   GW ports number:  1024
   Max num ufm ports supported: 1024
   Eth L3 enabled:   true
   FCF enabled:      true

Extracting an OS image and running genlicense on an x86 machine

I did this to generate a host license before realizing that the shell access license I’d found online is generic (dummy me). You can generate it, but it’ll always be the same:

[liam@m715q ~]$ /tmp/image/opt/tms/bin/genlicense 2 RESTRICTED_CMDS_GEN2 m2l0n%0x9
LK2-RESTRICTED_CMDS_GEN2-88A1-NEWD-BPNB-1

Anyway, I’m going to leave this here because I already wrote it up, and this is probably how you’d want to go about tracing dumplicense to find the magic number if you were so inclined.

Since the license generator is in the MLNX-OS tarball, we can pull down an x86 version of MLNX-OS (I’m using 3.6.8008 for the SB7800 from https://www.mellanox.com/downloads/Software/image-X86_64-3.6.8008.img), use binwalk to disassemble the img file and extract the filesystem tarball, then run the binary on any x86 machine.

I’ve copied the image (“image-X86_64-3.6.8008.img”) to a Linux machine (Alma 10) that has podman, curl, and git installed (we’ll need this to build and run Binwalk in a container).

First, clone the binwalk source:

git clone https://github.com/ReFirmLabs/binwalk && cd binwalk

Edit the docker_build script to alias docker=podman (yeah, you could also replace the single docker call):

#!/usr/bin/env bash

shopt -s expand_aliases
alias docker=podman

docker build --build-arg SCRIPT_DIRECTORY=$PWD -t binwalkv3 .

Run the build script (don’t need to elevate this since Podman):

./docker_build.sh

Wait for the build to complete. I made the mistake of running it on a Zen 1 machine, so it took a little while.

Make a directory in /tmp for the container, copy the .img there:

mkdir /tmp/analysis
mv ~/image-X86_64-3.6.8008.img /tmp/analysis/

Run the container, with automatic private relabeling (:Z), and keep-id (so you don’t have to chown 1000:1000 /tmp/analysis:

podman run --userns=keep-id -t -v /tmp/analysis:/analysis:Z binwalkv3 -Me /analysis/image-X86_64-3.6.8008.img

binwalk should spit out its analysis. In my case, it says that:

You can look at what was extracted from the image by navigating to the binwalk output directory:

[liam@m715q ~]$ cd /tmp/analysis/extractions/image-X86_64-3.6.8008.img.extracted/0/
[liam@m715q 0]$ ls -l
total 585440
-rwxr-xr-x. 1 liam liam      7820 Jul  3  2018 build_version.sh
-rwxr-xr-x. 1 liam liam      5467 Jul  3  2018 build_version.txt
-rw-r--r--. 1 liam liam 573662040 Jul  3  2018 image-X86_64-x86_64-x86_64-20180703-203209.tbz
-rw-r--r--. 1 liam liam       212 Jul  3  2018 image_vars.sh
-rw-r--r--. 1 liam liam       502 Jul  3  2018 md5sums
-rw-r--r--. 1 liam liam  21543657 Jul  3  2018 mfg-initrd-X86_64-x86_64-x86_64-20180703-203209
-rw-r--r--. 1 liam liam   4228528 Jul  2  2018 mfg-kernel-X86_64-x86_64-x86_64-20180703-203209
drwxr-xr-x. 3 liam liam      4096 Oct 19 13:42 mfg-kernel-X86_64-x86_64-x86_64-20180703-203209.extracted
-rw-r--r--. 1 liam liam       363 Jul  3  2018 mfg_vars.sh
-rw-r--r--. 1 liam liam      2033 Jul  3  2018 tpkg-manifest
-rw-r--r--. 1 liam liam       455 Jul  3  2018 tpkg-manifest.sig
-rw-r--r--. 1 liam liam        49 Jul  3  2018 tpkg-vars

I’ll extract the image.tbz tarball:

mkdir /tmp/image
tar xf image-X86_64-x86_64-x86_64-20180703-203209.tbz -C /tmp/image/

If you have a look, you’ll see that this is a MLNX-OS filesystem:

[liam@m715q ~]$ ls /tmp/image/
IBswcountlimits.pm  bin  boot  bootmgr  config  data  debug_symbols  dev  etc  home  initrd  iptables  iptables-multi  iptables-restore  iptables-save  lib  lib64  media  mnt  opt  output  proc  root  run  sbin  share  srv  stats-files  stats-reports  sys  tmp  usr  var  vtmp

We can then find and run the genlicense binary.

[liam@m715q ~]$ find /tmp/image/ -name "genlicense" 2> /dev/null
/tmp/image/opt/tms/bin/genlicense
[liam@m715q ~]$ /tmp/image/opt/tms/bin/genlicense
Usage: genlicense <license type ID> <type-specific options ...>

genlicense 1 <feature name> <start date> <end date>
             <tied info id> <tied info str> <secret>

    Feature name:   String of characters not including hyphen '-'
    Start date:     0 for no limit, or date like 2003/10/15
    End date:       0 for no limit, or date like 2003/10/15
    Tied info id:   1 for first MAC address
                    2 for host ID
                    3 for a license not tied to any specific machine
    Tied info str:  For tied info id 1: MAC like 11:33:55:77:99:AA
                    For tied info id 2: Host ID like 8bbf35f82b2c
                    For tied info id 3: ignored (use "-")
    Secret:         Shared secret, quoted or escaped if necessary

genlicense 2 <feature name> <secret> [-h <hash type>] [-u <length>] [-i]
             [-f] [-o <option tag id> <option value>]*

    Feature name:   String of capital letters, numbers, and '_'
    Secret:         Shared secret, quoted or escaped if necessary

    Hash type: defaults to hmac_sha256_48 if not specified.
    Use either the number or the string in parentheses to specify.
       1 (hmac_md5_full): HMAC MD5 with full 128-bit length
       2 (hmac_md5_96): HMAC MD5 truncated to 96 bits
       3 (hmac_md5_48): HMAC MD5 truncated to 48 bits
       5 (hmac_sha256_full): HMAC SHA256 with full 256-bit length
       6 (hmac_sha256_128): HMAC SHA256 truncated to 128 bits
       7 (hmac_sha256_96): HMAC SHA256 truncated to 96 bits
       8 (hmac_sha256_48): HMAC SHA256 truncated to 48 bits

    The '-u' option requests to embed a randomly generated number in the
    license, which serves to make it unique from other licenses with the same
    options.  Useful e.g. if you are using a cumulative informational option.
    The parameter is a length in bits, up to 64.

    The '-i' option requests to add a validator identifier to the license.
    This makes the license five characters longer, but may speed up license
    calculations in some cases.

    The '-f' option forces generation of a license that otherwise failed our
    validation criteria: either (a) it does not meet the current requirements
    for hash length and/or algorithm, or (b) it uses an option tag which is
    not relevant to the particular feature chosen.

    The '-o' option can be used zero or more times to specify options
    for the license.  Each option tag has an id (which encodes the data
    type and meaning of the tag) and a value.  There are two classes of
    option tags, which share the same id namespace.  Specify the id either
    by number, or by the string shown in parentheses after the number:

    Activation option IDs:
        1 (start_date): Start date: license not active before this date
        2 (end_date): End date: license not active after this date
        3 (tied_primary_mac): Tie to this MAC address on primary interface
        4 (tied_host_id): Tie to this host ID
        5 (tied_host_id_hex): Tie to this host ID (lowercase hexadecimal)
        6 (tied_serialno): Tie to this serial number
        7 (tied_uuid): Tie to this UUID (must be RFC 4122 compliant)
       49 (efm_sx_serial_num): Chassis serial number for license verification

    Informational option IDs:
       48 (efm_sx_max_num_hca_ports): Maximum number of HCA ports supported by this MLNX-OS SwitchX license
       50 (efm_sx_active_ports): Active ports number supported by this MLNX-OS SwitchX license
       51 (efm_sx_l2_enabled): Eth L2 enabled by this MLNX-OS SwitchX license
       52 (efm_sx_ib_enabled): IB enabled by this MLNX-OS SwitchX license
       53 (efm_sx_eth_enabled): Eth enabled by this MLNX-OS SwitchX license
       54 (efm_sx_gw_ports): GW ports number supported by this MLNX-OS SwitchX license
       55 (efm_sx_max_ufm_ports): Maximum number of UFM ports supported by this MLNX-OS SwitchX license
       56 (efm_sx_ib_speed_sw_limit): IB port SW speed limit enabled by this MLNX-OS SwitchX license
       57 (efm_sx_eth_speed_sw_limit): Eth port SW speed limit enabled by this MLNX-OS SwitchX license
       58 (efm_sx_l3_enabled): Eth L3 enabled by this MLNX-OS SwitchX license
       59 (efm_sx_fcf_enabled): FCF enabled by this MLNX-OS SwitchX license
       60 (oem_lic_10gbps_enable): 10 Gbps ports licensed by OEM license
       61 (oem_lic_25gbps_enable): 25 Gbps ports licensed by OEM license
       62 (oem_lic_100gbps_enable): 100 Gbps ports licensed by OEM license

This is the same thing that you get on a SX6036, just compiled for x86. I was able to install a license generated with it on my switch.

If you want to trace dumplicense, install gdb and ltrace, then ltrace dumplicense {license}. Look for the repeated memory addresses in memcmp calls. Run gdb and examine the memory location that’s being referenced. gdb is broken on the box I was using so I didn’t go through the whole process, but I found a few addresses of interest.

Changing the system profile to VPI mode (enable Infiniband AND Ethernet)

You can use the system profile commands to change the system from Infiniband mode to Ethernet or VPI (both) mode. The SX6036, being an Infiniband switch, will ship in ib-single-switch mode. I’ll change mine to “Virtual Protocol Interconnect” (VPI) mode, which allows you to switch ports between IB and Ethernet mode.

switch-b7b218 [standalone: master] (config) # system profile vpi-single-switch
Warning - confirming will cause system reboot and all configuration will be deleted
Type 'yes' to confirm profile change: yes

The management configuration will not be wiped by this change. When the switch reloads, you can set specific ports (or all of them) to Ethernet mode.

A manual for configuring the switch may be found here, from IBM.

Quieting the fans

You can run fae mlxi2c commands to slow the fans in the SX6036 from the CLI:

switch-b7b218 [standalone: master] # fae mlxi2c set_fan /FAN/FAN 1 30
switch-b7b218 [standalone: master] # fae mlxi2c set_fan /PS2/FAN 1 30
switch-b7b218 [standalone: master] # fae mlxi2c set_fan /PS1/FAN 1 30

This brings the switch down to a much more tolerable noise level comparable to my Cisco 3750X and 3850 1 GBe non-PoE switches (the UPOE multigig ones are much louder). This also drops power consumption on my switch to around 42w at idle (4 connections up).

On a related note, you can use the show temperature command to see sensor readouts from your switch:

switch-b7b218 [standalone: master] # sh temp
---------------------------------------------------------
Module      Component              Reg  CurTemp    Status
                                        (Celsius)
---------------------------------------------------------
MGMT        SX                     T1   32.00      OK
MGMT        QSFP_TEMP1             T1   25.50      OK
MGMT        QSFP_TEMP2             T1   26.50      OK
MGMT        QSFP_TEMP3             T1   26.00      OK
MGMT        BOARD_MONITOR          T1   31.00      OK
MGMT        CPU_BOARD_MONITOR      T1   30.00      OK
MGMT        CPU_BOARD_MONITOR      T2   62.00      OK

Note the switch chip (SX), and CPU temperatures. The little PowerPC chip gets quite warm.

MLNX-OS seems to use SysVinit, so, to make your fan tweaks persistent, you can write a script to set fan_speed and maximum at startup and either dump it in the rc.local file or as a service under /etc/init.d.

I’ve created a really simple fanspeed script as a service under /etc/init.d referencing this post on the STH forums. I found that I could just set my PSU fans to 10% and chassis fans to 20% without triggering the temp alarm and a fan speed spike; this may differ depending on your ambient conditions. This quiets my switch down nicely (barely louder than a 1G Cisco 3750X-48T/non-PoE switch that I’ve got kicking around, perfectly acceptable for a closet somewhere).

If you’re feeling really fancy you could probably do something cool and write a PID control loop for fan control; I was not feeling really fancy, and the switch will ramp the fans on its own if it gets too hot. Maybe a future project.

To make any changes to the root partition, you’ll need to remount it read/write.

mount -o rw,remount /

Then, dump your new script in:

tee /etc/init.d/fanspeed << 'EOT'
#!/bin/sh
#
# fanspeed.sh custom fan speed setter
#
# chkconfig: 2345 99 99
# description: waits for clusterd (system full boot) then sets fan speed minimums

start() {

    logger "fanspeed.sh: Waiting for clusterd"
    
    count=1
            
    while true; do
    
        if [ "$count" -gt 15 ]; then
            
            logger "fanspeed.sh: FAILED! Timed out waiting for clusterd (15 rep.); aborting."
            break
        
        fi
        
        if pgrep -f "/opt/tms/bin/clusterd" > /dev/null; then

            logger "fanspeed.sh: clusterd is running; system has booted. Starting final 30s wait."
            
            sleep 30
            
            logger "fanspeed.sh: Setting fan speed minimum to 20 (CHA), 10 (PS{1,2})"
                
            /opt/tms/bin/mdreq action /system/chassis/actions/set-fan-speed fan_module string "/FAN/FAN" fan_number int8 1 fan_speed int8 20 set_max uint8 100
            /opt/tms/bin/mdreq action /system/chassis/actions/set-fan-speed fan_module string "/PS1/FAN" fan_number int8 1 fan_speed int8 10 set_max uint8 100
            /opt/tms/bin/mdreq action /system/chassis/actions/set-fan-speed fan_module string "/PS2/FAN" fan_number int8 1 fan_speed int8 10 set_max uint8 100
            
            logger "fanspeed.sh: Set fan speeds. Exiting."
            
            break
            
        else
        
            count=$((count + 1))
            
            sleep 30
            
        fi
        
    done  
    
}

case "$1" in
    start)
        start
        ;;
    stop)
        echo "fanspeed.sh: nothing to stop."
        ;;
    restart)
        $0 stop
        $0 start
        ;;
    *)
        echo "usage: $0 {start|stop|restart}"
        exit 1
        ;;
esac

exit 0
EOT

Make the script executable, then add it to the init list with chkconfig.

chmod +x /etc/init.d/fanspeed
chkconfig --add fanspeed
chkconfig fanspeed on

You shouldn’t get any errors here.

Confirm you can start the fanspeed script:

[admin@switch-b7b218 ~]# /etc/init.d/fanspeed start

And that it’s registered with chkconfig:

[admin@switch-b7b218 ~]# chkconfig --list | grep fanspeed
fanspeed        0:off 1:off 2:on 3:on 4:on 5:on 6:off

Then, reboot to test the script. Once the switch boots up, the fans should quiet down.

Switching ports to Ethernet

Since I’m running in VPI (Infiniband-or-Ethernet) mode, I’ll have to switch ports from IB mode to Eth mode. This can be done with:

switch-b7b218 [standalone: master] (config) # sh ports type
InfiniBand: 1/1 1/2 1/3 1/4 1/5 1/6 1/7 1/8 1/9 1/10 1/11 1/12 1/13 1/14 1/15 1/16 1/17 1/18 1/19 1/20 1/21 1/22 1/23 1/24 1/25 1/26 1/27 1/28 1/29 1/30 1/31 1/32 1/33 1/34 1/35 1/36
switch-b7b218 [standalone: master] (config) # int ib 1/1-1/36
switch-b7b218 [standalone: master] (config interface ib 1/1-1/36) # shutdown
switch-b7b218 [standalone: master] (config interface ib 1/1-1/36) # exit
switch-b7b218 [standalone: master] (config) # port 1/1-1/36 type ethernet
switch-b7b218 [standalone: master] (config) # int eth 1/1-1/36
switch-b7b218 [standalone: master] (config interface ethernet 1/1-1/36) # no shut
switch-b7b218 [standalone: master] (config interface ethernet 1/1-1/36) # exit
switch-b7b218 [standalone: master] (config) # wr mem
switch-b7b218 [standalone: master] (config) # exit
switch-b7b218 [standalone: master] # sh ports type
Ethernet:   1/1 1/2 1/3 1/4 1/5 1/6 1/7 1/8 1/9 1/10 1/11 1/12 1/13 1/14 1/15 1/16 1/17 1/18 1/19 1/20 1/21 1/22 1/23 1/24 1/25 1/26 1/27 1/28 1/29 1/30 1/31 1/32 1/33 1/34 1/35 1/36

My eth 1/1 port, hooked up to a CX4 LX, will link right up at 40 Gbps out of the box.

switch-b7b218 [standalone: master] # sh int eth 1/1

Eth1/1:
  Admin state                               : Enabled
  Operational state                         : Up
  Last change in operational status         : 0:00:48 ago (1 oper change)
  Boot delay time                           : 0 sec
  Description                               : N\A
  Mac address                               : 00:02:c9:8e:e6:ac
  MTU                                       : 1500 bytes (Maximum packet size 1522 bytes)
  Fec                                       : auto
  Flow-control                              : receive off send off
  Actual speed                              : 40 Gbps

If you have a CX3 Pro or CX4-non-LX, force the switchport to speed 56000 to make the two link up at 56 Gbps (cabling permitting). This is a proprietary Mellanox thing so you need Mellanox cabling/optics and Infiniband HCAs that support Ethernet (since they have hardware support for the fourteen data rate signalling rate).

Connecting a machine with a CX4

I hooked up a computer with a ConnectX-4 and another with a SFN7122F (with some breakouts) to verify that things were working.

The CX4-LX links up at 40 Gbps right out of the box (with the in-tree driver in both Fedora 42 and Alma 10).

This NIC doesn’t support the proprietary 56 Gbps Ethernet mode that the full fat Infiniband HCAs (the CX4 non LX and some CX3s) do; maybe I’ll pick up some CX3 Pros for my bigger servers (power consumption be damned).

The CX4 LX use much less power, produce much less heat, and support ASPM, so 16 Gbps that I won’t saturate isn’t such a large sacrifice to make anyway.

Anyway, here’s ethtool output:

wporter@a10cx4:~$ sudo ethtool enp1s0np0
Settings for enp1s0np0:
 Supported ports: [ Backplane ]
 Supported link modes:   1000baseT/Full
                         1000baseKX/Full
                         10000baseKR/Full
                         40000baseKR4/Full
                         40000baseCR4/Full
                         40000baseSR4/Full
                         40000baseLR4/Full
                         25000baseCR/Full
                         25000baseKR/Full
                         25000baseSR/Full
                         50000baseCR2/Full
                         50000baseKR2/Full
 Supported pause frame use: Symmetric
 Supports auto-negotiation: Yes
 Supported FEC modes: None  RS  BASER
 Advertised link modes:  1000baseT/Full
                         1000baseKX/Full
                         10000baseKR/Full
                         40000baseKR4/Full
                         40000baseCR4/Full
                         40000baseSR4/Full
                         40000baseLR4/Full
                         25000baseCR/Full
                         25000baseKR/Full
                         25000baseSR/Full
                         50000baseCR2/Full
                         50000baseKR2/Full
 Advertised pause frame use: Symmetric
 Advertised auto-negotiation: Yes
 Advertised FEC modes: None
 Link partner advertised link modes:  Not reported
 Link partner advertised pause frame use: No
 Link partner advertised auto-negotiation: Yes
 Link partner advertised FEC modes: Not reported
 Speed: 40000Mb/s
 Lanes: 4
 Duplex: Full
 Auto-negotiation: on
 Port: Direct Attach Copper
 PHYAD: 0
 Transceiver: internal
 Supports Wake-on: d
 Wake-on: d
 Link detected: yes

For the sake of the example, I’m going to install the proprietary out-of-tree driver. The Ethernet-only “lightweight” driver is no longer supported; it’s been supplanted by the in-tree driver (part of the Linux kernel). The MLNX-OFED driver has been replaced with the DOCA-OFED driver (see the transition guide here (docs.nvidia.com)), and the MLNX-OFED driver’s final LTS release came out at the end of 2024 (no new kernel support).

Since I’m running Alma 10, I’ll need to install the DOCA-OFED driver for RHEL 10. The more complete drivers don’t support EL10 yet (not sure if they ever will - Nvidia really likes Ubuntu).

Install the EPEL repo and activate CRB:

sudo dnf install -y epel-release
sudo /usr/bin/crb enable

Add the GPG key:

wget http://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox-SHA256
sudo rpm --import RPM-GPG-KEY-Mellanox-SHA256

Nvidia disables gpgcheck… You can reenable it, and I have below - they do actually sign things with the GPG key you installed above.

sudo tee /etc/yum.repos.d/doca.repo << 'EOT'
[doca]
name=DOCA Online Repo
baseurl=https://linux.mellanox.com/public/repo/doca/3.1.0/rhel10.0/x86_64/
enabled=1
gpgcheck=1
EOT

Go ahead and install the doca-ofed package now:

sudo dnf install -y doca-ofed

When done, give your machine a reboot to load the new kernel modules. You should then be able to use the Mellanox utilities to work with your NIC.

wporter@a10cx4:~$ sudo mst start
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
Loading MST PCI configuration module - Success
Create devices
Unloading MST PCI module (unused) - Success
wporter@a10cx4:~$ sudo mlxfwmanager
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX4LX
  Part Number:      MCX4131A-GCA_Ax
  Description:      ConnectX-4 Lx EN network interface card; 50GbE single-port QSFP28; PCIe3.0 x8; ROHS R6
  PSID:             MT_2430110032
  PCI Device Name:  /dev/mst/mt4117_pciconf0
  Base MAC:         98039b63e56c
  Versions:         Current        Available
     FW             14.20.1010     N/A
     PXE            3.5.0210       N/A

  Status:           No matching image found

If you want to update firmware (available from Nvidia at network.nvidia.com/support) to get ASPM support, you can do so now.

Wrapping things up

I think that’s about it. More advanced stuff coming soon.

This switch will be installed soon so it can start serving my clusters, but I like it so much that I might buy another.