Gigabit Switch Performance

We have compared network performance for two inexpensive unmanaged gigabit switches (2003) and two newer managed switches (2006). For these tests we use bidirectional edge exchanges between multiple pairs of nodes. Each pair of nodes exchanges messages of various sizes with the neighboring node. We report the asymptotic one-way throughout for large messages in MBytes/sec; the theoretical limit is 125. We use our own program, (netbench), to generate the packets. Although the same capability is found in Netpipe 4.x the timing in Netpipe is not suitable for our needs. Netpipe measures the time for messages to be written to the send buffer on the transmitter, which, in the case of a congested switch, can be orders of magnitude shorter than the time needed to fill the receive buffer. For parallel applications the time to complete the receive is the crucial one, and determines the elapsed wall-clock time.

The switches we have tested are:
  1. Asante GX5-16: 16 port unmanaged Gigabit switch (price in 2003 ~ $2000)
  2. HP 2724: 24 port unmanaged Gigabit switch (price in 2003 ~ $2000)
  3. Force10 S50: 48 port managed Gigabit switch (price in 2005 ~ $4500)
  4. Extreme Networks s400-48t: 48 port managed Gigabit switch (price in 2005 ~ $4500)

Tests on the GX5-16 and 2724 used MPICH while the more recent benchmarks on the S50 and S400-48T used LAM v7. We used optimum message sizes, typically 128-256KByte messages. The results are averages over many messages, since there is a substantial variation in throughput in these tests, particular for large numbers of nodes. For the GX5-16 and 2724 we used onboard Intel PRO 1000 NICS and for the other switches the onboard Broadcom 5721J.

# Nodes GX5-16 2724 S50 S400-48T
2 90 86 97 97
4 75 87 93 95
8 47 60 92 96
16 38 56 90 96
32 86 93
Some observations:
  1. Price matters. The more expensive switches from Force10 and Extreme Networks easily outperform the cheaper ones, which have a marked tendency to choke when a large number of ports are in use. Thus their utility in parallel high-performance computing is limited.
  2. The Force10 switch becomes erratic with larger numbers of ports, sometimes fast sometimes slower. It works more smoothly with smaller packet sizes but that reduces throughput. On the other hand the Extreme switch maintains high performance with all 48 ports running.
  3. The Extreme Networks s400-48t performed very well in these tests and showed a very low latency, less than 4 microsecs. However, under more rigorous tests after purchase we found significant flaws in this switch as well.
  4. These tests used an earlier version of netbench which did not necessarily saturate all the buffers in the interconnect. Nor did it record the longest time, merely the one on process 0. This data can be used for comparison between these switches, but not necessarily with later netbench results.