A Tale of Two Switches

We were initially pleased with the performance of the Extreme Networks s400-48t edge switch. When stacked together they offered 192-port switching for less than $20,000. However, as we tested more thoroughly, flaws in this switch emerged as well. We noticed a sharp loss in performance on various global message passing patterns, which we later traced to the effects of different port arrangements. Gigabit switches are composed of 12 or 16 port modules which sit on a switching backplane. The backplane connection is a critical and expensive component, and typical edge switches cannot support wire speed connections in all port configurations. Most, like the Force10 S50 and Extreme Networks s400-48t, use a 10 Gigabit connection to the backplane so, with 12 ports per ASIC, this connection can be oversubscribed. Since none of these switches can generate stop frames (they can pass them but not generate them) there is an inevitable packet loss in some configurations. The tests below illustrate this by using different arrangements of the ports. In the A tests messages are exchanged between sequential pairs of ports; i.e. 1<->2, 3<->4 and so on. In the B tests there is an offset of 12 ports; i.e. 1<->13, 2<->14 and so on. In the first case, all packets stay within a single ASIC; in case B all packets have to cross the backplane.

The switches we compared were:
  1. Extreme Networks s400-48t: 48 port managed Gigabit switch (price in 2005 ~ $4500)
  2. Extreme Networks x450a-48t: 48 port managed Gigabit switch (price in 2006 ~ $6500)

The bidirectional tests used Intel PRO 1000 NICS (PCI-X) with the MPIGAMMA software layer, since this generates the highest data flow of all the combinations we have available. The maximum one-way throughput with a bidirectional test is 116 MBytes/sec using two nodes wired back-to-back. A bidirectional test is more complex than a one-way test, since the acknowledgement packets must mix with the data packets. We used an updated version of netbench which has a more rigorous timing and generates many more packets than previously; it tends to report a slightly smaller throughput.

S400-48T X450A-48T
# Nodes A B A B
2 110 109 109 109
4 110 108 109 108
8 109 108 109 107
16 107 107 107 107
32 106 106 108 108
48 108 41 107 106

It can be seen that the s400-48t cannot sustain wire speed on all ports in all configurations. The backplane cannot support the bandwidth necessary to prevent oversubscription and packet loss. The consequences of packet loss can be much worse than that shown here which is an average over a large volume of data (40 Mbytes). Below we show the minimum throughput as a function of message size for 48 nodes (24 pairs of edge exchanges).

S400-48T X450A-48T
Message (KBytes) A B A B
1 30 28 28 29
2 36 34 34 35
4 39 37 38 39
8 59 54 59 60
16 79 61 80 80
32 95 0.3 96 96
64 105 33 107 102
128 73 29 99 92

The stability of the throughput to varying message size is important in practical applications since the buffer size may vary. In this context the s400-48t can be problematic since its performance can lag by orders of magnitude in some circumstances: 0.3 MBytes/sec is not a typo. The x450a-48t, sharing hardware with the Black Diamond chassis switch, is qualitatively better although it too suffers a small performance loss in test B. We have tried several configurations and not been able to cause the switch significant distress. The last data point may be misleading since 128KBytes is the size when GAMMA would exhaust it credits and require an acknowledgement; typically we use messages sizes up to 96KBytes in applications.

Some observations:
  1. The x450a-48t is fabulous. Not cheap, but I have not tried any other switch that can compare. The s400-48t was a good effort, but the x450a is far superior. Interestingly the key word seems to be non-blocking (not wire speed). Extreme never claimed the s400 was non-blocking, but they do for the x450 series.
  2. The combination of the Intel PRO 1000, MPIGAMMA, and the x450a-48t gives a very high level of performance and stability of throughput. For 64 KByte messages you can ALWAYS get at least 100MBytes/sec (80% of maximum) no matter how many ports (up to 48) or how they are configured.
  3. The need for careful benchmarks prior to purchase cannot be overstressed. We tried hard, but our efforts were insufficient to get the performance we wanted.
  4. We were fortunate that Extreme Networks was generous in giving us a free swap of all four switches, 6 months after purchase. They didn't have to do that.
  5. The multi-node Netpipe test in v4.x does not distinguish between these switches. Because it times the completion of the send rather than the receive, it fails to notice the switch congestion, even though there is a sometimes dramatic increase in wall-clock time.