| |
| |
Le Mieux at PSC
The system at Pittsburgh Supercomputing Center, PSC, entered the top500 some 3 years ago as the most powerful Supercomputer in academia with an RMAX of 4463 GigaFlops. It was the first system to deploy the federated QsNet network and comprises 2 rails of Quadrics QsNet. The system is based on the HP AlphaServer SC product and combines Intel processing nodes using Linux to create a customized heterogeneous QsNet configuration for high performance graphics and storage.
| |  |
The Terascale Computing System at the Pittsburgh Supercomputing Center (PSC). Compute power is provided by 750 quad processor Compaq AlphaServer ES45s connected by two rails of Quadrics interconnect.
In order to scale to this numbers of servers, a "federated" switch structure is used, which extends the "fat tree" topology of the standard Quadrics switches to multiple switch chassis. Two completely independent Quadrics QsNet networks connect all the servers.This gives double the application bandwidth as well as enhanced system resilience.
| |  |
QsNet Federated Switch Networks
The standard Quadrics 128-way switch implements a 3-stage fat tree within a single switch chassis. However the architecture of the "Elite 3" switch scales to networks with up to 9 stages supporting over 16000 nodes. The federated switch is a packaging solution, which enables very large networks to be implemented with two stages of switch chassis. This requires only simple extensions to the existing QsNet product family.
The switch topology in federated systems remains a fat tree - the physical partitioning of the switch between chassis is not visible to applications. Each factor of 4 increase in system size requires the addition of another level of switches. This does add to the end to end latency for longer range communications, but the use of worm-hole routing keeps this to a minimum - approximately 70ns for every additional level of switches. Because the fat tree topology is maintained across large systems, QsNet broadcast operations are supported on federated systems. This gives excellent scaling of global operations. For example a barrier operation takes 4.7µs across 64 nodes, and 5.3µs across 512 nodes. |
The basic component of the federated switch network is the QM-128F federated switch chassis. This is configured as two different types of switch, node-switches and top-switches. The node-switch has 4 QM401 16 port switch cards, connecting 'down' to 64 separate nodes, 16 QM402 switches providing the 3 level of switch, and 4 QM407 buffer cards which provides 64 links 'up' to the next level of switches. A 1024 way network requires 16 of these '64 up 64 down' switches, which in turn are connected by 64 independent 16 way switches.This top level of switches is implemented by 64 QM401X switches packaged in 8 QM-128F chassis.
The Terascale Computing System installed at the Pittsburgh Supercomputer Centre consists of 750 Compaq ES45 processing nodes, each with 4 1Ghz Alpha processors. These are connected by two independent 1024 way Quadrics federated switch networks. Each independent rail has 12 '64 up 64 down' node switch chassis, and a full complement of 8 top switches. The system has a peak performance of 6 Teraflops. |
> Legal
| |
|