| |
QsNet High Performance Interconnect
QsNetII overview
QsNetII is the leading high-performance interconnect for supercomputer systems. The combination of high bandwidth, ultra low latency and scalability has made this the network of choice for many of the world's fastest computer systems. Using QsNetII, multi-teraflop systems can be constructed from commodity compute servers. The technolo gy has been developed from the outset to support the requirements of supercomputer class systems, with the emphasis on performance, resilience, security and data integrity. |
QsNetII- is designed to connect servers high performance PCIe/PCI-X interfaces. This uses parallel copper interconnect to deliver over 900Mbytes/s of user space to user space bandwidth. Optional optical interconnect extends the maximum link length to over 100m. QsNetII uses a 'fat tree' topology.This permits scaling up to 4096 nodes.The nodes themselves typically have multiple CPUs, permitting systems of >10,000 CPUs to be constructed. Multiple, parallel QsNetII networks can be employed in a system to maintain the compute to communications ratio where high CPU count SMP nodes are employed. QsNetII hardware is just one part of a complete family of products for building high performance clusters. Optimized libraries for common distributed memory programming models exploit the full capabilities of the base hardware.The kernel communication layer allows system services to take advantage of the performance of QsNet.This software is available as open source for the Linux platform. |
Network Interface Architecture
QsNetII interfaces to the host computer through the industry standard PCIe/PCI-X buses.The architecture of the network interface has been developed to offload the entire task of interprocessor communication from the main processor, and to avoid the overhead of system calls for user process to user process messaging. A DMA transfer between two user processes can be initiated with a short sequence of writes to the network interface with no requirement for an expensive system call.
| Uniquely, QsNetII supports the capability to perform I/O to and from paged virtual memory.This means users can communicate to and from anywhere in their process space without the overhead of copying, or locking down pages. QsNetII is designed for use within SMP systems - multiple, concurrent processes can utilise the network interface, at any time, without any task switching overhead. Since each client process accesses its own virtual communication processor, they may each run their own set of protocols without compromising process to process security. Data transfer is handled by a DMA engine for message output, and a hardware input packet handler for message receipt. A dedicated I/O processor is provided to offload protocol handling from the main CPU. Local memory on the PCI card provides storage for buffers, translation tables and I/O adapter code.This ensures that all the available PCI bandwidth is dedicated to data communication. The actual system performance of QsNetII is determined by the PCIe/PCI-X bus bridge implementation of the host system. Performance scales beyond the capacity of a single bus in multi-rail systems - where each PCI segment of an SMP node is connected to an independent QsNetII network. Up to 8 independent rails can be used, provided the base SMP systems have sufficient PCI buses available. QsNetII architecture supports 64 bit processor architectures such as the Intel Itanium and AMD Opteron. QsNetII provides full support for 64 bit virtual address translation on all network interface functional units permitting zero copy transfers across the entire 64 bit address space. | Elan4 ASIC | | |
|
Software Integration
QsNetII is supported under Linux for the Intel® XEON and Itanium processor families and the AMD Opteron architecture. In addition QsNet is available for the HP Alpha processor running Tru64 Unix. Quadrics MPI provides an optimized implementation of MPI 1.2 that makes full use of the capabilities of the hardware.The Quadrics MPI implementation is based on MPICH from Argonne National Laboratory, with extensions that make use of the broadcast and global operations of the QsNetII network, and the programmable IO processor on the network interface. A subset of MPI-2 operations providing one-sided communications is also supported. The Shmem communications library, with get and put operations mapped directly to remote read and write hardware primitives, provides access in to the basic network read and write operations with minimal overhead. It is also possible to use the Quadrics native communication libraries - libelan - where portability to other interconnects is not an issue. As each application has its own direct and protected access to the network interface, without going through a traditional protocol stack, it is possible to developed and deploy new communications libraries in one part of the machine without compromising the integrity of other applications running on the machine. The operating system utilises QsNetII through a reliable kernel messaging layer.This supports a range of services such as IP, cluster membership, and bulk data transfer for high performance file systems. |
Data Network
| The basic component of the QsNetII switch network is an 8 port custom switch ASIC.These can be combined in a 'fat tree' network that scales, in powers of 4, up to many thousands of nodes. Fat tree networks have many properties that make them attractive for high performance switch fabric. Most importantly the bisectional bandwidth of the network scales linearly with growth in network size.The topology is also inherently highly resilient with large amounts of redundancy in the higher levels of the switch. In this topology packets are routed 'up' the tree to the level from which the destination is reachable. At each stage there are up to 4 alternate up routes.The packet is then routed back down the tree to the destination. As the packet is routed through the network it constructs a fast return path for the packet acknowledge generated by the destination.The 'up' route is selected using adaptive routing, where the packet is routed to the lightest loaded alternate path. This ensures efficient use of the network, and also routes around any unconnected or disabled links. The fat tree topology also enables QsNetII to provide an innovative range selected broadcast. In this case the packet is routed up the tree to the point at which the entire broadcast range is reachable. When the packet is routed back down the switch components automatically copy the message across a range of destinations.The acknowledgements from all the destinations are recombined in the network, so that a broadcast only succeeds when all destinations have been successfully reached. Hardware broadcast allows global operations such as a barrier synchronize to be implemented with excellent scaling behaviour. For further information on optimised collectives see Optimised Collectives on QsNetII | Elite4 ASIC | | |
|
Stand-alone Systems
| Quadrics offers a number of stand-alone switch chassis based on QsNetII technology in the range of 8-128 way cluster configurations. The Enterprise-Series combine ultra low latency and high bandwidth with cost effective configurations and are targeted at industrial level customers as well as research institutions. Potential uses include dedicated ISV codes for industries such as aerospace and automotive. The E-Series is supported under Linux for the Intel Xeon® and Itanium® processor families and the AMD Opteron architecture. | QS32- 32-ports standalone switch | | | | | | |
|
Federated systems
| The basic building block of QsNetII switch networks is the QS5A switch chassis. A single chassis can be configured to provide up to 64 ports of switching implemented as a 3 stage fat tree network. For switches of greater than 64 ports, multiple switch chassis are used in a 'federated' network. "Federated switching" is a packaging solution, which enables very large networks to be implemented with two stages of switch chassis. Although the switch is now physically distributed between multiple chassis, this is partitioning is not visible to applications, as the basic switch network topology is unchanged.The lower level of switch chassis - the node switches - have 64 'down links' connected to processing nodes, and 64 'uplinks' connecting to higher levels of the switch network.The up links are connected to multiple independent switches, packaged in the same standard switch chassis. |  | | Configuration of a 1024-way system with 16 node levels switches each porting 64 nodes and 8 top level switches (full bandwidth). |
|
| Network size | Node switch chassis | Top switch size | Top switch chassis | | 256 | 4 | x4 | 2 | | 1024 | 16 | x16 | 8 | | 4096* | 64 | x64 | 64 |
|
QsNetII Spec
| | | | Bus interfaces | PCI-X 1.0/PCIe 1.0a | | | Peak bus bandwidth | 1064Mbytes/s | | | QsNet link width | 10 bits | | | QsNet line rate | 1.333Gbaud | | | Sustainable transfer rate | 900Mbytes/s | | | On chip cache | 32kbytes D + 16Kbytes | | | Local Memory | 64Mbytes ECC DDR SDRAM | | | Peak Memory Bandwidth | 2.67Gbyte/s | | | IO processor | 200MHz 64 bit | | | Physical Addressing | 52 bits | | | Virtual Address | 64 bit VA, 4K/16K contexts | | | MMU | 2 x 64 entry TLB + hash table | |
|
| QM500LP PCI-X adapter | | QM509 PCIe adapter | | QM700 PCIe adapter | | | | | |  |  |  |  |  | | | | | | | | | | |
|
Network Adapters: QM509 (PCIe), QM500 (PCI-X)
| QM509/QM500 | | | | Processor | Quadrics Elan 4 network processor. | | | Bus Interface | QM509: x4 PCIe Rev. 1.0a | | | QM500: 64 bit & 128 bit, 133MHz PCI Bus. PCI-X 1.0 | | | Link Physical Layer | Full duplex 10 bit, 1.3 Gbaud Quadrics QsNetII Link. 900MBytes/s peak each direction, after protocol. | | | Link Logical Layer | Remote virtual write - Quadrics proprietary. | | | I/O processor | 200 MHz integrated I/O processor. | | | DMA processor | Integrated DMA engine. Automatic packetisation and scheduling. | | | Input processor | Dedicated input packet processing engine. | | | Cache | 32KByte on chip d-cache, 16KByte on chip i-cache. | | | On board Memory | 64MBytes onboard DDR-SDRAM with ECC. | | | MMU | Dual 128 entry integrate TLB + table walk engine supporting full 64 bit virtual addressing. | | | Supported OS | Tru64 UNIX, Linux. | | | Communications libraries | MPI 1.2 + MPI 2.0 remote read and write. Shmem*, kernel messaging & IP. | | | Physical | Low profile, half length PCI card (167.65mm x 68.90mm). Standard height (111.15mm) faceplate also available | | | Power | QM509 (<12W), QM500 (<10W), typical | |
|
E-Series Standalone switches
| QS8A | QS32 | QS5A-LA | | | | | | Number of links | 8 ports | 32 ports | 128 ports | | Link Physical Layer | Full duplex 10 bit, 1.3 Gbaud Quadrics QsNet II Link. | Full duplex 10 bit, 1.3 Gbaud Quadrics QsNetII Link. | Full duplex 10 bit, 1.3 Gbaud Quadrics QsNetII Link. | | Switch architecture | Single stage | 2 stage fully instantiated fat tree. | 3 stage fully instantiated fat tree. | | Bisectional Bandwidth | 7.2 GBytes | 28.8 GByte/s | 57.6 GByte/s | | Physical | 2U 19 inch rack mountable (90mm x 43mm x 260mm) | 4U 19 inch rack mountable (180mm x 430mm x 400mm). | 17U 19 inch rack mountable (750mm x 440mm x 510mm). | | Power | 40W 120/240 Vac | 180W max 120/240 Vac 50/60Hz | 700W 120/240 Vac50/60Hz | | QS8A page | | |
|
QS5A 64 Port QsNetII switch
| Number of down links | 16 to 64 ports from up to 4 QM501-C 16 port cards. | | Number of up links | 16 to 64 ports from up to 4 QM502-C 16 port cards. | | Link Physical Layer | Full duplex 10 bit, 1.3 Gbaud Quadrics QsNet II Link. | | Switch architecture | 3 stage fully instantiated fat tree. | | Bisectional Bandwidth | 58 GByte/s | | Physical | 17U 19 inch rack mountable (0.75m x 0.44m x 0.51m). | | Power | Dual redundant 1.25KW PSU 120/240 Vac. |
|
Notes
QsNet performance is dependent upon the host PCI interface. Performance figures given in this document are indicative of what can be achieved, but do not represent a commitment for any particular system.
Tru64 UNIX is a registered trademark of Hewlett & Packard.
Linux is a registered trademark of Linus Torvalds.
Quadrics® is a registered trademark of Quadrics Ltd.
All other trademarks are the property of their respective owners.
|
Latest news Quadrics QsTenG for HPC Interconnect Product Family (13 Nov 2007). - Click Here to view
> Legal
| |