About Quadrics
Customers
Presentations
Contact Details
Employment
Overview
QsNet
QsTenG
RMS
Linux Software
QsEm for Sky-X & Sky-Y
HPC Services&Solutions
Overview
Software Download
Documentation
Platform Compliance
3rd Party Tools and Applications
External Resources
Gnats
FAQ
Case Studies
Performance Results
Benchmarks
Features & Benefits
Documents Library
Overview
QsNetII E-Series Pricing System
Partners
Channel Partners
Configurations
 
Home   Screensaver   Legal   Login
 
   
   
 

PLATFORMS SUPPORTED


QsNet Supported Node Types

This page describes the node types for which Quadrics has tested and confirmed the correct operation of our QsNet and QsNetII network adapters (Quadrics network adapters are often known by the names of their ASICs Elan3 for QsNet and Elan4 for QsNetII). Information is also provided on chipsets, performance considerations and node architecture/firmware issues that should be taken into account when building a high performance cluster. Known problems that are not subject to NDA are also covered. Appendix 1 covers a number of common questions on selection of a PCI slot. Details on Elan4 bus requirements and performance issues are provided in Appendix 2.
Quadrics customers can select nodes from this document without seeking our help in qualification, although this help is available as part of our pre-sales support service. Customers wishing to bid nodes that are not covered by this document are strongly advised to qualify correct functionality and performance before committing to an order.


NODES WITH GENERIC INTEL CHIPSETS

Quadrics network adapters have been qualified in Intel Xeon nodes that use the Intel 7500, 7501 and 7505 chipsets or the Serverworks GC chipset. A variety of nodes have been tested, including those from HP, IBM, Dell and Serverworks. Quadrics network adapters have been qualified for use in Intel "Tiger" nodes, 2 or 4 processor IA64 nodes using the Intel E8870 chipset.
The Intel PCI-X bridge (P64H2) must be B1 stepping or above on both IA32 and IA64 systems. Quadrics drivers check the revision of this chipset, generating an error if an older version is detected - The early P64H2 chipsets had a bug on them that caused data corruption.
The following versions of the Intel IO Hub (IOH) and Scalable Node Controller (SNC) chips are recommended:
IOH - C2 or newer
SNC - C1 or newer
Performance is reduced on older versions.
Quadrics is working on support for Intel's 3.0 GHz Xeon EM64T nodes and their associated chipset (E7525). Driver support will be available shortly.

For a full test report please refer to the Compliance and Performance page.


NODES WITH GENERIC AMD CHIPSET

Nodes with Generic AMD Chipsets
Quadrics QsNetII network adapters have been qualified in variety of AMD Opteron nodes with the standard 'Gollum' chipset. We have tested the following node types for use with QsNetII:
  • AMD/Celestica 4U 4P A8440 model 'Quartet'
  • AMD/Celestica 1U 2P A2210 model 'Melody'
  • Newisys 1U 2P 2100 model
  • Tyan 1U B2882
    Only the Elan4 adapter is recommended. The older Elan3 adaptors are not recommended for Opteron systems, although they are tested and are known to work.

    For a full test report please refer to the Compliance and Performance page.


  • HP NODES

    Quadrics network adapters are qualified for use in HP AlphaServer SC and XC Linux clusters (see www.hp.com/techservers). Contact your local HP representative for information on these systems.
    Quadrics network adapters have been qualified for use in RX2620 and RX4640 Integrity servers. The RX2600 has one full speed PCI-X slot and three reduced speed slot (100MHz). Elan3 adapters can be used in any of these slots and two Elan3 adapters may be used in a single node, but an Elan4 adapter requires the full speed slot. The RX4640 has two full speed PCI-X slots; each node can support 1 or 2 Elan3 or Elan4 adapters.
    Quadrics network adapters are also known to operate correctly in HP DL360 and DL380 servers. Performance is reduced in the DL360, which only has a 100MHz PCI-X bus.
    HP DL145 and DL585 Opteron are known to work with Elan4.

    For a full test report please refer to the Compliance and Performance page.


    BULL NODES

    Quadrics network adapters are qualified for use in Bull NovaScale nodes, shared memory IA64 nodes with between 4 and 32 CPUs per node.

    Please contact your local Bull representative for information on these systems.


    SGI NODES

    The QsNetII network adaptors have been qualified for SGI Altix 350 systems (the 2-processor IA64 nodes). Validation in the 8-processor systems is in progress. Support for Quadrics patches, drivers and kernel modules is included in SGI ProPack 3. Contact your local SGI representative for further information on these systems.


    NEC NODES

    The QsNet network adaptors have been qualified in NEC TX-7 systems. TX-7 is a shared memory IA64 SMP with up to 32 CPUs per node. For further information on these systems please contact
    NEC High Performance Computing Europe GmbH
    Prinzenallee 11
    D-40549 Düsseldorf
    +49-211-5369-0
    +49-211-5369-199 (fax)
    Email: info@hpce.nec.com
    NEC Corporation
    High Performance Computing Marketing Promotion Division
    7-1 Shiba, 5-chome
    Minato-ku, Tokyo 108-8001 Japan
    +81 81 37 98 91 31
    +81 81 37 98 91 32 (fax)


    OTHER NODES

    Quadrics is working on qualifying Elan4 for use with nodes from a number of other manufacturers. Please contact sales@quadrics.com for further information.


    QUESTIONS AND ANSWERS

    Question: What sort of PCI slot does an Elan adapter require?

    Answer:
    The Elan4 network adapter (QM500) is a Universal, short, 64-bit PCI card that conforms to PCI X specification 1.0. To obtain the maximum communications bandwidth, it should ideally operate in a PCI X slot that supports a clock speed of 133 MHz, a 64-bit wide bus and a 3.3 V DC power supply. If such a slot is unavailable, a 64-bit, 66 MHz PCI, 100 MHz PCI X, 5 V or 3.3 V slot may be used. The card operates from a 3.3 V DC power supply and consumes less than 10 W.
    The QM-400 is a Universal, short, 64-bit PCI card that conforms to PCI specification 2.1. To obtain the maximum communications bandwidth, it should operate in a PCI slot that supports a clock speed of 66 MHz, a 64-bit wide bus and a 3.3 V DC power supply. A PCI-X slot is also recommended. If such a slot is unavailable, a 64-bit, 33 MHz, 5 V or 3.3 V slot may be used. The card operates from a 3.3 V DC power supply and consumes less than 15 W.
    The choice of slot to use depends on the type of computer system in which the adapter is being installed. It is important to appreciate that Elan adapters make very heavy use of the PCI or PCI X bus. Inter-node communication performance will be significantly reduced if the PCI or PCI X bus is servicing other devices. In general, you should arrange for the QM500 or QM400 to have exclusive use of one PCI or PCI X bus. If this is not possible then try to ensure that it does not share a PCI or PCI X bus with any other high-speed devices.

    Question: Why do I only get c. 500MBytes/s with an Elan4 card in a 1U Server that uses a PCI riser?

    Answer:
    Many 1U and 2U servers make use of small L-shaped risers so that full height PCI-X cards can be installed horizontally above the motherboard. These risers should normally transfer the full 133Mhz bus to the PCI-X card. Unfortunately we have come across several risers that by grounding one of the edge pins bring the bus down to 66Mhz. This halves the potential Elan4 performance.


    OTHER PCI ISSUES

    PCI 2.2 As part of being PCI-X complaint the card may also be used in PCI 2.2 buses. This would not allow the full performance of Elan4, but should offer significantly higher performance than Elan3. Elan4 has much larger read buffers than Elan3 and will generate much longer read bursts. This will significantly reduce the effect of a large initial read latency on the total read bandwidth. Elan4 will attempt to generate read bursts of up to 4KBytes in size.

    Physical Address Bits Elan4 will generate full 64 bit physical addresses were bits 63:50 can have up to 4 different values or combination of patterns and bits 49:0 is generated directly from the output of the Elan4s MMU. In practice most PCI bridge chips use the lower address bits to access main memory and use the high addresses bit to access different modes for accessing main memory. Elan4 can directly address up to 50-bits of physical main memory per node. Elan4 also supports a 64-bit Physical MSI address.


    QM500 BUS REQUIREMENTS

    This section provides details on requirements of the Elan4 ASIC. See the Elan Reference Manual for further details.

    Configurations space issues

    The Elan 4 has two BARS. The standard QM500 card will put the Elan4 into a mode requiring space for two 64-bit PCI BARs. One BAR requests a 26-bit or 64MByte chunk of PCI addresses space. This is used to implement an array of command port queues. It is also used to access all the internal Elan4 system control registers and give the device driver direct access to all the internal register files.
    The other BAR requests a 28-bit or 256MByte chunk of PCI addresses space. This is used to provide direct access to the Elan4 local memory. This local memory is implemented with 64 or 128MBytes of DDR-SDRAM running at 166MHz. This SDRAM is internally cached in the Elan4 in 32 KBytes of 4-way set associative memory.
    It is possible that future version of the QM500 card might provide a lot more local DDR-SDRAM. In this case the Elan4 would be set into a mode where the second BAR would request a 31-bit or 2GByte segment of PCI address space.

    PCI-X performance issues

    DMA read

    The Elan4 will issue up to 4 aligned split read request of up to 512 bytes each. The DMA engine can take the split completion data for each of the requests in any order but it will only make good progress if the data is returned in the same order the requests are issued.
    A large distributed memory system may not return data to the PCI Bridge in the same order as the data was requested. This is especially true when if the data lives in a dirty processor cache. For maximum performance the PCI Bridge probably should start to deliver data as soon as some becomes available even if this is not the first data requested. If split completion data becomes ready for more than one of the requests then highest performance is achieved if the PCI Bridge starts to deliver the data that was requested first. If the bridge is in the process of delivering data and data becomes ready for an earlier split request then it would be better if the current split completion burst were terminated as soon as possible to allow the earlier data to be delivered instead. The terminated burst could then be completed later. This is not necessary but will improve the total system performance.

    Small reads
    As well as large DMA reads the Elan4 is optimized to generate high throughput on small reads to main memory. In addition to the 4 split reads of up to 512 bytes the Elan4 may request up to a further 8 split read request of up to 64 bytes each. In practice it is more likely to request up to 8 split read requests of 8 bytes each. This is useful if random small remote reads are required by an application.

    DMA write
    Elan4 has a control bit that effects the way writes are generated. When this bit is clear the maximum write block size is 128 bytes and the Elan4 is fully compliant with the PCI-X specification. If this bit is set the Elan4 will generate write block requests of the full 4KBytes. However the Elan4 may disconnect at a 128-byte ADB boundary and then never restart the write block transfer to complete the rest of the 4KByte transfer. This is a violation of a strict interpretation of the PCI-X specification however in practice it is likely that most PCI bridge chips will perform write buffer allocation 'on the fly' and will not lock down data buffers after a write has disconnected.
    The Elan4 will never issue a write with a byte count smaller that the transfer it will attempt to deliver. It will never try to disconnect a write, where the byte count is greater than the intended transfer size, at anything other than a 128-byte ADB boundary.
    This has been added because the Elan4 is able to stream data directly from the Elan4 network to the PCI-X bus and at the time the PCI-X write-burst is started the total size of the transfer is unknown. Longer write burst can significantly increase the maximum write bandwidth and, using this technique, not significantly increase the latency of the write transfer.

    PIO read
    The Elan4 supports up to 2 split read requests. It is capable of returning up to 4k bytes of split completion read data at the full bus bandwidth without a disconnection however PIO read performance is not a critical operation of the system as most processors are not capable of issuing large PIO read requests. If a large transfer of data is required from the Elan4 to main memory it is far more efficient to program the Elan4 to perform a DMA write burst.
    PIO Write Low latency, high bandwidth, PIO writes are important for good performance in an Elan4 system. The Elan4 is capable of accepting a very long uninterrupted PIO write block transfer to either Elan4 local memory or as a command sequence for directly generating network packets. The lowest transfer latencies are achieved through the use of PIO writes. The lower the latency from a processor store instruction to the data appearing of the PCI-X bus the better.

    > Legal