PCI Express provides a high-speed, high-performance, point-to-point, dual simplex, differential signaling Link for interconnecting devices. Data is transmitted from a device on one set of signals, and received on another set of signals.
The Link - A Point-to-Point Interconnect
As shown in Figure 1-20, a PCI Express interconnect consists of either a x1, x2, x4, x8, x12, x16 or x32 point-to-point Link. A PCI Express Link is the physical connection between two devices. A Lane consists of signal pairs in each direction. A x1 Link consists of 1 Lane or 1 differential signal pair in each direction for a total of 4 signals. A x32 Link consists of 32 Lanes or 32 signal pairs for each direction for a total of 128 signals. The Link supports a symmetric number of Lanes in each direction. During hardware initialization, the Link is initialized for Link width and frequency of operation automatically by the devices on opposite ends of the Link. No OS or firmware is involved during Link level initialization.
Figure 1-20. PCI Express Link
Differential Signaling
PCI Express devices employ differential drivers and receivers at each port. Figure 1-21 shows the electrical characteristics of a PCI Express signal. A positive voltage difference between the D+ and D- terminals implies Logical 1. A negative voltage difference between D+ and D- implies a Logical 0. No voltage difference between D+ and D- means that the driver is in the high-impedance tristate condition, which is referred to as the electrical-idle and low-power state of the Link.
Figure 1-21. PCI Express Differential Signal
The PCI Express Differential Peak-to-Peak signal voltage at the transmitter ranges from 800 mV - 1200 mV, while the differential peak voltage is one-half these values. The common mode voltage can be any voltage between 0 V and 3.6 V. The differential driver is DC isolated from the differential receiver at the opposite end of the Link by placing a capacitor at the driver side of the Link. Two devices at opposite ends of a Link may support different DC common mode voltages. The differential impedance at the receiver is matched with the board impedance to prevent reflections from occurring.
Switches Used to Interconnect Multiple Devices
Switches are implemented in systems requiring multiple devices to be interconnected. Switches can range from a 2-port device to an n-port device, where each port connects to a PCI Express Link. The specification does not indicate a maximum number of ports a switch can implement. A switch may be incorporated into a Root Complex device (Host bridge or North bridge equivalent), resulting in a multi-port root complex. Figure 1-23 on page 52 and Figure 1-25 on page 54 are examples of PCI Express systems showing multi-ported devices such as the root complex or switches.
Figure 1-23. Low Cost PCI Express System
Figure 1-25. PCI Express High-End Server System
Packet Based Protocol
Rather than bus cycles we are familiar with from PCI and PCI-X architectures, PCI Express encodes transactions using a packet based protocol. Packets are transmitted and received serially and byte striped across the available Lanes of the Link. The more Lanes implemented on a Link the faster a packet is transmitted and the greater the bandwidth of the Link. The packets are used to support the split transaction protocol for non-posted transactions. Various types of packets such as memory read and write requests, IO read and write requests, configuration read and write requests, message requests and completions are defined.
Bandwidth and Clocking
As is apparent from Table 1-3 on page 14, the aggregate bandwidth achievable with PCI Express is significantly higher than any bus available today. The PCI Express 1.0 specification supports 2.5 Gbits/sec/lane/direction transfer rate.
No clock signal exists on the Link. Each packet to be transmitted over the Link consists of bytes of information. Each byte is encoded into a 10-bit symbol. All symbols are guaranteed to have one-zero transitions. The receiver uses a PLL to recover a clock from the 0-to-1 and 1-to-0 transitions of the incoming bit stream.
Address Space
PCI Express supports the same address spaces as PCI: memory, IO and configuration address spaces. In addition, the maximum configuration address space per device function is extended from 256 Bytes to 4 KBytes. New OS, drivers and applications are required to take advantage of this additional configuration address space. Also, a new messaging transaction and address space provides messaging capability between devices. Some messages are PCI Express standard messages used for error reporting, interrupt and power management messaging. Other messages are vendor defined messages.
PCI Express Transactions
PCI Express supports the same transaction types supported by PCI and PCI-X. These include memory read and memory write, I/O read and I/O write, configuration read and configuration write. In addition, PCI Express supports a new transaction type called Message transactions. These transactions are encoded using the packet-based PCI Express protocol described later.
PCI Express Transaction Model
PCI Express transactions can be divided into two categories. Those transactions that are non-posted and those that are posted. Non-posted transactions, such as memory reads, implement a split transaction communication model similar to the PCI-X split transaction protocol. For example, a requester device transmits a non-posted type memory read request packet to a completer. The completer returns a completion packet with the read data to the requester. Posted transactions, such as memory writes, consist of a memory write packet transmitted uni-directionally from requester to completer with no completion packet returned from completer to requester.
Error Handling and Robustness of Data Transfer
CRC fields are embedded within each packet transmitted. One of the CRC fields supports a Link-level error checking protocol whereby each receiver of a packet checks for Link-level CRC errors. Packets transmitted over the Link in error are recognized with a CRC error at the receiver. The transmitter of the packet is notified of the error by the receiver. The transmitter automatically retries sending the packet (with no software involvement), hopefully resulting in auto-correction of the error.
In addition, an optional CRC field within a packet allows for end-to-end data integrity checking required for high availability applications.
Error handling on PCI Express can be as rudimentary as PCI level error handling described earlier or can be robust enough for server-level requirements. A rich set of error logging registers and error reporting mechanisms provide for improved fault isolation and recovery solutions required by RAS (Reliable, Available, Serviceable) applications.
Quality of Service (QoS), Traffic Classes (TCs) and Virtual Channels (VCs)
The Quality of Service feature of PCI Express refers to the capability of routing packets from different applications through the fabric with differentiated priorities and deterministic latencies and bandwidth. For example, it may be desirable to ensure that Isochronous applications, such as video data packets, move through the fabric with higher priority and guaranteed bandwidth, while control data packets may not have specific bandwidth or latency requirements.
PCI Express packets contain a Traffic Class (TC) number between 0 and 7 that is assigned by the device's application or device driver. Packets with different TCs can move through the fabric with different priority, resulting in varying performances. These packets are routed through the fabric by utilizing virtual channel (VC) buffers implemented in switches, endpoints and root complex devices.
Each Traffic Class is individually mapped to a Virtual Channel (a VC can have several TCs mapped to it, but a TC cannot be mapped to multiple VCs). The TC in each packet is used by the transmitting and receiving ports to determine which VC buffer to drop the packet into. Switches and devices are configured to arbitrate and prioritize between packets from different VCs before forwarding. This arbitration is referred to as VC arbitration. In addition, packets arriving at different ingress ports are forwarded to their own VC buffers at the egress port. These transactions are prioritized based on the ingress port number when being merged into a common VC output buffer for delivery across the egress link. This arbitration is referred to as Port arbitration.
The result is that packets with different TC numbers could observe different performance when routed through the PCI Express fabric.
Flow Control
A packet transmitted by a device is received into a VC buffer in the receiver at the opposite end of the Link. The receiver periodically updates the transmitter with information regarding the amount of buffer space it has available. The transmitter device will only transmit a packet to the receiver if it knows that the receiving device has sufficient buffer space to hold the next transaction. The protocol by which the transmitter ensures that the receiving buffer has sufficient space available is referred to as flow control. The flow control mechanism guarantees that a transmitted packet will be accepted by the receiver, baring error conditions. As such, the PCI Express transaction protocol does not require support of packet retry (unless an error condition is detected in the receiver), thereby improving the efficiency with which packets are forwarded to a receiver via the Link.
MSI Style Interrupt Handling Similar to PCI-X
Interrupt handling is accomplished in-band via PCI-X-like MSI protocol. PCI Express device use a memory write packet to transmit an interrupt vector to the root complex host bridge device, which in-turn interrupts the CPU. PCI Express devices are required to implement the MSI capability register block. PCI Express also supports legacy interrupt handling in-band by encoding interrupt signal transitions (for INTA#, INTB#, INTC# and INTD#) using Message transactions. Only endpoint devices that must support legacy functions and PCI Express-to-PCI bridges are allowed to support legacy interrupt generation.
Power Management
The PCI Express fabric consumes less power because the interconnect consists of fewer signals that have smaller signal swings. Each device's power state is individually managed. PCI/PCI Express power management software determines the power management capability of each device and manages it individually in a manner similar to PCI. Devices can notify software of their current power state, as well as power management software can propagate a wake-up event through the fabric to power-up a device or group of devices. Devices can also signal a wake-up event using an in-band mechanism or a side-band signal.
With no software involvement, devices place a Link into a power savings state after a time-out when they recognize that there are no packets to transmit over the Link. This capability is referred to as Active State power management.
PCI Express supports device power states: D0, D1, D2, D3-Hot and D3-Cold, where D0 is the full-on power state and D3-Cold is the lowest power state.
PCI Express also supports the following Link power states: L0, L0s, L1, L2 and L3, where L0 is the full-on Link state and L3 is the Link-Off power state.
Hot Plug Support
PCI Express supports hot plug and surprise hot unplug without usage of sideband signals. Hot plug interrupt messages, communicated in-band to the root complex, trigger hot plug software to detect a hot plug or removal event. Rather than implementing a centralized hot plug controller as exists in PCI platforms, the hot plug controller function is distributed to the port logic associated with a hot plug capable port of a switch or root complex. 2 colored LEDs, a Manually-operated Retention Latch (MRL), MRL sensor, attention button, power control signal and PRSNT2# signal are some of the elements of a hot plug capable port.
PCI Compatible Software Model
PCI Express employs the same programming model as PCI and PCI-X systems described earlier in this chapter. The memory and IO address space remains the same as PCI/PCI-X. The first 256 Bytes of configuration space per PCI Express function is the same as PCI/PCI-X device configuration address space, thus ensuring that current OSs and device drivers will run on a PCI Express system. PCI Express architecture extends the configuration address space to 4 KB per functional device. Updated OSs and device drivers are required to take advantage and access this additional configuration address space.
PCI Express configuration model supports two mechanisms:
PCI compatible configuration model which is 100% compatible with existing OSs and bus enumeration and configuration software for PCI/PCI-X systems.
PCI Express enhanced configuration mechanism which provides access to additional configuration space beyond the first 256 Bytes and up to 4 KBytes per function.
Mechanical Form Factors
PCI Express architecture supports multiple platform interconnects such as chip-to-chip, board-to-peripheral card via PCI-like connectors and Mini PCI Express form factors for the mobile market. Specifications for these are fully defined. See "Add-in Cards and Connectors" on page 685 for details on PCI Express peripheral card and connector definition.
PCI-like Peripheral Card and Connector
Currently, x1, x4, x8 and x16 PCI-like connectors are defined along with associated peripheral cards. Desktop computers implementing PCI Express can have the same look and feel as current computers with no changes required to existing system form factors. PCI Express motherboards can have an ATX-like motherboard form factor.
Mini PCI Express Form Factor
Mini PCI Express connector and add-in card implements a subset of signals that exist on a standard PCI Express connector and add-in card. The form factor, as the name implies, is much smaller. This form factor targets the mobile computing market. The Mini PCI Express slot supports x1 PCI Express signals including power management signals. In addition, the slot supports LED control signals, a USB interface and an SMBus interface. The Mini PCI Express module is similar but smaller than a PC Card.
Mechanical Form Factors Pending Release
As of May 2003, specifications for two new form factors have not been released. Below is a summary of publicly available information about these form factors.
NEWCARD Form Factor
Another new module form factor that will service both mobile and desktop markets is the NEWCARD form factor. This is a PCMCIA PC card type form factor, but of nearly half the size that will support x1 PCI Express signals including power management signals. In addition, the slot supports USB and SMBus interfaces. There are two size form factors defined, a narrower version and a wider version though the thickness and depth remain the same. Although similar in appearance to Mini PCI Express Module, this is a different form factor.
Server IO Module (SIOM) Form Factor
These are a family of modules that target the workstation and server market. They are designed with future support of larger PCI Express Lane widths and higher frequency bit rates beyond 2.5 Gbits/s Generation 1 transmission rates. Four form factors are under consideration. The base module with single- and double-width modules. Also, the full height with single- and double-width modules.
PCI Express Topology
Major components in the PCI Express system shown in Figure 1-22 include a root complex, switches, and endpoint devices.
Figure 1-22. PCI Express Topology
The Root Complex denotes the device that connects the CPU and memory subsystem to the PCI Express fabric. It may support one or more PCI Express ports. The root complex in this example supports 3 ports. Each port is connected to an endpoint device or a switch which forms a sub-hierarchy. The root complex generates transaction requests on the behalf of the CPU. It is capable of initiating configuration transaction requests on the behalf of the CPU. It generates both memory and IO requests as well as generates locked transaction requests on the behalf of the CPU. The root complex as a completer does not respond to locked requests. Root complex transmits packets out of its ports and receives packets on its ports which it forwards to memory. A multi-port root complex may also route packets from one port to another port but is NOT required by the specification to do so.
Root complex implements central resources such as: hot plug controller, power management controller, interrupt controller, error detection and reporting logic. The root complex initializes with a bus number, device number and function number which are used to form a requester ID or completer ID. The root complex bus, device and function numbers initialize to all 0s.
A Hierarchy is a fabric of all the devices and Links associated with a root complex that are either directly connected to the root complex via its port(s) or indirectly connected via switches and bridges. In Figure 1-22 on page 48, the entire PCI Express fabric associated with the root is one hierarchy.
A Hierarchy Domain is a fabric of devices and Links that are associated with one port of the root complex. For example in Figure 1-22 on page 48, there are 3 hierarchy domains.
Endpoints are devices other than root complex and switches that are requesters or completers of PCI Express transactions. They are peripheral devices such as Ethernet, USB or graphics devices. Endpoints initiate transactions as a requester or respond to transactions as a completer. Two types of endpoints exist, PCI Express endpoints and legacy endpoints. Legacy Endpoints may support IO transactions. They may support locked transaction semantics as a completer but not as a requester. Interrupt capable legacy devices may support legacy style interrupt generation using message requests but must in addition support MSI generation using memory write transactions. Legacy devices are not required to support 64-bit memory addressing capability. PCI Express Endpoints must not support IO or locked transaction semantics and must support MSI style interrupt generation. PCI Express endpoints must support 64-bit memory addressing capability in prefetchable memory address space, though their non-prefetchable memory address space is permitted to map the below 4GByte boundary. Both types of endpoints implement Type 0 PCI configuration headers and respond to configuration transactions as completers. Each endpoint is initialized with a device ID (requester ID or completer ID) which consists of a bus number, device number, and function number. Endpoints are always device 0 on a bus.
Multi-Function Endpoints. Like PCI devices, PCI Express devices may support up to 8 functions per endpoint with at least one function number 0. However, a PCI Express Link supports only one endpoint numbered device 0.
PCI Express-to-PCI(-X) Bridge is a bridge between PCI Express fabric and a PCI or PCI-X hierarchy.
A Requester is a device that originates a transaction in the PCI Express fabric. Root complex and endpoints are requester type devices.
A Completer is a device addressed or targeted by a requester. A requester reads data from a completer or writes data to a completer. Root complex and endpoints are completer type devices.
A Port is the interface between a PCI Express component and the Link. It consists of differential transmitters and receivers. An Upstream Port is a port that points in the direction of the root complex. A Downstream Port is a port that points away from the root complex. An endpoint port is an upstream port. A root complex port(s) is a downstream port. An Ingress Port is a port that receives a packet. An Egress Port is a port that transmits a packet.
A Switch can be thought of as consisting of two or more logical PCI-to-PCI bridges, each bridge associated with a switch port. Each bridge implements configuration header 1 registers. Configuration and enumeration software will detect and initialize each of the header 1 registers at boot time. A 4 port switch shown in Figure 1-22 on page 48 consists of 4 virtual bridges. These bridges are internally connected via a non-defined bus. One port of a switch pointing in the direction of the root complex is an upstream port. All other ports pointing away from the root complex are downstream ports.
A switch forwards packets in a manner similar to PCI bridges using memory, IO or configuration address based routing. Switches must forward all types of transactions from any ingress port to any egress port. Switches forward these packets based on one of three routing mechanisms: address routing, ID routing, or implicit routing. The logical bridges within the switch implement PCI configuration header 1. The configuration header contains memory and IO base and limit address registers as well as primary bus number, secondary bus number and subordinate bus number registers. These registers are used by the switch to aid in packet routing and forwarding.
Switches implement two arbitration mechanisms, port arbitration and VC arbitration, by which they determine the priority with which to forward packets from ingress ports to egress ports. Switches support locked requests.
Enumerating the System
Standard PCI Plug and Play enumeration software can enumerate a PCI Express system. The Links are numbered in a manner similar to the PCI depth first search enumeration algorithm. An example of the bus numbering is shown in Figure 1-22 on page 48. Each PCI Express Link is equivalent to a logical PCI bus. In other words, each Link is assigned a bus number by the bus enumerating software. A PCI Express endpoint is device 0 on a PCI Express Link of a given bus number. Only one device (device 0) exists per PCI Express Link. The internal bus within a switch that connects all the virtual bridges together is also numbered. The first Link associated with the root complex is number bus 1. Bus 0 is an internal virtual bus within the root complex. Buses downstream of a PCI Express-to-PCI(-X) bridge are enumerated the same way as in a PCI(-X) system.
Endpoints and PCI(-X) devices may implement up to 8 functions per device. Only 1 device is supported per PCI Express Link though PCI(-X) buses may theoretically support up to 32 devices per bus. A system could theoretically include up to 256 PCI Express Link and PCI(-X) buses.
PCI Express System Block Diagram
Low Cost PCI Express Chipset
Figure 1-23 on page 52 is a block diagram of a low cost PCI Express based system. As of the writing of this book (April 2003) no real life PCI Express chipset architecture designs were publicly disclosed. The author describes here a practical low cost PCI Express chipset whose architecture is based on existing non-PCI Express chipset architectures. In this solution, AGP which connects MCH to a graphics controller in earlier MCH designs (see Figure 1-14 on page 32) is replaced with a PCI Express Link. The Hub Link that connects MCH to ICH is replaced with a PCI Express Link. And in addition to a PCI bus associated with ICH, the ICH chip supports 4 PCI Express Links. Some of these Links can connect directly to devices on the motherboard and some can be routed to connectors where peripheral cards are installed.
The CPU can communicate with PCI Express devices associated with ICH as well as the PCI Express graphics controller. PCI Express devices can communicate with system memory or the graphics controller associated with MCH. PCI devices may also communicate with PCI Express devices and vice versa. In other words, the chipset supports peer-to-peer packet routing between PCI Express endpoints and PCI devices, memory and graphics. It is yet to be determined if the first generation PCI Express chipsets, will support peer-to-peer packet routing between PCI Express endpoints. Remember that the specification does not require the root complex to support peer-to-peer packet routing between the multiple Links associated with the root complex.
This design does not require the use of switches if the number of PCI Express devices to be connected does not exceed the number of Links available in this design.
Another Low Cost PCI Express Chipset
Figure 1-24 on page 53 is a block diagram of another low cost PCI Express system. In this design, the Hub Link connects the root complex to an ICH device. The ICH device may be an existing design which has no PCI Express Link associated with it. Instead, all PCI Express Links are associated with the root complex. One of these Links connects to a graphics controller. The other Links directly connect to PCI Express endpoints on the motherboard or connect to PCI Express endpoints on peripheral cards inserted in slots.
Figure 1-24. Another Low Cost PCI Express System
High-End Server System
Figure 1-25 shows a more complex system requiring a large number of devices connected together. Multi-port switches are a necessary design feature to accomplish this. To support PCI or PCI-X buses, a PCI Express-to-PCI(-X) bridge is connected to one switch port. PCI Express packets can be routed from any device to any other device because switch support peer-to-peer packet routing (Only multi-port root complex devices are not required to support peer-to-peer functionality).
PCI Express Specifications
As of the writing of this book (May 2003) the following are specifications released by the PCISIG.
PCI Express 1.0a Base Specification released Q2, 2003
PCI Express 1.0a Card Electomechanical Specification released Q2, 2002
PCI Express 1.0 Base Specification released Q2, 2002
PCI Express 1.0 Card Electomechanical Specification released Q2, 2002
Mini PCI Express 1.0 Specification released Q2, 2003
As of May 2003, the specifications pending release are: the PCI Express-to-PCI Bridge specification, Server IO Module specification, Cable specification, Backplane specification, updated Mini PCI Express specification, and NEWCARD specification.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment