Advertisemen
10 Gigabit Ethernet technology has arrived; it is real, and has the potential to change real-time and embedded systems more dramatically than prior generations of Ethernet technology.
Ethernet continues its never-ending march to higher and higher levels of performance and capability. 10 Gigabit Ethernet (10GbE) is its next step forward and is sure to make a bigger impact on real-time and embedded systems than any prior advancement. 10GbE holds the promise to provide an order of magnitude increase in performance, maintain compatibility with its prior variants, and to have the potential to displace the other more specialized data network fabrics.
However, today’s embedded processors are unable to keep up with the protocol overhead associated with even 1GbE pipes, each of which is capable of supporting 250 Mbytes/s of throughput on a sustained basis. Increasing those pipes to 10GbE, a ten-fold increase in capacity, makes a difficult problem an impossible one. A complete offload of the Ethernet protocol stack to silicon (silicon stack technology) will allow the promise of 10GbE technology to be realized.
Upgrading to 10GbE NICs and 10GbE switches offers 10 times the performance for an array of bandwidth- and latency-constrained applications. Increasing bandwidth by 10 times while reducing latency by 90 percent certainly takes Ethernet to a whole new level of performance and will certainly compel architects of real-time and embedded systems to consider (or reconsider) Ethernet for even the highest performance applications. Add to this the low cost advantages due to its eventual commoditization and out-of-the-box interoperability with prior generations, and you have the “promise” of 10GbE fairly well summarized. On the surface, 10GbE technology is quite compelling.
The Challenge of 10 Gigabit Ethernet
In reality, deploying 10GbE and realizing these benefits will not be easy. It wasn’t too long ago when 1GbE held a similar promise only to struggle in delivering on that promise, particularly with regard to real-time and embedded systems where performance requirements proved difficult to achieve.
The basic problem with Ethernet has nothing to do with the Ethernet technology itself; the switches and NICs are very capable and reliable. The problem is the software-intensive nature of the TCP/IP protocol stack—the software stack. The software stack is host-processor-intensive and thereby limits the throughput that can be achieved. The throughput capacity (potential I/O bandwidth) of Ethernet NIC technology (1 Mbit/s 10 Mbit/s 100 Mbit/s 1GbE/and now 10GbE) has been growing faster than CPU technology’s ability to process the protocols associated with the data stream.
With each advance in Ethernet technology, state-of-the-art CPU technology falls farther behind. This problem is even worse for embedded systems, which are typically much more constrained with respect to power consumption or thermal dissipation and therefore are less able to simply toss more CPU cycles at the problem—as can be done in high-end server class processing systems.
To illustrate the protocol processing crisis, consider a conventional 1GbE NIC. The TCP/IP protocol stack consumes roughly 10 CPU cycles for each and every byte of data coming into or out of the NIC. Or, viewed from a different perspective, every 1GHz of a CPU can process about 100 Mbytes of Ethernet I/O. Therefore, it would require 100 percent of a 2.5 GHz processor to achieve wire-speed throughput of a 1GbE port (full duplex, 125 Mbytes/sec of payload in each direction). And that is only a single port of 1GbE—a dual port doubles this problem.
But this is a theoretical example; in practice, it is not realistic to allocate 100 percent of any CPU to the processing of Ethernet traffic. A reasonable allocation depends on the application. Real-time or CPU-intensive applications like signal processing might allow only five percent; less intensive applications might allow up to 20 percent of the CPU to be dedicated to managing the Ethernet interface and implementing the TCP/IP stack.
A 10 percent allocation of a 2 GHz embedded processor to TCP/IP processing would limit that processor to 20 Mbytes/s of Ethernet I/O, which is only eight percent of potential 250 Mbytes/s bandwidth of a vanilla 1GbE NIC.
So using today’s CPU technology, a standard embedded CPU can realistically utilize only eight percent of the I/O bandwidth of its built-in 1GbE NIC. At some point in the future, perhaps 5 or 10 years from now, more powerful embedded CPUs will be able to make full use of that 1GbE interface. But by that time, 100GbE NICs will be available offering 100 times more bandwidth than the processors can keep up with, thus making the CPU loading problem even worse than it is today.
The Arrival of 10 Gigabit Ethernet
10GbE technology has arrived and is following the familiar adoption of Ethernet’s prior generations. Table 1 shows the impact that can be expected by upgrading from 1GbE to 10GbE technology in a typical embedded system. Unfortunately, without fully offloading the TCP/IP stack processing, 10GbE technology will provide little or no benefit to real-time and embedded systems. After all, if embedded processors today can only make effective use of eight percent of a “conventional” 1GbE NIC, then those very same processors would only be able to utilize 0.8 percent of a “conventional” 10GbE NIC.
The CPU overhead required to support the conventional software implementation of the protocol stack is what limits the utilization of a conventional Ethernet NIC today. Increasing the size of the Ethernet pipe by ten-fold will not increase the performance of that interface unless something is done to address the limiting factor, which is the software implementation of the TCP/IP protocol stack—the software stack.
The Solution – Silicon Stack Technology
Figure 1 illustrates the use of a hardware (specifically silicon) implementation of the protocol stack. Here the conventional TCP/IP protocol stack processing is moved from the operating system of the host processor (software) to hardware (silicon).
The overhead on the host CPU is substantially reduced because the host processor is no longer required to expend the 10 CPU cycles processing each byte of information; it needs only to specify that a message be sent, or be notified when a new message arrives. All protocol processing, buffer management, data movement and transaction management is done by the silicon stack hardware.
Moving the host responsibilities away from the conventional byte-level software processing to transaction-level processing allows Ethernet to achieve a level of efficiency and performance that is typically only associated with the more exotic network fabrics such as InfiniBand, Fibre Channel and Serial RapidIO. Silicon stack technology enables a processing system to actually make full use of 10GbE technology.
There are many benefits of the silicon stack approach. Wire-speed throughput is achieved because the silicon implementation is designed to handle full rate I/O without the potential of being overwhelmed by the data; latency is reduced to a fraction; reliability when under heavy load is improved substantially since the likelihood of losing packets due to overwhelmed software is eliminated; and determinism is improved since the need for retransmission (which often occurs in software-based stacks under high load conditions) is greatly reduced.
The Cost of Performance
In designing a system to handle a large amount of Ethernet traffic, one must consider the various approaches to solving that problem. Depending on system requirements, it may be more cost-effective to add processors; but often it is more cost-effective to add specialized offload hardware.
Ethernet NICs with silicon stack technology can be selectively used on processor nodes that need the unique performance that it offers, while conventional Ethernet interfaces can be used everywhere else. This allows designers to minimize the overall system cost. In contrast, a specialized network fabric (like InfiniBand) would require all nodes to incorporate the additional hardware.
Many embedded systems are thermally constrained. Low-power CPUs are often desired, and as a result, fewer cycles are available for Ethernet processing. Here, offload technology has even a greater payback since it can allow the designer to minimize the “thermal cost” of the system.
Table 2 provides an analysis of the dollar and thermal costs of I/O bandwidth for various 1GbE and 10GbE systems. Cost is computed in terms of dollars per Ethernet bandwidth (dollars per Mbyte/s) and also watts per Ethernet bandwidth (watts per Mbyte/s).
As shown in Table 2, the payback of the silicon stack offload is greater for 10GbE interfaces than it is for 1GbE interfaces. 1GbE offload reduces costs from roughly $150 per Mbyte/s to roughly $20 per Mbyte/s, and 10GbE offload takes costs down to roughly $6 per Mbyte/s. The table also shows a similar thermal cost reduction.
Network bandwidth is growing at a faster rate than the ability of CPUs to process the increased data. Network offload technology is quickly moving from a “nice to have” to a “must have” feature—particularly for data-intensive server applications. Moving the TCP/IP stack from software to silicon dramatically improves the performance and reliability of the Ethernet connection – taking Ethernet to the same performance realm as the specialized network technologies such as InfiniBand, Serial RapidIO and Fibre Channel.
Full silicon offload of the TCP/IP stack is useful for certain 1GbE applications but an absolute necessity for all 10GbE applications. Software stack implementations will not deliver the high throughput, reliable data transfer and low latency that 10GbE offers. Finally, silicon offload is much more cost-effective and thermal efficient than tossing additional processors at the I/O bandwidth problem. While 10GbE holds the promise for greater performance and compatibility, embedded systems architects must understand how to overcome its inherent challenges in order to fulfill this potential and achieve the most effective use of the technology.
Ethernet continues its never-ending march to higher and higher levels of performance and capability. 10 Gigabit Ethernet (10GbE) is its next step forward and is sure to make a bigger impact on real-time and embedded systems than any prior advancement. 10GbE holds the promise to provide an order of magnitude increase in performance, maintain compatibility with its prior variants, and to have the potential to displace the other more specialized data network fabrics.
However, today’s embedded processors are unable to keep up with the protocol overhead associated with even 1GbE pipes, each of which is capable of supporting 250 Mbytes/s of throughput on a sustained basis. Increasing those pipes to 10GbE, a ten-fold increase in capacity, makes a difficult problem an impossible one. A complete offload of the Ethernet protocol stack to silicon (silicon stack technology) will allow the promise of 10GbE technology to be realized.
Upgrading to 10GbE NICs and 10GbE switches offers 10 times the performance for an array of bandwidth- and latency-constrained applications. Increasing bandwidth by 10 times while reducing latency by 90 percent certainly takes Ethernet to a whole new level of performance and will certainly compel architects of real-time and embedded systems to consider (or reconsider) Ethernet for even the highest performance applications. Add to this the low cost advantages due to its eventual commoditization and out-of-the-box interoperability with prior generations, and you have the “promise” of 10GbE fairly well summarized. On the surface, 10GbE technology is quite compelling.
The Challenge of 10 Gigabit Ethernet
In reality, deploying 10GbE and realizing these benefits will not be easy. It wasn’t too long ago when 1GbE held a similar promise only to struggle in delivering on that promise, particularly with regard to real-time and embedded systems where performance requirements proved difficult to achieve.
The basic problem with Ethernet has nothing to do with the Ethernet technology itself; the switches and NICs are very capable and reliable. The problem is the software-intensive nature of the TCP/IP protocol stack—the software stack. The software stack is host-processor-intensive and thereby limits the throughput that can be achieved. The throughput capacity (potential I/O bandwidth) of Ethernet NIC technology (1 Mbit/s 10 Mbit/s 100 Mbit/s 1GbE/and now 10GbE) has been growing faster than CPU technology’s ability to process the protocols associated with the data stream.
With each advance in Ethernet technology, state-of-the-art CPU technology falls farther behind. This problem is even worse for embedded systems, which are typically much more constrained with respect to power consumption or thermal dissipation and therefore are less able to simply toss more CPU cycles at the problem—as can be done in high-end server class processing systems.
To illustrate the protocol processing crisis, consider a conventional 1GbE NIC. The TCP/IP protocol stack consumes roughly 10 CPU cycles for each and every byte of data coming into or out of the NIC. Or, viewed from a different perspective, every 1GHz of a CPU can process about 100 Mbytes of Ethernet I/O. Therefore, it would require 100 percent of a 2.5 GHz processor to achieve wire-speed throughput of a 1GbE port (full duplex, 125 Mbytes/sec of payload in each direction). And that is only a single port of 1GbE—a dual port doubles this problem.
But this is a theoretical example; in practice, it is not realistic to allocate 100 percent of any CPU to the processing of Ethernet traffic. A reasonable allocation depends on the application. Real-time or CPU-intensive applications like signal processing might allow only five percent; less intensive applications might allow up to 20 percent of the CPU to be dedicated to managing the Ethernet interface and implementing the TCP/IP stack.
A 10 percent allocation of a 2 GHz embedded processor to TCP/IP processing would limit that processor to 20 Mbytes/s of Ethernet I/O, which is only eight percent of potential 250 Mbytes/s bandwidth of a vanilla 1GbE NIC.
So using today’s CPU technology, a standard embedded CPU can realistically utilize only eight percent of the I/O bandwidth of its built-in 1GbE NIC. At some point in the future, perhaps 5 or 10 years from now, more powerful embedded CPUs will be able to make full use of that 1GbE interface. But by that time, 100GbE NICs will be available offering 100 times more bandwidth than the processors can keep up with, thus making the CPU loading problem even worse than it is today.
The Arrival of 10 Gigabit Ethernet
10GbE technology has arrived and is following the familiar adoption of Ethernet’s prior generations. Table 1 shows the impact that can be expected by upgrading from 1GbE to 10GbE technology in a typical embedded system. Unfortunately, without fully offloading the TCP/IP stack processing, 10GbE technology will provide little or no benefit to real-time and embedded systems. After all, if embedded processors today can only make effective use of eight percent of a “conventional” 1GbE NIC, then those very same processors would only be able to utilize 0.8 percent of a “conventional” 10GbE NIC.
The CPU overhead required to support the conventional software implementation of the protocol stack is what limits the utilization of a conventional Ethernet NIC today. Increasing the size of the Ethernet pipe by ten-fold will not increase the performance of that interface unless something is done to address the limiting factor, which is the software implementation of the TCP/IP protocol stack—the software stack.
The Solution – Silicon Stack Technology
Figure 1 illustrates the use of a hardware (specifically silicon) implementation of the protocol stack. Here the conventional TCP/IP protocol stack processing is moved from the operating system of the host processor (software) to hardware (silicon).
The overhead on the host CPU is substantially reduced because the host processor is no longer required to expend the 10 CPU cycles processing each byte of information; it needs only to specify that a message be sent, or be notified when a new message arrives. All protocol processing, buffer management, data movement and transaction management is done by the silicon stack hardware.
Moving the host responsibilities away from the conventional byte-level software processing to transaction-level processing allows Ethernet to achieve a level of efficiency and performance that is typically only associated with the more exotic network fabrics such as InfiniBand, Fibre Channel and Serial RapidIO. Silicon stack technology enables a processing system to actually make full use of 10GbE technology.
There are many benefits of the silicon stack approach. Wire-speed throughput is achieved because the silicon implementation is designed to handle full rate I/O without the potential of being overwhelmed by the data; latency is reduced to a fraction; reliability when under heavy load is improved substantially since the likelihood of losing packets due to overwhelmed software is eliminated; and determinism is improved since the need for retransmission (which often occurs in software-based stacks under high load conditions) is greatly reduced.
The Cost of Performance
In designing a system to handle a large amount of Ethernet traffic, one must consider the various approaches to solving that problem. Depending on system requirements, it may be more cost-effective to add processors; but often it is more cost-effective to add specialized offload hardware.
Ethernet NICs with silicon stack technology can be selectively used on processor nodes that need the unique performance that it offers, while conventional Ethernet interfaces can be used everywhere else. This allows designers to minimize the overall system cost. In contrast, a specialized network fabric (like InfiniBand) would require all nodes to incorporate the additional hardware.
Many embedded systems are thermally constrained. Low-power CPUs are often desired, and as a result, fewer cycles are available for Ethernet processing. Here, offload technology has even a greater payback since it can allow the designer to minimize the “thermal cost” of the system.
Table 2 provides an analysis of the dollar and thermal costs of I/O bandwidth for various 1GbE and 10GbE systems. Cost is computed in terms of dollars per Ethernet bandwidth (dollars per Mbyte/s) and also watts per Ethernet bandwidth (watts per Mbyte/s).
As shown in Table 2, the payback of the silicon stack offload is greater for 10GbE interfaces than it is for 1GbE interfaces. 1GbE offload reduces costs from roughly $150 per Mbyte/s to roughly $20 per Mbyte/s, and 10GbE offload takes costs down to roughly $6 per Mbyte/s. The table also shows a similar thermal cost reduction.
Network bandwidth is growing at a faster rate than the ability of CPUs to process the increased data. Network offload technology is quickly moving from a “nice to have” to a “must have” feature—particularly for data-intensive server applications. Moving the TCP/IP stack from software to silicon dramatically improves the performance and reliability of the Ethernet connection – taking Ethernet to the same performance realm as the specialized network technologies such as InfiniBand, Serial RapidIO and Fibre Channel.
Full silicon offload of the TCP/IP stack is useful for certain 1GbE applications but an absolute necessity for all 10GbE applications. Software stack implementations will not deliver the high throughput, reliable data transfer and low latency that 10GbE offers. Finally, silicon offload is much more cost-effective and thermal efficient than tossing additional processors at the I/O bandwidth problem. While 10GbE holds the promise for greater performance and compatibility, embedded systems architects must understand how to overcome its inherent challenges in order to fulfill this potential and achieve the most effective use of the technology.
Advertisemen