TCP Segmentation Offload (TSO) is a mechanism by which the
host stack offloads certain portions of outbound TCP packet processing
to the Network Interface Card (NIC) thereby reducing host CPU utilization.
This functionality can significantly reduce the load on the server
for certain applications which primarily transmit large amounts
of data from the system. Examples include web servicing, NFS, and
file transfer applications.
Transport software support for TCP Segmentation Offload (TSO)
enhancements is included in the TOUR 2.0 release, which became available
for download on software.hp.com in May, 2004. Note that TOUR 2.0
only provides the mechanisms in the transport software needed to support
the TSO-enhanced cards and drivers. Additional software is required
to implement TCP Segmentation Offload. For further details and to
download the feature, please go to http://software.hp.com, and look
under “enhancement releases and patch bundles” for
the enhancement to the igelan driver to support TSO. At the time
of this writing, the TSO enhancement for the igelan driver can be
found at this link: http://software.hp.com/portal/swdepot/displayProductInfo.do?productNumber=GigEtherEnh-01
How It Works
The reduction in CPU utilization is achieved primarily by
allowing the host to transmit large frames (frames larger than the
links Maximum Transmission Unit or MTU) to the NIC which are subsequently
carved up into smaller, MTU-sized frames by the NIC, before transmission on
the wire. Thus instead of processing many small MTU-sized frames
during transmit, the host sends fewer larger VMTU (Virtual MTU)
sized frames thereby increasing the efficiency of the data transfer
in the host. The VMTU is typically much larger than the links MTU;
for example, on a typical Ethernet card, the link MTU is 1500 bytes
while a VMTU could be as large as 64Kbytes. Greater than 50% reduction
in CPU utilization has been observed on some FTP workloads.
 |
 |  |
 |
 | NOTE: Not all applications benefit from the TSO mechanism.
Only data intensive applications which transmit large data buffers
using TCP over IPv4 are improved. Other types of applications will
not significantly benefit from the TSO mechanism. Performance improvements
vary depending upon the platform used. Systems which support hardware
partitioning will notice a decrease in per-card throughput in addition
to the significant reduction in CPU utilization. |
 |
 |  |
 |