DAFS And NISv4 Could Become Semantically Identical

Computer Technology Review, Jan, 2001 by Steve Kleiman, Mark Wittle

This is the third in a series of columns authored by members of the DAFS Collaborative, an industry group formed to create a protocol specification for direct, memory-to-memory data networking.

The Direct Access File System (DAFS) is a file access protocol, based on NFSv4 that is being designed to take advantage of new standard memory-to-memory interconnect technologies such as VI and InfiniBand in high-performance data center environments.

DAFS enables an application to directly access transport resources, and transfer data from its application buffers to the network, bypassing the operating system while still preserving file semantics. This translates into high-performance file I/O, significantly improved CPU utilization, and greatly reduced system overhead due to fewer data copies, context switches and interrupts, and far less network protocol processing.

DAFS also provides data integrity and availability features such as consistent high speed locking, graceful fail-over of clients and servers, fencing, and enhanced data recovery, to meet the needs of 7x24 machine room environments where clusters of application servers need low-latency access to shared file storage. This article examines the relationship between DAFS and NFS-v4.

Starting Point For DAFS

NFSv4 was the logical starting point for DAFS because it is an open standard, has good failure recovery characteristics, ACL support, high-performance locking, and accommodates both Unix- and Windows-style access mechanisms.

However, NFS evolved with an increasingly wide-area focus and lacks some of the semantics required in our target "local sharing" environment. Another problem was that NFS is designed with synchronous procedure calls in mind, whereas we wanted to take advantage of the asynchronous nature of VI to allow fully pipelined behavior that is critical in high-throughput environments.

To understand the performance potential of VI, we performed experiments using NFS over a sockets-like protocol on top of VI. Given that VI can reduce network processing overhead by orders of magnitude, the results were disappointing.

The multiple data buffering mechanisms throughout the protocol stack are independent and not easily coordinated. These additional protocol layers were providing little value, while preventing access to useful VI features. Tuning did not help much so we needed a new approach.

We then prototyped a (DAFS) "native" VI data transfer (i.e., no RPC or sockets layer) and saw a dramatic improvement (25 [micro]sec/CPU op-three-and-a-half times better than NFS over sockets on VI (Table I).

Our approach, therefore, has been for DAFS to closely follow NFS-v4 semantics but make fairly radical changes in how requests are represented and exchanged to improve performance using memory-to-memory networking technologies.

The Transport Mapping

NFS tolerates low bandwidth, high latency, connectionless, unreliable networks at the cost of significant complexity. The RPC layer provides synchronous procedure call semantics (Table 2).

DAFS, in contrast, runs over high-bandwidth, low-latency, reliable, connectionoriented VI transports. DAFS communication has more in common with high-performance messaging systems. Asynchronous requests are the primary mode of access, preserving the asynchronous nature of VI requests and allowing high performance and low overhead without extra threads or processes. In highperformance applications asynchronous communication is actually simpler tha;n RPC.

In addition, the :DAFS protocol is adapted to memory-to-memory networking at a fundamental level, providing, for example, both direct and inline variants of read and write operations. These allow low-latency for small operations and efficient data transfer for large operations.

DAFS Performance Improvements

DAFS performance improvements to leverage VI include: use of RDMA, chaining, client-specified byte order, read ahead, and batch writes.

DAFS supports RDMA (remote direct memory access), for requests that might have large data areas, to avoid bulk data copying on those requests, and to limit the size of the control portion of requests and responses. This allows simple efficient message handling implementations on the client and server.

DAFS chaining provides the benefits of NFSv4 COMPOUND in a low-latency environment. Rather than combining multiple operations into a single request, each chained operation retains it own identity within a series of requests. This takes advantage of the low latency network and minimizes buffer space requirements, yet preserves the performance and semantic benefits of pipelining.

In a traditional network stack, XDR performs byteswapping to a standard network byte order. For memoryto-memory networking, a better model is to allow the client to send simple, fixed layout data structures laid out for direct access by languages like C or C . Whenever the client and server are of the same byte-order, no conversion is necessary. When they differ, the client uses its natural order, and the server performs the conversion. This avoids the complexity of serialized message encoding and decoding for both the client and server.

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
  • Click Here
advertisement
Click Here

Content provided in partnership with Thompson Gale