Braids and fibers: Language constructs with architectural support for adaptive responses to memory latencies
IBM Journal of Research and Development, Mar-May 2006 by Bacon, D F, Shen, X
As processor speeds continue to increase at a much higher rate than memory speeds, memory latencies may soon approach a thousand processor cycles. As a result, the flat memory model that was made practical by deeply pipelined superscalar processors with multilevel caches will no longer be tenable. The most common approach to this problem is multithreading; however, multithreading requires either abundant independent applications or well-parallelized monolithic applications, and neither is easy to come by. We present high-level programming constructs called braids and fibers. The programming constructs facilitate the creation of programs that are partially ordered, in which the partial orders can be used to support adaptive responses to memory access latencies. Braiding is simpler than parallelizing, while yielding many of the same benefits. We show how the programming constructs can be effectively supported with simple instruction set architecture extensions and microarchitectural enhancements. We have developed braided versions of a number of important algorithms. The braided code is easy to understand at the source level and can be translated into highly efficient instructions using our architecture extensions.
Introduction
High-performance microprocessors have, for many years, been designed to present a flat memory abstraction to the programmer. This greatly simplifies the programming model and works well as long as caches are able to hide memory access latencies. However, this abstraction is beginning to break down because the current trend of memory hierarchy is becoming ever deeper while the relative speed of communications between the processor and memory continues to increase. Latencies for DRAM accesses may soon approach a thousand processor cycles. The term memory wall was coined to refer to the increasing gap between CPU and memory speeds [1].
Although optimization techniques, such as prefetching, have been used successfully to hide significant numbers of memory latencies, they usually work well only for highly predictable programs. Ultimately, as the performance of the memory system becomes more and more nonuniform, such techniques for tolerating memory latencies will no longer function.
If there is a line at a restaurant, patrons are given an estimate of the length of time they will have to wait. Similarly, if callers to a company's customer service line are put on hold, they are updated on how long they will remain there. In these real-world scenarios, it seems obvious that information about the length of a delay should be provided. We argue that what is obvious in the restaurant business should be obvious in the computer business.
In this paper, we present a high-level programming model that enables programs to respond to long memory latencies by performing other work while high-latency memory operations are in progress. The fundamental approach is to divide the program into fibers-sections of sequential code that can be interleaved in a partial order. Although fibers are partially ordered with respect to one another, they execute sequentially. This greatly reduces the semantic complexity of the programming model because the programmer does not have to worry about locking or arbitrary interleaving of parallel threads. Instead, the interleaving occurs at intuitive points that are specified and controlled by the programmer.
A braid is a collection of fibers with the same scope of execution. We present high-level abstractions based on braids and fibers and show an exemplary histogram computation algorithm that uses them. Braids are objectlike abstractions, and fibers are method-like abstractions. We show instruction set architecture (ISA) and microarchitecture extensions that are required to support braided code. The fundamental architecture abstractions are memory inquiry operations that provide latency prediction information about potentially lengthy memory operations instead of blocking and waiting for results of the memory operations. This allows the program to respond adaptively and perform other work while waiting for the memory operations to complete (for instance, loading data from the memory into a cache or computing an address translation). To allow efficient interaction between software and hardware resources, the hardware associates memory transaction identifiers with in-flight memory operations. The software can then poll for completion, typically when some other memory operation is deferred.
Programming constructs
A program that is able to adapt to memory latencies generally has to defer some portion of its work while requested data is being fetched. This implies that the program should be able to operate on a partial order. As a result, the programmer must give up explicit control of the ordering of some operations in exchange for higher performance.
In this section, we begin by describing a set of highlevel language constructs for expressing latency-adaptive programs. In subsequent sections, we describe the necessary ISA and microarchitecture enhancements to support these constructs and show how the language constructs can be efficiently compiled to the extended ISA.
- 5 Rules for Immediate Annuities
- Death in the Family: 12 Things to Do Now
- Dumbest Things You Do With Your Money
- 6 Online Networking Mistakes to Avoid
- 401(k) Mistakes to Avoid
- 5 Economic Scenarios to Keep You Up at Night
- The Real ‘Best Places to Retire’
- Best Credit Cards for You
- 12 Tough Questions to Ask Your Parents
- The Real ‘Best Colleges’
- Home Buyer Tax Credit: How to Cash In
- Why You Shouldn’t Bash Cash
- 8 Phony 'Bargains' and Better Alternatives
- Danger: 3 Debit Card Scams to Avoid
- 6 Myths About Gas Mileage
- 29 Fees We Hate Most
- Quick and Easy Ways to Boost Returns
- Best Stocks to Buy Now
- Lower Your Taxes: 10 Moves to Make Now
- New Jobs: 8 Lessons from Real-Life Career Switchers
- The New Job Market: Who Wins and Who Loses?
- Health Care Reform's Public Option: Everything You Need to Know
- Volunteer Work When Unemployed: Should You Work for Free?
- Whose Recovery Is This?
- Long-Term-Care Insurance: 4 Biggest Risks to Avoid
Content provided in partnership with
Most Recent Technology Articles
- INTERVIEW WITH BEN BUTTERS, DIRECTOR OF EUROPEAN AFFAIRS AT EUROCHAMBRES : "A PERFECT ROAD MAP FOR EU CLUSTERS DOES NOT EXIST".
- AGENDA.(Brief article)(Conference notes)
- FIGHT AGAINST INTERNET PIRACY.
- INTERNET : AUTHORS' SOCIETIES URGE ACTION AGAINST PIRACY.
- TELECOMMUNICATIONS : BUSINESSEUROPE HOSTILE TO FURTHER CONTRACTUAL OBLIGATIONS.(Brief article)
Most Recent Technology Publications
Most Popular Technology Articles
- Speed control of separately excited DC motor
- BizRate to monitor in-store customer satisfaction for Office Depot stores - Market Intelligence
- Effects of creative, educational drama activities on developing oral skills in primary school children
- Failed businesses in Japan: a study of how different companies have failed, and tips on how to succeed, in the Japanese market
- Political stability and economic growth in Asia


