Make Computers Faster - The Time Has Come for Transactional RAM (TRAM) Everywhere

Dru Nelson
333 Ravenswood Ave.
Menlo Park, CA 94025
dnelson@redwoodsoft.com
Copyright 1998


Abstract

Today, there is a wide gap in access speed between semiconductor memory and magnetic memory. Semiconductor technologies have been getting denser and quicker while magnetic memories like hard disk drives are getting denser but not signifcantly faster. The difference between their acess times is large. Why not introduce TRAM into the existing computers and speed up any disk operation tremendously. This would especially benefit servers which are usually perfomance bound from slow disk IO.
TRAM is synonymous with NVRAM, but I wanted to differentiate it by its particular application of NVRAM. In this form  is designed to accelerate all forms of transaction processing that would typically use a hard disk or any or some other non-volatile memory which takes a long time to write.<> This is not a new idea. Techniques mentioned below have been used by various computer systems in the past. What is interesting is that TRAM is not in widespread use today. The goal of this paper is to address TRAM for use as an accelerator of all transactions or storage.


Table of Contents


Introduction

The commodity PC has clearly taken the roles of workstation and server. In the past, these were roles that would have required systems that had the additional hardware to support multi-user or server style activity. Today, it is not surprising to find that these same designs exist for commodity PCs. As a result, the relatively low-cost, commodity PC has significantly more power than those computers in the past and can take their roles.

Now also, look at the gap in speed between semiconductor memory and magnetic memory. Semiconductor technologies have been getting denser and quicker while magnetic memories like hard disk drives are getting denser but not signifcantly faster. Now, consider that on most of the common operating systems, all transactions are commited to a disk.  In server designs of the past, various hardware designs were implemented to increase this IO throughput. This disk IO throughput is usually the bottleneck for all of the transaction processing on computers acting in the server role.

With this gap between semiconductor memory speed and disk IO speed widening, the older server designs will become more important.

It is surprising that one of the solutions to this problem used has not become a common sight in the commodity PC world. This solution is the use of non-volatile semiconductor memory as the new layer between a computers main memory and magnetic memory. I suggest calling this memory TRAM and I will go over some of the applications that could use this. In this paper, some of the applications wouldn't require a disk. So I define TRAM as any high-speed, non-volatile memory which can be used to accelerate transaction processing.

In the first section I will describe how this technology works. Next, I will address the hardware aspects of this device first in the 'Hardware' section. Then I will discuss the software side of implementing this technology in the 'Software' section. Then I will discuss the benefits in the 'Benefits' section. Finally, there is the 'Similar Work and Future directions' section.

<-- table of contents

How it Works

Explain some typical operations........ do this section better Explain random writes, etc. and leaky bucket...

Transactions

In transaction processing systems, there is an acronym called A.C.I.D. Each letter represents the essential properties of a transaction. The letter 'D' is the one that concerns us. It represents the 'durable' aspect of transactions. This property requires that transactions exist after it is complete[Gray, ??]. In todays systems, this commonly translates in meaning to "write the data to the disk, and then reply". As far as transaction processing is concerned, this write to the disk can take the most time.

In most modern operating systems, writes are cached to memory before they go to disk. However, these writes are in jeapordy if the system fails. As a result, these operating systems will have a system for causing certain writes to happen immediately. This is required for databases that run on top of such an operating system. Also, some operating system operations have to be transactional. For example, the Unix operating system guarantees that file renaming and other directory operations are done as a transaction. Usually, if a lot of these operations occur, their performance is poor.

In the past, a solution to this problem has been to use a fast, semiconductor based memory which was non-volatile (NVRAM). Usually, the transactions would store their operations in the NVRAM. The transaction would then commit very quickly and a background process would write the transaction to disk or some other slow non-volatile memory.

TRAM Benefits To Disk Writes

TRAM benefits a system with the two patterns of writes that exist in a disk based system. In some systems, the writes are random "random writes". In others, the writes are predominatly serial and always to new blocks (databases or log based file systems.) "log writes". TRAM can benefit both types of write patterns based on certain factors.

If the set of "random writes" is a subset of the TRAM's size, then the writes could all exist in TRAM. If this was the case, the disk drive wouldn't have to be used. So, as long as the "random write" pattern is a subset of some specified amount of TRAM, there would be a performance benefit. The factors for this benefit are the size of the TRAM and the set of active blocks.  I would call this the 'subset' benefit.

If the system is generating "log writes", it is generating new data blocks at a rate. If the effective rate of these new blocks are being generated is under the rate at which a TRAM device can flush blocks to the disk, then these bursty writes can be commited quickly. For example, if writes are serial and the average rate is 1 megabyte a minute and the system can write blocks from TRAM to disk at 1 megabyte a second, it could be guaranteed that the "log writes" will be to TRAM. The benefit will be the quick commit time on the transactions. The factors to consider are: what is the 1 minute rate of writes, what is the 1 minute rate from TRAM to disk, and how large is the TRAM. This could be called the 'leaky bucket' benefit.

If there isn't enough TRAM to satisfy the request, the request will have to block until the request could be satisfied. At this point the speed of the transactions would effectively be the same as a system without TRAM. This should be seen as equivalent to an operating system 'thrashing' when it doesn't have enough memory to hold it's working set. Systems should be designed to that the TRAM would deliver performance without ever causing a block. For most systems small to large, I think this could be achieved easily.
 

Benefits

The largest benefit of TRAM is just pure phenomenal transaction and write performance. Disk writes would be few and not often for most users. For a lot of database users, the transactions would be tremendously faster than before. All in all the performance increase will be 100 to 1000 times faster.

Lower power consumption. Most systems today require disks to go at a faster rate. This requires more electrical current. If the TRAM is sufficiently large, it may not be necessary for disks to provide that kind of performance. Disks with lower electrical current could be used and the heat generation could be cut down. There will always be a need for some applications, but the vast majority would have the pressure eased off by TRAM. In fact, with a 128 meg TRAM, most users may not need to use the disk for a day or a week. With 1 Gig of TRAM, I think it would be safe to say that most users wouldn't hit their disk for years. This would help the world's power consumption or green PC initiative. It would also allow the laptop user to not have to spin up or down anywhere near as often and subsequently increase the battery time.

Since file writes would not be as often, the number of writes to a disk would be reduced. This may increase the lifetimes of disks more than they already are.

Noise, without the disk running, the overall computer noise level could be brought down. For some drives, this is very noticable.

The need for fans would be reduced.

Drawbacks

Removable media would require a synchronization with the TRAM before being removed. Since this is already the case with most operating systems anyways, it is not a big issue. In fact, it may allow disks like this to be used for more applications since they are generally not as fast as non-removable disks..
 

Not all applications fit well into current memory sizes. More than  90% of the applications do. Some of the applications that wouldn't benefit from this are appications that are streaming large amounts of data very quickly. (Video editing, very-large scale transaction processing).
 

<-- table of contents

 

Hardware

TRAM could be implemented in a system in several different ways.

A TRAM, from a hardware point of view, would consist of three sections. The memory, memory controller, and power section. Overall, all of these parts could be made from off the shelf parts. However, there aren't any off the shelf memory controller designs for this application. Since the parts are off the shelf and since the application is essentially just providing more physical memory address space, building these shouldn't be difficult.

Memory

As of today, semiconductor memories have very high densities. DRAM has  64 megabit parts shipping today. Static RAM (SRAM) can be seen in the 4 megabit densities are common. To build an TRAM device, it is recommended that either DRAM or SRAM is used for this purpose. All of the other technologies like Ferroelectric RAM and Flash don't appear to be suitable. Ferroelectric is still something that hasn't become common. Flash has a limitation on writes which would are done quite often with a TRAM.

DRAM is the most commodity of components in computers today and it has the highest density for semiconductor memories. It would be the most cost effective solution for this problem from the pure density per dollar point of view. DRAM's main requirement is that it requires a refresh cycle periodically in order to maintain it's memory state.

SRAM is the next suitable commodity semiconductor memory. It is also the type of memory that has been the most used in the past TRAM designs. Several manufacturers like Dallas Semiconductor make such SRAM's with a battery and necessary circuitry to make it non-volatile.

Memory Controller

The memory controller is the part that would have to interface with the host machine and also control the memory system. On the host side, it would properly interface with whatever bus the system provided (these days that would probably be PCI). When it received read and write requests through that interface, it would apply those signals to the non-volatile memory side.

Since SRAM doesn't require a refresh, the memory controller is a simple gating buffer. This is probably the reason why so many of the TRAM's in the past have used SRAM for its memory. In the future, as this technology becomes popular, we should see designs using DRAM. DRAM has a higher density and is less expensive per bit..

The memory controller could also deal with all error detection and correction. ECC is a common format for memories these days that provides this functionality.

The hardest part of the design is the part that would assure that the memory doesn't get corrupted when the power begins to fade.There are designs to accomidate this. The easiest method of insuring this is to cause the TRAM's acceptable power level to be higher than what the host computer finds acceptable. The software to manage a TRAM would have to make it's operations transactional so only complete transactions get completed.

Power

The power system would be the circuitry that dealt with the power sources and switchover of power to the memory section. It would also consist of some power source that would always be on if power were to be lost. With the advent of laptops, there are many off the shelf parts that could be used. Many of these parts control different types of batteries and can determine the amount of time left until a power failure occured.

<-- table of contents

Software

Now, assuming that a device is built like the one mentioned in the previous section, how could the software take advantage of it. In this section, I describe how this technology could be implemented today using 'levels' like the early RAID papers.
 

NVRAM Disks - Level 0

The quickest way to implement this technology is to implement an TRAM disk. In this form, the TRAM would appear as a disk resource in the operating system. Generally, it is easy to add a device to the system. The main benifiters of this would be Unix or Windows NT users - specifically Microsoft SQL server users. The SQL server database and log partition could be placed on the TRAM partition. This would provide much better perforamance than any magnetic disk technology could provide.

The upside is that it is quick to implement. The downside is that it doesn't give the system good performance globally. You have to direct the application to use that disk. You are also limited to a small sized partition.

Standard Partitions - Special File System Check - Disk Cache - Level 1

The second quickest way to implement this technology is to implement a Disk Cache using the existing filesystems in place. All disk writes would be cached via the TRAM. So if a disk read occurs, it checks the TRAM cache. If it is there, it reads it from there. If it isn't there, it reads it from the disk. If it does a write, it checks the cache for a presence. If it isn't there, it should place it in there and remove any old entries if necessary. Past that, all that would be required is that all tools that access the filesystem abide to this.

The changes required would be:

The system would act just as it did before and all operations would act the same as well. It would not matter whether the applications or the filesystem was using the disk. If the operating system allowed, it could also ride on top of a raid volume. The key here is that various layers would exist over the block devices. The first layer would be the disk cache. The next layer would be the RAID block device. The layer under that would be the actual disk(s) themselves.

The upside is that it is only a minor change to an OS and it would deliver good global performance to the system. The downside for some systems  is that it will not guarantee that the filesystems will be in a consistent state upon boot (if the system did an improper shutdown). Therefore the system or application would have to bring the system into a consistent state again. For a unix system, this would mean an 'fsck'. For MSDOS or Windows 95/98 this would mean a 'scandisk'.

Some Unix's and Windows NT already use a journaled filesystem. In this scenario, the amount of time to take the system into a consistent state would be quick. However, these systems tend to be 'log write' heavy. So, they would require good "leaky bucket' performance. Still, for these systems, Level 1 would be an adequate solution.

Journaled Transactions - Same Filesystem - Disk Cache - Level 2

The next incremental step is to make the recovery from an improper shutdown as painless as possible for existing filesystems. One solution, which would only require a minor change on the Level 1 system, is to have a journaling partition for the filesystem. Usually, when one mentions adding a journal to the filesystem it means a significant re-architecture of the filesystem. In this case, I am suggesting using the TRAM to our advantage on an existing filesystem.

The goal is to eliminate having the 'fsck' to examine the whole disk in order to restore consistency to the disk. To accomplish this, it would be required that part of the TRAM would be used to store transactions that would be necessary for filesystem metadata consistency to be stored in TRAM before the data is considered 'commited' to the normal disk portion of the TRAM. The journal would only have to have enough room for the largest filesystem, metadata transaction.

For example, the filesystem locks the appropriate blocks for an atomic directory operation. It performs the changes and writes those new blocks to this small portion of TRAM as well as preserve the blocks before they are written over. When the changes are complete, the TRAM needs to change it's internal tables that describe what blocks of memory correspond to which disk blocks. Once that completes, the filesystem code would state that the journal is usable for the next  transaction.
If it didn't complete, upon reboot the filesystem checker could restore the filesystem to a consistent state very quickly. This is because it only has to run through one tiny journal. After that, the TRAM and disk are considered to be consistent.

This would allow existing filesystems to do a filesystem check in the sub millisecond range, per filesystem. This would be nice.

<-- tracking pitch in hummed queries
<-- table of contents

Slightly More Exotic Uses

Some exotic uses for TRAM could be as the presever of file lock state for a network filesystem protocol. This would allow for stateful servers and clients. This might be the magic bullet needed to make network filesystems really usable <<MORE>>

Of course, TRAM could  be used for all sorts of fault tolerant applications.

One could easily make a high perforamance site pair [GRAY].  Imagine two FreeBSD computers with a highspeed network connection. Imagine that the method that clients use to connect to the proper server is done (this is a previously solved problem). Now, imagine that any disk write to the primary requires that the data is successfully written to the secondary of the site pair. If they are both using TRAM, that limitation in response will purely be the network infrastructure. Two boxes costing $4000 will be able to provide performance better than boxes 10 times that cost that provide that kind of fault tolerance (my speculation, but I bet that I'm not far off)

The TRAM could be used for storing the state table of a network address translating router. In essence, this is just like the site pair described above, except that there are no disks. These would be FreeBSD or other boxes acting as routers. Network address translation requires the routers to maintain a state table of all the connections that go through the router. This requires very quick performance. TRAM could provide that performance. If it was large enough, there would not be the need for a disk.

<-- table of contents

Future directions and Related Work

The future for TRAM is bright. I forsee almost every desktop, laptop, or server computer in the future having some form of TRAM to bridge that gap between semiconductor memory and disk memories. In fact, it may become a standard feature of every computer in the future to make main memory non-volatile. (It should).

The future devices could use the most cost effective memory, DRAM as it's semiconductor memory.This would allow the devices to be as large as normal computer memories. It is not uncommon to see a 64 megabyte or 128 megabyte computer. I suspect that in the next few years when memories get to be in the gigabytes range, hard disks will be supplanted for small computers and mid sized computers.
 

 The devices could also eventually have a Content Addressable Memory unit (an MMU) to speed the cache lookup or hardware could speed up any of the common functions used so the CPU isn't utilized.
 

Related Work

Previous work on this has been done by many people.

Since most computers were built with core memory in the past, which had the characteristic of being non-volatile, I'm sure there was software that took advantage of this feature. In some computers, like the first internet 'routers', it was the only form of non-volatile storage[2]. It would be kind of 'cool', to have this feature back.

Silicon disks are another related technology[4].

After writing this I stored any links or papers I found on the topic.

McVoy has an interesting paper on line from the 1991 period that is the closesly related to this paper [3]. His even discusses in more detail intelligent disk controller technology and the pro's and con's of putting the NVRAM on the disk.

From the abstract of their paper at the Usenix site, Peacock, et. al. did a paper that is also closely related to this paper. Although I thought my idea on using existing filesystems with TRAM was original, their publication was years ahead (good work guys). I am not a member of Usensix, so it would be nice if someone could email me the paper.[8]
 

<-- table of contents
 


Impacts  <<DROP THIS>>

If you put serious transaction performance into the hands of commodity PC manufacturers, this significantly inreases the range of applications that commodity PC equipment can accomplish. This would erode the low and middle range markets of many computer manufacturers.

Just about anybody running a 'server' will easily see the cost justification in an NVRAM board.

Free operating systems like FreeBSD, *BSD, and Linux that use commodity PC hardware, could provide serious transaction processing power and significantly better fault tolerance for very little . This would eliminate the notion that a 'serious box' made by some other vendor would be required for a solution. This has been something that they have lacked the ability to solve.

With performance like this, Network Appliance and Auspex might lose market share in the file server market as commodity PC's will achieve performance of a portion of their market.

Microsoft and other OS vendors  could have a new justification for new OS releases.
 
Since the PC would now be filling a role as the middle range server, Sun, SGI, IBM, and HP will be compteting.

The future for NVRAM is bright. I forsee almost every desktop, laptop, or server computer in the future having some form of NVRAM.

If laptop users aren't driven by any heavier data storage needs, they may not need a disk with sufficient NVRAM storage.

When you use your computer, your hard disk may be off for long periods of time.

<-- table of contents

Footnotes

*
Blah blah
**
Th blah2
.
<-- table of contents

References

2
 Katie Hafner, Matthew Lyon. Where Wizards Stay Up Late : The Origins of the Internet. Touchstone Books; ISBN: 0684832674. January 1988.

 
3
Larry McVoy. Hardware Support for High Performance Filesystems. Sun Microsystems - White Paper. 1991.
(He was influenced by the Legato prestoserve as well)
4
Quantum Corporation. Solid State Disk Drives (SSD)
5
G. Copeland, T. Keller, R. Krishnamurthy, and M. Smith. The Case for Safe RAM. In Proceedings of the Fifteenth International Conference on Very Large Databases, Amsterdam, pages 327-336, 1989.
(I haven't read this, but I saw it mentioned somewhere)
6
Dave Hitz. An NFS File Server Appliance. Network Appliance, Inc. 1993.
7
Dave Hitz, James Lau, and Michael Malcolm. File Server Design for an NFS File Server Appliance. Network Appliance, Inc. 1995.
 8
J. Kent Peacock, Ashvin Kamaraju, and Sanjay Agrawal. Fast Consistency Checking for the Solaris File System. Sun Microsystems. 1998.

http://wwwwswest2.sun.com/products-n-solutions/hw/networking/netra_nfs/netra_nfs_owners/techinfo/fsck.shtml
(This looks a lot like the same ideas I have, unfortunately I'm not a Usenix member so I can't read this. If you can send a copy, that would be great)

 
<-- table of contents

Analogy, if we have an NVRAM with a size of 1 GIG, do we need a disk??
 

Please contact me if you know of any papers or related work that I haven't mentioned...