The cloud is the promised land when it comes to storage. A recent 451 Research report said AWS and Azure will be two of the top five enterprise storage vendors by 2017 with AWS as number two overall. But the challenge with using the cloud for primary storage is the latency between that storage and users/applications. To take advantage of the economics, scale, and durability of cloud storage, it will take a combination of caching, global deduplication, security, and global file locking to provide cloud storage with the performance and features organizations require.
"Any time you move your infrastructure somewhere outside of your data center, there's going to be latency involved, and you run in to the speed of light problem: The speed of light can only go so fast,” says Scott Sinclair, analyst with the Enterprise Strategy Group. “But unlike most storage problems, the trick to achieving high-performance cloud storage isn't just to throw more disk drives or flash at the problem. When solving for the speed of light, new technologies need to rely on a specific innovation to solve the problem -- namely, co-locating data very close to compute, or introducing some sort of network optimization or caching mechanism.”
Let’s first take a quick look at AWS S3 as an example of why there is so much hype around cloud storage. AWS provides 11 nines of durability and is designed to sustain the concurrent loss of data in two facilities. AWS also lets customers pay as they grow and immediately take advantage of any price drops in storage. This is much different than buying fixed amounts of storage at today’s prices ahead of the actual need for that storage.
And there are few organizations, if any, that can match the scale of AWS. Everyday, AWS installs enough infrastructure to host the entire Amazon e-tailing business from back in 2004 (when Amazon the retailer was one-tenth its current size at $7 billion in annual revenue).
With all these advantages why is cloud storage relegated to only a back-up role instead of primary storage The speed of light between the datacenter and cloud storage is hard to overcome. However, there are ways to overcome the latency and break the speed of light.
Latency normally manifests itself in slow performance. This is where caching, global deduplication and global file locking come into play.
Caching data locally is the first step to eliminate the effect of latency. Many analysts will tell you that 70% of data has not been accessed in 60 days. When we evaluate the storage for prospective customers, we find that 90% of their data has not been accessed in six months. This means that if you cache the hot or active data in the office, the rest of the data can be stored in the cloud.
The goal is to have as much active data in the cache as possible. This can be accomplished by having enough storage to cache the active data, and/or by using an efficient caching algorithm. We typically find that customers underestimate the amount of local cache needed even if they are planning for growth. They often add more users than they forecast, or they put more types of data in the cache than they originally planned, as the data in the cache does not need to have back-up, DR, or archive systems.
A caching algorithm uses machine learning to know what data needs to be locally cached and what data can “recede” into the cloud. There are techniques that can be used in the caching algorithm to predict what data needs to be kept in cache based on how the data was written in time. The goal is to predict what data is needed based on the data that is being accessed and then “pre-fetch” other data that is not cached.
Caching does not have to be black or white from a file perspective if global deduplication is used. A global dedup table in cache enables the caching algorithm to leverage common blocks across different files so it only pulls down the missing blocks of a file if that file is accessed but not fully in cache. This dramatically reduces the amount of time to access a file that is not fully cached locally.
Global dedup is especially useful when transferring a file from one local cache to another local cache assuming both cache’s are connected to the same cloud storage. Since each local cache has a dedup table, it knows what blocks that it is missing from a file that is being transferred. Only the missing blocks are actually transferred across the wide area network between the two different local caches. Electronic Arts reduced the transfer times of 10GB-to-50GB game build files from over 10 hours to just minutes as only the new blocks of the files were actually transferred.
While caching and dedup are a tremendous help, they do not fully solve the latency issue. Caching and dedup eliminate or significantly reduce the time to transfer data, but do not solve for “application chattiness.” People often talk about chattiness and latency, but do not fully understand how the combination of latency and chattiness can have a much bigger performance impact than data transfer. This can be illustrated with a time and motion study that was done with a chatty application opening a small 1.5MB file across the country -- from New York to California.
CAD, like other technical applications, has a significant number of file operations that happen sequentially when a file is opened. In the case of AutoCAD, the most widely used CAD program, nearly 16,000 file operations happen when the file is opened. This is the “chattiness” of the application. If the authoritative copy of the file (with the file lock) is 86 milliseconds away (the round trip latency from California to New York), then it takes 16,000 * 86ms for the file to open – approximately 22 minutes. The actual data transfer for a 1.5MB is a fraction of the 22 minutes.
This is where global file locking comes in. When the file lock is transferred from New York to California, it is as though the authoritative copy of the file is stored in California (even though the authoritative copy is still in New York) so the latency is LAN latency instead of WAN latency, and drops from 86ms to 0.56ms. The time to open the file drops significantly: 16,000 * 0.56ms for a grand total of 8 seconds.
Of course, not every application has the level of chattiness of AutoCAD, but any application that was developed for a high-speed, low latency local area network will have some sort of chattiness that will often cause more performance issues than the transfer of the file data itself.
Organizations can take advantage of all the benefits of cloud storage for all their files, not just the files they are not using. When this happens, organizations begin to rethink storage in general. Since there is so much durability and redundancy in the cloud itself, customers have to get their head around the fact that systems and processes used for back-up, DR, and archiving are really not required anymore. Those functions become a natural byproduct of using the cloud for primary storage once you solve for the speed of light.