The rise of the cloud, the emergence of hyper-convergence and the lightning-fast speeds of non-volatile memory express (NVMe) flash have perhaps masked some of the fundamentals of storage.
But, at root, all storage is categorised as either block, file or object, with those terms derived from how data is accessed in each mode.
Fundamentally, the physical storage that data resides on shares a common characteristic. Namely, that it comprises a medium that can register the presence or not of bits of data, and that’s the same whether it is the slowest magnetic hard drive or super-fast NVMe flash.
Where things start to differ is how that basic bit-level information forms part of the bigger picture, and it’s here where the key differences between block, file and object storage emerge.
File and block share a lot, namely in their relationship to a file system. Object is cut from a different cloth altogether.
File systems rule, OK?
Most of what we know about the way data is stored is based around the several decades-old concept of the file system. Block and file storage are defined by their relationship to it.
Block access storage – as deployed in storage-are network (SAN) systems, for example – provides only the means to address blocks of storage from file systems, databases, and so on. When you buy SAN/block storage you are merely buying the storage array and the ability to configure volumes to make them available to applications via a file system resident elsewhere in the software stack.
File access storage – commonly consumed via network-attached storage (NAS) – is most easily understandable in opposition to this. In other words, when you buy a NAS box or a linked cluster of scale-out NAS nodes, they come with their own file system with storage presented to applications and users in the familiar drive letter format. Everything a SAN does is also performed in a NAS system, but it is hidden away.
Object storage is quite different. It is based on a “flat” structure with access to objects via unique identifiers, somewhat similar to the way websites are addressed in the domain name system (DNS). That makes it quite unlike the hierarchical, tree-like, file system structure.
File, block, object: Performance and use cases
Whether storage is file, block or object goes a long way to determine the likely performance and use cases. It’s not the only determinant, however, and with the advent of very fast flash storage, performance wrinkles present in previous times can be ironed out.
But, in general, each storage mode has key characteristics whether used on-premise or in the cloud.
File storage is, as you’d expect, good for storing and accessing files. In other words, it’s good if you want to access entire files and so is good for general file storage, or for more specialised workloads that require file access, such as movie files. It’s also a good choice for data at the other end of the size scale if it exists as small files, such as might be the case with machine or sensor data you want to run analytics on.
File storage as NAS is also well-suited to working with applications that need file locking or that are written as “traditional” on-premise applications.
Having said all that, object storage also provides access at file level, but without file locking. It is also less likely to be addressable by many applications unless they are written for use with object storage.
Block access via SAN only
Meanwhile, block storage can also do all of this. It does, after all, work with a file system to provide application access to data.
But, block storage is at its best when providing access to blocks that form part of larger files. A typical use case here is database access where many users access what is essentially the same file simultaneously, but different parts of it, and with locking operating at the sub-file level.
A key characteristic of block storage is its performance, which derives from being lean and efficient, and not having to deal with metadata and file system information etc. So, it’s ideal for low latency, consistent input/output (I/O) performance for database-oriented applications that can include email, as well as virtual machines (VMs) and desktops.
Like file access storage, block-access SAN storage is likely to be more familiar to most enterprise applications. It often forms the basis of the most high-end and therefore pricey storage systems in the enterprise and usually with flash media, and often now its super-fast NVMe variant.
SAN products often have their own skills requirements, with Fibre Channel and iSCSI protocols the ones commonly used.
Object storage emerged as a rival to file-access storage for large quantities of unstructured data when scale-out NAS file systems started to creak under the sheer number of files being stored.
Where file-access storage with its hierarchical file structure can get cumbersome as it grows, object storage brings a “flat” structure with equal access to all objects held, making it eminently suitable for large volumes of unstructured data.
Another characteristic is that objects in object storage can also be accompanied by a richer set of metadata than in a traditional file system. That potentially makes data in object storage well-suited to analytics too. Object storage is also well-suited to web operations and cloud-native applications than file and block.
Drawbacks compared to file system-based approaches include that object storage has no locking mechanism, and that many existing applications cannot work as easily with it as they can with more traditional modes of access.
In addition, object storage tends to be the least well-performing of all storage modes in part because of the heavier metadata overheads, although that is changing.
Another possible drawback that makes object not well-adjusted to more time-critical operations, and certainly not transactional processes, is that it is not strongly consistent. In other words, object storage is eventually consistent between mirrored copies that exist.