Why NFS is not suitable for work anymore

(Image credit: Shutterstock)

In the early 1980s, Motorola introduced the first commercial mobile phones. They were huge, heavy, cost thousands of dollars, and ran on an analog network called AMPS that was bandwidth-hungry and lacked basic security features to prevent calls from being intercepted or bugged. 

As cutting-edge as they were in their day, nobody in their right mind would still use one now. 

Around the time the Motorola DynaTac 8000X was introduced, Sun Microsystems developed its Network File System (NFS) as a protocol for client machines to access files on a single centralized server.

About the author

Björn Kolbeck is co-founder and CEO at Quobyte

NFS was a breakthrough at the time, but nobody in their right mind would still use it today, am I right?

Back then, dial-up connections over modems were measured in bits per second, and local ethernet LANs peaked at 10MBit/s. Today we deal with exponentially more data, faster networks, and more servers than back in the 1980s or even the 1990s. 

With the advent of scale-out architectures in IT – or warehouse-scale computing as Google called it – we have ended up with environments that are not suitable for even the latest and greatest NFSv4. In fact it’s become a liability.

The single biggest problem: NFS is designed for a single centralized server, not for scale-out. Today’s NFSv4 and even parallel NFS are still based on a centralized model. Not only was NFS designed for clients to talk to a single server, those machines only had a few MB of capacity, file sizes were relatively small, throughput relatively low. 

Every enterprise IT executive, CIO, and Data Scientist in the world today has two goals: one, meeting the scale needs of users and applications, and two, appropriate data safety to address security, compliance, and availability. 

Scale-out requires full mesh (n-to-m) communication between clients and storage servers. Otherwise there are bottlenecks and lockups to choke performance, particularly in read- or write-heavy workloads – which is essentially all workloads in a modern enterprise. 

And this is ultimately its critical flaw: NFS is itself a bottleneck. The NFS device inherently sits directly in the data path, and can’t scale performance to accommodate  the demands of I/O intensive computing or multiple concurrent requests. 

Any gateway is a bottleneck too, and NFS gateways are no exception. Architectures based on NFS gateways have severe performance scaling limits due to the consistency of caches between the NFS gateways to create the illusion of a single NFS server. Because that is all NFS can do, and cache consistency is an expensive band-aid to make an outdated protocol work, instead of fixing the problem: NFS. 

Load “balancing” – I use quotes because most of the time the result is far from balanced – intrinsically demands a distributed environment or system, and since NFS was never intended for distributed systems, load balancing is painful and disruptive. It simply doesn’t think that way. 

Ah, but that’s where parallel NFS comes in. People think it solves all these issues. Sadly pNFS is still just as broken, and still the opposite of scale-out. Only I/O is spread across multiple servers; there’s still just a single centralized server for metadata and the control plane. It won’t surprise anyone that the explosion in enterprise data includes a corresponding explosion in metadata. Performance and scale in metadata processing is especially important in “big data” applications like AI/ML and analytics. 

Unfortunately, as I see over and over, pNFS solves only a tiny part of the problem: the data transfer. It may be the most modern iteration, but it’s 15 years late to the market and leaves many of the real issues unsolved.

NFS fails at failover as well. Anyone who uses NFS knows the “stale file handles” problem when an NFS failover happens. The protocol, even NFSv4, does not have any idea what failover is – again, it wasn’t created to think that way – and instead relies on fragile IP failover, which is slow and disruptive. Like many critical features, fault-tolerance must be designed into a protocol from the start, but NFS has clunky failover added later on, like a badly designed building waiting to collapse.

This brings me to the second goal of enterprise IT: data safety –a catchall term for data integrity, governance, compliance, protection, access control, and so on. 

Data safety is a major concern whether it be preventing data breaches or industry regulation. Lately data breaches have resulted in significant fines for companies subject to the European Union’s GDPR. Enterprises processing personally identifiable information or health data must implement state-of-the art data protection through encryption. 

Here again NFS is a liability because neither pNFS nor NFSv4 offer proper end-to-end encryption, let alone other security mechanisms like TLS and X.509 certificates, all of which are available today in storage technologies designed for scale-out and safety including Quobyte’s Data Center File System. NFS is a severe business and compliance risk in comparison.

pNFS and NFSv4 also lack end-to-end checksums to identify data corruption. This is also partly a result of the increasing scale of data operations today compared to when NFS was developed. In the 1980s data integrity via checksums wasn’t a question, because data transferred over IP packets was small, and TCP checksums were adequate. But TCP checksums are now too weak, especially at scales greater than 64k bytes per packet. Today an enterprise expects gigabytes per second. Decades later, NFS still doesn’t address data integrity in an adequate way. You probably underestimate how often you get corrupted data from your NFS storage – and tracking down the problem is difficult and time-consuming.

Whether it is high-throughput requirements, random or mixed general workloads, or data safety and access, there is nowhere in the modern enterprise where NFS excels. It's time to retire the protocols of the Back to the Future era in favor of alternatives that provide users and applications with the performance and reliability they need.

Björn Kolbeck

Before taking over the helm at Quobyte, Björn spent time at Google working as tech lead for the hotel finder project (2011–2013). He was the lead developer for the open-source file system XtreemFS (2006–2011). Björn’s PhD thesis dealt with fault-tolerant replication.