Depending on how you look at it, a database is a kind of sophisticated storage system or storage is a kind of a reduction of a database. In the real world, where databases and storage are separate, there is a continuum of cooperation between the two, for sure. There is no question that relational databases drove the creation of storage systems every bit as much – and drove them in very different directions – as file serving and then object serving workloads have.
What if you didn’t have to make such choices? What if your storage was a real, bona fide, honest to goodness database? What if Vast Data, the upstart maker of all-flash storage clusters that speak Network File System better and with vastly more scale than more complex (and less useful) NoSQL or object stores, was thinking about this from the very moment it was founded, that creating a new kind of storage to drive a new kind of embedded database, was always the plan? What if AI was always the plan, and HPC simulation and modeling could come along for the ride?
Well, the Vast Data Platform, as this storage-database hybrid is now called, was always the plan. And that plan was always more than the Universal Storage that was conceived of in early 2016 by co-founders, Renen Hallak, the company’s chief executive officer, Shachar Fienblit, vice president of research and development, and Jeff Denworth, vice president of products and chief marketing officer, and launched in February 2019. This is a next platform in its own right, which means that it will have to do clever things with compute as well. So maybe, in the end, it will just be called the Vast Platform? But let’s not get ahead of ourselves.
Then again, why not? The co-founders of Vast Data did way back when.
“Back in 2015, in my pitch deck, there was one slide about storage in that entire deck, which had maybe fifteen slides,” Hallak tells The Next Platform. “One of them had storage in it, the rest of them had other parts that needed to be built in order for this AI revolution to really happen in the way that it should. Eight years ago, AI was cats in YouTube videos being identified as cats. It was not close to what it is today. But it was very clear that if anything big was going to happen in the IT sector over the next twenty years, it would be AI and we wanted to be a part of it. We wanted to lead it. We wanted to enable others to take part in this revolution that looked like it might be confined to a few very large organizations. And we didn’t like that. We want to democratize this technology.”
And that means more than just creating a next-generation, massively scalable NFS file system and object storage system based on flash. It means thinking at ever-higher levels in the stack, and bringing together the concepts of data storage and a database against the large datasets from the natural world that are increasingly underpinning AI applications.
Data in no longer restricted to limited amounts of text and numbers in rows or columns in a database, but high resolution data – video, sound, genomics, whatever – that would break a normal relational database. AI workloads need enormous amounts of data to build models, and lots of performance to drive the training of models and sometimes an enormous amount of compute to run inference on new data as it enters the model. All of this puts tremendous pressure on the storage system to deliver information – something that Vast Data’s Universal Storage, a disaggregated shared nothing implementation of NFS that has a very fine-grained quasi-object store underneath it, can handle.
“Data has a lot more gravity than compute does,” Hallack adds. “It’s bigger and it’s harder to move around. And so for us to play in that AI space, we cannot confine ourselves just to the data piece. We have to know something and have an opinion about how the data is organized. It is about the breaking of tradeoffs, and it is not just a storage thing. If you take out that word storage, and put in the word database, the same type of challenges apply. Cost, performance, scale, resilience, ease of use – these are not storage terms. They’re very generic computer science terms.”
The first inklings of the Vast Data Platform were unveiled in the Vast Catalog, introduced in February of this year, which basically put a SQL front end and semantic system on top of the NFS file system and object storage underpinning the Universal Storage. This was the first hint that a new engine was underneath the covers of the Universal Storage that supported SQL queries. Now, Vast Data is taking the covers completely off, revealing how the data storage and database have been converged into a single platform and how it will eventually have a compute layer.
And as such, we are going to treat the Vast Data Platform announcement just like we would a server compute engine announcement, giving it an overview to start (that would be this story you are reading) and then a deep dive after we do some digging into the architecture. Technically, we are on vacation at the beach in Hilton Head Island.