I get a kick out of all of the ___-as-a-Service acronyms that the industry has invented. We now have another common phrase – Software Defined <Data Center/Network/Storage>. These are software based solutions that abstract and pool heterogenous hardware resources and then layer on intelligent software to provide an easy to consume service typically with rich REST APIs for programatic access. Hence my creation, Software Defined Objects, to label just one of the capabilities of ViPR. But first we must cover some basics about objects. We will see object in more and more enterprises as they transition to mobile, web, and big data applications. Why is object gaining in popularity? Why would someone choose object over a filesystem?
First, a definition of what makes an object. I am borrowing Chuck Hollis’ because I cannot state it any simpler. You can find a very useful analogy on his post. An object looks like this:
- an arbitrary amount of bits (data, code, whatever)
- a unique identifier
- an aribrary amount of metadata to describe it to different parties
- and some sort of access method to invoke or use it
Access methods are typically REST API based. Common examples out there are OpenStack Swift, Amazon S3, EMC Atmos, and Centera.
File, meet Object
If IT is fortunate, the DevOps team is requesting an object store for a new web application. I say fortunate because often they are taking this object store out to a public cloud service like Amazon or rolling their own in a dreaded “Shadow IT” project. I say embrace object and all of the next generation capabilities it has to offer. Better yet, offer an object store to the dev team before they ask. You are becoming the IT service broker after all. Here is why the developers want an object store as opposed to a filesystem.
Objects know nothing about the location restrictions that filesystems must deal with on a daily basis. While a filesystem is constrained by a single location, object stores can span distances allowing an object to be accessed anywhere and live in many locations globally. Access anywhere with locality awareness that serves up the closest copy to the user based on their location is a huge advantage of object storage. Imagine the listing on ebay.com with images that are made available to the potential buyer of an auction item who lives in Europe but the seller is in the US. The buyer’s experience would be poor if they retrieved the images across the pond. The user need not care about the location of the object but simply needs to request it from the namespace and let the object store do the heavy lifting of locating, retrieving, and serving the object from the nearest location if it is available. Replication is core to the architecture of most object platforms for this reason and is not just a method of providing resiliency although that is a tremendous benefit.
That last point requires the ability to have a global namespace across all locations. http://www.emc.com/namespace/objectguid is an example. No matter the geographic location of the object or application retrieving the object, it will still be found at the same namespace location. Compare this to accessing a CIFS or NFS share that has a specific, single geographic location.
Want to store key data about the objects/files you are storing. Most file use cases out there today have some kind of relational database storing the metadata pointing back to the file. This does not scale well when dealing with millions or billions of files. Even worse are implementations I have witnessed where the files are actually embedded inside a database structure. Unstructured data buried in a structured relational database. Yuck! Object stores combine the metadata with the object on the storage itself allowing for seemingly endless scale.
There are scale-limiting factors of filesystems – overall size restrictions of a file system and number of files or directories. Because objects are stored in a flat structure rather than directories, there is no need to store directory structure data which is often why filesystems begin to struggle at scale.
ViPR Data Services (DS) layers various data structures on top of existing enterprise storage infrastructure – today VNX, Isilon, and NetApp NFS filesystems – and soon commodity hardware. With this software-only solution that runs as a scale-out cluster of VMs, IT organizations now have a simple to deploy and manage method to provide Object and Hadoop as a service to their development teams. The supported Object APIs in ViPR Data Services are Atmos, OpenStack Swift, and Amazon S3. Keep in mind, the ViPR DS is not managing a standalone instance of OpenStack Swift but instead IS the object store itself. Data traverses through ViPR DS. This is a common misunderstanding when talking with customers and EMCers. Expect to see future data services added such as File and Block – think ScaleIO as a ViPR DS! Chad Sakac recently made it very clear that much of our portfolio will run in the data plane as a software only solution by the end of 2014. My guess is there will be tight integration with ViPR controller to manage these software defined services.
Access methods battle it out
A powerful option ViPR DS provides is the ability to simultaneously access stored data via Object or HDFS. Don’t battle over where or how data is stored or, worse, move massive amounts of the same data around for different purposes. Think of the duplication of data that occurs if an application writes objects or files but then you want to run analytics in your Hadoop cluster. Traditionally, this requires you to move the data from the object store or filesystem to the Hadoop cluster. Holy duplication of data! Instead, create the objects with the application via REST API and then access that same data in-place with the Hadoop cluster. Therefore, no duplication of data and much faster access to the data as it exists in realtime.
Briefly, here are a couple observations from my interactions with customers.
- Developers are more frequently demanding object rather than block or file storage. DevOps shops love REST APIs rather than needing to deal with LUNs or file/folder structures in their code.
- Many customers are pursuing OpenStack Swift as a means to leverage commodity hardware instead of proprietary storage systems even if object is not the data type they need. This leads to interesting, maybe bizarre, solutions such as placing a file gateway in front of the object store. Now you are back to the same limitations as before. Pay attention to EMC World as you may hear “Commodity” mentioned.
- Many pay-for products are sprouting up to assist with and provide enterprise support the free Open Source options. SwiftStack and InkTank are a couple examples. This goes along with my belief that while free Open Source projects may be a way to get to cheaper commodity platforms, often the effort to roll these out and consequently maintain them is overlooked. Value added software options like ViPR can deliver these capabilities to the enterprise faster, easier, and in a supportable package.
- Use cases include storing large amounts of unstructured data such as log files, images, and web content.
What are you seeing out there?