Get Your Dockerized ECS on in AWS

EMC launched ECS 2.0 at DockerCon 2015 and a great big bonus was included – you can download a community edition for FREE.  The devs were an early adopter of containerization so this software defined object storage platform runs as, you guessed it, a Docker container.  Let’s deploy it in AWS.

This will walk you through the steps to run the free community supported, non-prod version of ECS in AWS as a Docker container using a CentOS instance.  You can start at section 2 to deploy the container to a bare metal or vSphere CentOS machine.  If you do, be sure to add a disk of at least 100GB to be used for ECS persistent storage.

Updated 7/13/15: Note- There are regular updates occurring to the installation scripts located at the EMCECS Github page.  Please review the documentation there to ensure you are using the latest commands.

Overview of Steps:

  1. Launch and Configure a CentOS Instance in AWS

  2. Execute ECS Deployment Script

  3. Configure ECS with Script

Launch and Configure a CentOS Instance in AWS

  1. Login to the AWS Console and navigate to EC2
    ecsaws_35
  2. Click Launch Instance.  Excited yet?ecsaws_34
  3. Choose AWS Marketplace, enter CentOS, and click Select.ecsaws_32
  4. Updated 6/29: Thanks @mcowger for pointing out a lower cost instance option. Select the instance M4.2xlarge R3.xlarge ($0.35/hr) as this meets the minimum requirements of 30GB of RAM and 4 CPUs.  Then, click Next. The cloud costs money.  This could add up if you leave it running!  NOTE: ECS can run with less than 30GB of RAM with some tweaks not covered here.ecsaws_31
  5. Select or create a new VPC.  Click Next.ecsaws_30
  6. Click Add New Volume, enter a size of at least 100GB, and select Magnetic for the volume type.  Again, this surely is not free captain cloud builder.ecsaws_28
  7. next…
    ecsaws_27
  8. Configure the security group to allow the ports necessary to access ECS.  I recommend restricting access to specific IP addresses.  The initial deployment will have a well known default password.
    ecsaws_26 ecsaws_25
  9. No need to boot from SSD unless you want to pay for it.ecsaws_24
  10. OMG Launch it already!  Ain’t the cloudz so simple?  SHAMELESS PLUG ALERT: you too can have a simple on-prem cloud > EMC Federation Hybrid Cloud gets you there quickly.  Tell them I sent you.  If only I got referral payments…ecsaws_23
  11. Create a new key pair unless you already have existing keys to use.  Do not lose this or give it to your neighbor.  They will hack your ECS instance.ecsaws_22
  12. Launch Status.  Houston, we have a new instance in the cloud.  Time to make it rain ECS swift or S3 objects.ecsaws_21
  13. Select your new instance and copy your public address.  You will need this to navigate to your fabulous new cloud instance.  Tell the CIO you are using the cloud.  RUN!
    ecsaws_19
  14. Open terminal on your Mac.  PC users go get a Mac or an ssh client.  Execute:
    ssh -i <yourkey>.pem centos@<youraddressfromabove>
  15. yesecsaws_15
  16. Updates.  Yummy ->
    sudo yum update
  17. Install required bits ->
    sudo yum install git tar wget

Execute ECS Deployment Script

You can perform these steps on any docker capable machine with enough CPU and RAM.

  1. Clone the ECS repository from github:
    git clone https://github.com/EMCECS/ECS-CommunityEdition

    ecsaws_11

  2. cd ECS-CommunityEdition/ecs-single-node/

    ecsaws_10

  3. Get the device name for the volume you added and the NIC name.
    sudo fdisk -l

    then

    ifconfig -a

    Also note the IP address of eth0 for later.ecsaws_8

  4. Run the configuration script substituting what you have from the previous commands.  Here is what it looks like in my super CentOS machine in the cloud.
    sudo python step1_ecs_singlenode_install.py --disks xvdb --ethadapter eth0 --hostname <yourhostname>
  5. If you failed to mess up these steps you will eventually see a prompt instructing you to navigate to the web interface. See what I did there?
    1. SUCCESS!!  You just deployed a cloud-scale object platform with geo-distribution capabilities.  Nice.
    2. Put it on your resume.ecsaws_6
  6. You can verify the container is running with a
    sudo docker ps

    command.ecsaws_5

  7. You can now navigate to the web interface but be sure to continue on to the next section to configure the object settings.


ecsaws_4

Configure ECS with Script

  1. This section is quick.  One script.  You can do it. The main thing you need to be sure to change is the ECSNodes flag.  Use your NIC IP from above.  The other options can be customized if you want. ecsaws_2
    sudo python step2_object_provisioning.py --ECSNodes=10.0.1.10 --Namespace=ns1 --ObjectVArray=ova1 --ObjectVPool=ovp1 --UserName=emccode --DataStoreName=ds1 --VDCName=vdc1 --MethodName
  2. Let the script complete.  Notice it is using the ECS REST API to configure the virtual data center, virtual array, and more.
    ecsaws_1

Now that you have a fully functional Elastic Cloud Storage platform deployed, go push all of your objects up there.  You can use it for HDFS too.  Check out the ECS 2.0 Product Documentation Index to really geek out.  Also check out the handiwork of the EMC {code} team.

Deploying ViPR Services Docker Container on AWS

Update 7/2/15: EMC Software Defined Solutions team launched ECS 2.0 at DockerCon and announced the availability of a free community supported version of ECS available at emc.com/getECS.  This essentially replaced the project I refer to in this post.  Head over to my post that walks you through deploying ECS 2.0 in AWS or directly on CentOS.

EMC is all-in on Open Source, code sharing, and APIs to cater to developers. The @EMCCode team has many fantastic projects in the works.  Go check it out at http://emccode.github.io.  Some of them like this project I am writing about will be public very soon but for now it is available internal to EMC only.  Consequently, some details are not available here until the final public release.  This post walks you through everything to get you setup with an instance running Docker in AWS and then deploying the <secret_codename> Docker container that functions as a standalone ViPR Services instance.  You can use this to test S3, Swift, and Atmos object functionality.  Note that some ViPR Services capabilities are not available in this standalone instance.

If you have not played with Docker, this is a good introduction and will get you using your first container.

Deploy Docker in AWS

  1. Login to AWS Instance Wizard
  2. Select the Amazon Linux AMI or your preferred distribution.
  3. Choose the instance type.  You need 12-16GB of RAM for the instance.  Go with the  r3.large with 2 CPUs and 15GB of RAM.
  4. Go with the defaults on the Configure Instance Details page unless you have specific networking requirements.
  5. Review and click Launch. We will configure the security group later.
  6. Create a new key pair or use an existing one.  This is what you will use to connect to the new instance.  NOTE: I recommend setting up billing alerts at this point to prevent you from getting a surprise bill the size of your teenage daughter’s texting usage charges.
  7. Launch the instance.
  8. Browse to your instance(s), select the one you created, and click connect for instructions for connecting.
  9. SSH into your instance
    ssh -i key.pem ec2-user@<ip_address>
  10. Install Docker
    sudo yum install -y docker ; sudo service docker start

Deploy <Secret_Codename>

Now that Docker is deployed in your AWS linux instance you are ready to deploy the <secret_codename> Docker container.

  1. Download the container
    sudo docker pull emccode/<secret_codename>
  2. Next, run the container and bind the ports from the container to the host using “-p”
    docker run -tid -p 10101:10101 -p 10301:10301 -p 10501:10501 emccode/<secret_codename>:latest
  3. The container ID is displayed.  Replace <id> with the value.  This connects to the persistent bash session.  Use <ctrl-p> and <ctrl-q> to leave this session but keep the container running.
     docker attach <id>
  4. You just unleashed the ViPR.
  5. Open the above ports.
    1. Browse to your AWS instance and select the ViPR instance.
    2. Click on the Security Group name below to view the settings.
    3. Click Edit.
    4. Click Add Rule and then specify the ports we used earlier: 10101, 10301, 10501.  Configure the Source to a specific IP, range, or Anywhere.
    5. Save
  6. Get your secret key
    cat /StandaloneDeploymentOutput

Client Access

The below s3cmd examples are copied from https://github.com/emccode/<secret_codename>/blob/master/clients.md

  1. Install s3cmd
  2. Configure s3cmd
    • s3cmd --configure
  3. Configure s3cmd to use:
    • Access key: wuser1@sanity.local
    • Secrect key: YourSecretKeyHere
    • Encryption password:
    • Path to GPG program:
    • Use HTTPS protocol: no (in simulator version)
  4. Add HTTP Proxy:
    • HTTP Proxy server name: <ip_of_vipr>
    • HTTP Proxy server port: 10101

List buckets

  • s3cmd ls

Create bucket

  • s3cmd mb s3://new-bucket

Copy file to bucket

  • s3cmd put README.md s3://new-bucket

Copy file from bucket

  • s3cmd get s3://new-bucket/README.md test.md

Command help

  • s3cmd --help

15 minutes to Storage as a Service

In most organizations, IT is quickly coming to realize that they must become an efficient service provider that is easy and fast to transact with or their workloads will rapidly move to a public cloud provider.  Now before you start throwing stones, I am part of the crowd that strongly believes there are things that should and will not ever move to a public cloud service.  Key intellectual property is a good example.  That is another post….

So you want to build out a service catalog?  Where do you start?  How about a full-blown, heterogenous storage service catalog in 15 minutes that automates storage management?  That sounds like an excellent start!  Watch the video below as I install ViPR Controller as a vSphere vApp, configure ViPR, discover physical assets, create virtual assets, and finally order storage from the service catalog – in 15 minutes. Then visit a recent post where I will be regularly updating links to valuable resources for ViPR and ViPR SRM. I intentionally did not edit this video so you can see that even with the wait time involved with deploying the OVF, booting ViPR, waiting for services, and so on, you can accomplish this all easily and quickly.  An edited version with voice-over is coming soon.

I used virtual Isilon nodes which you can download here or by going to support.emc.com, navigating to support by product, search for Isilon OneFS, select Tools, and grab the virtual nodes.  I will post a walkthrough of how to deploy them and configure OneFS to use ViPR so you can play with both in a lab.  Note, this is for lab testing only.  Get ViPR for free with no time-bomb at emc.com/getvipr.

Software Defined Objects

I get a kick out of all of the ___-as-a-Service acronyms that the industry has invented.  We now have another common phrase – Software Defined <Data Center/Network/Storage>.  These are software based solutions that abstract and pool heterogenous hardware resources and then layer on intelligent software to provide an easy to consume service typically with rich REST APIs for programatic access.  Hence my creation, Software Defined Objects, to label just one of the capabilities of ViPR.  But first we must cover some basics about objects. We will see object in more and more enterprises as they transition to mobile, web, and big data applications.  Why is object gaining in popularity? Why would someone choose object over a filesystem?

First, a definition of what makes an object.  I am borrowing Chuck Hollis’ because I cannot state it any simpler.  You can find a very useful analogy on his post. An object looks like this:

  • an arbitrary amount of bits (data, code, whatever)
  • a unique identifier
  • an aribrary amount of metadata to describe it to different parties
  • and some sort of access method to invoke or use it

Access methods are typically REST API based.  Common examples out there are OpenStack Swift, Amazon S3, EMC Atmos, and Centera.

File, meet Object

If IT is fortunate, the DevOps team is requesting an object store for a new web application.  I say fortunate because often they are taking this object store out to a public cloud service like Amazon or rolling their own in a dreaded “Shadow IT” project.  I say embrace object and all of the next generation capabilities it has to offer.  Better yet, offer an object store to the dev team before they ask.  You are becoming the IT service broker after all. Here is why the developers want an object store as opposed to a filesystem.

Geo-Distribution

Objects know nothing about the location restrictions that filesystems must deal with on a daily basis.  While a filesystem is constrained by a single location, object stores can span distances allowing an object to be accessed anywhere and live in many locations globally.  Access anywhere with locality awareness that serves up the closest copy to the user based on their location is a huge advantage of object storage.  Imagine the listing on ebay.com with images that are made available to the potential buyer of an auction item who lives in Europe but the seller is in the US.  The buyer’s experience would be poor if they retrieved the images across the pond.  The user need not care about the location of the object but simply needs to request it from the namespace and let the object store do the heavy lifting of locating, retrieving, and serving the object from the nearest location if it is available.  Replication is core to the architecture of most object platforms for this reason and is not just a method of providing resiliency although that is a tremendous benefit.

Global Namespace

That last point requires the ability to have a global namespace across all locations.  http://www.emc.com/namespace/objectguid is an example.  No matter the geographic location of the object or application retrieving the object, it will still be found at the same namespace location.  Compare this to accessing a CIFS or NFS share that has a specific, single geographic location.

Meta-Data

Want to store key data about the objects/files you are storing.  Most file use cases out there today have some kind of relational database storing the metadata pointing back to the file.  This does not scale well when dealing with millions or billions of files.  Even worse are implementations I have witnessed where the files are actually embedded inside a database structure.  Unstructured data buried in a structured relational database.  Yuck!  Object stores combine the metadata with the object on the storage itself allowing for seemingly endless scale.

Scalability

There are scale-limiting factors of filesystems – overall size restrictions of a file system and number of files or directories.  Because objects are stored in a flat structure rather than directories, there is no need to store directory structure data which is often why filesystems begin to struggle at scale.

ViPR Object

ViPR Data Services (DS) layers various data structures on top of existing enterprise storage infrastructure – today VNX, Isilon, and NetApp NFS filesystems – and soon commodity hardware.  With this software-only solution that runs as a scale-out cluster of VMs, IT organizations now have a simple to deploy and manage method to provide Object and Hadoop as a service to their development teams.  The supported Object APIs in ViPR Data Services are Atmos, OpenStack Swift, and Amazon S3.  Keep in mind, the ViPR DS is not managing a standalone instance of OpenStack Swift but instead IS the object store itself.  Data traverses through ViPR DS.  This is a common misunderstanding when talking with customers and EMCers.  Expect to see future data services added such as File and Block – think ScaleIO as a ViPR DS!  Chad Sakac recently made it very clear that much of our portfolio will run in the data plane as a software only solution by the end of 2014.  My guess is there will be tight integration with ViPR controller to manage these software defined services.

Access methods battle it out

A powerful option ViPR DS provides is the ability to simultaneously access stored data via Object or HDFS.  Don’t battle over where or how data is stored or, worse, move massive amounts of the same data around for different purposes.  Think of the duplication of data that occurs if an application writes objects or files but then you want to run analytics in your Hadoop cluster.  Traditionally, this requires you to move the data from the object store or filesystem to the Hadoop cluster.  Holy duplication of data!  Instead, create the objects with the application via REST API and then access that same data in-place with the Hadoop cluster.  Therefore, no duplication of data and much faster access to the data as it exists in realtime.

Real World

Briefly, here are a couple observations from my interactions with customers.

  • Developers are more frequently demanding object rather than block or file storage.  DevOps shops love REST APIs rather than needing to deal with LUNs or file/folder structures in their code.
  • Many customers are pursuing OpenStack Swift as a means to leverage commodity hardware instead of proprietary storage systems even if object is not the data type they need.  This leads to interesting, maybe bizarre, solutions such as placing a file gateway in front of the object store.  Now you are back to the same limitations as before.  Pay attention to EMC World as you may hear “Commodity” mentioned.
  • Many pay-for products are sprouting up to assist with and provide enterprise support the free Open Source options.  SwiftStack and InkTank are a couple examples.  This goes along with my belief that while free Open Source projects may be a way to get to cheaper commodity platforms, often the effort to roll these out and consequently maintain them is overlooked.  Value added software options like ViPR can deliver these capabilities to the enterprise faster, easier, and in a supportable package.
  • Use cases include storing large amounts of unstructured data such as log files, images, and web content.

What are you seeing out there?

New venture at EMC for 2011

Today marked the beginning of a new venture as I logged my first day with EMC.  Thrilling, mentally tiring, and massive are words that describe my experience on the first day.  Thrilling as any new job should be, mentally tiring due to extreme information overload, and massive describes the new ecosystem I now belong to.

Here’s to 2011!