As promised, I am back with another post on cloud computing. This time we will talk about storage in clouds and will try to find answers to following questions. Since I have worked on Amazon cloud storage called Simple Storage Service (Amazon S3), the figures and data in this post are true for Amazon S3 only but the concepts are true for any provider.

  1. What do we mean by Cloud Storage?
  2. What benefits does cloud storage provide?
  3. What disadvantages are there?
  4. Conclusion

What is Cloud Storage: Suppose you started a small startup with a few gigabytes of data to store and you setup your system accordingly. At a later stage are two possibilities:

  1. You did a very good job and make your business multifold. You would need new storage servers, more memory, and more money all at once. Cost arises suddenly so the infrastructure and maintenance cost.
  2. You goofed up and have to shutdown your business. What would you do with the entire infrastructure you built for starting? You feel like loosing money for all the hardware you earned during good days.

Solution both the cases is that in place of buying everything; rent it. Maintenance, infrastructure will automatically become lender’s problems; let the best guy do it for you against a nominal fee. Cloud storage is nothing but something which gives you space on demand. The best part is that cloud storage is elastic and you can use as much space as you want and just pay for the actual data. You need not worry about expanding or contracting your business.

Also you are not paying for extra hardware, you bought in optimistic anticipation or worrying about lesser hardware than required.

Benefits of cloud storage: In 4 words I will say its Bigger-Better-Faster-Cheaper…

Bigger coz there is no limit on data, pure elastic storage.

Better coz of redundancy, cloud-computing platforms provide. (up almost all the time). Focus on your core business Infrastructure becomes someone else's problem.

Faster coz it’s available on the fly and available on demand Provision via APIs not phone calls, fastest speeds is data I/O from clouds own computing infrastructure (EC2 in case of Amazon AWS)

Cheaper coz of no associated costs, Reduced need for capital, focus on OpEx not CapEx, Barrier to entry is much lower

Specific for Amazon S3:

  • Storage is organized in buckets

  1. Like a namespace for the objects it contains
  2. Accessible via http://bucketname.s3.amazonaws.com

  • It’s not file storage; it’s a key-value store

  1. Like a big hash table or dictionary
  2. Key-value pairs
  3. Accessible via http://bucketname.s3.amazonaws.com/keyname

  • Implicit BitTorrent seeding for all keys
  • 5GB limit for each key
  • Official API to operate on buckets in different languages and for different platforms.

Also, you can specify access for individual keys; whether this particular data is public or private and other custom access.

You may choose what kind of reliability you want; more reliable more cost, less reliable less cost.

Disadvantages of cloud storage: Biggest disadvantage is if it’s down you are down.

Other problems is that bandwidth cost is very high; if you are having very large size public keys in S3 and someone gets holds of url of those keys; he may make you bankrupt by continuously fetching data. So you should be very careful about access specifier and key distribution. Whenever possible, hide your keys. Also you have to trust other with your critical data; it’s outside your firewall.

Specific to S3:

  • Changing one byte means reinserting the entire object
  • Renaming (re-keying) an object also means reinserting
  • New beta API allows for object moves (copy) within a bucket
  • Various bugs in third-party apps and S3 itself
  • Inserting objects between 2 to 4 GB can be difficult
  • Bandwidth can be a significant barrier

Conclusion: Cloud storage is a great entry point for new entrant in the industry with minimum possible CapEx and no worries about maintenance and infrastructure. If used wisely, it can prove a boon but in novice hands, it may be a curse as well.

PS: Points under different heading are true not only for storage but for cloud-computing in general (leave specific points aside).

This is not all, there has to be other measure like private, public, hybrid platforms; distribution of data. I will cover the same in upcoming posts.


Subscribe - To get an automatic feed of all future posts subscribe here, or to receive them via email go here and enter your email address in the box. You can also like us on facebook and follow me on Twitter @akashag1001.