We at Slideshare had an event called ‘Hack Day’ in the beginning of this year. The idea of this event was to create an application in 24 hours that actually solves some existing problem or to develop a new useful feature for our product. The contest was among different teams and at the most 2 members were allowed in a team. I made a team with Mayank and we decided to take the challenge of real-time analytics.
Analytics is a huge area inside Slideshare and we could not complete all the analytics in 24 hours. We decided to focus on slideshow views in different geographical location based on country in real-time.
Next challenge was how to do this. Since we did not want to be stuck into any kind of maintenance hassles and scalability issue, we decided to go with the cloud for our data storage and processing purpose. Now there were a few options for us in cloud as well (I’m only talking about AWS cloud):
- Amazon SimpleDB(Sdb)
- Amazon Relational Database Service(RDS)
- Amazon Simple Storage Service (S3)
Since we didn’t want to just store data in cloud but to store them in efficient manner, S3 lost very early. Also we didn’t want to have any relational stuff for our purpose, we phased out the usage of RDS as well. Finally we decided to use SimpleDB for our purpose. SimpleDB uses a NoSQL key-value structure for storing data. Using NoSQL as a persistence store, and specifically key-value stores, provides a very efficient and scalable solution for logging in the cloud.
With Amazon SimpleDB, you store and query data items via simple web services requests, and Amazon SimpleDB does the rest. In addition to handling infrastructure provisioning, software installation and maintenance, Amazon SimpleDB automatically indexes your data, creates geo-redundant replicas of the data to ensure high availability, and performs database tuning on customers’ behalf. Amazon SimpleDB also provides no-touch scaling. There is no need to anticipate and respond to changes in request load or database utilization; the service simply responds to traffic as it comes and goes, charging only for the resources consumed. Finally, Amazon SimpleDB doesn’t enforce a rigid schema for data. This gives customers flexibility – if their business changes, they can easily reflect these changes in Amazon SimpleDB without any schema updates or changes to the database code.
(Above paragraph is copied from Amazon product page W/o any changes. I didn’t feel any changes required.)
In subsequent parts, we will know more about Amazon SimpleDB and a complete solution for analytics in cloud.
Subscribe - To get an automatic feed of all future posts subscribe here, or to receive them via email go here and enter your email address in the box. You can also like us on facebook and follow me on Twitter @akashag1001.