Analytics Trends

Back

You can start your big data project tomorrow

Jose Hernandez
Posted 5/16/16
2 Comments

Deploying a data warehouse or big data solution requires having the infrastructure in place to support the implementation.  That’s easier said than done.  Recently I embarked on a data warehouse project with a local client.  The project was greenfield and required a series of servers.  We would spend the first few weeks working through requirements and design; giving their infrastructure team some runway.  A few days into it, we learned that it would be two months before we would have our servers.  Nothing kills project inertia like waiting for servers.  There were many reasons given, and they were valid; selecting the appropriate hardware, burn-in period, installing server software, you get the picture.  Is this really necessary?  IT is not the core business for this organization.  Having and maintaining servers to support today’s technology solutions is complicated.  

I started considering all the planning, work and capex required for an organization to maintain an on premise data center.  I came to the conclusion that it doesn’t make sense for many organizations to “own” the infrastructure.  Then I thought about corporate 500 companies; does it make sense for them?  I can certainly make a case for larger companies.  The expenditures for supporting their own infrastructure would be a small fraction of their overall expenses.  Even so, how much infrastructure should they maintain on premise?  If this organization wanted to initiate a Big Data research project and needed a 1,000 node cluster, should they have enough infrastructure on hand to handle it?  It’s a research project with a lifespan of 3 months, then what would they do with all the hardware?  How about explaining to someone that you need to spend $1.5M and need 4 months to buy and deploy a 1,000 node Hadoop cluster for a 3-month research project?

It doesn’t make sense.  Especially if you consider what is possible with cloud computing, SaaS and IaaS.  We are well past being skeptical about our data being in the cloud.  Consider Salesforce.com; hundreds of thousands of subscribers trust them with their client information (that’s more valuable than gold).  And how many email accounts are there between Microsoft, gMail, and Yahoo?  That said, Microsoft Azure and AWS offer a very attractive alternative.  Let them worry about infrastructure, let you pay for what you need when you need it, and spin up even huge environments in hours or days (not months). 

Let’s go back to my stalled project.  During the initial discovery meetings, we determined that we needed a development, QA and production environment.  Great, we could spin up the development environment that same day, and when there is code ready to test we could spin up the QA environment.  During our requirements meetings we determined that a four node cluster was needed for production.  No problem, while QA is in progress we could spin up the production environment.  Oops, after some load testing we determined that we needed twice the horsepower.  No worries, spin up four more nodes.  You may think I am over simplifying it, but I’m not.  Without getting into the weeds, spinning up servers in Azure and AWS is as easy as a few clicks.  Compare that to ordering, installing, and configuring servers for your on premise computer room!


Let me make my case a bit clearer.  Let’s look at key benefits of analytics in the cloud:

  1. Time to market - (whether that market is internal or external) is faster, and you start reaping rewards faster.  
  2. Zero effort for infrastructure management -  Basically let Microsoft and Amazon worry about the hardware and software support issues.  I.e. network bandwidth, load balancing, hardware upgrades, power, cooling, etc.  
  3. Infrastructure best practices.  Microsoft and Amazon have a responsibility to do things right.  Not just for its customers, but for their own reputation.  Your infrastructure is not your worry, you can spend that energy on your core focus.
  4. No CapEx - Using Microsoft Azure or Amazon AWS means that the cost and maintenance of a very robust infrastructure and support team is shared among their many customers.  You pay for what you use and need – only when you need it.
  5. Scalability – quickly scale up and down without long term commitments 

Given all these benefits the choice is pretty clear.  I will always strive to convince my customers to build their analytics solutions in the cloud.  At least consider it, it’s the right thing to do.

Comments
Add Comment
christiana steves
Nice article, thanks for sharing the info on <a href=http://tekslate.com/sap-bi-training/>SAP BI</a>
Posted on 4/10/17 11:54 PM.
Bdata nam.R
When starting a Big Data analytics project, time is a very important issue. It may take from a few weeks to many years, and it depends on many factors, such as understanding the requirements, choosing the right technology, the complexity of the analytics and many more. An important thing to understand is that a big data analytics solution should be a business decision, not an IT decision. vISIT : http://aussieheadlines.com/technology/research-big-data-virtuous-circle-technical-progress/
Posted on 3/23/18 9:37 AM.

Contact Us

character(s) remaining