Thoughts / Blog‎ > ‎

Hadoop - The Google File System For The Rest Of Us

posted Oct 13, 2009, 6:08 PM by Matt Lankford   [ updated Mar 23, 2010, 10:12 AM ]



Hadoop is a project created by Doug Cutting and is sponsored by the Apache Foundation that allows you to build a massive storage and processing cluster.... for just the cost of the computers and / or running them.

Amazon Web Services even has a service where you can uses theirs, paying only for what you use.

The inspiration for hadoop came from the way Google created their system. They created a system that allows them to add nodes without the expense of premium hardware...with out the worry of a system going down here or there.

Hadoop takes a little while to get your mind around because it is very different than what most people are used to.

Basically, hadoop is a thing you use when your computing problems are bigger than your computer. It allows you to store and process vast amounts of data in seconds.

The big problem for most developers is that you have to think about the problems you are attempting to solve differently...

The best way I can describe it would be something like this... take a stack of papers and hand a group of people each a single sheet... have them count the words on the page they have... that makes the problem as fast as a slowest person counting words...

Hadoop can also help with security by keeping data in what are called "shards"... the data is not as easy for the casual passerby to use... I would not count on this.. but it is a nice little side benefit...

If you have problems that are bigger than you think a single computer can handle... check it out...
Comments