(Apache) Hadoop
About
Apache Hadoop is basically 3 things:
Image comes from Hortonworks
Hadoop:
scales, runs on commodity hardware where you setup a Hadoop cluster (a group of machines connected together over a network)
is fault tolerant.
is good for “embarassingly parallel” problems, such as building an index for the web.
Steps:
setting up a Hadoop cluster and then store huge amounts of data spread across all the machines into the HDFS
processing the scattered data by writing MapReduce programs (or jobs) that goes locally on each machine in order to not relocate the data.
Single threaded writes.
Use cases
Building
The BUILDING.txt in the code repository has all information to build it (also for windows)
Proposition
Oracle: Big Data Appliance
In Oct 2011, Oracle announced the Big Data Appliance, which integrates:
Cloudera's Distribution Including Hadoop (CDH),
Oracle Enterprise Linux,
the R programming language,
and a NoSQL database
with the Exadata hardware.
Azure
Azure - HDInsight (Microsoft's Hadoop)
Documentation / Reference