Member-only story
Hadoop Ecosystem in a Nutshell

Lets discuss this in very high-level manner and try to see how these technologies work together. Also lets try to understand what all these cryptic names in the Hadoop ecosystem really mean and what everything is for at a very high level.
Major Components
Lets briefly touch on all these different technologies. We can group them in to three major areas,
- Core Hadoop Ecosystem
- External Data Storage
- Query Engines
Core Hadoop Ecosystem
PINK colored things are part of Hadoop itself. Everything else is soft of add on projects that have come out over time that integrate with it so we can solve specific problems.

HDFS
This stands for the Hadoop Distributed File System. This is the system that allows us to distribute the storage of big data across our cluster of computers. So it makes all of the hard drives on our cluster look like one giant file system. It also maintains redundant copies of the data. If one of the servers happens to crashed, it can actually recover from that and it will back itself up to…