Member-only story

Hadoop Ecosystem in a Nutshell

Ishan Liyanage
8 min readMay 14, 2021

--

Lets discuss this in very high-level manner and try to see how these technologies work together. Also lets try to understand what all these cryptic names in the Hadoop ecosystem really mean and what everything is for at a very high level.

Major Components

Lets briefly touch on all these different technologies. We can group them in to three major areas,

  1. Core Hadoop Ecosystem
  2. External Data Storage
  3. Query Engines

Core Hadoop Ecosystem

PINK colored things are part of Hadoop itself. Everything else is soft of add on projects that have come out over time that integrate with it so we can solve specific problems.

HDFS

This stands for the Hadoop Distributed File System. This is the system that allows us to distribute the storage of big data across our cluster of computers. So it makes all of the hard drives on our cluster look like one giant file system. It also maintains redundant copies of the data. If one of the servers happens to crashed, it can actually recover from that and it will back itself up to…

--

--

Ishan Liyanage
Ishan Liyanage

Written by Ishan Liyanage

Passionate Technical Lead, Senior Software Developer and free and open source software advocate. Based in Singapore.

No responses yet

Write a response