Before understanding what is HDFS first I would like to explain what is distributed file system.
What is Distributed File System?
As you know that each physical system has its own storage limit. And when it comes to store lots of data then we may need more than one system, Basically a network of systems. So that the data can be segregated among various machines which are connected to each other through a network. Such type of management in order to store bulk of data is known as distributed file system.
What is HDFS – Hadoop Distributed File System?
Hadoop has its own distributed file system which is known as HDFS ( renamed from NDFS).
- Hadoop doesn’t requires expensive hardware to store data, rather it is designed to support common and easily available hardware.
- It is designed to store very very large file( As you all know that in order to index whole web it may require to store files which are in terabytes and petabytes or even more than that). Hadoop clusters are used to perform this task.
- It is designed for streaming data access.
Hadoop file system
1) Local: This file system is for locally connected disks.
2) HDFS: Hadoop distributed file system: Explained above
3) HFTP: The purpose of it to provide read-only access for Hadoop distributed file system over HTTP.
4) HSFTP: It is almost similar to HFTP, unlike HFTP it provides read-only on HTTPS.
5) HAR – Hadoop’s Archives: Used for archiving files.
6) WebHDFS: Grant write access on HTTP.
7) KFS: Its a cloud store system similar to GFS and HDFS.
8) Distributed RAID: Like HAR it is also used for archival.
9) S3: A file system provided by Amazon S3
HDFS Cluster Nodes
HDFS cluster has two nodes:
It basically stores the name and addresses of datanodes. It stores the data in form of a tree. Without Namenodes this whole system of stroing and retrieving data would not work as it is responsible to know which data is stored where.
Datanodes are used to store the data in form of blocks. They store and retrieve data in form of data blocks after communication with Namenodes.