An Effective Big Data Role of Data Storage and Job Tracking Analysis through the HDFS

Submission Deadline-29th March 2024
March 2024 Issue : Publication Fee: 30$ USD Submit Now
Special Issue of Education: Publication Fee: 30$ USD Submit Now

International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue III, March 2018 | ISSN 2321–2705

An Effective Big Data Role of Data Storage and Job Tracking Analysis through the HDFS

S.Janardhan1, Rajeshkumar.P2, B.Madhu Sudhan Reddy3

IJRISS Call for paper

  1Student, Master of Computer Applications, SKIIMS, Srikalahasti, Andhra Pradesh India
2Research Scholar, Computer science, Bharathiar University, Coimbatore, India
3Asst.Professor, Master of Computer Applications, SKIIMS, Srikalahasti, India

Abstraction: – Bigdata inspires new ways to transform processes, organizations, entire industries and even society itself. Yet extensive media coverage makes it hard to distinguish hype from reality. Big data is a collection of massive and complex data sets that include the huge quantities of data, social media analytics, data management capabilities, real-time data. Big Data includes e-mail messages, snaps, business deals, surveillance video recordings, posts of social Medias, mobile phone GPS signals and RFID readers, microphones, cameras, sensors and activity logs. That’s why, the data that exceeds the processing capacity of conventional database systems. Big data processing such as Hadoop which uses the map-reduce paradigm. Using MapReduce programming paradigm, the big data is processed. Whatever the label, organizations are starting to understand and explore how to process and analyze a vast array of information in new ways. This paper will explain about HDFS architecture, high volume of data storage and query processing.

Key Words: Hadoop, HDFS, Name Node, Secondary Name Node, JobTracker, DataNode, Task Tracker

I. INTRODUCTION

Hadoop is an ecosystem of open source components that fundamentally changes the way enterprises store, process, and analyzes data. Unlike traditional systems, Hadoop enables multiple types of analytic workloads to run on the same data, at the same time, at massive scale on industry-standard hardware. CDH, Cloudera’s open source platform, is the most popular distribution of Hadoop and related projects in the world (with support available via a Cloudera Enterprise subscription). The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS[1] is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nautch web search engine project.