Result Analysis of Adaptive Replication Management Approach for Duplicate Data Management in Cloud Architecture

May 7, 2018
Posted by: RSIS
Category: Information Technology

Submission Deadline-30th April 2024

April 2024 Issue : Publication Fee: 30$ USD Submit Now

Submission Deadline-20th April 2024

Special Issue of Education: Publication Fee: 30$ USD Submit Now

International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue IV, April 2018 | ISSN 2321–2705

Result Analysis of Adaptive Replication Management Approach for Duplicate Data Management in Cloud Architecture

Vaishali Pandey^*1, Varsha Sharma^*2, Vivek Sharma^*3

^{1, 2, 3}Department of SoIT, RGPV, Bhopal, India

Abstract – This paper proposes an approach to dynamically replicate the data file based on the predictive analysis. With the help of probability theory, the utilization of each data file can be predicted to create a corresponding replication strategy. Eventually, the popular files can be subsequently replicated according to their own access potentials. For the remaining low potential files, an erasure code is applied to maintain the reliability. Hence, our approach simultaneously improves the availability while keeping the reliability in comparison to the default scheme. Furthermore, the complexity reduction is applied to enhance the effectiveness of the prediction when dealing with Big Data.

Keywords: Replication, HDFS, Proactive Prediction, Optimization, Bayesian Learning, Gaussian Process.

I. INTRODUCTION

Hadoop is an open source implementation of the MapReduce and includes a distributed file system (HDFS), wherever application information is often kept with replication. With replication, Hadoop provides high degrees of availability and fault-tolerance. Hadoop is additionally increasingly gaining quality and has verified to be scalable and of production quality by Facebook, Amazon, Last.fm, etc. In HDFS, information are split during a fixed size (e.g., 32MB, 64MB, and 128MB) and also the split information blocks (chunks) are distributed and kept in multiple information nodes with replication. Hadoop divides every MapReduce job into a group of tasks according to the number of data blocks.[2] Basically, the Hadoop scheduler assigns a task to a node storing the information block preferentially, however it should assigns a task to a node not storing the information often consistent with the Hadoop scheduling policy. Cloud computing could be a new computing paradigm that’s gaining increased quality. It permits enterprise and individual users to enjoy flexible, on demand and top quality services like high volume information storage and process while not the requirement to invest on expensive infrastructure, platform or maintenance.[2]