Title Of Assignment
A. Problem statement feasibility assessment using satisfiability analysis.
B. Mathematical Modeling.
A. Problem Statement :
Big data is generated from various ubiquitous system. This data is difficult to handle with single computer In order to reduce the computation time and decreases the storage space the workload is distributed on two or more computers. MapReduce is recent programming model for processing on data Parallel manner. The performance of MapReduce depends on how evenly it distributes the workload to the machines without skew. The workload distribution depends on the algorithm that partitions the data. For that need to be determine the workload of each reducer. The major problem is how partitioning data effectively on distributed system. So it is necessary to have a technique which addresses the problem of Data Skew and memory consumption.
Goals and Objectives
The goals of this project is to create Hadoop MapReduce …show more content…
TeraSort uses two-level Trie to partitioning the data i.e tree data structure for storing strings. In a two level, Trie only uses two characters of a string is considered during the partitioning phase. This is the reducing the load balancing of data. Another problem is that it degrades the performance of mapreduce job i.e node are running slow. So it is one of the issue with TeraSort method.
Xtrie:
The problem with TeraSort can be overcome by using Xtrie method. Problem faced with TeraSort is that it uses only two-level trie for partioning data which degrade the performance of mapreduce so in Xtrie method for each word in the trie it maintains the counter value. Using counter, partitioner can distribute the total number of keys among the reducers evenly.
Etrie:
To reduce the memory space of trie, algorithm ReMap is used which reduce the memory requirment by using Etrie