So to tackle the issue of parallelization, fault tolerance and distribution of data, they acquired the map and reduce primitives. The use of a functional model with user-specified map and reduce operations allows us to parallelize large computations easily and to use re-execution as the primary mechanism for fault tolerance.
The programming models would accept the key/value pair as input and produces the …show more content…
Failing that, it attempts to schedule a map task near a replica of that task’s input data. When running large MapReduce operations on a significant fraction of the workers in a cluster, most input data is read locally and consumes no network bandwidth
The task granularity is performed by subdividing the map phase by M pieces and reduce phase into R pieces. Ideally, M and R should be much larger than the number of worker machines. Having each worker perform many different tasks improves dynamic load balancing and also speeds up recovery when a worker fails: the many map tasks it has completed can be spread out across all the other worker machines.
The backup tasks, as mentioned earlier in a scenario, where the system was not reachable the map-reduce library kept on switching the workers and trying to do the job. This would in-turn would increase the amount of time required to perform the task. MapReduce operation is a straggler, that is, a machine that takes an unusually long time to complete one of the last few map or reduce tasks in the computation. Stragglers can arise for a whole host of