Hipi Essay

4096 Words Nov 22nd, 2015 17 Pages
HIPI: A Hadoop Image Processing Interface for Image-based MapReduce Tasks
Chris Sweeney

Liu Liu

Sean Arietta

Jason Lawrence

University of Virginia





images n-k....n Hipi Image

Map 1

Map i

Reduce 1




Reduce j

Figure 1: A typical MapReduce pipeline using our Hadoop Image Processing Interface with n images, i map nodes, and j reduce nodes



The amount of images being uploaded to the internet is rapidly increasing, with Facebook users uploading over 2.5 billion new photos every month [Facebook 2010], however, applications that make use of this data are severely lacking. Current computer vision applications use a small
…show more content…
Many image processing and computer vision algorithms are applicable to large-scale data tasks. It is often desirable to run these algorithms on large data sets (e.g. larger than 1 TB) that are currently limited by the computational power of one computer [Guo. . .
2005]. These tasks are typically performed on a distributed system by dividing the task across one or more of the following features: algorithm parameters, images, or pixels [White et al. 2010]. Performing tasks across a particular parameter is incredibly parallel and can often be perfectly parallel. Face detection and landmark classification are examples of such algorithms [Li and Crandall. . .
2009; Liu et al. 2009]. The ability to parallelize such tasks allows for scalable, efficient execution of resource-intensive applications.
The MapReduce framework provides a platform for such applications.

Keywords: mapreduce, computer vision, image processing


Basic vision applications that utilize Hadoops MapReduce framework require a staggering learning curve and overwhelming complexity [White et al. 2010]. The overhead required to implement such applications severely cripples the progress of researchers
[White et al. 2010; Li and Crandall. . . 2009]. HIPI removes the highly technical details of Hadoops system and provides users with the familiar feel of an image library with the access to the advanced resources of a distributed system [Dean and Ghemawat
2008; Apache

Related Documents