Mapreduce Rendezvous Hasing Based Virtual Hierarchies: The Cassandra Nosql Case Study

Improved Essays
The gradual transformation in data quantity has resulted in emergence of the Big data and immense datasets that need to be stored. Traditional relational databases are facing many difficulties meeting the requirements of the volume and heterogeneity structure of big data. NoSQL databases are designed with a novel data management system that can handle and process huge volumes of data. NOSQL systems provide horizontal scalability by supporting horizontal data partitioning across heterogeneous nodes. In this paper, a MapReduce Rendezvous Hashing Based Virtual Hierarchies (MR-RHVH) framework is proposed for scalable partitioning of Cassandra NoSQL databases. MapReduce framework is used to implement MR-RHVH on Cassandra to enhance its scalability …show more content…
MR-RHVH framework apply the distributed structure of nodes using MapReduce, that equally partition data amongst Cassandra nodes using mapper and reducer functions. The MR-RHVH framework consists of three main layers, as shown in figure 2, Cassandra/Hadoop Cluster, MRRHVH, Cassandra/Hadoop Data Center and Hadoop/MapReduce applier.
Cassandra clients ' nodes are distributed in this data center. Task Tracker and Data Node services run on each Cassandra node/client in the data center. The Task Tracker accept tasks from job tracker and then recall data needed from data node. The Data Node used to provide task trackers with the required data, using HDFS in MapReduce layer. Cassandra/Hadoop Cluster:
The Cassandra master node resident in this layer. In Cassandra/Hadoop Cluster, the Job Tracker service, running on the master nodes, is used to coordinate job requests sent to and from the Task Trackers in client 's node using MapReduce. The name node in the Cassandra master node used to save a list of all files in data center, and search for the node that keep the file or have the capability to save a file. Name node considered as a single Point of Failure (SPOF) in Hadoop/MapReduce, as when the Name Node failed, the whole system goes down. A Converter Name Node (CNN) module is used to solve this SPOF in next

Related Documents

  • Improved Essays

    Nt1310 Unit 4 Test Paper

    • 419 Words
    • 2 Pages

    Suited for small operations that don’t require large amounts of storage space. Also suited for operations that are critical requiring high availability and no downtime. 1. Improved…

    • 419 Words
    • 2 Pages
    Improved Essays
  • Improved Essays

    This protocol is classified into rounds; every round composed of two phases; Set-up Phase (1) Advertisement Phase (2) Cluster Set-up Phase Steady Phase (1) Schedule Creation (2) Data Transmission A. Setup Phase: Every node selects independent of other nodes if it will become a CH or not. This selection takes into consideration when the node behaved as a CH for the last time. In the advertisement phase, the CHs inform their neighboring node with an advertisement packet that they become CHs. Non-CH nodes take the advertisement packet with the strongest obtained signal strength.…

    • 547 Words
    • 3 Pages
    Improved Essays
  • Improved Essays

    Microsoft Hyper-V Server 2012 R2 is a server designed for virtual machine (VM). Virtualization is creation include storage, device, operation system as virtual rather than physical. Almost similar to VMware vSphere as virtual machine. VMware vSphere is cost for VMware install on server. But the features will be available than Microsoft Hyper-V Server 2012 R2 because it is free virtual machine for stand-alone.…

    • 637 Words
    • 3 Pages
    Improved Essays
  • Improved Essays

    CPSC 558 — Adv Networking — Lab Assignment – Simple Data Link A1: Simple Data Link Introduction Build one data line between two nodes, that can send and receive a text string of ‘1’s and ‘0’s. We transmit the string "10010" in one direction and reply with the string "01101". Implementation Language: C++ (or C) Implementation Structure…

    • 888 Words
    • 4 Pages
    Improved Essays
  • Decent Essays

    Step1: Start the program Step2: Initialize the nodes by fixing the number of nodes, type of antenna used, type of routing protocol and plotting circumference Step3: Frequency is allocated for the MIMO antennas. Step4: Positioning and plotting the nodes Step5: Base Bandwidth allocation for primary and secondary nodes • Primary network range-…

    • 306 Words
    • 2 Pages
    Decent Essays
  • Improved Essays

    Distance And Age Of M52

    • 1078 Words
    • 4 Pages

    The upper point on the main sequence which is the most densely populated with stars where the red giants seem to begin is called the turnoff point; the exact location of turnoff point indicates the age of the cluster. Deriving the distance and age of M52 We have already identified main sequence, turnoff point, and giant red for M52 shown in figure by comparing with Figure 1. A best fit line this will help to find the distance to open cluster M52 by using the techniques of best fit line.…

    • 1078 Words
    • 4 Pages
    Improved Essays
  • Improved Essays

    Nt1330 Unit 1 Paper

    • 521 Words
    • 3 Pages

    The rational for having Rouge One communications data centralize on a file server using a Distributed File System (DFS) is simple. When the data is centralized, instead of being spread out and existing on multiple computers will make it be easier to manage. The way this is going to be solve is a backup of the user data will be taken. This is to ensure that nothing will be lost. Then a Distributed File System (DFS) will be setup on the Rouge One communications file server.…

    • 521 Words
    • 3 Pages
    Improved Essays
  • Superior Essays

    Nt1330 Unit 7 Exercise 1

    • 756 Words
    • 4 Pages

    6 8. The following parameters are calculated for each of the node in each of the server wings: Voltage, Temperature, Fan Speed, CPU Utilization. After we calculate the theoretical values of the parameters we calculate the threshold value using the above, if the calculated value exceeds the threshold value there is a chances of the node to fail, and hence we take the previously mentioned migration policies to tackle the situation.…

    • 756 Words
    • 4 Pages
    Superior Essays
  • Decent Essays

    Compute ri0 r j = r jri0 .  Put the value of authentication phase: Node Ni chooses a random integer a 2 Z q to compute the public key aP and ri0Y, and then sends (Ni; aP; ri0Y; kiP) to the node Nj.  Node Nj computes the chameleon hash value CH0 BS = f (m; kiP)kiP + ri0Y of node Ni based on the received message.  Node Nj select random no.…

    • 144 Words
    • 1 Pages
    Decent Essays
  • Great Essays

    If the data is not stored appropriately, then the information that is repetitive in nature will be processed again and again. This will result in high processing time of data that could have been stored for future use. Thus, a robust infrastructure is needed that will be able to not only store high volumes of data but also be able process that data in very quick time frame. The large amount of data generated by the IoT devices also brings about the issue of security.…

    • 1436 Words
    • 6 Pages
    Great Essays
  • Improved Essays

    Nt1310 Unit 3 Study Guide

    • 703 Words
    • 3 Pages

    11. OLAP: an online analytical processing (OLAP) engine, responsible for data discovery, including capabilities for limitless report viewing, complex analytical calculations, and predictive “what if” scenario (budget, forecast) planning. It performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modeling. 12. “Drill down”: To drill down means to go through the hierarchy of folders to find a specific file or to click through drop-down menus in a GUI or access database.…

    • 703 Words
    • 3 Pages
    Improved Essays
  • Great Essays

    Nt1330 Unit 3 Quiz

    • 1731 Words
    • 7 Pages

    Please note that, we only discuss couple of options for each command to get the familiarity, and get you going with your learning. At the end of each command, there is a link to the command reference, where we discuss the most relevant and practical usages of the commands. It not practical to discuss all the options available for each command we discuss. We recommend you to refer to the man page of the command on your Linux system. Listing Files…

    • 1731 Words
    • 7 Pages
    Great Essays
  • Improved Essays

    To represent the constraints on the host level (and implicitly on the level of feature packs), and uses a dependency matrix. Figure represents an example of a dependency matrix that shows the dependencies between the different hosts, as well as restrictions on these dependencies. The first column and the first row represent a file (database). Each cell contains a set of pairs (C, H), where C indicates the constraint and H indicates the type of dependency. A value of 2 means a strong dependency and 1 indicating a weak dependency.…

    • 477 Words
    • 2 Pages
    Improved Essays
  • Improved Essays

    1) Eavesdropping: It can be defined as secretly listening to the private conversation of others without their consent. Here an attacker can choose to passively eavesdrop on the network communication and steal the data. Through passive eavesdropping attackers apparently eliminate their presence in the network and make such attacks difficult to detect. The goal of such an attack is to violate the confidentiality of the communications by intercepting the network and sniffing or listening to the routing packets. Also, an adversary can actively influence the communication channel by disrupting, jamming or modifying the network packets and/or inserting false packets into the network.…

    • 869 Words
    • 4 Pages
    Improved Essays
  • Great Essays

    Case Study Assignment – I Campbellsville University MASSIVE DATABASE MASTERING - MASTERCARD INTERNATIONAL Various affiliations are endeavoring to address the open entryways and limit challenges related with "huge data." Industry masters gage that the total volume of data is increasing at general interims and most by a wide margin of new data being delivered is prepared to go spaces. MasterCard Universal (www.mastercard.com) is not any more impossible to miss to think about the issues identified with monstrous databases. MasterCard has amassed a data circulation focus that is more than 100-terabytes in size. Insiders expect that it will create to more than 1.8 petabytes.…

    • 933 Words
    • 4 Pages
    Great Essays