Ejaz Urval - 667116837
Jobert Correia - 668724826
Niharika Dhawad - 661340878
Rutuja Desai - 669292858
Table of Contents
Introduction to MongoDB 3
Features of MongoDB 3
Drawbacks of MongoDB 4
SQL vs MongoDB 4
Sharding and Replication 5
Implementation 6
Retrieval of live tweets 7
Conclusions 9
References Error! Bookmark not defined.
Introduction to MongoDB
The traditional relational database proved to be insufficient when it came to store unstructured large amount of data by companies like Facebook, Google, Amazon.com. Such databases which were used for storage and retrieval of this unstructured data existed since late 1960s but they gained popularity …show more content…
It allows developers to focus on programming, rather than scaling, since the design already includes automated scaling and management of hardware and software infrastructure. MongoDB is completely based on concept of collection of documents which are nothing but key-value pairs. It supports APIs in various computer languages like JavaScript, Python, Ruby, Java, C++, Perl.
Features of MongoDB MongoDB has the following features which are apt for NoSQL databases and that is why is most widely used for NoSQL database management. Schema-less: This is the major reason why companies went for NoSQL as it is schema-less and thus can store unstructured data efficiently. The data can be stored in any structure and few key-value pairs may be empty or have different data types all in same document. Thus, a document with integer type id can have another key-value pair with char data type. This makes the NoSQL database easier to update, delete and make the required changes to the data. Sharding and Replication: This allows MongoDB to store and balance load of data across multiple servers called shards and replicate them to prevent loss of …show more content…
Shard(S): Shards are the divided servers which store the data( S_11……...〖 S〗_42 ) Config servers (C): Each sharded cluster must have its own config server. These servers keep information of the location of data on the shards.
Replication (2): It is the process of loading the same data across multiple servers. This is done to increase data availability and avoid loss of data. In MongoDB, replica sets are used for this. The replica sets host the same data for replication purpose. Two types of nodes are present here- Master(primary) node and Slave(secondary) node. The master node receives all the write operations while read operations can be performed from both, the master and slave nodes.
Implementation
MongoDB has a paid cloud based service called Atlas (about 50 cents an hour). The entire project was done on one system. The implementation of sharding was on different partitions of the same system as it was difficult to sync servers on 32 and 64 bit systems between different operating systems. The entire project was carried out on the Eclipse IDE and written in Java. The Twitter4J library and MongoDB Java driver were