• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/25

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

25 Cards in this Set

  • Front
  • Back
Big Data
It is defined by the
- Volume
- Variety
- Velocity
- Veracity
- Variability
- Value
Architecture for Big Data Solutions
What slows BD initiatives
- Limitations of your current platform's processing power.
- Data sources does not comply with the data storage schema
- Need the freshest data on the reporting
- You want to work with a schema-on-demand data storage paradigm because of the variety of data types involved.
- The data is arriving so fast at your organization’s doorstep that your traditional analytics platform cannot handle it.
Critical Success Factors for BD Analytics
- A clear business need
- Strong, committed sponsorship
- Alignment between the business and IT strategy
- A fact-based decision-making culture
- A strong data infrastructure
- The right analytics tools
- Right people with right skills
BD Analytics Enablers
- In-memory analytics: Storing and processing the complete data set in RAM
- In-database analytics:Placing analytic procedures close to where data is stored
- MPP:Use of many machines and processors in parallel (MPP - massively parallel processing)
- Appliances:Combining hardware, software, and storage in a single unit for performance and scalability
Challenges of DB Analytics
- Data volumeThe ability to capture, store, and process the huge volume of data in a timely manner
- Data integrationThe ability to combine data quickly and at reasonable cost
- Processing capabilitiesThe ability to process the data quickly, as it is captured (i.e., stream analytics)
- Data governance (… security, privacy, access)- Skill availability (… data scientist)
- Solution cost (ROI)
Business Problems Addressed by BD Analytics
- Business Process efficiency and cost reduction
- Brand managementRevenue maximization, cross-selling/up-selling
- Enhanced customer experienceChurn identification, customer recruitingImproved customer service
- Identifying new products and market opportunities
- Risk management (especially banks and education)
- Regulatory complianceEnhanced security capabilities
MapReduce - Definition
MapReduce distributes the processing of very large multi-structured data files across a large cluster of ordinary machines/processors
MapReduce - Goal
Achieving high performance with “simple” computers usually running Windows or Linux.
MapReduce - Background
Developed and popularized by Google

Good at processing and analyzing large volumes of multi-structured data in a timely manner
MapReduce - Sample Application
Indexing the Web for search, graph analysis, text analysis, machine learning
Hadoop - Definition
Hadoop is an open source framework for storing and analyzing massive amounts of distributed, unstructured data
Hadoop - Background
Originally created by Doug Cutting at Yahoo!

Hadoop is now part of Apache Software Foundation


Hadoop -How it works
Hadoop clusters run on inexpensive commodity hardware so projects can scale-out inexpensively
How Does Hadoop Work?
- Access unstructured & semi-structured data
- Break data into “parts,” load them into a file system made up of multiple nodes running on commodity hardware using HDFS
- Each “part” is replicated multiple times & loaded into the file system for replication & failsafe processing
- node acts as the Facilitator & another as Job Tracker
- Jobs distributed 2 clients, & once complete, results r collected using MapReduce
Demystifying Hadoop
- Is about data diversity, not just data volume
- Hadoop complements a DW; it’s rarely a replacement
- Hadoop enables many types of analytics, not just Web analytics
HDFS, Hive and MapReduce Demystifying
- HDFS is a file system, not a DBMS
- Hive resembles SQL but is not standard SQL
- MapReduce provides control for analytics, not analytics
Data Scientist Skill
BD and DW
Big Data and RDBMS do not go nicely together

Will Hadoop replace DW/RDBMS?
Use Cases for Hadoop
Hadoop as the repository and refinery
Hadoop as the active archive
Use Cases for Data Warehousing
Data warehouse performance
Integrating data that provides business value
Interactive BI tools
When to Use Hadoop vs. DW
Big Data & Stream Analytics - Definition
Analytic process of extracting actionable information from continuously flowing/streaming data
Why Stream Analytics
It may not be feasible to store the data, or may lose its value
Stream Analytics Applications
- e-Commerce
- Telecommunication
- Law Enforcement and Cyber Security
- Power Industry
- Financial Services
- Health Services
- Government