Big Data

Mastering Big Data Analytics

4.67 (411 Ratings)


Skill level


Course cost

About this course

Today, we’re surrounded by data. People upload videos, take pictures on their cell phones, text friends, update their Facebook status, leave comments around the web, click on ads, and so forth. Machines, too, are generating and keeping more and more data. To process such large datasets, there is a need for specialized tools.

This course covers two important frameworks Hadoop and Spark, which provide some of the most important tools to carry out enormous big data tasks.The first module of the course will start with the introduction to Big data and soon will advance into big data ecosystem tools and technologies like HDFS, YARN, MapReduce, Hive, etc.

In the second module, the course will take you through an introduction to spark and then dive into Scala and Spark concepts like RDD, transformations, actions, persistence and deploying Spark applications. The course also covers Spark Streaming and Kafka, various data formats like JSON, XML, Avro, Parquet and Protocol Buffers.

Skills covered

  • check Map reduce
  • check HDFS
  • check YARN
  • check Hive
  • check Apache Hadoop
  • check Spark and advanced spark
  • check Pyspark
  • check Kafka
  • check Spark streaming
  • check Spark SQL
  • check Spark MLIB

Course Syllabus

Hadoop : Master your Big data

  • play Big data touch
  • play Getting started: Hadoop
  • play Hadoop framework : Stepping into Hadoop
  • play HDFS: What and Why?
  • play Working on HDFS
  • play Hadoop 2.x - YARN
  • play Mapreduce: A Programming paradigm
  • play Closer look to Map reduce
  • play Practical approach to Map reduce
  • play Hadoop 1.x vs Hadoop 2.x
  • play Hadoop 3.x

Hive: Big data SQL

  • play Apache hive : Teasing the Honey bee
  • play Hive illustration : Basics
  • play Hive Illustration : External tables in hive
  • play Hive illustration : Loading different file formats
  • play Hive illustration : Loading data into Hive tables
  • play Hive illustration : Simple Operations on Hive table
  • play Hive illustration : Query Operations on Hive table
  • play Hive illustration : Querying complex structures
  • play Hive illustration : Views

Spark : Stream and analyze the big data

  • play Getting started - Spark Basics
  • play Spark and Hadoop - Face to face
  • play Spark - Architecture
  • play RDDs - Building blocks of Spark
  • play RDDs continued
  • play Spark Terminologies
  • play Pyspark - Getting hands dirty
  • play Spark - MLlib
  • play Pyspark - Clustering
  • play Music data - Study the case - 01
  • play Music data - Study the case - 02
  • play Music data - Study the case - 03
  • play Spark streaming and Real time data analytics
  • play Spark streaming Architecture
  • play Real-time Data Analysis on Twitter Data : Demo
  • play Case study - Ad tech - 01
  • play Case study - Ad tech - 02

Apache Kafka - A distributed streaming platform

  • play Kafka - What and Where?
  • play Kafka - Key components_Broker_Producer
  • play Kafka - Key components_Topics_Partitions
  • play Kafka - Key components_Consumer_Replicas
  • play Kafka - APIs and Clusters
  • play More fun with Kafka
  • play Zookeeper - Basic principles
  • play Live Kafka demo with Twitter

Advanced Spark

  • play Configure the Spark
  • play Spark Properties
  • play Performance Tuning
  • play Data serialization
  • play Memory tuning
  • play Garbage collection
  • play Memory usage and levels of parallelism
  • play Data locality and broadcasting
  • play Job scheduling
  • play Modes in cluster management
  • play Dynamic resource allocation
  • play Decommission of executors
  • play Application schedule


Yellow Taxi trip analysis using Hive

The NYC taxi trip Analysis project is as elite as it sounds. The dataset is well designed to put your big data skills to the ultimate test. The project will untie your potential to hone as well as master exploratory data analysis on the given dataset. The ultimate aim of the project is to derive the highest possible revenue figures using Hadoop and Hive.

Sentiment Analysis on Twitter in Real Time

With over 500 million tweets wrapped up in 280 words, Twitter is the home to one of the crispest and concisely written content on the web. From space tweets to ( Lebron James’ on chicken nuggets OR Donald Trump’s infamous ‘covfefe’ tweet), it hosts ideas, comments, and sentiments with minimum jargons and more information. This makes it an ideal platform for Sentiment Analysis using Machine Learning. This project will enable you to run analysis on real-time tweet data, derive opinions and understand trends on a gamut of trending topics across the globe, and obtain a riveting visual plot using PySpark

Course Certificate

Get Mastering Big Data Analytics course completion certificate from Great learning which you can share in the Certifications section of your LinkedIn profile, on printed resumes, CVs, or other documents.

GL Academy Sample Certificate

Great Learning Academy - Free Online Certification Courses

Great Learning Academy, an initiative taken by Great Learning to provide free online courses in various domains, enables professionals and students to learn the most in-demand skills to help them achieve career success.

Great Learning Academy offers free certificate courses with 1000+ hours of content across 100+ courses in various domains such as Data Science, Machine Learning, Artificial Intelligence, IT & Software, Cloud Computing, Marketing & Finance, Big Data, and more. It has offered free online courses with certificates to 500,000+ learners from 140 countries. The Great Learning Academy platform allows you to achieve your career aspirations by working on real-world projects, learning in-demand skills, and gaining knowledge from the best free online courses with certificates. Apart from the free courses, it provides video content and live sessions with industry experts as well.

popup asset

Welcome to Great Learning Academy