Big Data Fundamentals with Hadoop and Spark

Name: Big Data Fundamentals with Hadoop and Spark
Rating: 4.5 (42 reviews)

This course is part of multiple programs. Learn more.

Discover the power of big data technologies with IBM's foundational course. Learn to process and analyze massive datasets using industry-standard tools like Hadoop and Apache Spark. Explore distributed processing, parallel programming, and data parallelism concepts. Master practical skills in PySpark, Spark SQL, and streaming analytics. Perfect for IT professionals looking to understand big data processing tools and their applications. Gain hands-on experience with real-world scenarios and learn to leverage these technologies for efficient data analysis.

4.5

(42 ratings)

14,897 already enrolled

Instructors:

Aije Egwaikhide

Karthik Muthuraman

English

This course includes

6 Weeks

Of Self-paced video lessons

Beginner Level

Completion Certificate

awarded on course completion

8,650

Audit For Free

Add to compare

What you'll learn

Master fundamental concepts of big data and its impact on organizations

Understand Hadoop architecture and ecosystem components including HDFS and MapReduce

Develop skills in Apache Spark programming and parallel processing

Gain practical experience with PySpark and Spark SQL applications

Skills you'll gain

Big Data

Apache Hadoop

Apache Spark

PySpark

Spark SQL

HDFS

MapReduce

Data Parallelism

Distributed Computing

Apache Hive

This course includes:

PreRecorded video

Graded assignments, exams

Access on Mobile, Tablet, Desktop

Limited Access access

Shareable certificate

Closed caption

Get a Completion Certificate

Share your certificate with prospective employers and your professional network on LinkedIn.

Created by

IBM

Provided by

Edx

Top companies offer this course to their employees

Top companies provide this course to enhance their employees' skills, ensuring they excel in handling complex projects and drive organizational success.

There are 7 modules in this course

This course provides a comprehensive introduction to big data technologies and practices. Students learn about the fundamentals of big data processing, including parallel processing, scaling, and data parallelism. The curriculum covers major platforms like Hadoop and Spark, exploring their architectures, components, and applications. Through hands-on labs and practical exercises, participants gain experience with distributed file systems, MapReduce, PySpark, and Spark SQL. The course also covers advanced topics like performance monitoring and tuning, making it valuable for aspiring data engineers and IT professionals.

What is Big Data

Module 1

Introduction to the Hadoop Ecosystem

Module 2

Introduction to Apache Spark

Module 3

DataFrames and SparkSQL

Module 4

Development and Runtime Environment Options

Module 5

Monitoring and Tuning

Module 6

Final Quiz

Module 7

Fee Structure

Individual course purchase is not available - to enroll in this course with a certificate, you need to purchase the complete Professional Certificate Course. For enrollment and detailed fee structure, visit the following: Data Engineering, NoSQL, Big Data and Spark Fundamentals

Instructors

Aije Egwaikhide

4.3 rating

87 Reviews

6,31,843 Students

6 Courses

Data Scientist Aije Egwaikhide: Empowering Women in STEM and Innovating AI Solutions at IBM

Aije Egwaikhide is a fantastic example of how dedication and passion can lead to a successful career in tech! With her background in Economics and Statistics, paired with advanced qualifications in Business and Management Analytics, she’s truly paving the way in the field of data science. Her work at IBM, particularly in creating innovative machine learning solutions for the Oil and Gas sector, is an inspiring achievement.

Karthik Muthuraman

2 Courses

A Distinguished AI Engineer Advancing Open Source Machine Learning

Karthik Muthuraman serves as a Data Scientist and Developer Advocate at IBM's Center for Open Source Data & AI Technologies (CODAIT), where he focuses on democratizing AI through open-source technologies. After earning his Master's degree in Electrical and Computer Engineering from the University of Michigan, Ann Arbor, with a focus on machine learning and computer vision, he has established himself as an expert in deep learning and AI systems. His work at CODAIT includes developing open-source deep learning models, contributing to frameworks like TensorFlow, and creating innovative applications such as automatic image cropping and age estimation systems

This course includes