Learn the fundamentals of Big Data including key concepts, frameworks, and Hadoop implementation. Perfect for beginners exploring data science applications.
Learn the fundamentals of Big Data including key concepts, frameworks, and Hadoop implementation. Perfect for beginners exploring data science applications.
This introductory course is designed for those new to data science who want to understand the Big Data landscape. You'll explore why the Big Data era has emerged and how it's transforming businesses and careers. The curriculum covers the fundamental concepts behind big data problems, applications, and systems, including the three key sources of Big Data: people, organizations, and sensors. You'll learn about the six V's of Big Data (volume, velocity, variety, veracity, valence, and value) and how each impacts various aspects of data management. The course provides a structured 5-step process for approaching data science problems and extracting value from Big Data. You'll also gain an introduction to Hadoop, one of the most common frameworks that has made big data analysis more accessible. Through practical assignments, you'll even get hands-on experience installing and running programs using Hadoop, preparing you to apply Big Data concepts in real-world scenarios.
4.6
(10,919 ratings)
3,30,290 already enrolled
Instructors:
English
پښتو, বাংলা, اردو, 5 more
What you'll learn
Describe the Big Data landscape and identify real-world big data problems
Explain the six V's of Big Data and their impact on data collection, storage, and analysis
Apply a 5-step process to structure Big Data analysis and extract value
Distinguish between big data problems and traditional data science questions
Explain the architectural components used for scalable big data analysis
Summarize the features and value of core Hadoop components
Skills you'll gain
This course includes:
3.5 Hours PreRecorded video
6 assignments
Access on Mobile, Tablet, Desktop
FullTime access
Shareable certificate
Closed caption
Get a Completion Certificate
Share your certificate with prospective employers and your professional network on LinkedIn.
Created by
Provided by

Top companies offer this course to their employees
Top companies provide this course to enhance their employees' skills, ensuring they excel in handling complex projects and drive organizational success.





There are 6 modules in this course
This comprehensive introduction to Big Data covers both theoretical concepts and practical applications. The course begins by exploring the origins and value of Big Data across various industries, including healthcare and business. Students learn about the key characteristics of Big Data through the six V's framework: volume, velocity, variety, veracity, valence, and value. The curriculum then shifts to a practical approach with a 5-step data science process that helps students understand how to extract meaningful insights from massive datasets. Foundation modules cover distributed file systems, scalable computing, and programming models essential for Big Data processing. The course culminates with hands-on experience using Hadoop and MapReduce, giving students practical skills they can immediately apply. Throughout the modules, students engage with real-world examples and complete assignments that reinforce their understanding of Big Data concepts.
Welcome
Module 1 · 25 Minutes to complete
Big Data: Why and Where
Module 2 · 4 Hours to complete
Characteristics of Big Data and Dimensions of Scalability
Module 3 · 2 Hours to complete
Data Science: Getting Value out of Big Data
Module 4 · 3 Hours to complete
Foundations for Big Data Systems and Programming
Module 5 · 1 Hours to complete
Systems: Getting Started with Hadoop
Module 6 · 5 Hours to complete
Fee Structure
Instructors
Distinguished Data Science Leader and Scientific Workflow Pioneer
Dr. Ilkay Altintas serves as Chief Data Science Officer at the San Diego Supercomputer Center (SDSC) at UC San Diego, where she has established herself as a leading innovator in scientific workflows and data science since 2001. After earning her Ph.D. from the University of Amsterdam focusing on workflow-driven collaborative science, she founded the Workflows for Data Science Center of Excellence and has led numerous cross-disciplinary projects funded by NSF, DOE, NIH, and the Moore Foundation. Her contributions include co-initiating the open-source Kepler Scientific Workflow System and developing the Biomedical Big Data Training Collaborative platform. Her research impact spans scientific workflows, provenance, distributed computing, and software modeling, earning her the SDSC Pi Person of the Year award in 2014 and the IEEE TCSC Award for Excellence in Scalable Computing for Early Career Researchers in 2015. As Division Director for Cyberinfrastructure Research, Education, and Development, she oversees numerous computational data science initiatives while serving as a founding faculty fellow at the Halıcıoğlu Data Science Institute and maintaining active research collaborations across multiple scientific domains
Expert in Cosmology and Scientific Computing
Andrea Zonca leads the Scientific Computing Applications group at the San Diego Supercomputer Center, combining his cosmology expertise with advanced computing skills. His academic foundation includes extensive work analyzing Cosmic Microwave Background data from the Planck Satellite during his Ph.D. and postdoctoral research. At SDSC, he has developed significant expertise in supercomputing, particularly in parallel computing with Python and C++, and maintains widely used community software packages like healpy and PySM. His current role involves leading efforts to help research groups optimize their data analysis pipelines for national supercomputers. He has also built specialized knowledge in cloud computing, particularly in deploying services on platforms like Jetstream using Kubernetes and JupyterHub. As a certified Software Carpentry instructor, he teaches essential computational skills to scientists, including automation with bash, version control with git, and Python programming. His research contributions have been significant, with his work on the healpy package becoming a crucial tool for data analysis on spherical surfaces in Python, garnering widespread use in the scientific community.
Testimonials
Testimonials and success stories are a testament to the quality of this program and its impact on your career and learning journey. Be the first to help others make an informed decision by sharing your review of the course.
Frequently asked questions
Below are some of the most commonly asked questions about this course. We aim to provide clear and concise answers to help you better understand the course content, structure, and any other relevant information. If you have any additional questions or if your question is not listed here, please don't hesitate to reach out to our support team for further assistance.