Big Data Integration and Processing

This course is part of Big Data.

This comprehensive course focuses on the practical aspects of big data integration and processing, providing you with the skills to tackle real-world data challenges. You'll learn how to retrieve data from various database and big data management systems, including relational databases like Postgres and NoSQL databases like MongoDB and Aerospike. The curriculum covers the connections between data management operations and big data processing patterns essential for large-scale analytical applications. You'll gain hands-on experience with industry-standard tools such as Apache Spark, Splunk, and Pandas, learning to execute data integration and processing tasks. The course includes practical exercises working with real data, culminating in a project analyzing Twitter data using MongoDB and Spark. Whether you're interested in data pipelines, workflow management, or advanced analytics using Spark's MLlib and GraphX tools, this course provides the foundational knowledge needed to work with big data effectively.

4.4

(2,408 ratings)

79,224 already enrolled

Instructors:

Ilkay Altintas

Amarnath Gupta

English

پښتو, বাংলা, اردو, 4 more

This course includes

17 Hours

Of Self-paced video lessons

Beginner Level

Completion Certificate

awarded on course completion

Free course

Add to compare

What you'll learn

Retrieve data from various database and big data management systems

Describe connections between data management operations and big data processing patterns

Identify when a big data problem needs data integration

Execute big data integration and processing on Hadoop and Spark platforms

Work with relational databases like Postgres and NoSQL databases like MongoDB

Design and implement big data processing pipelines

Skills you'll gain

Big Data

Data Management

Databases

Data Analysis

SQL

Apache Spark

MongoDB

Splunk

Data Integration

Data Processing

This course includes:

5.1 Hours PreRecorded video

10 assignments

Access on Mobile, Tablet, Desktop

FullTime access

Shareable certificate

Closed caption

Get a Completion Certificate

Share your certificate with prospective employers and your professional network on LinkedIn.

Created by

The University of California, San Diego

Provided by

Coursera

Top companies offer this course to their employees

Top companies provide this course to enhance their employees' skills, ensuring they excel in handling complex projects and drive organizational success.

There are 7 modules in this course

This course provides a comprehensive introduction to big data integration and processing techniques. It begins with the fundamentals of data retrieval from relational and NoSQL databases, teaching students how to write effective queries for both structured and unstructured data. Students then explore data integration tools and processes, learning how information from diverse sources can be combined for multichannel analytics. The core of the course focuses on big data processing using Apache Spark, covering the entire Spark ecosystem including Spark Core, Spark SQL, Spark Streaming, MLlib for machine learning, and GraphX for graph processing. Throughout the modules, students gain hands-on experience with industry tools like Postgres, MongoDB, Aerospike, Splunk, and Datameer. The course culminates with a practical project analyzing Twitter data, allowing students to apply their knowledge of MongoDB queries and Spark processing to real-world social media analytics.

Welcome to Big Data Integration and Processing

Module 1 · 1 Hours to complete

Retrieving Big Data (Part 1)

Module 2 · 1 Hours to complete

Retrieving Big Data (Part 2)

Module 3 · 2 Hours to complete

Big Data Integration

Module 4 · 2 Hours to complete

Processing Big Data

Module 5 · 3 Hours to complete

Big Data Analytics using Spark

Module 6 · 2 Hours to complete

Learn By Doing: Putting MongoDB and Spark to Work

Module 7 · 3 Hours to complete

Fee Structure

Individual course purchase is not available - to enroll in this course with a certificate, you need to purchase the complete Professional Certificate Course. For enrollment and detailed fee structure, visit the following: Big Data

Instructors

Ilkay Altintas

4.7 rating

1,793 Reviews

5,02,814 Students

14 Courses

Distinguished Data Science Leader and Scientific Workflow Pioneer

Dr. Ilkay Altintas serves as Chief Data Science Officer at the San Diego Supercomputer Center (SDSC) at UC San Diego, where she has established herself as a leading innovator in scientific workflows and data science since 2001. After earning her Ph.D. from the University of Amsterdam focusing on workflow-driven collaborative science, she founded the Workflows for Data Science Center of Excellence and has led numerous cross-disciplinary projects funded by NSF, DOE, NIH, and the Moore Foundation. Her contributions include co-initiating the open-source Kepler Scientific Workflow System and developing the Biomedical Big Data Training Collaborative platform. Her research impact spans scientific workflows, provenance, distributed computing, and software modeling, earning her the SDSC Pi Person of the Year award in 2014 and the IEEE TCSC Award for Excellence in Scalable Computing for Early Career Researchers in 2015. As Division Director for Cyberinfrastructure Research, Education, and Development, she oversees numerous computational data science initiatives while serving as a founding faculty fellow at the Halıcıoğlu Data Science Institute and maintaining active research collaborations across multiple scientific domains

Amarnath Gupta

4.7 rating

1,793 Reviews

4,74,932 Students

10 Courses

Expert in Cosmology and Scientific Computing

Andrea Zonca leads the Scientific Computing Applications group at the San Diego Supercomputer Center, combining his cosmology expertise with advanced computing skills. His academic foundation includes extensive work analyzing Cosmic Microwave Background data from the Planck Satellite during his Ph.D. and postdoctoral research. At SDSC, he has developed significant expertise in supercomputing, particularly in parallel computing with Python and C++, and maintains widely used community software packages like healpy and PySM. His current role involves leading efforts to help research groups optimize their data analysis pipelines for national supercomputers. He has also built specialized knowledge in cloud computing, particularly in deploying services on platforms like Jetstream using Kubernetes and JupyterHub. As a certified Software Carpentry instructor, he teaches essential computational skills to scientists, including automation with bash, version control with git, and Python programming. His research contributions have been significant, with his work on the healpy package becoming a crucial tool for data analysis on spherical surfaces in Python, garnering widespread use in the scientific community.

This course includes

17 Hours

Of Self-paced video lessons

Beginner Level

Completion Certificate

awarded on course completion

Free course

Add to compare

Testimonials

Testimonials and success stories are a testament to the quality of this program and its impact on your career and learning journey. Be the first to help others make an informed decision by sharing your review of the course.

Frequently asked questions

Below are some of the most commonly asked questions about this course. We aim to provide clear and concise answers to help you better understand the course content, structure, and any other relevant information. If you have any additional questions or if your question is not listed here, please don't hesitate to reach out to our support team for further assistance.

When will I have access to the lectures and assignments?

What will I get if I subscribe to this Specialization?

What is the refund policy?

Is financial aid available?