Big Data - Capstone Project

This course is part of Big Data.

This capstone project offers a comprehensive, hands-on opportunity to apply big data techniques to a real-world scenario involving an imaginary game called "Catch the Pink Flamingo." Throughout the course, you'll walk through the typical big data science workflow—acquiring, exploring, preparing, analyzing, and reporting data. You'll begin by understanding the game's conceptual model and exploring simulated data that mimics real-world big data generated by game users. Using advanced tools including Splunk for data exploration, KNIME for classification analysis, Spark's MLLib for clustering, and Neo4j for graph analytics, you'll solve various business problems from identifying big spenders to analyzing player chat behavior. The project culminates in creating compelling reports and presentations that showcase your findings and recommendations. This practical approach allows you to integrate and apply the knowledge gained throughout the Big Data specialization to deliver actionable insights from complex datasets. Top-performing students may have the opportunity to present their projects to Splunk recruiters and engineering leadership.

4.4

(397 ratings)

17,830 already enrolled

Instructors:

Ilkay Altintas

Amarnath Gupta

English

پښتو, বাংলা, اردو, 4 more

This course includes

20 Hours

Of Self-paced video lessons

Intermediate Level

Completion Certificate

awarded on course completion

Free course

Add to compare

What you'll learn

Apply the complete big data analysis workflow to real-world gaming data

Explore and prepare data using Splunk for effective analysis

Build classification models with KNIME to identify valuable player segments

Implement clustering techniques with Spark MLlib to understand player behavior

Perform graph analytics on player interactions using Neo4j

Interpret analytical results to generate actionable business insights

Skills you'll gain

Big Data

Data Exploration

Classification

Clustering

Graph Analytics

Splunk

KNIME

Spark MLlib

Neo4j

Data Visualization

This course includes:

1.5 Hours PreRecorded video

1 assignment

Access on Mobile, Tablet, Desktop

Batch access

Shareable certificate

Get a Completion Certificate

Share your certificate with prospective employers and your professional network on LinkedIn.

Created by

The University of California, San Diego

Provided by

Coursera

Top companies offer this course to their employees

Top companies provide this course to enhance their employees' skills, ensuring they excel in handling complex projects and drive organizational success.

There are 7 modules in this course

This capstone project provides students with a comprehensive opportunity to apply various big data techniques in a realistic scenario. Working with simulated game data from "Catch the Pink Flamingo," students progress through the complete data science lifecycle—from acquisition and exploration to analysis and presentation. The project incorporates multiple analytical approaches including exploratory data analysis with Splunk, classification using KNIME to identify high-value customers, clustering with Spark MLlib to segment the player base, and graph analytics with Neo4j to analyze player interactions. Students gain practical experience with industry-standard tools while addressing business-relevant questions about player behavior, spending patterns, and social interactions. The course emphasizes not just technical analysis but also the critical skills of interpreting results and communicating findings through professional reports and presentations. This hands-on approach ensures students can integrate and apply the diverse knowledge gained throughout the Big Data specialization.

Simulating Big Data for an Online Game

Module 1 · 52 Minutes to complete

Acquiring, Exploring, and Preparing the Data

Module 2 · 3 Hours to complete

Data Classification with KNIME

Module 3 · 4 Hours to complete

Clustering with Spark

Module 4 · 4 Hours to complete

Graph Analytics of Simulated Chat Data With Neo4j

Module 5 · 3 Hours to complete

Reporting and Presenting Your Work

Module 6 · 8 Minutes to complete

Final Submission

Module 7 · 3 Hours to complete

Instructors

Ilkay Altintas

4.7 rating

1,793 Reviews

5,02,814 Students

14 Courses

Distinguished Data Science Leader and Scientific Workflow Pioneer

Dr. Ilkay Altintas serves as Chief Data Science Officer at the San Diego Supercomputer Center (SDSC) at UC San Diego, where she has established herself as a leading innovator in scientific workflows and data science since 2001. After earning her Ph.D. from the University of Amsterdam focusing on workflow-driven collaborative science, she founded the Workflows for Data Science Center of Excellence and has led numerous cross-disciplinary projects funded by NSF, DOE, NIH, and the Moore Foundation. Her contributions include co-initiating the open-source Kepler Scientific Workflow System and developing the Biomedical Big Data Training Collaborative platform. Her research impact spans scientific workflows, provenance, distributed computing, and software modeling, earning her the SDSC Pi Person of the Year award in 2014 and the IEEE TCSC Award for Excellence in Scalable Computing for Early Career Researchers in 2015. As Division Director for Cyberinfrastructure Research, Education, and Development, she oversees numerous computational data science initiatives while serving as a founding faculty fellow at the Halıcıoğlu Data Science Institute and maintaining active research collaborations across multiple scientific domains

Amarnath Gupta

4.7 rating

1,793 Reviews

4,74,932 Students

10 Courses

Expert in Cosmology and Scientific Computing

Andrea Zonca leads the Scientific Computing Applications group at the San Diego Supercomputer Center, combining his cosmology expertise with advanced computing skills. His academic foundation includes extensive work analyzing Cosmic Microwave Background data from the Planck Satellite during his Ph.D. and postdoctoral research. At SDSC, he has developed significant expertise in supercomputing, particularly in parallel computing with Python and C++, and maintains widely used community software packages like healpy and PySM. His current role involves leading efforts to help research groups optimize their data analysis pipelines for national supercomputers. He has also built specialized knowledge in cloud computing, particularly in deploying services on platforms like Jetstream using Kubernetes and JupyterHub. As a certified Software Carpentry instructor, he teaches essential computational skills to scientists, including automation with bash, version control with git, and Python programming. His research contributions have been significant, with his work on the healpy package becoming a crucial tool for data analysis on spherical surfaces in Python, garnering widespread use in the scientific community.

This course includes

20 Hours

Of Self-paced video lessons

Intermediate Level

Completion Certificate

awarded on course completion

Free course

Add to compare

Testimonials

Testimonials and success stories are a testament to the quality of this program and its impact on your career and learning journey. Be the first to help others make an informed decision by sharing your review of the course.

Frequently asked questions

Below are some of the most commonly asked questions about this course. We aim to provide clear and concise answers to help you better understand the course content, structure, and any other relevant information. If you have any additional questions or if your question is not listed here, please don't hesitate to reach out to our support team for further assistance.

When will I have access to the lectures and assignments?

What will I get if I subscribe to this Specialization?

What is the refund policy?

Is financial aid available?