CS6220 Big
Data Systems and Analytics
(Fall
2018)
| General
| Description
| Grading
| Introduction
| Schedule
& Notes |
Project
| Technology
Review | Course
Readings | FAQ | GT Calendar |
General
Information
Lecture: 1:30pm - 2:45pm TR; Klaus 2447 (Aug. 20 - Dec. 13, Lectures ending on Dec. 4)
Office hours:TR 11-12noon or by appointment
Course TA: Mehmet Emre Gursoy, Stacey Truex
Office hours: by appointment
Course Objectives
and Description
Data has been the No. 1 fast growing phenomenon on the Internet for the last decade. Big data demands both high performance computing and elastic utility driven computing. Big data analytics holds the potential to reveal deep insights such as social influence among customers by analyzing business transactions, user-generated feedback ratings, social and geographical data. In the past 40 years, data was primarily used to record and report business activities and scientific events, and in the next 40 years data will be used also to derive new insights, to influence business decisions and to accelerate scientific discovery. One of the key challenges is to provide the right platforms and tools to make reasoning of big data easy and simple. Another key challenge is to revolutionize the ways of collecting, processing and analyzing the massive data that exceeds the processing capacity of existing computing systems. Big data education should cover big data systems, big data algorithms, big data technology, big data programming, big data applications from both research and development perspectives.
This
course reviews concepts, techniques, algorithms and
systems issues in big data education and research, with strong emphasis on
systems and analytics, and explores big data opportunities from a variety of science
and engineering applications, and examine various research problems and challenges
that are critical for developing big data systems and big data applications.
Main topics to be covered include but not limited to: fundamentals
of data storage systems and optimizations, fundamentals of data mining and knowledge discovery,
fundamentals of big data aware computing systems and software design, fundamentals
of cluster computing and distributed file systems, fundamentals of
geographically distributed data intensive systems.
We
will also cover big data applications that pose new challenges to big data systems and analytics, such as healthcare,
mobile commerce, social media, Internet of Things, software defined computing, cyber manufacturing, cyber-physical systems, to name a few. This course is designed to provide the fundamental training for big data scientists from high performance big data computing systems, to big data applications and big data analysis and management algorithms, and to look beyond the present status of the Big
Data and conjecture what possible future technologies and applications will
evolve. The course will include a significant project component that will
typically require Java/C++/CGI/HTML5 programming.
Prerequisites:
Students are expected to have taken Operating Systems (CS2200 or equivalent)
and Introduction to database systems (CS 4400/6400 or equivalent). In addition,
students are expected to have a solid grasp of Java/C/C++/CGI programming.
Sockets programming is not required but desirable.
A
detailed description of course structure and administration can be found in Course
Introduction.
Grades will be computed using the
tentative weighting scheme below:
The grading
policy can be found in the Course
Introduction and FAQ (Important!
Read Me).
There are a total of 4 homework assignments and on average one assignment every 2 weeks. Usually each assignment is posted 2-3 weeks before its due date. Each assignment requires a student to choose from two types of assignments: reading based or programing based.
Technology review is used as the take home final exam for the course. Topics will come from weekly lectures, class discussions, guest presentations as well as homework assignments. You are required write a technology review of 10-15 pages in single column and 1.2-1.5pt spacing, including figures and references. This technology review paper is due by 11:55pm on the final exam day.
In principle, you can propose anything you wish: algorithms, implementation, benchmarking, evaluation, interesting Big data applications, to name a few. For the students who are currently working part time in companies, it is possible to propose a work related project. However, all course project related material must be non-proprietary, i.e., I will not sign any non-disclosure agreement just to evaluate a project. Students are encouraged to come up with your own project ideas. You are encouraged to discuss your project ideas with instructor.
Important
Dates:
The important due dates for project proposal, project demo, project
presentation and project code and documentation deliverable can be found from
the TSquare Wiki page.
Useful
References and Texts
To be posted under the course area on TSquare.gatech.edu.