CS6220 Big Data Systems and Analytics

Course Introduction

Attention: The information contained in this page is subject to changes.

This is a graduate level class on Big Data Systems and Analytics. The syllabus covers a wide range of topics. The concrete course content may change every year to reflect the cutting edge technology and the state of art in BIG DATA research and development.

You are encouraged to participate in the class as much as feasible. In addition to attendance, you are encouraged to ask questions, share your experiences with the class, active participation in class discussions. I value your input on course format, content coverage and composition, and evaluation criteria.

Course Structure: There are no required text books in this course. Course notes and course readings will form the central material for the course. Class readings will consists of materials either handed out in class or made available via the class WWW page. Materials not presented in any readings and handouts may also be covered in class, therefore your attendance is important. The course will consist of two 75-minute lectures per week with weekly reading and homework assignments.

Homework Assignment
There are five homework assignments. You can choose each assignment from either reading based or programming based homework. All 5 assignments are individual homework. Discussion is allowed but the homework deliverable has to be produced by yourself independently.You are encouraged to complete your assignment by incorporating your own experience with the issues you address. You can certainly use what you learned in class as background knowledge.

Programming based Homework
Programming homework is designed to help students gain better understanding of the course materials by performing some hand-on exercises. Each programming assignment typically consists of 1-3 coding or experimentation tasks, ranging from systems, applications to algorithms. For example, students may be asked to program a given program, or extend an exising software package with some additional functionality. Students may also asked to download a particular sofwware package and run the software on some given dataset(s) and report the execution results and runtime performance comparison on several use case scenarios. Reading based homework
Reading based homework is also designed to strengthen the understanding of the course materials by encouraging students to read some reference materials. Each reading based assignment includes 3 subtasks: (1)choose two papers from the course reading list and the two papers should be on the same subject. (2) Read the two papers and understand how the two pieces of work relate to one another and how different the concrete problems, or solution approaches differ from one another. (3) Write a reading critique of 1 page.

The reading critique should have three sections (at least one paragraph per section): (1) Summary of the general problems the authors attempt to solve in the two papers and discuss the difference if any. (2) Strong Points: Write the strong points about the two papers in terms of solution approaches, such as what are the best technical contributions of the two papers that you enjoyed learning and reading. (3) Suggestions and/or Weak Points: What are the problems that the authors claim to solve but did not deliver as stated? Technical flaws if any, Suggestions you want to make. Submissions of Homework: You are required to submit your homework on the due date to TSquare. Late submissions by email to TA or instructor are not accepted. Here are guidelines used for grading:

Although discussion on assignment is allowed, the verbatim copies of the programming code or critique will be considered cheating. If text/code is copied from another source, you should credit the source by referencing the source directly and correctly. Verbatim copying from uncredited sources is considered as plagiarism. If a case of plagiarism or any form of academic dishonesty is found, the guilty parties involved will receive a zero score for the assignment. Repeated offenders will be referred to the Dean's office.

Your project should be significant (non-trivial) and relevant to the course. Generally speaking, you can propose anything you wish: implementation, benchmarking, evaluation, interesting Big Data applications, analytic algorithm, etc.i Students from companies can propose projects related to their work, provided that all the project related material must not be proprietary. I will not sign any non-disclosure agreement just to evaluate a project.

Project will count 50% of your total grade. The grading of your project consists of three components:

  1. Project Proposal: 20%
  2. Project presentation: 20%
  3. Final Project (incl. project demo, project report, code deliverable quality) count towards 60% of total project grade.

The project proposal is due in the fourth or five weeks of the semester. Based on the proposal, I will meet with each project team to negotiate with you about the goals of your project, the evaluation criteria, and plan for its execution. The project is due by the last week of the semester, with a demo, program source code and executable, final project report (incl. read me and documentation). Particularly interesting projects may be extended into Spring for research project credits (cs8903 special problems) or Master theses. Talk with me if you are interested.

All project presentations are scheduled at the big data course workshop. The workshop will be held in consecutive lectures at the end of the semester. Each presentation will be limited to 15-20 minutes, including 2-5 minutes question time. The workshop schedule will be posted on Tsquare.

Project grade for each student is based on your group score plus the quality of your in-class presentation, your demo participation, and the quality of your class participation.

The course project should have innovations in design or implementation in terms of ideas, techniques or optimizations. You are encouraged to discuss with me to define a novel and interesting project.

Class Participation will be judged primarily by the quality of interaction rather than quantity. Classes are designed to be interactive with plenty of opportunities for discussions.

Technology Review The technology review topics can be selected based on the weekly lecture theme covered in the course as well as topics in homework assignments. The technology review topic can also be combined with the theme of your course project though it is not required. For example, one can choose a social network mining course project and a technology review on big data and Internet of Things. The technology review should cover the state of art in the chosen Big Data Systems and Analytics technology area with a list of references. The technology review should also contain one discussion section describing your thought and your prediction of the technology in terms of its impact in the next 10-20 years after surveying this specific technology area. The expected length of the technology review is 10~15 pages of single column and single spacing of 1.2 - 1.5 pt. Figures are welcome.

Office hours: Most of the time I will be available for answering questions 1 hour before class and half hour after the class. Email (lingliu AT cc.gatech.edu) is the best way to get a quick answer. Appointments for meetings other than office hours are possible upon request.

HOMEPAGE Back to Ling Liu's home page