CS 7476 Advanced Computer Vision
Spring 2017, MWF 12:05 to 12:55, College of Computing 52
Instructor: James Hays
TA: Samarth Brahmbhatt
Course Description
This course covers advanced research topics in computer vision. Building on the introductory materials in CS 6476 (Computer Vision), this class will prepare graduate students in both the theoretical foundations of computer vision as well as the practical approaches to building real Computer Vision systems. This course investigates current research topics in computer vision with an emphasis on recognition tasks and deep learning. We will examine data sources, features, and learning algorithms useful for understanding and manipulating visual data. Several topics will straddle the boundary between computer vision and computer graphics. Class topics will be pursued through independent reading, class discussion and presentations, and state-of-the-art projects.The goal of this course is to give students the background and skills necessary to perform research in computer vision and its application domains such as robotics, healthcare, and graphics. Students should understand the strengths and weaknesses of current approaches to research problems and identify interesting open questions and future research directions. Students will hopefully improve their critical reading and communication skills, as well.
Course Requirements
Reading and Discussion Topics
Students will be expected to read one paper for each class. For each assigned paper, students must identify at least one question or topic of interest for class discussion. Interesting topics for discussion could relate to strengths and weaknesses of the paper, possible future directions, connections to other research, uncertainty about the conclusions of the experiments, etc. Questions / Discussion topics must be posted to Piazza by 11:59pm the day before each class. Feel free to reply to other comments on Piazza and help each other understanding confusing aspects of the papers. The Piazza discussion will be the starting point for the class discussion. If you are presenting you don't need to post a question to Piazza.Class participation
All students are expected to take part in class discussions. If you do not fully understand a paper that is OK. We can work through the unclear aspects of a paper together in class. If you are unable to attend a specific class please let me know ahead of time (and have a good excuse!).Presentation(s)
Each student will lead the presentation of one paper during the semester (possibly as part of a pair of students). Ideally, students would implement some aspect of the presented material and perform experiments that help us to understand the algorithms. Presentations and all supplemental material should be ready one week before the presentation date so that students can meet with the instructor, go over the presentation, and possibly iterate before the in-class discussion. For the presentations it is fine to use slides and code from outside sources (for example, the paper authors) but be sure to give credit.Semester group projects
Students will work alone or in pairs to complete a state-of-the-art research project on a topic relevant to the course. Students will propose a research topic early in the semester. After a project topic is finalized, students will meet occasionally with the instructor or TA to discuss progress. Students will report their progress on their semester project twice during the course and the course will end with final project presentations. Students will also produce a conference-formatted write-up of their project. Projects will be published on the this web page. The ideal project is something with a clear enough direction to be completed in a couple of months, and enough novelty such that it could be published in a peer-reviewed venue with some refinement and extension.Prerequisites
Strong mathematical skills (linear algebra, calculus, probability and statistics) are needed. It is strongly recommended that students have taken one of the following courses (or equivalent courses at other institutions):- Computer Vision (e.g. 4476 / 6476)
- Computer Graphics
- Computational Photography
Textbook
We will not rely on a textbook, although the free, online textbook "Computer Vision: Algorithms and Applications" by Richard Szeliski is a helpful resource.Grading
Your final grade will be made up from- 20% Reading summaries posted to Piazza
- 10% Classroom participation and attendance
- 10% Intro Project
- 10% Leading discussion for particular research paper
- 10% Semester project updates
- 40% Semester project
Office Hours:
James Hays, Wednesday and Friday, 1:00 to 2:00, CCB 315Samarth Brahmbhatt, Thursday 2:00, CCB 2nd floor common area
Tentative Schedule
Date | Paper | Paper, Project page | Presenter |
Mon, Jan 9 | Introduction | James | |
Wed, Jan 11 | Scene Completion Using Millions of Photographs. James Hays, Alexei A. Efros. ACM Transactions on Graphics (SIGGRAPH 2007). August 2007, vol. 26, No. 3. | project page | James |
Fri, Jan 13 | Photo Clip Art. Jean-Francois Lalonde, Derek Hoeim, Alexei A. Efros, Carsten Rother, John Winn and Antonio Criminisi. ACM Transactions on Graphics (SIGGRAPH 2007). | project page | James |
Mon, Jan 16 | No Classes, Institute Holiday | ||
Wed, Jan 18 | ImageNet: A Large-Scale Hierarchical Image Database. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei. IEEE Computer Vision and Pattern Recognition (CVPR), 2009 | pdf, project page | James |
Fri, Jan 20 | ImageNet Classification with Deep Convolutional Neural Networks. Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton. NIPS 2012. | James | |
Mon, Jan 23 | Sketch2Photo: Internet Image Montage. ACM SIGGRAPH ASIA 2009, ACM Transactions on Graphics. Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, Shi-Min Hu. | project page | James |
Wed, Jan 25 | How do humans sketch objects? Mathias Eitz, James Hays, and Marc Alexa. Siggraph 2012. | project page | James |
Fri, Jan 27 | BING: Binarized Normed Gradients for Objectness Estimation at 300fps. Cheng, Ming-Ming, Ziming Zhang, Wen-Yan Lin, and Philip Torr. CVPR 2014. | project page | Takuma |
Mon, Jan 30 | What makes Paris look like Paris? Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros. Siggraph 2012. | project page | Shruthi |
Wed, Feb 1 | DeepFace: Closing the Gap to Human-Level Performance in Face Verification. Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf. CVPR 2014. | Vaibhav | |
Fri, Feb 3 | Robust Video Segment Proposals with Painless Occlusion Handling. Zhengyang Wu, Fuxin Li, Rahul Sukthankar, James M. Rehg. CVPR 2015. | project page | Junxian |
Mon, Feb 6 | DynamicFusion: Reconstruction and Tracking of Non-rigid Scenes in Real-Time. Richard Newcombe, Dieter Fox, Steve Seitz. CVPR 2015. | project page | Jonathan |
Wed, Feb 8 | Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. Nguyen A, Yosinski J, Clune J. CVPR 2015. | project page | Apoorva |
Fri, Feb 10 | A Neural Algorithm of Artistic Style. Leon A. Gatys, Alexander S. Ecker, Matthias Bethge. 2015. | implementation, arXiv | Sajid |
Mon, Feb 13 | Understanding deep features with computer-generated imagery. Mathieu Aubry, Bryan Russell. ICCV 2015. | arXiv | - |
Wed, Feb 15 | Learning Visual Biases from Human Imagination. Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba. NIPS 2015. | project page | Jing Dao and Andrew |
Fri, Feb 17 | FaceNet: A Unified Embedding for Face Recognition and Clustering. Florian Schroff, Dmitry Kalenichenko, James Philbin. CVPR 2015. | arXiv | Jia Yi |
Mon, Feb 20 | Joint Embeddings of Shapes and Images via CNN Image Purification. Yangyan Li, Hao Su, Charles R. Qi, Noa Fish, Daniel Cohen-Or, Leonidas J. Guibas. Siggraph Asia 2015. | project page | Kevin |
Wed, Feb 22 | The Sketchy Database: Learning to Retrieve Badly Drawn Bunnies. Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, James Hays. Siggraph 2016 | project page | Goutam |
Fri, Feb 24 | Do Deep Convolutional Nets Really Need to be Deep and Convolutional? Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan, Shengjie Wang, Rich Caruana, Abdelrahman Mohamed, Matthai Philipose, Matt Richardson. ICLR Workshop track 2016. | arXiv | Wengling |
Mon, Feb 27 | Unsupervised Learning of Visual Representations using Videos. Xiaolong Wang, Abhinav Gupta. ICCV 2015. | project page | Murali |
Wed, Mar 1 | XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. | arXiv | David |
Fri, Mar 3 | Deep Residual Learning for Image Recognition. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. MS COCO detection challenge winner 2015. | arXiv | Wei |
Mon, Mar 6 | Learning Aligned Cross-Modal Representations from Weakly Aligned Data. Lluis Castrejon, Yusuf Aytar, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba. CVPR. 2016. | project page | Yun |
Wed, Mar 8 | Visually Indicated Sounds. Andrew Owens, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson, William T. Freeman. CVPR. 2016. | project page | Zhewei |
Fri, Mar 10 | "What happens if..." Learning to Predict the Effect of Forces in Images. Roozbeh Mottaghi, Mohammad Rastegari, Abhinav Gupta, Ali Farhadi. ECCV 2016. | project page | Meera |
Mon, Mar 13 | YOLO9000: Better, Faster, Stronger. Joseph Redmon, Ali Farhadi. arXiv. Dec. 2016. | arXiv | Nataniel |
Wed, Mar 15 | Visual Dialog. Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, Jose M. F. Moura, Devi Parikh, Dhruv Batra. arXiv. Nov. 2016. | arXiv | Abhishek |
Fri, Mar 17 | Generative Adversarial Nets. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. 2014. | arXiv | Naveen |
Mon, Mar 20 | No Classes, Institute Holiday | ||
Wed, Mar 22 | No Classes, Institute Holiday | ||
Fri, Mar 24 | No Classes, Institute Holiday | ||
Mon, Mar 27 | Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Alec Radford, Luke Metz, Soumith Chintala. Nov 2015. | arXiv | Jianan |
Wed, Mar 29 | Object Contour Detection with a Fully Convolutional Encoder-Decoder Network. Jimei Yang, Brian Price, Scott Cohen, Honglak Lee, Ming-Hsuan Yang. CVPR 2016. | project page | Nitin |
Fri, Mar 31 | Context Encoders: Feature Learning by Inpainting. Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell and Alexei A. Efros. CVPR 2016. | project page | Phani and Madhuri |
Mon, Apr 3 | Generative Adversarial Text-to-Image Synthesis. Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee. ICML 2016. | project page | Suraj |
Wed, Apr 5 | Generative Visual Manipulation on the Natural Image Manifold. Jun-Yan Zhu, Philipp Krahenbuhl, Eli Shechtman and Alexei A. Efros. ECCV 2016. | project page | Niranjan |
Fri, Apr 7 | Improved Techniques for Training GANs. Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen. arXiv, June 2016. | arXiv | Hanoi and Anmol |
Mon, Apr 10 | Conditional Image Generation with PixelCNN Decoders. Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu. 2016. | project page | Siddharth |
Wed, Apr 12 | Attribute2Image: Conditional Image Generation from Visual Attributes. Xinchen Yan, Jimei Yang, Kihyuk Sohn, Honglak Lee. ECCV 2016. | project page | Venkat and Si |
Fri, Apr 14 | Semantic Segmentation using Adversarial Networks. Pauline Luc, Camille Couprie, Soumith Chintala, Jakob Verbeek. Nov 2016. | arXiv | Chih-Yao |
Mon, Apr 17 | NetVLAD: CNN architecture for weakly supervised place recognition. Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. CVPR 2016. | project page | Weijian |
Wed, Apr 19 | Image-to-Image Translation with Conditional Adversarial Nets. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros. arXiv, Nov. 2016. | project page | Yifei |
Fri, Apr 21 | StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiaogang Wang, Dimitris Metaxas. Dec 2016. | project page | Tianyu |
Mon, Apr 24 | Neural Architecture Search with Reinforcement Learning. Barret Zoph, Quoc V. Le. Nov 2016. | arXiv | Albert |
Monday, May 1, 8:00 to 10:50 (Final Exam Slot) | Final Project Presentations | Everyone | |
Friday, May 5 | Final Report or Poster due | Everyone |
2017 new suggestions
Date | Paper | Paper, Project page | Presenter |
The Cityscapes Dataset for Semantic Urban Scene Understanding. Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele. CVPR 2016 | arXiv, project page | ||
PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization. Alex Kendall, Matthew Grimes, Roberto Cipolla. ICCV 2015. | arXiv | ||
Convolutional Pose Machines. Shih-En Wei, Varun Ramakrishna, Takeo Kanade, Yaser Sheikh. CVPR 2016. | arXiv, project page | ||
Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images. Roozbeh Mottaghi, Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi. CVPR 2016. | project page | ||
You Only Look Once: Unified, Real-Time Object Detection. Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi. CVPR 2016. | arXiv | ||
SSD: Single Shot MultiBox Detector. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. ECCV 2016. | arXiv | ||
Supersizing Self-supervision: Learning to Grasp from 50K Tries and 700 Robot Hours. Lerrel Pinto, Abhinav Gupta. ICRA 2016. | arXiv | ||
Learning with Side Information through Modality Hallucination. Saurabh Gupta, Judy Hoffman, Jitendra Malik. CVPR 2016. | arXiv |