CS 7476 Advanced Computer Vision

Spring 2016, MWF 12:05 to 12:55, Mason 2117.
Instructor: James Hays
TA: Nam Vo

Course Description

This course covers advanced research topics in computer vision. Building on the introductory materials in CS 6476 (Computer Vision), this class will prepare graduate students in both the theoretical foundations of computer vision as well as the practical approaches to building real Computer Vision systems. This course investigates current research topics in computer vision with an emphasis on recognition tasks and deep learning. We will examine data sources, features, and learning algorithms useful for understanding and manipulating visual data. Several topics will straddle the boundary between computer vision and computer graphics. Class topics will be pursued through independent reading, class discussion and presentations, and state-of-the-art projects.

The goal of this course is to give students the background and skills necessary to perform research in computer vision and its application domains such as robotics, healthcare, and graphics. Students should understand the strengths and weaknesses of current approaches to research problems and identify interesting open questions and future research directions. Students will hopefully improve their critical reading and communication skills, as well.

Course Requirements

Reading and Summaries

Students will be expected to read one paper for each class. For each assigned paper, students must write a two or three sentence summary and identify at least one question or topic of interest for class discussion. Interesting topics for discussion could relate to strengths and weaknesses of the paper, possible future directions, connections to other research, uncertainty about the conclusions of the experiments, etc. Reading summaries must be posted to the class blog http://cs7476.blogspot.com/ by 11:59pm the day before each class. Feel free to reply to other comments on the blog and help each other understanding confusing aspects of the papers. The blog discussion will be the starting point for the class discussion. If you are presenting you don't need to post a summary to the blog.

Class participation

All students are expected to take part in class discussions. If you do not fully understand a paper that is OK. We can work through the unclear aspects of a paper together in class. If you are unable to attend a specific class please let me know ahead of time (and have a good excuse!).


Each student will lead the presentation of one paper during the semester. Ideally, students would implement some aspect of the presented material and perform experiments that help us to understand the algorithms. Presentations and all supplemental material should be ready one week before the presentation date so that students can meet with the instructor, go over the presentation, and possibly iterate before the in-class discussion. For the presentations it is fine to use slides and code from outside sources (for example, the paper authors) but be sure to give credit.

Semester group projects

Students will work in pairs to complete a state-of-the-art research project on a topic relevant to the course. Students will propose a research topic early in the semester. After a project topic is finalized, students will meet occasionally with the instructor to discuss progress. Students will present their progress on their semester project twice during the course and the course will end with final project presentations. Students will also produce a conference-formatted write-up of their project. Projects will be published on the this web page. The ideal project is something with a clear enough direction to be completed in a couple of months, and enough novelty such that it could be published in a peer-reviewed venue with some refinement and extension.


Strong mathematical skills (linear algebra, calculus, probability and statistics) are needed. It is strongly recommended that students have taken one of the following courses (or equivalent courses at other institutions): If you aren't sure whether you have the background needed for the course, you can try reading some of the papers below or you can simply come to class during the first weeks.


We will not rely on a textbook, although the free, online textbook "Computer Vision: Algorithms and Applications" by Richard Szeliski is a helpful resource.


Your final grade will be made up from

Office Hours:

James Hays, Monday and Wednesday 1-2pm, CCB 315
Nam Vo, Friday 2-4pm, CCB 308L

Final project writeups for 2016

Highlighted project:
Avery Allen and Wenchen Li, Generative Adversarial Denoising Autoencoder for Face Completion. [webpage]

All projects:
Cusuh Ham, Sketch-Based Image Synthesis. [webpage]
John Turner and Siddharth Raja, O'FaMACap dataset (Obama Face&Mouth Image/Audio/Caption) and LSTM-based lipreader. [webpage]
Carl Saldanha, Visual Question Generation. [webpage]
Varun Agrawal and Palash Shastri, Deep Learning on the Yelp Image Dataset. [webpage]
Vasavi Gajarla and Aditi Gupta, Emotion Detection and Sentiment Analysis of Images. [pdf]
Avinash Bhaskaran and Anusha Sridhar Rao, Structure from Motion using Uncalibrated Cameras. [pdf]
Huda Alamri and Julia Deeb, Diving Deeper into IM2GPS. [pdf]
Jonathan Suit, Generating Facial Expressions. [pdf]
Punarva Katte and Prabhudev Prakash, Billboard Content Recognition for Driver Assistance Systems. [pdf]
Sam Seifert, Autocomplete Sketch Tool. [pdf]
Shantanu Deshpande and Naman Goyal, Sketch Based Image Retrieval. [pdf]
Stefano Fenu and Carden Bagwell, Image Colorization using Residual Networks. [pdf]

Tentative Schedule

Date Paper Paper, Project page Presenter
Mon, Jan 11 Introduction James
Wed, Jan 13 Scene Completion Using Millions of Photographs. James Hays, Alexei A. Efros. ACM Transactions on Graphics (SIGGRAPH 2007). August 2007, vol. 26, No. 3. project page James
Fri, Jan 15Microsoft COCO: Common Objects in Context. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick. ECCV 2014. project page, paper James
Mon, Jan 18No Classes
Wed, Jan 20Photo Clip Art. Jean-Francois Lalonde, Derek Hoeim, Alexei A. Efros, Carsten Rother, John Winn and Antonio Criminisi. ACM Transactions on Graphics (SIGGRAPH 2007). project page James
Fri, Jan 22Snow Day
Mon, Jan 25Learning to predict where humans look. T. Judd, K. Ehinger, F. Durand, and A. Torralba. IEEE International Conference on Computer Vision (ICCV), 2009.project page James
Wed, Jan 27 CVPR 2014 Tutorial on Deep Learning. Graham Taylor, Marc'Aurelio Ranzato, and Honglak Lee. Read only the first two sets of labeled Introduction and Supervised learning. CVPR 2014 tutorial James
Fri, Jan 29 ImageNet Classification with Deep Convolutional Neural Networks. Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton. NIPS 2012. pdf James
Mon, Feb 1 ImageNet: A Large-Scale Hierarchical Image Database. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei. IEEE Computer Vision and Pattern Recognition (CVPR), 2009 pdf, project page Carl
Wed, Feb 3Sketch2Photo: Internet Image Montage. ACM SIGGRAPH ASIA 2009, ACM Transactions on Graphics. Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, Shi-Min Hu.project page James
Fri, Feb 5 How do humans sketch objects? Mathias Eitz, James Hays, and Marc Alexa. Siggraph 2012. project page Sam
Mon, Feb 8Project Status Updates. Everyone
Wed, Feb 10Project Status Updates. Everyone
Fri, Feb 12 What makes Paris look like Paris? Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros. Siggraph 2012. project page Julia
Learned Representations, ConvNets, Visualizations
Mon, Feb 15 Going Deeper with Convolutions. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. 2014. arXiv James
Wed, Feb 17 PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing. Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. Siggraph 2009. (rescheduled -- not a deep learning paper) project page Anusha
Fri, Feb 19 Deep Neural Decision Forests. Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, and Samuel Rota Bulo. ICCV 2015. Project page Varun
Mon, Feb 22 Learning Deep Features for Scene Recognition using Places Database. B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. NIPS 2014. project page, pdf, demo John T
Wed, Feb 25 Object Detectors Emerge in Deep Scene CNNs. Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba. ICLR, 2015. project page, arXiv James
Fri, Feb 27 Understanding Deep Image Representations by Inverting Them. Aravindh Mahendran, Andrea Vedaldi. CVPR 2015. arXiv James
Object Proposals
Mon, Feb 29 Selective Search for Object Recognition. J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders. IJCV 2013. project page Avinash
Wed, Mar 2 DeepBox: Learning Objectness with Convolutional Networks. Weicheng Kuo, Bharath Hariharan, Jitendra Malik. ICCV 2015. arXiv Punarva
ConvNet detection and segmentation
Fri, Mar 4 Diagnosing error in object detectors. Derek Hoiem, Yodsawalai Chodpathumwan, and Qieyun Dai. ECCV 2012. project page Aditi
Mon, Mar 7 Fast R-CNN. Ross Girshick. ICCV 2015. arXiv, code Wenchen
Wed, Mar 9 Fully Convolutional Networks for Semantic Segmentation. Jonathan Long, Evan Shelhamer, Trevor Darrell. CVPR 2015. arXiv Siddharth
Fri, Mar 11 Deep Residual Learning for Image Recognition. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. MS COCO detection challenge winner 2015. arXiv Prateek
Weakly Supervised ConvNets
Mon, Mar 14 Unsupervised Visual Representation Learning by Context Prediction. Carl Doersch, Abhinav Gupta, Alexei A. Efros. ICCV 2015. project page Stefano
Siamese / Ranking / Triplet ConvNets
Wed, Mar 16 Learning Visual Similarity for Product Design with Convolutional Neural Networks. Sean Bell, Kavita Bala. Siggraph 2015. author page, pdf Palash
Data-driven Image Synthesis
Fri, Mar 18AverageExplorer: Interactive Exploration and Alignment of Visual Data Collections. Jun-Yan Zhu, Yong Jae Lee, Alexei Efros. Siggraph 2014. project page Avery
Mon, Mar 21No Classes
Wed, Mar 23No Classes
Fri, Mar 25No Classes
Mon, Mar 28Project Status Updates. Everyone
Wed, Mar 30Project Status Updates. Everyone
Images and Words
Fri, Apr 1 Exploring Nearest Neighbor Approaches for Image Captioning. Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C Lawrence Zitnick. arXiv, 2015. arXiv Naman
Mon, Apr 4 VQA: Visual Question Answering. S. Antol*, A. Agrawal*, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. ICCV, 2015. project page, arXiv Vasavi
Wed, Apr 6 Visual Madlibs: Fill in the blank Description Generation and Question Answering. Licheng Yu, Eunbyung Park, Alexander C. Berg, Tamara L. Berg. ICCV, 2015. project page, pdf Huda
Texture, Image Statistics, correspondence
Fri, Apr 8 Dense Semantic Correspondence Where Every Pixel is a Classifier. Hilton Bristow, Jack Valmadre, Simon Lucey. ICCV 2015. arXiv Carden
Generative ConvNets
Mon, Apr 11 Learning to Generate Chairs, Tables and Cars with Convolutional Networks. Alexey Dosovitskiy, Jost Tobias Springenberg, Maxim Tatarchenko, Thomas Brox. CVPR 2015. arXiv Shantanu
Wed, Apr 13 A Neural Algorithm of Artistic Style. Leon A. Gatys, Alexander S. Ecker, Matthias Bethge. 2015. implementation, arXiv Cusuh
Fri, Apr 15 Class cancelled - work on projects!
Mon, Apr 18 Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Alec Radford, Luke Metz, Soumith Chintala. 2015. project page, arXiv Jonathan S
Wed, Apr 20 LSDA: Large Scale Detection Through Adaptation. Judy Hoffman, Sergio Guadarrama, Eric Tzeng, Ronghang Hu, Jeff Donahue, Ross Girshick, Trevor Darrell, Kate Saenko. 2014. arXiv Varun
Fri, Apr 22 The Sketchy Database: Learning to Retrieve Badly Drawn Bunnies. Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, James Hays. Siggraph 2016 paper
Fri, Apr 29th (8:00 to 10:50 Exam slot) Final Project Presentations Everyone

Suggested Topics

Date Paper Paper, Project page Presenter
? Aggregating local descriptors into a compact image representation (VLAD). H. Jegou, M. Douze, C. Schmid, and P. Perez. In Proc. CVPR, 2010. pdf ?
Crowdsourcing and Human Computation
? The Multidimensional Wisdom of Crowds. Welinder P., Branson S., Belongie S., Perona, P. NIPS 2010. pdf ?
? Micro Perceptual Human Computation for Visual Tasks. Yotam Gingold, Ariel Shamir, Daniel Cohen-Or. ACM Transactions on Graphics (ToG) 2012 project page ?
? Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. R. Girshick, J. Donahue, T. Darrell, J. Malik. CVPR 2014. arXiv ?
Siamese / Ranking / Triplet ConvNets
? Learning Deep Representations for Ground-to-Aerial Geolocalization. Tsung-Yi Lin, Yin Cui, Serge Belongie, James Hays. CVPR 2015. pdf ?
? Joint Embeddings of Shapes and Images via CNN Image Purification. Yangyan Li, Hao Su, Charles Ruizhongtai Qi, Noa Fish, Daniel Cohen-Or, Leonidas Guibas. Siggraph Asia 2015. project page ?
Texture, Image Statistics, correspondence
? Internal Statistics of a Single Natural Image. Maria Zontak and Michal Irani. CVPR 2011. pdf, project page ?
? FlowWeb: Joint Image Set Alignment by Weaving Consistent, Pixel-wise Correspondences. Tinghui Zhou, Yong Jae Lee, Stella X. Yu, Alexei A. Efros. CVPR 2015. project page ?
Data-driven Image Synthesis
?PatchNet: A Patch-based Image Representation for Interactive Library-driven Image Editing. Shi-Min Hu, Fang-Lue Zhang, Miao Wang, Ralph R. Martin, Jue Wang. Siggraph Asia 2013. project page ?
? ImageSpirit: Verbal Guided Image Parsing. Ming-Ming Cheng, Shuai Zheng, Wen-Yan Lin, Vibhav Vineet, Paul Sturgess, Nigel Crook, Niloy Mitra, Philip Torr. Transactions on Graphics, 2014. project page ?
Attribute-based Representations
?The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding. Genevieve Patterson, Chen Xu, Hang Su, James Hays. IJCV 2014. project page ?
?Transient Attributes for High-Level Understanding and Editing of Outdoor Scenes. Pierre-Yves Laffont, Zhile Ren, Xiaofeng Tao, Chao Qian, James Hays. Siggraph 2014. project page ?
? Discovering States and Transformations in Image Collections. Phillip Isola, Joseph J. Lim, Edward H. Adelson. CVPR 2015. project page ?
? Discovering the Spatial Extent of Relative Attributes. Fanyi Xiao, Yong Jae Lee. ICCV 2015. pdf ?
? Learning Visual Biases from Human Imagination. Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba. NIPS 2015. project page ?
? Learning a Discriminative Model for the Perception of Realism in Composite Images. Jun-Yan Zhu, Philipp Krahenbuhl, Eli Shechtman, Alexei A. Efros. ICCV 2015. project page ?
? Sketch-Based 3D Shape Retrieval Using Convolutional Neural Networks. Fang Wang, Le Kang, Yi Li. CVPR 2015. arXiv ?
? Multi-view Convolutional Neural Networks for 3D Shape Recognition. Hang Su, Subhransu Maji, Evangelos Kalogerakis, Erik Learned-Miller. ICCV 2015. project page ?
? Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus. 2015. project page, arXiv ?

Previous topics (which you should know)

Date Paper Paper, Project page Presenter
Fundamental representations
?Object recognition from local scale-invariant features, David Lowe, ICCV 1999. pdf, project page ?
?Video Google: A Text Retrieval Approach to Object Matching in Videos. Sivic, J. and Zisserman, A. Proceedings of the International Conference on Computer Vision (2003) pdf, project page ?
? Histograms of Oriented Gradients for Human Detection. Navneet Dalal and Bill Triggs. In Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2005. .pdf ?
? Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. S. Lazebnik, C. Schmid, and J. Ponce, CVPR 2006. pdf, slides ?
?LabelMe: a Database and Web-based Tool for Image Annotation. B. C. Russell, A. Torralba, K. P. Murphy, W. T. Freeman. International Journal of Computer Vision, 2008. pdf, project page ?
? 80 million tiny images: a large dataset for non-parametric object and scene recognition. A. Torralba, R. Fergus, W. T. Freeman. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30(11), 2008. pdf, project page ?
?Describing Objects by Their Attributes. A. Farhadi, I. Endres, D. Hoiem, and D.A. Forsyth. CVPR 2009 project page ?
? SUN Database: Exploring a Large Collection of Scene Categories J. Xiao, K. Ehinger, J. Hays, A. Oliva, and A. Torralba. IJCV 2014. project page, pdf ?

Other previous topics

Date Paper Paper, Project page Presenter
? Painting-to-3D Model Alignment Via Discriminative Visual Elements. Mathieu Aubry, Bryan Russell Josef Sivic. ToG 2013. project page ?
? DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell. 2013. arXiv ?
? Visualizing and Understanding Convolutional Networks. Matthew D Zeiler, Rob Fergus. ECCV 2014. pdf ?
? Image Melding: combining inconsistent images using patch-based synthesis. Soheil Darabi, Eli Shechtman, Connelly Barnes, Dan B Goldman, Pradeep Sen. Siggraph 2012. project page ?
?Ground-truth dataset and baseline evaluations for intrinsic image algorithms. R. Grosse, M.K. Johnson, E.H. Adelson and W.T. Freeman. ICCV 2009 project page ?
?Intrinsic Images in the Wild. Sean Bell, Kavita Bala, Noah Snavely. Siggraph 2014. project page ?
?First Person Hyperlapse Videos. Johannes Kopf, Michael Cohen, Richard Szeliski. Siggraph 2014. project page ?
? Depixelizing Pixel Art. Johannes Kopf and Dani Lischinski. Siggraph 2011. project page ?
?Photo tourism: Exploring photo collections in 3D. Noah Snavely, Steven M. Seitz, Richard Szeliski. Siggraph 2006.pdf, project page ?