CS 7476 Advanced Computer Vision
Spring 2018, MW 4:30 to 5:45, Mason 1133
Instructor: James Hays
TA: Cusuh Ham
Course Description
This course covers advanced research topics in computer vision. Building on the introductory materials in CS 6476 (Computer Vision), this class will prepare graduate students in both the theoretical foundations of computer vision as well as the practical approaches to building real Computer Vision systems.
This course investigates current research topics in computer vision with an emphasis on recognition tasks and deep learning. We will examine data sources, features, and learning algorithms useful for understanding and manipulating visual data. Several topics will straddle the boundary between computer vision and computer graphics. Class topics will be pursued through
independent reading, class discussion and presentations, and state-of-the-art projects.
The goal of this course is to give students the background and skills necessary to perform research in computer vision and its application domains such as robotics, healthcare, and graphics. Students should understand the strengths and weaknesses of current approaches to research problems and identify interesting open questions and future research directions. Students will hopefully improve their critical reading and
communication skills, as well.
Course Requirements
Reading and Discussion Topics
Students will be expected to read one paper for each class.
For each assigned paper, students must identify at least one question or topic of interest for class discussion.
Interesting topics for discussion could relate to strengths and weaknesses of the paper, possible future directions,
connections to other research, uncertainty about the conclusions of the experiments, etc.
Questions / Discussion topics must be posted to
Piazza
by 11:59pm the day before each class. Feel free to reply to other comments on Piazza and help each other understanding confusing aspects of the papers.
The Piazza discussion will be the starting point for the class discussion. If you are presenting you don't need to post a question to Piazza.
Class participation
All students are expected to take part in class discussions. If you do not fully understand a paper that is OK.
We can work through the unclear aspects of a paper together in class.
If you are unable to attend a specific class please let me know ahead of time (and have a good excuse!).
Presentation(s)
Each student will lead the presentation of one paper during the semester (possibly as part of a pair of students).
Ideally, students would implement some aspect of the presented material and perform experiments that help us to understand the algorithms.
Presentations and all supplemental material should be ready one week before the presentation date so that students can meet with the instructor,
go over the presentation, and possibly iterate before the in-class discussion. For the presentations it is fine to use slides and code from
outside sources (for example, the paper authors) but be sure to give credit.
Semester group projects
Students will work alone or in pairs to complete a state-of-the-art research project on a topic relevant to the course.
Students will propose a research topic early in the semester. After a project topic is finalized, students will meet occasionally with the
instructor or TA to discuss progress. Students will report their progress on their semester project twice during the course and the course
will end with final project presentations. Students will also produce a conference-formatted write-up of their project.
Projects will be published on the this web page. The ideal project is something with a clear enough direction to be completed in a couple of months,
and enough novelty such that it could be published in a peer-reviewed venue with some refinement and extension.
Prerequisites
Strong mathematical skills (linear algebra, calculus, probability and statistics) are needed.
It is strongly recommended that students have taken one of the following courses (or equivalent courses at other institutions):
- Computer Vision (e.g. 4476 / 6476)
- Computer Graphics
- Computational Photography
If you aren't sure whether you have the background needed for the course,
you can try reading some of the papers below or you can simply come to class during the first weeks.
Textbook
We will not rely on a textbook, although the free, online textbook
"Computer Vision: Algorithms and Applications" by
Richard Szeliski is a
helpful resource.
Grading
Your final grade will be made up from
- 20% Reading summaries posted to Piazza
- 10% Classroom participation and attendance
- 10% Leading discussion for particular research paper
- 20% Semester project updates
- 40% Semester project
Office Hours:
James Hays, Monday and Wednesday, 10:00 to 11:00, CCB 315
Cusuh Ham, Wednesday, 3:00pm to 4:30, CCB third floor inner lab
Tentative Schedule
Date |
Paper |
Paper, Project page |
Presenter |
Mon, Jan 8 |
No Classes, Institute Closed by Weather |
|
James |
Wed, Jan 10 |
Photo Clip Art. Jean-Francois Lalonde, Derek Hoeim, Alexei A. Efros, Carsten Rother, John Winn and Antonio Criminisi. ACM
Transactions on Graphics (SIGGRAPH 2007). | project page |
James |
Mon, Jan 15 | No Classes, Institute Holiday | | |
Wed, Jan 17 |
No Classes, Institute Closed by Weather |
|
|
Mon, Jan 22 |
Sketch2Photo: Internet Image Montage. ACM SIGGRAPH ASIA 2009, ACM Transactions on Graphics. Tao Chen, Ming-Ming Cheng, Ping
Tan, Ariel Shamir, Shi-Min Hu. |
project page |
James |
Wed, Jan 24 |
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Alec Radford, Luke Metz, Soumith Chintala. November 2015 |
arXiv |
James |
Mon, Jan 29 |
Image Style Transfer Using Convolutional Neural Networks. Leon A. Gatys, Alexander S. Ecker, Matthias Bethge. CVPR 2016. |
paper |
Vijay |
Wed, Jan 31 |
Network Dissection: Quantifying Interpretability of Deep Visual Representations David Bau*, Bolei Zhou*, Aditya Khosla, Aude Oliva, Antonio Torralba. CVPR 2017. |
project page |
Gukyeong |
Mon, Feb 5 |
Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images S. Song, and J. Xiao. CVPR 2016. |
project page |
Rodrigo |
Wed, Feb 7 |
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas. arXiv Dec 2016 |
arXiv |
Erik |
Mon, Feb 12 |
Frustum PointNets for 3D Object Detection from RGB-D Data Charles R. Qi, Wei Liu, Chenxia Wu, Hao Su, Leonidas J. Guibas. arXiv Nov 2017 |
arXiv |
David |
Wed, Feb 14 |
Explaining and Harnessing Adversarial Examples. Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. ICLR 2015. |
arXiv |
Dilara |
Mon, Feb 19 |
Adversarial Patch. Tom B. Brown, Dandelion Mane, Aurko Roy, Martin Abadi, Justin Gilmer. December 2017. |
arXiv |
Matthew |
Wed, Feb 21 |
Learning Features by Watching Objects Move. Deepak Pathak, Ross Girshick, Piotr Dollar, Trevor Darrell, Bharath Hariharan. CVPR 2017. |
project page |
Samyak |
Mon, Feb 26 |
Matching Networks for One Shot Learning. Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, Daan Wierstra. NIPS 2016 |
arXiv |
Chia-Wen |
Wed, Feb 28 |
Learning from Simulated and Unsupervised Images through Adversarial Training Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, Russ Webb. CVPR 2017. |
arXiv |
Nathan |
Mon, Mar 5 |
Finding Tiny Faces Peiyun Hu, Deva Ramanan. CVPR 2017 |
arXiv, project page |
Sergio and Govin |
Wed, Mar 7 |
Image-to-Image Translation with Conditional Adversarial Nets. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros. arXiv, Nov. 2016. |
project page |
Rapha and Wenqi |
Mon, Mar 12 |
Generative Adversarial Text-to-Image Synthesis. Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee. ICML 2016. |
project page |
Shijie and Jingjing |
Wed, Mar 14 |
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks Jun-Yan Zhu*, Taesung Park*, Phillip Isola, Alexei A. Efros. ICCV 2017 |
project page |
Ananya |
Mon, Mar 19 | No Classes, Institute Holiday | | |
Wed, Mar 21 | No Classes, Institute Holiday | | |
Mon, Mar 26 |
Feature Pyramid Networks for Object Detection Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie. CVPR 2017 |
arXiv |
Li Tong |
Wed, Mar 28 |
Deformable Convolutional Networks Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei. arXiv March 2017 |
arXiv |
Marc |
Mon, Apr 2 |
Focal Loss for Dense Object Detection Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollar. ICCV 2017 |
arXiv |
Haofeng |
Wed, Apr 4 |
Non-local Neural Networks. Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He. November 2017. |
arXiv |
Chanho |
Mon, Apr 9 |
Mask R-CNN Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick. ICCV 2017 |
arXiv |
Haoping and Jihai |
Wed, Apr 11 |
Annotating Object Instances with a Polygon-RNN Lluis Castrejon, Kaustav Kundu, Raquel Urtasun, Sanja Fidler. CVPR 2017 |
project page |
Rakshith and Nirbhay |
Mon, Apr 16 |
Sequential Grouping Networks for Instance Segmentation S. Liu, J. Ya, S. Fidler and R. Urtasun. ICCV 2017 |
pdf |
Sainandan |
Wed, Apr 18 |
Panoptic Segmentation Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollar. arXiv January 2018 |
arXiv |
Ying |
Mon, Apr 23 |
Inferring and Executing Programs for Visual Reasoning Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick. ICCV 2017 |
project page |
Sean and Zhiyu |
Wed, Apr 25 |
Detect to Track and Track to Detect. Christoph Feichtenhofer, Axel Pinz, Andrew Zisserman. ICCV 2017. |
project page |
Jason |
Final Exam Slot, Friday Apr 27, 2:50 to 5:40 |
Final Project Poster Session |
|
Everyone |
Friday, May 5 |
Final Report or Poster due |
|
Everyone |
Unused 2018 new paper suggestions
Date |
Paper |
Paper, Project page |
Presenter |
|
Scribbler: Controlling Deep Image Synthesis with Sketch and Color Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, James Hays. CVPR 2017 |
project page |
|
|
Aggregated Residual Transformations for Deep Neural Networks Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, Kaiming He. CVPR 2017 |
arXiv |
|
|
Be Your Own Prada: Fashion Synthesis with Structural Coherence S. Zhu, C. Loy, D. Ling, R. Urtasun and S. Fidler. ICCV 2017 |
pdf |
|
|
Light-Head R-CNN: In Defense of Two-Stage Object Detector Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun. arXiv Nov 2017 |
arXiv |
|
|
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam. arXiv April 2017 |
arXiv |
|
|
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun. arXiv July 2017 |
arXiv |
|
|
Photographic Image Synthesis with Cascaded Refinement Networks. Qifeng Chen and Vladlen Koltun. ICCV 2017 |
project page |
|
|
Progressive Growing of GANs for Improved Quality, Stability, and Variation. Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen. arXiv October 2017 |
arXiv |
|
2017 papers (that aren't on 2018 schedule)
Date |
Paper |
Paper, Project page |
Presenter |
|
ImageNet: A Large-Scale Hierarchical Image Database. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei. IEEE Computer
Vision and Pattern Recognition (CVPR), 2009 | pdf, project
page |
|
|
ImageNet Classification with Deep Convolutional Neural Networks. Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton. NIPS 2012.
| pdf |
|
|
How do humans sketch objects? Mathias Eitz, James Hays, and Marc Alexa. Siggraph 2012.
|
project page
|
|
|
What makes Paris look like Paris? Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros. Siggraph 2012. |
project page |
|
|
Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. Nguyen A, Yosinski J, Clune J. CVPR 2015. |
project page |
|
|
FaceNet: A Unified Embedding for Face Recognition and Clustering. Florian Schroff, Dmitry Kalenichenko, James Philbin. CVPR 2015. |
arXiv |
|
|
The Sketchy Database: Learning to Retrieve Badly Drawn Bunnies. Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, James Hays. Siggraph 2016 |
project page |
|
|
Unsupervised Learning of Visual Representations using Videos. Xiaolong Wang, Abhinav Gupta. ICCV 2015. |
project page |
|
|
Deep Residual Learning for Image Recognition. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. MS COCO detection challenge winner 2015. |
arXiv |
|
|
Visually Indicated Sounds. Andrew Owens, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson, William T. Freeman. CVPR. 2016. |
project page |
|
|
"What happens if..." Learning to Predict the Effect of Forces in Images. Roozbeh Mottaghi, Mohammad Rastegari, Abhinav Gupta, Ali Farhadi. ECCV 2016. |
project page |
|
|
YOLO9000: Better, Faster, Stronger. Joseph Redmon, Ali Farhadi. arXiv. Dec. 2016. |
arXiv |
|
|
NetVLAD: CNN architecture for weakly supervised place recognition. Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. CVPR 2016. |
project page |
|
Generative Networks |
|
Generative Adversarial Nets. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. 2014. |
arXiv |
|
|
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Alec Radford, Luke Metz, Soumith Chintala. Nov 2015. |
arXiv |
|
|
Context Encoders: Feature Learning by Inpainting. Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell and Alexei A. Efros. CVPR 2016. |
project page |
|
2017 unused paper suggestions
Date |
Paper |
Paper, Project page |
Presenter |
|
The Cityscapes Dataset for Semantic Urban Scene Understanding. Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele. CVPR 2016 |
arXiv, project page |
|
|
PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization. Alex Kendall, Matthew Grimes, Roberto Cipolla. ICCV 2015. |
arXiv |
|
|
Convolutional Pose Machines. Shih-En Wei, Varun Ramakrishna, Takeo Kanade, Yaser Sheikh. CVPR 2016. |
arXiv, project page |
|
|
Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images. Roozbeh Mottaghi, Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi. CVPR 2016. |
project page |
|
Object Detection |
|
You Only Look Once: Unified, Real-Time Object Detection. Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi. CVPR 2016. |
arXiv |
|
|
SSD: Single Shot MultiBox Detector. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. ECCV 2016. |
arXiv |
|
Weakly Supervised Networks |
|
Supersizing Self-supervision: Learning to Grasp from 50K Tries and 700 Robot Hours. Lerrel Pinto, Abhinav Gupta. ICRA 2016. |
arXiv |
|
Cross Domain Learning |
|
Learning with Side Information through Modality Hallucination. Saurabh Gupta, Judy Hoffman, Jitendra Malik. CVPR 2016. |
arXiv |
|