2023
Habitat-Matterport 3D Semantics Dataset .
Karmesh Yadav, Ram Ramrakhya, Santhosh Kumar Ramakrishnan, Theo Gervet, John Turner, Aaron Gokaslan, Noah Maestre, Angel Xuan Chang, Dhruv Batra, Manolis Savva, Alexander William Clegg, Devendra Singh Chaplot.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
Highlight paper
PIRLNav: Pretraining with Imitation and RL Finetuning for ObjectNav .
Ram Ramrakhya, Dhruv Batra, Erik Wijmans, Abhishek Das.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second.
Vincent-Pierre Berges, Andrew Szot, Devendra Singh Chaplot, Aaron Gokaslan, Roozbeh Mottaghi, Dhruv Batra, Eric Undersander.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
Emergence of Maps in the Memories of Blind Navigation Agents .
Erik Wijmans, Manolis Savva, Irfan Essa, Stefan Lee, Ari S. Morcos, Dhruv Batra.
International Conference on Learning Representations (ICLR), 2023.
Outstanding Paper Award
BC-IRL: Learning Generalizable Reward Functions from Demonstrations .
Andrew Szot, Amy Zhang, Dhruv Batra, Zsolt Kira, Franziska Meier.
International Conference on Learning Representations (ICLR), 2023.
Notable top-25% paper
ViNL: Visual Navigation and Locomotion Over Obstacles .
Simar Kareer*, Naoki Yokoyama*, Dhruv Batra, Sehoon Ha, and Joanne Truong.
IEEE International Conference on Robotics and Automation (ICRA), 2023.
Best Paper Award at Learning for Agile Robotics Workshop at CoRL, 2022.
Simple and Effective Synthesis of Indoor 3D Scenes .
Jing Yu Koh, Harsh Agrawal, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson.
AAAI Conference on Artificial Intelligence (AAAI), 2023.
2022
ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings.
Arjun Majumdar, Gunjan Aggarwal, Bhavika Devnani, Judy Hoffman, Dhruv Batra.
Neural Information Processing Systems (NeurIPS), 2022.
Variable Experience Rollout: Training Robust Skill Policies for Rearrangement .
Erik Wijmans, Irfan Essa, Dhruv Batra.
Neural Information Processing Systems (NeurIPS), 2022.
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning .
Changan Chen, Carl Schissler, Sanchit Garg, Philip Kobernik, Alexander Clegg, Paul Calamia, Dhruv Batra, Philip W Robinson, Kristen Grauman.
Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks, 2022.
Sim2Real: Lower Fidelity Simulation Leads to Higher Sim2Real Transfer in Navigation .
Joanne Truong, Max Rudolph, Naoki Yokoyama, Sonia Chernova, Dhruv Batra, Akshara Rai.
Conference on Robot Learning (CoRL), 2022.
Cross-Domain Transfer via Semantic Skill Imitation .
Karl Pertsch, Ruta Desai, Vikash Kumar, Franziska Meier, Joseph J Lim, Dhruv Batra, Akshara Rai.
Conference on Robot Learning (CoRL), 2022.
Housekeep: Tidying Virtual Households using Commonsense Reasoning .
Yash Kant, Arun Ramachandran, Sriram Yenamandra, Igor Gilitschenski, Dhruv Batra, Andrew Szot, Harsh Agrawal.
European Conference on Computer Vision (ECCV), 2022.
Memory-Augmented Reinforcement Learning for Image-Goal Navigation .
Lina Mezghani, Sainbayar Sukhbaatar, Thibaut Lavril, Oleksandr Maksymets, Dhruv Batra, Piotr Bojanowski, Karteek Alahari.
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022.
Benchmarking Augmentation Methods for Learning Robust Navigation Agents: the Winning Entry of the 2021 iGibson Challenge .
Naoki Yokoyama, Qian Luo, Dhruv Batra, Sehoon Ha.
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022.
Episodic Memory Question Answering .
Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna, Dhruv Batra, Devi Parikh.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
Oral Presentation
Around the World in 3,000 Hours of Egocentric Video .
Kristen Grauman et al.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
Best Paper Award Finalist
Learning Embodied Object-Search Strategies from Human Demonstrations at Scale .
Ram Ramrakhya, Eric Undersander, Dhruv Batra, Abhishek Das.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
Is Mapping Necessary for Realistic PointGoal Navigation?
Ruslan Partsey, Erik Wijmans, Naoki Yokoyama, Oles Dobosevych, Dhruv Batra, Oleksandr Maksymets.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget .
Erik Wijmans, Irfan Essa, Dhruv Batra.
International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2022.
2021
Habitat 2.0: Training Home Assistants to Rearrange their Habitat .
Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir Vondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, Dhruv Batra.
Neural Information Processing Systems (NeurIPS), 2021.
Spotlight paper
Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI .
Santhosh K. Ramakrishnan, Aaron Gokaslan, Erik Wijmans, Oleksandr Maksymets, Alex Clegg, John Turner, Eric Undersander, Wojciech Galuba, Andrew Westbury, Angel X. Chang, Manolis Savva, Yili Zhao, Dhruv Batra.
Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks, 2021.
[ https://aihabitat.org/datasets/hm3d/ ]
SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation .
Abhinav Moudgil, Arjun Majumdar, Harsh Agrawal, Stefan Lee, Dhruv Batra.
Neural Information Processing Systems (NeurIPS), 2021.
Model-Advantage Optimization for Model-Based Reinforcement Learning .
Nirbhay Modhe, Harish Kamath, Dhruv Batra, Ashwin Kalyan.
arXiv:2106.14080, 2021.
THDA: Treasure Hunt Data Augmentation for Semantic Navigation.
Oleksandr Maksymets, Vincent Cartillier, Aaron Gokaslan, Erik Wijmans, Stefan Lee, Wojciech Galuba, Dhruv Batra.
International Conference on Computer Vision (ICCV), 2021.
Contrast and Classify: Training Robust VQA Models .
Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal.
International Conference on Computer Vision (ICCV), 2021.
Auxiliary Tasks and Exploration Enable ObjectNav .
Joel Ye, Dhruv Batra, Abhishek Das, Erik Wijmans.
International Conference on Computer Vision (ICCV), 2021.
Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation .
Xiaoming Zhao, Harsh Agrawal, Dhruv Batra, Alexander Schwing.
International Conference on Computer Vision (ICCV), 2021.
Waypoint Models for Instruction-guided Navigation in Continuous Environments .
Jacob Krantz, Aaron Gokaslan, Dhruv Batra, Stefan Lee, Oleksandr Maksymets.
International Conference on Computer Vision (ICCV), 2021.
Oral Presentation
Large Batch Simulation for Deep Reinforcement Learning .
Brennan Shacklett, Erik Wijmans, Aleksei Petrenko, Manolis Savva, Dhruv Batra, Vladlen Koltun, Kayvon Fatahalian.
International Conference on Learning Representations (ICLR), 2021.
Success Weighted by Completion Time: A Dynamics-Aware Evaluation Criteria for Embodied Navigation .
Naoki Yokoyama, Sehoon Ha, Dhruv Batra.
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.
Learning Navigation Skills for Legged Robots with Learned Robot Embeddings .
Joanne Truong, Denis Yarats, Tianyu Li, Franziska Meier, Sonia Chernova, Dhruv Batra, Akshara Rai.
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.
Bi-directional Domain Adaptation for Sim2Real Transfer of Embodied Navigation Agents .
Joanne Truong, Sonia Chernova, Dhruv Batra.
IEEE International Conference on Robotics and Automation (ICRA), 2021.
IEEE Robotics and Automation Letters (RA-L), 2021.
SOrT-ing VQA Models: Contrastive Gradient Learning for Improved Consistency .
Sameer Dharur, Purva Tendulkar, Dhruv Batra, Devi Parikh, Ramprasaath R. Selvaraju.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021.
Semantic MapNet: Building Allocentric Semantic Maps and Representations from Egocentric Views .
Vincent Cartillier, Zhile Ren, Neha Jain, Stefan Lee, Irfan Essa, Dhruv Batra.
AAAI Conference on Artificial Intelligence (AAAI), 2021.
2020
Rearrangement: A Challenge for Embodied AI .
Dhruv Batra, Angel X. Chang, Sonia Chernova, Andrew J. Davison, Jia Deng, Vladlen Koltun,
Sergey Levine, Jitendra Malik, Igor Mordatch, Roozbeh Mottaghi, Manolis Savva, Hao Su.
arXiv:2011.01975, 2020.
Sim-to-Real Transfer for Vision-and-Language Navigation .
Peter Anderson, Ayush Shrivastava, Joanne Truong, Arjun Majumdar, Devi Parikh, Dhruv Batra, Stefan Lee.
Conference on Robot Learning (CoRL), 2020.
Auxiliary Tasks Speed Up Learning PointGoal Navigation .
Joel Ye, Dhruv Batra, Erik Wijmans, Abhishek Das.
Conference on Robot Learning (CoRL), 2020.
Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents .
Samyak Datta, Oleksandr Maksymets, Judy Hoffman, Stefan Lee, Dhruv Batra, Devi Parikh.
Conference on Robot Learning (CoRL), 2020.
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data .
Michael Cogswell, Jiasen Lu, Rishabh Jain, Stefan Lee, Devi Parikh, Dhruv Batra.
Neural Information Processing Systems (NeurIPS), 2020.
Where Are You? Localization from Embodied Dialog .
Meera Hahn, Jacob Krantz, Dhruv Batra, Devi Parikh, James Rehg, Stefan Lee, Peter Anderson.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects .
Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans.
arXiv:2006.13171, 2020.
Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Performance?
Abhishek Kadian*, Joanne Truong*, Aaron Gokaslan, Alexander Clegg, Erik Wijmans, Stefan Lee, Manolis Savva, Sonia Chernova, Dhruv Batra.
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020.
IEEE Robotics and Automation Letters (RA-L), 2020.
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web .
Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra.
European Conference on Computer Vision (ECCV), 2020.
Spotlight
Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments .
Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee.
European Conference on Computer Vision (ECCV), 2020.
Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation .
Medhini Narasimhan, Erik Wijmans, Xinlei Chen, Trevor Darrell, Dhruv Batra, Devi Parikh, Amanpreet Singh.
European Conference on Computer Vision (ECCV), 2020.
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline .
Vishvak Murahari, Dhruv Batra, Devi Parikh, Abhishek Das.
European Conference on Computer Vision (ECCV), 2020.
Spatially Aware Multimodal Transformers for TextVQA .
Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal.
European Conference on Computer Vision (ECCV), 2020.
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames .
Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra.
International Conference on Learning Representations (ICLR), 2020.
Analyzing Visual Representations in Embodied Navigation Tasks .
Erik Wijmans, Julian Straub, Dhruv Batra, Irfan Essa, Judy Hoffman, Ari Morcos.
arXiv:2003.05993, 2020.
IR-VIC: Unsupervised Discovery of Sub-goals for Transfer in RL .
Nirbhay Modhe, Prithvijit Chattopadhyay, Mohit Sharma, Abhishek Das, Devi Parikh, Dhruv Batra, Ramakrishna Vedantam.
International Joint Conference on Artificial Intelligence (IJCAI), 2020.
Embodied Multimodal Multitask Learning .
Devendra Singh Chaplot, Lisa Lee, Ruslan Salakhutdinov, Devi Parikh, Dhruv Batra.
International Joint Conference on Artificial Intelligence (IJCAI), 2020.
2019
The Replica Dataset: A Digital Replica of Indoor Spaces .
Julian Straub et al.
arXiv:1906.05797, 2019.
Emergence of Compositional Language with Deep Generational Transmission .
Michael Cogswell, Jiasen Lu, Stefan Lee, Devi Parikh, Dhruv Batra.
arXiv:1907.02022, 2019.
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks .
Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee.
Neural Information Processing Systems (NeurIPS), 2019.
Chasing Ghosts: Instruction Following as Bayesian State Tracking .
Peter Anderson*, Ayush Shrivastava*, Devi Parikh, Dhruv Batra, Stefan Lee.
Neural Information Processing Systems (NeurIPS), 2019.
Habitat: A Platform for Embodied AI Research .
Manolis Savva*, Abhishek Kadian*, Oleksandr Maksymets*, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra.
International Conference on Computer Vision (ICCV), 2019.
[ https://aihabitat.org/ ]
Best Paper Award Nominee
nocaps: novel object captioning at scale .
Harsh Agrawal*, Karan Desai*, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson.
International Conference on Computer Vision (ICCV), 2019.
[ https://nocaps.org ]
SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation .
Daniel Gordon, Abhishek Kadian, Devi Parikh, Judy Hoffman, Dhruv Batra.
International Conference on Computer Vision (ICCV), 2019.
Embodied Visual Recognition .
Jianwei Yang*, Zhile Ren*, Mingze Xu, Xinlei Chen, David Crandall, Devi Parikh, Dhruv Batra.
International Conference on Computer Vision (ICCV), 2019.
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded .
Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Dhruv Batra, Devi Parikh.
International Conference on Computer Vision (ICCV), 2019.
Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning .
Jyoti Aneja*, Harsh Agrawal*, Dhruv Batra, Alexander Schwing.
International Conference on Computer Vision (ICCV), 2019.
Improving Generative Visual Dialog by Answering Diverse Questions .
Vishvak Murahari, Prithvijit Chattopadhyay, Dhruv Batra, Devi Parikh, Abhishek Das.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
Trainable Decoding of Sets of Sequences for Neural Sequence Models .
Ashwin Kalyan, Peter Anderson, Stefan Lee, Dhruv Batra.
International Conference on Machine Learning (ICML), 2019.
Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering .
Ramakrishna Vedantam, Karan Desai, Stefan Lee, Marcus Rohrbach, Dhruv Batra, Devi Parikh.
International Conference on Machine Learning (ICML), 2019.
TarMAC: Targeted Multi-Agent Communication .
Abhishek Das, Théophile Gervet, Joshua Romoff, Dhruv Batra, Devi Parikh, Michael Rabbat, Joelle Pineau.
International Conference on Machine Learning (ICML), 2019.
Counterfactual Visual Explanations .
Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, Stefan Lee.
International Conference on Machine Learning (ICML), 2019.
CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication .
Jin-Hwa Kim*, Nikita Kitaev*, Xinlei Chen, Marcus Rohrbach, Yuandong Tian, Dhruv Batra, Devi Parikh.
Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
Embodied Question Answering in Photorealistic Environments with Point Cloud Perception .
Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[ http://embodiedqa.org ]
Oral Presentation
Multi-target Embodied Question Answering .
Licheng Yu, Xinlei Chen, Georgia Gkioxari, Mohit Bansal, Tamara L. Berg, Dhruv Batra.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[ http://embodiedqa.org ]
Towards VQA Models That Can Read .
Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, Marcus Rohrbach.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[ https://textvqa.org/ ]
Audio-Visual Scene-Aware Dialog .
Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Stefan Lee, Peter Anderson, Irfan Essa, Devi Parikh, Dhruv Batra, Anoop Cherian, Tim K. Marks, Chiori Hori.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[ https://video-dialog.com// ]
CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog .
Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019.
Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future .
Nan Rosemary Ke, Amanpreet Singh, Ahmed Touati, Anirudh Goyal, Yoshua Bengio, Devi Parikh, Dhruv Batra.
International Conference on Learning Representations (ICLR), 2019.
2018
Pythia v0.1: The Winning Entry to the VQA Challenge 2018 .
Yu Jiang, Vivek Natarajan, Xinlei Chen, Marcus Rohrbach, Dhruv Batra, Devi Parikh.
arXiv:1807.09956, 2018.
Talk the Walk: Navigating New York City through Grounded Dialogue .
Harm de Vries, Kurt Shuster, Dhruv Batra, Devi Parikh, Jason Weston, Douwe Kiela.
arXiv:1807.03367, 2018.
Neural Modular Control for Embodied Question Answering .
Abhishek Das, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra.
Conference on Robot Learning (CoRL), 2018.
Learning to Ask Questions to Learn Visual Recognition .
Jianwei Yang*, Jiasen Lu*, Stefan Lee, Dhruv Batra, Devi Parikh.
Conference on Robot Learning (CoRL), 2018.
Visual Coreference Resolution in Visual Dialog using Neural Module Networks .
Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach.
European Conference on Computer Vision (ECCV), 2018.
Choose Your Neuron: Incorporating Domain Knowledge through Neuron Importance .
Ramprasaath R. Selvaraju, Prithvijit Chattopadhyay, Mohamed Elhoseiny, Tilak Sharma, Dhruv Batra, Devi Parikh, Stefan Lee.
European Conference on Computer Vision (ECCV), 2018.
Graph R-CNN for Scene Graph Generation .
Jianwei Yang*, Jiasen Lu*, Stefan Lee, Dhruv Batra, Devi Parikh.
European Conference on Computer Vision (ECCV), 2018.
Learn From Your Neighbor: Learning Multi-Modal Distributions from Sparse Annotation .
Ashwin K Vijayakumar, Stefan Lee, Anitha Kannan, Dhruv Batra.
International Conference on Machine Learning (ICML), 2018.
Oral Presentation (Long Talk)
Embodied Question Answering .
Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[ http://embodiedqa.org ]
Oral Presentation
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering .
Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
Neural Baby Talk .
Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[ code ]
Spotlight
Visual Dialog .
Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, Stefan Lee, José M. F. Moura, Devi Parikh, Dhruv Batra.
Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2018.
[ visualdialog.org | video ]
Neural Guided Deductive Search for Real-time Program Synthesis .
Abhishek Mohta∗, Ashwin K Vijayakumar∗, Oleksandr Polozov, Dhruv Batra, Sumit Gulwani, Prateek Jain.
International Conference on Learning Representations (ICLR), 2018.
Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models .
Ashwin Vijayakumar, Michael Cogswell, Ramprasaath Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra.
AAAI Conference on Artificial Intelligence (AAAI), 2018.
[ Code
| CloudCV demo ]
2017
Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?
Abhishek Das*, Harsh Agrawal*, Larry Zitnick, Devi Parikh, Dhruv Batra.
Computer Vision and Image Understanding (CVIU), 2017.
Resolving Vision and Language Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes.
Gordon Christie, Ankit Laddha, Aishwarya Agrawal, Stanislaw Antol, Yash Goyal, Kevin Kochersberger, Dhruv Batra.
Computer Vision and Image Understanding (CVIU), 2017.
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model .
Jiasen Lu, Anitha Kannan, Jianwei Yang, Devi Parikh, Dhruv Batra.
Neural Information Processing Systems (NIPS), 2017.
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning .
Abhishek Das*, Satwik Kottur*, José M. F. Moura, Stefan Lee, Dhruv Batra.
International Conference on Computer Vision (ICCV), 2017.
[ visualdialog.org (data, code)
| CloudCV demo ]
Oral Presentation
Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization .
Ramprasaath Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra.
International Conference on Computer Vision (ICCV), 2017.
[ Code
| CloudCV demo
| video ]
Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog .
Satwik Kottur, José M.F. Moura, Stefan Lee, Dhruv Batra.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017.
Best Paper Award
Deal or No Deal? End-to-End Learning for Negotiation Dialogues .
Mike Lewis, Denis Yarats, Yann N. Dauphin, Devi Parikh, Dhruv Batra.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017.
The Promise of Premise: Harnessing Question Premises in Visual Question Answering .
Aroma Mahendru, Viraj Prabhu, Akrit Mohapatra, Dhruv Batra, Stefan Lee.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017.
ParlAI: A Dialog Research Software Platform .
Alexander H. Miller, Will Feng, Adam Fisch, Jiasen Lu, Dhruv Batra, Antoine Bordes, Devi Parikh, Jason Weston.
Conference on Empirical Methods in Natural Language Processing (EMNLP) System Demonstrations Track, 2017.
[ parl.ai ]
Evaluating Visual Dialog Agents via Cooperative Human-AI Games .
Prithvijit Chattopadhyay, Deshraj Yadav, Viraj Prabhu, Arjun Chandrasekaran, Abhishek Das, Stefan Lee, Dhruv Batra, Devi Parikh.
AAAI Conference on Human Computation and Crowdsourcing (HCOMP), 2017.
[ visualdialog.org ]
Visual Dialog .
Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[ visualdialog.org | video ]
Spotlight
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning .
Qing Sun, Stefan Lee, Dhruv Batra.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering .
Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, Devi Parikh.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[ Webpage
| video ]
Counting Everyday Objects in Everyday Scenes .
Prithvijit Chattopadhyay, Ramakrishna Vedantam, Ramprasaath R. Selvaraju, Dhruv Batra, Devi Parikh.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
Spotlight
LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation .
Jianwei Yang, Anitha Kannan, Dhruv Batra, Devi Parikh.
International Conference on Learning Representations (ICLR), 2017.
[ code ]
VQA: Visual Question Answering .
Aishwarya Agrawal*, Jiasen Lu*, Stanislaw Antol*, Margaret Mitchell, C. Lawrence Zitnick, Devi Parikh, Dhruv Batra.
International Journal of Computer Vision (IJCV), Special Issue on Combined Image and Language Understanding, 2017.
[
visualqa.org (data, code, challenge)
| CloudCV demo
| video ]
2016
Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles .
Stefan Lee, Senthil Purushwalkam, Michael Cogswell, Viresh Ranjan, David Crandall, Dhruv Batra.
Neural Information Processing Systems (NIPS), 2016.
Hierarchical Question-Image Co-Attention for Visual Question Answering .
Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh.
Neural Information Processing Systems (NIPS), 2016.
06/2016: State of art performance on the VQA dataset (62.06% Open-Ended / 66.33% Multiple-Choice ).
Updated: Not top of leaderboard anymore. See here for latest numbers.
Sort Story: Sorting Jumbled Images and Captions into Stories .
Harsh Agrawal*, Arjun Chandrasekaran*, Dhruv Batra, Devi Parikh, Mohit Bansal.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016.
Analyzing the Behavior of Visual Question Answering Models .
Aishwarya Agrawal, Dhruv Batra, Devi Parikh.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016.
Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions .
Arijit Ray, Gordon Christie, Mohit Bansal, Dhruv Batra, Devi Parikh.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016.
Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?
Abhishek Das*, Harsh Agrawal*, C. Lawrence Zitnick, Devi Parikh, Dhruv Batra.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016.
Preliminary version:
International Conference on Machine Learning (ICML) Workshop on Visualization for Deep Learning, 2016.
Best Student Paper
Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes .
Gordon Christie, Ankit Laddha, Aishwarya Agrawal, Stanislaw Antol, Yash Goyal, Kevin Kochersberger, Dhruv Batra.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016.
Towards Transparent AI Systems: Interpreting Visual Question Answering Models .
Yash Goyal, Akrit Mohapatra, Devi Parikh, Dhruv Batra.
International Conference on Machine Learning (ICML) Workshop on Visualization for Deep Learning, 2016.
Best Student Paper
Measuring Machine Intelligence Through Visual Question Answering .
Larry Zitnick, Aishwarya Agrawal, Stanislaw Antol, Margaret Mitchell, Dhruv Batra, Devi Parikh.
AI Magazine, 2016.
Object-Proposal Evaluation Protocol is 'Gameable' .
Neelima Chavali*, Harsh Agrawal*, Aroma Mahendru*, Dhruv Batra.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[ project page (data, code)
| Object Proposals Library ]
Spotlight
We Are Humor Beings: Understanding and Predicting Visual Humor .
Arjun Chandrasekaran, Ashwin K Vijayakumar, Stanislaw Antol, Mohit Bansal,
Dhruv Batra, C. Lawrence Zitnick, Devi Parikh.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
Spotlight
Yin and Yang: Balancing and Answering Binary Visual Questions .
Peng Zhang*, Yash Goyal*, Douglas Summers-Stay, Dhruv Batra, Devi Parikh.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
Simultaneously Discovering Image Clusters and Deep Representations .
Jianwei Yang, Devi Parikh, Dhruv Batra.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[ code
]
Visual Storytelling .
Ting-Hao Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal,
Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra,
Larry Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell.
Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), 2016.
[ project page (with data)
]
Also presented at the GroupSight workshop
AAAI Conference on Human Computation and Crowdsourcing (HCOMP), 2016.
A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories .
Nasrin Mostafazadeh, Nathanael Chambers, Xiadong He, Devi Parikh,
Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, James Allen.
Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), 2016.
[ project page (with data) ]
Oral Presentation
Empirical Minimum Bayes Risk Prediction .
Vittal Premachandran, Daniel Tarlow, Alan Yuille, Dhruv Batra.
Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2016.
[ project page |
code
]
Reducing Overfitting in Deep Networks by Decorrelating Representations .
Michael Cogswell, Faruk Ahmed, Ross Girshick, Larry Zitnick, Dhruv Batra.
International Conference on Learning Representations (ICLR), 2016.
Pose Tracking by Efficiently Exploiting Global Features .
Ratnesh Kumar, Dhruv Batra.
IEEE Winter Conference on Applications of Computer Vision (WACV), 2016.
2015
Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks .
Stefan Lee, Senthil Purushwalkam, Michael Cogswell, David Crandall, Dhruv Batra.
arXiv:1511.06314, 2015.
SubmodBoxes: Near-Optimal Search for a Set of Diverse Object Proposals .
Qing Sun, Dhruv Batra.
Neural Information Processing Systems (NIPS), 2015.
VQA: Visual Question Answering .
Stanislaw Antol*, Aishwarya Agrawal*, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh.
International Conference on Computer Vision (ICCV), 2015.
[
visualqa.org (data, code, challenge)
| CloudCV demo
| video spotlight ]
Optimizing Expected Intersection-over-Union with Candidate-Constrained CRFs .
Faruk Ahmed, Daniel Tarlow, Dhruv Batra.
International Conference on Computer Vision (ICCV), 2015.
CloudCV: Large Scale Distributed Computer Vision as a Cloud Service .
Harsh Agrawal, Clint Solomon Mathialagan, Yash Goyal, Neelima Chavali, Prakriti Banik, Akrit Mohapatra, Ahmed Osman, Dhruv Batra.
Book Chapter, Mobile Cloud Visual Media Computing.
Editors: Gang Hua, Xian-Sheng Hua. Springer, 2015.
[ CloudCV ]
Active Learning for Structured Probabilistic Models with Histogram Approximation .
Qing Sun, Ankit Laddha, Dhruv Batra.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[ project page (data, code, poster, slides)
| talk ]
Oral Presentation
VIP: Finding Important People in Images .
Clint Solomon Mathialagan, Andrew C. Gallagher, Dhruv Batra.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[ project page
| online demo on CloudCV ]
A Comparative Study of Modern Inference Techniques for Discrete Energy Minimization Problems .
Joerg Kappes, Bjoern Andres, Christoph Schnoerr, Fred Hamprecht, Sebastian Nowozin,
Dhruv Batra, Sungwoong Kim, Thorben Kroeger, Bernhard X. Kausler, Jan Lellmann, Bogdan Savchynskyy,
Nikos Komodakis, Carsten Rother.
International Journal of Computer Vision (IJCV), 2015.
[ project page + data + code ]
2014
Submodular meets Structured: Finding Diverse Subsets in Exponentially-Large Structured Item Sets .
Adarsh Prasad, Stefanie Jegelka, Dhruv Batra.
Neural Information Processing Systems (NIPS), 2014.
[ data ]
Spotlight
Human Pose Estimation via Multi-layer Composite Models .
Kun Duan, Dhruv Batra, David Crandall.
Signal Processing, 2014.
Empirical Minimum Bayes Risk Prediction:
How to extract an extra few% performance from vision models with just three more parameters .
Vittal Premachandran, Daniel Tarlow, Dhruv Batra.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
[ project page |
code
]
Multimodal Learning in Loosely-organized Web Images .
Kun Duan, David J. Crandall, Dhruv Batra.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
Putting the User in the Loop for Image-Based Modeling .
Adarsh Kowdle, Yao-Jen Chang, Andrew Gallagher, Dhruv Batra and Tsuhan Chen.
International Journal of Computer Vision (IJCV), 2014.
Efficiently Enforcing Diversity in Multi-Output Structured Prediction .
Abner Guzman-Rivera, Pushmeet Kohli, Dhruv Batra, Rob Rutenbar.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2014.
Oral Presentation
2013
Submodular Maximization and Diversity in Structured Output Spaces .
Adarsh Prasad, Stefanie Jegelka, Dhruv Batra.
Workshop on Discrete Optimization in Machine Learning (DISCML)
Neural Information Processing Systems (NIPS), 2013.
Group Norm for Learning Structured SVMs with Unstructured Latent Variables .
Daozheng Chen, Dhruv Batra, William T. Freeman.
International Conference on Computer Vision (ICCV), 2013.
A Systematic Exploration of Diversity in Machine Translation .
Kevin Gimpel, Dhruv Batra, Greg Shakhnarovich, Chris Dyer.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2013.
Discriminative Re-ranking of Diverse Segmentations .
Payman Yadollahpour, Dhruv Batra, Greg Shakhnarovich.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
[ leaderboard link ]
06/2013: State of art (48.1%) performance on PASCAL VOC Segmentation 2012.
See here for more recent results.
A Comparative Study of Modern Inference Techniques for Discrete Energy Minimization Problems .
Joerg Kappes, Bjoern Andres, Christoph Schnoerr, Fred Hamprecht, Sebastian Nowozin,
Dhruv Batra, Jan Lellmann, Nikos Komodakis, Sungwoong Kim, Bernhard Kausler, Carsten Rother.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
[ talk video ]
Oral Presentation
DivMCuts: Faster Training of Structural SVMs with Diverse M-Best Cutting-Planes .
Abner Guzman-Rivera, Pushmeet Kohli, Dhruv Batra.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2013.
[ talk video ]
Oral Presentation
2012
Mode-Marginals: Expressing Uncertainty via Diverse M-Best Solutions .
Varun Ramakrishna, Dhruv Batra.
Workshop on Perturbations, Optimization, and Statistics.
Neural Information Processing Systems (NIPS), 2012.
Faster Training of Structural SVMs with Diverse M-Best Cutting-Planes .
Abner Guzman-Rivera, Pushmeet Kohli, Dhruv Batra.
Workshop on Discrete Optimization in Machine Learning (DISCML)
Neural Information Processing Systems (NIPS), 2012.
Multiple Choice Learning: Learning to Produce Multiple Structured Outputs .
Abner Guzman-Rivera, Dhruv Batra, Pushmeet Kohli.
Neural Information Processing Systems (NIPS), 2012.
Diverse M-Best Solutions in Markov Random Fields .
Dhruv Batra, Payman Yadollahpour, Abner Guzman-Rivera, Greg Shakhnarovich.
European Conference on Computer Vision (ECCV), 2012.
[ talk slides (pptx) |
talk video |
code
]
Oral Presentation
An Efficient Message-Passing Algorithm for the M-Best MAP Problem .
Dhruv Batra.
The Conference on Uncertainty in Artificial Intelligence (UAI), 2012.
[ talk slides (pptx) ]
Oral Presentation
MaxFlow Revisited: An Empirical Comparison of Maxflow Algorithms for Dense Vision Problems .
Tanmay Verma, Dhruv Batra.
British Machine Vision Conference (BMVC), 2012.
[ project page ]
A Multi-layer Composite Model for Human Pose Estimation .
Kun Duan, Dhruv Batra, David J. Crandall.
British Machine Vision Conference (BMVC), 2012.
[ project page ]
Learning the Right Model: Efficient Max-Margin Learning in Laplacian CRFs .
Dhruv Batra, Ashutosh Saxena.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
2011
M-Best Modes: Diverse M-Best Solutions in MRFs .
Payman Yadollahpour, Dhruv Batra, Greg Shakhnarovich.
Workshop on Discrete Optimization in Machine Learning,
Neural Information Processing Systems (NIPS), 2011.
Group Norm for Learning Latent Structural SVMs .
Daozheng Chen, Dhruv Batra, William T. Freeman, Micah K. Johnson.
Workshop on Optimization in Machine Learning,
Neural Information Processing Systems (NIPS), 2011.
Similarity Sensitive Nonlinear Embeddings .
Dhruv Batra, Greg Shakhnarovich.
Workshop on Kernels and Distances for Computer Vision,
International Conference on Computer Vision (ICCV), 2011.
Interactive Co-segmentation of Objects in Image Collections.
Dhruv Batra, Adarsh Kowdle, Devi Parikh, Jiebo Luo, Tsuhan Chen.
SpringerBriefs in Computer Science, 2011.
Mini-Book [ Springer Page |
project page ]
Dynamic Tree-Block Coordinate Ascent .
Daniel Tarlow, Dhruv Batra, Pushmeet Kohli, Vladimir Kolmogorov.
International Conference on Machine Learning (ICML), 2011.
[ talk slides | code ]
Making the Right Moves: Guiding Alpha-Expansion using Local Primal-Dual Gaps .
Dhruv Batra, Pushmeet Kohli.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
Inference for Order Reduction in Markov Random Fields .
Andrew Gallagher, Dhruv Batra, Devi Parikh.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
Tighter Relaxations for MAP-MRF Inference: A Local Primal-Dual Gap based Separation Algorithm .
Dhruv Batra, Sebastian Nowozin, Pushmeet Kohli.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
Scribble Based Interactive 3D Reconstruction via Scene Co-segmentation .
Adarsh Kowdle, Yao-Jen Chen, Dhruv Batra, Tsuhan Chen.
IEEE International Conference on Image Processing (ICIP), 2011.
[ project page |
dataset ]
Oral Presentation
Interactively Co-segmenting Topically Related Images with Intelligent Scribble Guidance .
Dhruv Batra, Adarsh Kowdle, Devi Parikh, Jiebo Luo, Tsuhan Chen.
International Journal of Computer Vision (IJCV), 2011.
[ Springer Page |
project page ]
03/2011: Among top downloaded articles from IJCV in the last 30 days.
2010
iModel: Interactive Co-segmentation for Object of Interest 3D Modeling .
Adarsh Kowdle, Dhruv Batra, Wen-Chao Chen, Tsuhan Chen.
Workshop on Reconstruction and Modeling of Large-Scale 3D Virtual Environments,
European Conference on Computer Vision (ECCV), 2010.
[ project page |
dataset ]
Beyond Trees: MRF Inference via Outer-Planar Decomposition .
Dhruv Batra, Andrew Gallagher, Devi Parikh, Tsuhan Chen.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
iCoseg: Interactive Co-segmentation with Intelligent Scribble Guidance .
Dhruv Batra, Adarsh Kowdle, Devi Parikh, Jiebo Luo, Tsuhan Chen.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
[ project page ]
2009
Dynamic Planar-Cuts: Efficient Computation of Min-Marginals for Outer-Planar Models .
Dhruv Batra, Tsuhan Chen.
Workshop on Discrete Optimization in Machine Learning,
Neural Information Processing Systems (NIPS), 2009.
Seed Image Selection in Interactive Cosegmentation .
Dhruv Batra, Devi Parikh, Adarsh Kowdle, Tsuhan Chen, Jiebo Luo.
International Conference on Image Processing (ICIP), 2009.
[ project page ]
Cutout-Search: Putting a name to the Picture .
Dhruv Batra, Adarsh Kowdle, Devi Parikh, Tsuhan Chen.
Workshop on Internet Vision,
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
2008
Semi-Supervised Clustering via Learnt Codeword Distances .
Dhruv Batra, Rahul Sukthankar, Tsuhan Chen.
British Machine Vision Conference (BMVC), 2008.
Learning Class-Specific Affinities for Image Labelling .
Dhruv Batra, Rahul Sukthankar, Tsuhan Chen.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
[ project page ]
Space-Time Shapelets for Action Recognition .
Dhruv Batra, Tsuhan Chen, Rahul Sukthankar.
Workshop on Motion and Video Computing (WMVC), 2008,
IEEE Winter Vision Meetings.
[ project page ]
Patents
Finding Important People in Images.
Clint Mathialagan, Dhruv Batra.
U.S. Patent Application No: 62/169, 634; Filing Date: June 2, 2015
Method for Generating Object Cutout for Topically Related Photographs.
Jiebo Luo, Dhruv Batra, Andrew C. Gallagher.
Application number: 12/397,547; Publication number: US 2010/0226566 A1; Filing date: Mar 4, 2009.
[ googe patent page |
faqs page ]
Demos
iModel: Object of Interest 3D Modeling via Interactive Co-segmentation on a Mobile Device .
Adarsh Kowdle, Haochen Liu, ShaoYou Hsu, Jason Lew, Charvi Puri, Dhruv Batra, Tsuhan Chen.
Demo session at Computer Vision and Pattern Recognition (CVPR), 2012.
Interactive Cosegmentation by Touch .
Dhruv Batra, Adarsh Kowdle, Kevin Tang, Devi Parikh, Jiebo Luo, Tsuhan Chen.
Demo session at Computer Vision and Pattern Recognition (CVPR), 2009.