Danfei Xu
I am an Assistant Professor at the School of Interactive Computing at Georgia Tech and a (part-time) Research Scientist at NVIDIA AI. I work at the intersection of Robotics and Machine Learning.
I received my Ph.D. in CS from Stanford University advised by Fei-Fei Li and Silvio Savarese (2015-2021) and B.S. from Columbia University (SEAS'15). I've spent time at DeepMind UK (2019), ZOOX (2017), Autodesk Research (2016), CMU RI (2014), and Columbia Robotics Lab (2013-2015).
Ph.D. applicants: Our lab is hiring! Apply through the official application portal for the CS Ph.D. program (School of Interactive Computing) and and make sure that you mention my name in your application.
Research opportunities: Please fill out this questionnaire and email me once submitted.
We especially encourage applications from students of traditionally underrepresented groups in robotics and AI such as BIPOC (black, indigenous, and people of color), women, and LGBTQ+ communities.
Email  / 
Google Scholar  / 
CV (Sep 2024)  / 
Github  / 
Twitter
|
|
Research
We aim to build general-purpose and adaptable "brains" for robots in home, factory, healthcare, and search & rescue missions alike. Our work focuses on endowing robots with both flexible high-level planning abilities ("what to do next") and robust low-level sensorimotor control ("how to do it"). The research draws equally from Robotics and Machine Learning, with the following themes:
- Compositionality: Inspired by human's ability to "make infinite use of finite means", we aim to enable (good!) combinatorial explosions in robots' ability to solve new tasks. Examples include neural program synthesis for control [a][b], compositional diffusion models for planning [a], and learning neural-symbolic motor skills [a].
- Generative Modeling: Modeling complex, high-dimensional distributions is fundamental to many robotics problems. We aim to advance core generative modeling research and their applications to robotics problems. Examples include modeling human behaviors in human-robot collaboration [a][b], modeling dynamics in manipulation planning [a], and explorative "play" behaviors [a][b].
- Data-centric Robot Learning: While many AI domains have achieved remarkable success with large-scale learning, Robotics is critically limited in data scale and coverage. Our work develop systems and algorithms for data collection [a], generation [a], quality control [a], as well as standard datasets and benchmarks [a].
- Full-stack Robot Learning: Robotics is as much science as system building. We are committed to developing high-quality, open-source robot hardware and software systems to demonstrate and promote "full-stack" progress in learning-based robotics. We actively maintain Robomimic, a general-purpose Robot Learning framework and benchmark.
|
The Research Group
I direct the Robot Learning and Reasoning Lab (RL2) at Georgia Tech. The current members are:
|
- [Nov 2024] I gave an Early Career Keynote at CoRL 2024 on Robot Learning from Human Data (slides).
- [July 2024] We received support from National Science Foundation (NSF) to develop generative models for robot training and validation. Thanks NSF!
- [July 2024] We are hosting a workshop on Data Generation for Robotics at RSS 2024.
- [June 2024] Invited talk at Toyota Research Institute.
- [May 2024] LEAGUE awarded IEEE RA-L Best Paper Honorable Mention (5/1200) and Open-X-Embodiment won Best Conference Paper at ICRA'24.
- [Dec 2023] Received support from Autodesk to work on high-precision robotic manipulation. Thanks Autodesk!
- [Nov 2023] MimicPlay awarded Best Paper and Best Systems Paper Finalist at CoRL 2023 (2/270).
- [Nov 2023] We are presenting 6 papers at CoRL 2023.
- [Nov 2023] Invited talk at CoRL 2023 (workshop link).
Publications and Preprints (representative works are highlighted)
|
|
EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision
Jiawei Yang,
Boris Ivanovic,
Or Litany,
Xinshuo Weng,
Seung Wook Kim,
Boyi Li,
Tong Che,
Danfei Xu,
Sanja Fidler,
Marco Pavone,
Yue Wang
ICLR 2024
Unsupervised 4D representation learning from videos.
|
|
Positive-Unlabeled Reward Learning
Danfei Xu,
Misha Denil
(Long version) CoRL 2020
(Short version) Late-Breaking Paper, NeurIPS Deep Reinforcement Learning Workshop 2019
[Video]
An algorithm framework that simultaneously addresses the reward delusion problem in supervised reward learning and the overfitting discriminator problem in adversarial imitation learning.
|
|
Procedure Planning in Instructional Videos
Chien-Yi Chang,
De-An Huang,
Danfei Xu,
Ehsan Adeli,
Li Fei-Fei
Juan Carlos Niebles
ECCV, 2020
Learning to plan from instructional videos.
|
|
Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations
Ajay Mandlekar*,
Danfei Xu*,
Roberto Martin-Martin,
Silvio Savarese,
Li Fei-Fei
RSS, 2020
[website]
[video]
[blog post]
Learning visuomotor policies that can generalize across long-horizon tasks by modeling latent compositional structures.
|
|
6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints
Chen Wang,
Roberto Martin-Martin,
Danfei Xu,
Jun Lv,
Cewu Lu,
Li Fei-Fei,
Silvio Savarese,
Yuke Zhu
ICRA, 2020
[website]
[video]
[code]
Real-time category-level 6D object tracking from RGB-D data.
|
|
Regression Planning Networks
Danfei Xu,
Roberto Martin-Martin,
De-An Huang,
Yuke Zhu,
Silvio Savarese,
Li Fei-Fei
NeurIPS, 2019
[code]
[poster]
A flexible neural network architecture for learning to plan from video demonstrations.
|
|
Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning
De-An Huang,
Danfei Xu,
Yuke Zhu,
Silvio Savarese,
Li Fei-Fei,
Juan Carlos Niebles
IROS, 2019
[blog post]
One-shot imitation learning via hybrid neural-symbolic planning.
|
|
Situational Fusion of Visual Representation for Visual Navigation
William B. Shen,
Danfei Xu,
Yuke Zhu,
Leonidas Guibas,
Li Fei-Fei,
Silvio Savarese
ICCV, 2019
Learning generalizable navigation policy from mid-level visual representations.
|
|
DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion
Chen Wang,
Danfei Xu,
Yuke Zhu,
Roberto Martin-Martin,
Cewu Lu,
Li Fei-Fei,
Silvio Savarese
CVPR, 2019
[website]
[video]
[code]
Dense RGB-depth sensor fusion for 6D object pose estimation.
|
|
Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration
De-An Huang*,
Suraj Nair*,
Danfei Xu*,
Yuke Zhu,
Animesh Garg,
Li Fei-Fei,
Silvio Savarese,
Juan Carlos Niebles
CVPR, 2019 (Oral)
[blog post]
Generate executable task graphs from video demonstrations.
|
|
Neural Task Programming: Learning to Generalize Across Hierarchical Tasks
Danfei Xu*,
Suraj Nair*,
Yuke Zhu,
Julian Gao,
Animesh Garg,
Li Fei-Fei,
Silvio Savarese
ICRA, 2018
[website]
[video]
[Two Minute Papers]
[blog post]
Neural Task Programming (NTP) is a meta-learning framework that learns to generate robot-executable neural programs from task demonstration video.
|
|
PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation
Danfei Xu,
Ashesh Jain,
Dragomir Anguelov
CVPR, 2018
End-to-end 3D Bounding Box Estimation via sensor fusion.
|
|
Scene Graph Generation by Iterative Message Passing
Danfei Xu,
Yuke Zhu,
Christopher B. Choy,
Li Fei-Fei
CVPR, 2017
[website] [code]
We propose an end-to-end model that jointly infers object category, location, and relationships. The model learns to iteratively improve its prediction by passing messages on a scene graph.
|
|
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
Christopher B. Choy,
Danfei Xu*,
JunYoung Gwak*,
Silvio Savarese
ECCV, 2016
[website] [code]
We propose an end-to-end 3D reconstruction model that unifies single- and multi-view reconstruction.
|
|
Model-Driven Feed-Forward Prediction for Manipulation of Deformable Objects
Yinxiao Li ,
Yan Wang ,
Yonghao Yue ,
Danfei Xu,
Michael Case ,
Shih-Fu Chang ,
Eitan Grinspun ,
Peter K. Allen
IEEE TASE, 2016
[website]
Deformable object manipulation with an application of personal assitive robot.
This is the journal paper of our "laundry robot" series:
ICRA 2015
IROS 2015
ICRA 2016
|
|
Topometric localization on a road network
Danfei Xu,
Hernan Badino,
Daniel Huber
IROS, 2015
Vision-based localization on a probabilistic road network.
|
|
Tactile identification of objects using Bayesian exploration
Danfei Xu,
Gerald E. Loeb,
Jeremy Fishel
ICRA, 2013
Object classification using multi-modal tactile sensing.
|
- Reviewer: CVPR, ICCV, ECCV, IROS, ICRA, RSS, CoRL, T-RO, AAAI, IJRR, TPAMI, RA-L, NeurIPS, ICLR, ICML
|