Multimodal Data Systems
Our current research agenda is focused on multimodal data systems for supporting workloads that combine structured and unstructured data, inference-centric computation, and machine learning models. This includes query processing over videos, text documents, and other multimodal data, efficient execution of ML and LLM workloads on modern hardware, natural language interfaces to databases, and new abstractions for building data-intensive AI applications. Our goal is to develop principled data management techniques that make these workloads more efficient, scalable, and usable, while preserving the performance and rigor traditionally associated with DBMSs.
Video is a particularly important and rapidly growing source of multimodal data in many domains. For example, an analyst at an autonomous car company may be interested in examining edge cases for their cars, while a neuroscientist may be interested in understanding behavioral patterns of animals. The volume of visual data collected in these domains precludes the possibility of manual analysis. We are studying and developing techniques for accelerating large-scale multimodal analytics using deep learning, query optimization, symbolic reasoning, and adaptive query processing.
- Halo: Domain-Aware Query Optimization for Long-Context Question Answering, Preprint
- PRISM: Navigating Cost–Accuracy Trade-offs for NL2SQL, SIGMOD 2026
- TRACER: Efficient Object Re-Identification in Networked Cameras through Adaptive Query Processing, Preprint
- Aero: Adaptive Query Processing of ML Queries, SIGMOD 2025
- Buffer Management for Out-of-GPU LLM Execution, DEEM 2025
- SketchQL: Video Moment Querying with a Visual Query Interface, SIGMOD 2024
- EVA: An End-to-End Exploratory Video Analytics System, DEEM 2023
- SEIDEN: Revisiting Query Processing in Video Database Systems, VLDB 2023
- GPU Database Systems Characterization and Optimization, VLDB 2023
- RAPIDS: Reconciling Availability, Accuracy, and Performance in Managing Geo-Distributed Scientific Data, HPDC 2023
- EVA: A Symbolic Approach to Accelerating Exploratory Video Analytics with Materialized Views, SIGMOD 2022
- FiGO: Fine-Grained Query Optimization in Video Analytics, SIGMOD 2022
- Zeus: Efficiently Localizing Actions in Videos using Reinforcement Learning, SIGMOD 2022
- ODIN: Automated Drift Detection and Recovery in Video Analytics, VLDB 2020
Past Research Areas
Database Reliability and Debugging
The practical art of constructing database systems involves a morass of trade-offs among query execution speed, standards compliance, feature parity, modularity, portability, and other goals. It is no surprise that database systems, like all complex software systems, contain bugs that can adversely affect their performance and correctness. Both developers and users face challenges while dealing with these bugs. We are developing techniques and toolchains for automatically detecting, reporting, and diagnosing bugs in DBMSs.
- APOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Systems, VLDB 2019
- Automatic Detection of Performance Bugs in Database Systems using Equivalent Queries, ICSE 2022
- SQLCheck: Automated Detection and Diagnosis of SQL Anti-Patterns, SIGMOD 2020
- A Framework For Inferring Properties of User-Defined Functions, ICSE 2024
Automated Query Reasoning and Optimization
Database-as-a-service offerings enable users to quickly create and deploy complex data processing pipelines. In practice, these pipelines often exhibit significant overlap of computation due to redundant execution of certain sub-queries. More broadly, modern query optimizers increasingly need formal and statistical techniques to reason about equivalence, exploit learned signals, and rewrite queries safely. We are designing automated tools for identifying query equivalence, proving semantic properties, and optimizing query execution.
- Automated Verification of Query Equivalence Using Satisfiability Modulo Theories, VLDB 2019
- SPES: A Symbolic Approach to Proving Query Equivalence Under Bag Semantics, ICDE 2022
- SIA: Optimizing Queries using Learned Predicates, SIGMOD 2021
Non-Volatile Memory Database Management Systems
This line of research focuses on a new class of memory category non-volatile memory (NVM) technologies that blur the gap between volatile memory and durable storage. NVM supports low-latency byte-addressable accesses similar to DRAM, but all writes are persistent like SSDs. There are several aspects of NVM that make existing DBMS architectures inappropriate for these devices. We investigate how to rearchitect the DBMS from the ground up to take advantage of NVM.
- BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory, VLDB 2018
- Spitfire: A Three-Tier Buffer Manager for Volatile and Non-Volatile Memory, SIGMOD 2021
- Write-Behind Logging, VLDB 2016
- How to Build a Non-Volatile Memory Database Management System, SIGMOD 2017 (Tutorial)
- Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems, SIGMOD 2015
Self-Driving Database Management Systems
We are pursuing research on DBMSs that can adapt automatically to changing workloads and hardware environments. Tuning modern DBMSs for a particular workload is a laborious and error-prone task due to the long and growing list of knobs that these systems expose. If the DBMS could tune itself automatically, then it would remove many of the complications and costs involved with deployment. Our research focuses on designing algorithms that allow the DBMS to adapt its physical design, storage layout, and query execution strategies.
- Self-Driving Database Management Systems, CIDR 2017
- Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads, SIGMOD 2016
- An Empirical Evaluation of In-Memory Multi-Version Concurrency Control, VLDB 2017
- SlimDB: A Space-Efficient Key-Value Storage Engine For Semi-Sorted Data, VLDB 2017