RESEARCH

Multimodal Data Systems

Our current research agenda is focused on multimodal data systems for supporting workloads that combine structured and unstructured data, inference-centric computation, and machine learning models. This includes query processing over videos, text documents, and other multimodal data, efficient execution of ML and LLM workloads on modern hardware, natural language interfaces to databases, and new abstractions for building data-intensive AI applications. Our goal is to develop principled data management techniques that make these workloads more efficient, scalable, and usable, while preserving the performance and rigor traditionally associated with DBMSs.

Video is a particularly important and rapidly growing source of multimodal data in many domains. For example, an analyst at an autonomous car company may be interested in examining edge cases for their cars, while a neuroscientist may be interested in understanding behavioral patterns of animals. The volume of visual data collected in these domains precludes the possibility of manual analysis. We are studying and developing techniques for accelerating large-scale multimodal analytics using deep learning, query optimization, symbolic reasoning, and adaptive query processing.

  • Halo: Domain-Aware Query Optimization for Long-Context Question Answering, Preprint
  • PRISM: Navigating Cost–Accuracy Trade-offs for NL2SQL, SIGMOD 2026
  • TRACER: Efficient Object Re-Identification in Networked Cameras through Adaptive Query Processing, Preprint
  • Aero: Adaptive Query Processing of ML Queries, SIGMOD 2025
  • Buffer Management for Out-of-GPU LLM Execution, DEEM 2025
  • SketchQL: Video Moment Querying with a Visual Query Interface, SIGMOD 2024
  • EVA: An End-to-End Exploratory Video Analytics System, DEEM 2023
  • SEIDEN: Revisiting Query Processing in Video Database Systems, VLDB 2023
  • GPU Database Systems Characterization and Optimization, VLDB 2023
  • RAPIDS: Reconciling Availability, Accuracy, and Performance in Managing Geo-Distributed Scientific Data, HPDC 2023
  • EVA: A Symbolic Approach to Accelerating Exploratory Video Analytics with Materialized Views, SIGMOD 2022
  • FiGO: Fine-Grained Query Optimization in Video Analytics, SIGMOD 2022
  • Zeus: Efficiently Localizing Actions in Videos using Reinforcement Learning, SIGMOD 2022
  • ODIN: Automated Drift Detection and Recovery in Video Analytics, VLDB 2020

Past Research Areas

Database Reliability and Debugging

The practical art of constructing database systems involves a morass of trade-offs among query execution speed, standards compliance, feature parity, modularity, portability, and other goals. It is no surprise that database systems, like all complex software systems, contain bugs that can adversely affect their performance and correctness. Both developers and users face challenges while dealing with these bugs. We are developing techniques and toolchains for automatically detecting, reporting, and diagnosing bugs in DBMSs.

Automated Query Reasoning and Optimization

Database-as-a-service offerings enable users to quickly create and deploy complex data processing pipelines. In practice, these pipelines often exhibit significant overlap of computation due to redundant execution of certain sub-queries. More broadly, modern query optimizers increasingly need formal and statistical techniques to reason about equivalence, exploit learned signals, and rewrite queries safely. We are designing automated tools for identifying query equivalence, proving semantic properties, and optimizing query execution.

Non-Volatile Memory Database Management Systems

This line of research focuses on a new class of memory category non-volatile memory (NVM) technologies that blur the gap between volatile memory and durable storage. NVM supports low-latency byte-addressable accesses similar to DRAM, but all writes are persistent like SSDs. There are several aspects of NVM that make existing DBMS architectures inappropriate for these devices. We investigate how to rearchitect the DBMS from the ground up to take advantage of NVM.

Self-Driving Database Management Systems

We are pursuing research on DBMSs that can adapt automatically to changing workloads and hardware environments. Tuning modern DBMSs for a particular workload is a laborious and error-prone task due to the long and growing list of knobs that these systems expose. If the DBMS could tune itself automatically, then it would remove many of the complications and costs involved with deployment. Our research focuses on designing algorithms that allow the DBMS to adapt its physical design, storage layout, and query execution strategies.