The primary areas of my current research agenda are:

Exploratory Video Analytics

Video is a rapidly growing source of data at scale in many domains. For example, an analyst at an autonomous car company may be interested in examining edge cases for their cars. A neuroscientist may be interested in understanding behavioral patterns of animals. The volume of visual data collected in these domains precludes the possibility of manual analysis. We are studying and developing techniques for accelerating large-scale video analytics using deep learning.

  • Zeus: Efficiently Localizing Actions in Videos using Reinforcement Learning, SIGMOD 2022
  • FiGO: Fine-Grained Query Optimization in Video Analytics, SIGMOD 2022
  • EVA: A Symbolic Approach to Accelerating Exploratory Video Analytics with Materialized Views, SIGMOD 2022
  • ODIN: Automated Drift Detection and Recovery in Video Analytics, VLDB 2020

Debugging Database Management Systems

The practical art of constructing database systems involves a morass of trade-offs among query execution speed, standards compliance, feature parity, modularity, portability, and other goals. It is no surprise that database systems, like all complex software systems, contain bugs that can adversely affect their performance and accuracy. Both developers and users face challenges while dealing with these bugs. We are developing APOLLO, a toolchain for automatically detecting, reporting, and diagnosing bugs in DBMSs.

Automated SQL Solver

Database-as-a-service offerings enable users to quickly create and deploy complex data processing pipelines. In practice, these pipelines often exhibit significant overlap of computation due to redundant execution of certain sub-queries. It is challenging for developers and database administrators to manually detect overlap across queries since they may be distributed across teams, organization roles, and geographic locations. We are designing EQUITAS, an automated cloud-scale tool for identifying equivalent queries to minimize computation overlap.

Non-Volatile Memory Database Management Systems

This line of research focuses on a new class of memory category non-volatile memory (NVM) technologies that blur the gap between volatile memory and durable storage. NVM supports low latency byte-addressable accesses similar to DRAM, but all writes are persistent like SSDs. There are several aspects of NVM that make existing DBMS architectures inappropriate for them. We investigate how to rearchitect the DBMS from the ground-up to take advantage of NVM.

Self-Driving Database Management Systems

We are pursuing a new research direction on designing a self-driving DBMS. Tuning modern DBMSs for a particular workload is a laborious and error-prone task due to the long and growing list of knobs that these systems expose. If the DBMS could do automatically tune itself, then it would remove many of the complications and costs involved with its deployment. Our research focuses on designing new algorithms that allow the DBMS to tune itself. We apply techniques from machine learning to tune the physical design of the database to accelerate query processing.