RESEARCH

The primary areas of my current research agenda are:

Exploratory Video Analytics
Debugging Database Management Systems
Automated SQL Solver
Non-Volatile Memory Database Management Systems
Self-Driving Database Management Systems

Exploratory Video Analytics

Video is a rapidly growing source of data at scale in many domains. For example, an analyst at an autonomous car company may be interested in examining edge cases for their cars. A neuroscientist may be interested in understanding behavioral patterns of animals. The volume of visual data collected in these domains precludes the possibility of manual analysis. We are studying and developing techniques for accelerating large-scale video analytics using deep learning.

Zeus: Efficiently Localizing Actions in Videos using Reinforcement Learning, SIGMOD 2022
FiGO: Fine-Grained Query Optimization in Video Analytics, SIGMOD 2022
EVA: A Symbolic Approach to Accelerating Exploratory Video Analytics with Materialized Views, SIGMOD 2022
ODIN: Automated Drift Detection and Recovery in Video Analytics, VLDB 2020

Debugging Database Management Systems

The practical art of constructing database systems involves a morass of trade-offs among query execution speed, standards compliance, feature parity, modularity, portability, and other goals. It is no surprise that database systems, like all complex software systems, contain bugs that can adversely affect their performance and accuracy. Both developers and users face challenges while dealing with these bugs. We are developing APOLLO, a toolchain for automatically detecting, reporting, and diagnosing bugs in DBMSs.

APOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Systems, VLDB 2020

Automated SQL Solver

Database-as-a-service offerings enable users to quickly create and deploy complex data processing pipelines. In practice, these pipelines often exhibit significant overlap of computation due to redundant execution of certain sub-queries. It is challenging for developers and database administrators to manually detect overlap across queries since they may be distributed across teams, organization roles, and geographic locations. We are designing EQUITAS, an automated cloud-scale tool for identifying equivalent queries to minimize computation overlap.

Automated Verification of Query Equivalence Using Satisfiability Modulo Theories, VLDB 2019

Non-Volatile Memory Database Management Systems

This line of research focuses on a new class of memory category non-volatile memory (NVM) technologies that blur the gap between volatile memory and durable storage. NVM supports low latency byte-addressable accesses similar to DRAM, but all writes are persistent like SSDs. There are several aspects of NVM that make existing DBMS architectures inappropriate for them. We investigate how to rearchitect the DBMS from the ground-up to take advantage of NVM.

BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory, VLDB 2018
Write-Behind Logging, VLDB 2017
How to Build a Non-Volatile Memory Database Management System, SIGMOD 2017 (Tutorial)
Let’s Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems, SIGMOD 2015

Self-Driving Database Management Systems

We are pursuing a new research direction on designing a self-driving DBMS. Tuning modern DBMSs for a particular workload is a laborious and error-prone task due to the long and growing list of knobs that these systems expose. If the DBMS could do automatically tune itself, then it would remove many of the complications and costs involved with its deployment. Our research focuses on designing new algorithms that allow the DBMS to tune itself. We apply techniques from machine learning to tune the physical design of the database to accelerate query processing.

Self-Driving Database Management Systems, CIDR 2017
Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads, SIGMOD 2016