The primary areas of my current research agenda are:
Exploratory Video Analytics
Video is a rapidly growing source of data at scale in many domains. For example, an analyst at an autonomous car company may be interested in examining edge cases for their cars. A neuroscientist may be interested in understanding behavioral patterns of animals. The volume of visual data collected in these domains precludes the possibility of manual analysis. We are studying and developing techniques for accelerating large-scale video analytics using deep learning.
- Zeus: Efficiently Localizing Actions in Videos using Reinforcement Learning, SIGMOD 2022
- FiGO: Fine-Grained Query Optimization in Video Analytics, SIGMOD 2022
- EVA: A Symbolic Approach to Accelerating Exploratory Video Analytics with Materialized Views, SIGMOD 2022
- ODIN: Automated Drift Detection and Recovery in Video Analytics, VLDB 2020
Debugging Database Management Systems
The practical art of constructing database systems involves a morass of trade-offs among query execution speed, standards compliance, feature parity, modularity, portability, and other goals. It is no surprise that database systems, like all complex software systems, contain bugs that can adversely affect their performance and accuracy. Both developers and users face challenges while dealing with these bugs. We are developing APOLLO, a toolchain for automatically detecting, reporting, and diagnosing bugs in DBMSs.
Automated SQL Solver
Database-as-a-service offerings enable users to quickly create and deploy complex data processing pipelines. In practice, these pipelines often exhibit significant overlap of computation due to redundant execution of certain sub-queries. It is challenging for developers and database administrators to manually detect overlap across queries since they may be distributed across teams, organization roles, and geographic locations. We are designing EQUITAS, an automated cloud-scale tool for identifying equivalent queries to minimize computation overlap.
Non-Volatile Memory Database Management Systems
This line of research focuses on a
new class of
memory
category non-volatile memory (NVM) technologies that blur the gap between
volatile memory and durable storage. NVM supports low latency byte-addressable
accesses similar to DRAM, but all writes are persistent like SSDs. There are
several aspects of NVM that make existing DBMS architectures inappropriate for
them. We investigate how to rearchitect the DBMS from the ground-up to take
advantage of NVM.
- BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory, VLDB 2018
- Write-Behind Logging, VLDB 2017
- How to Build a Non-Volatile Memory Database Management System, SIGMOD 2017 (Tutorial)
- Let’s Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems, SIGMOD 2015
Self-Driving Database Management Systems
We are pursuing a new research direction on designing a self-driving DBMS.
Tuning modern DBMSs for a particular workload is a laborious and error-prone
task due to the long and growing list of knobs that these systems expose.
If the DBMS could do automatically tune itself, then it would remove many of
the complications and costs involved with its deployment. Our research focuses
on designing new algorithms that allow the DBMS to tune itself.
We apply techniques from machine learning to tune the physical design of the
database to accelerate query processing.