Research

Larg-Scale Machine Learning (Distributed Tensorflow)
  
 
     We aim to build fast, robust, and flexible machine learning system. Our ML system builds on top of Tensorflow and is extended with the support for model parallelism, elastic computation, and approximate training and model serving. Our current focus is on improving deep/reinforcement learning algorithms in distributed setting. Other research interests include interactive and incremental machine learning, and support for efficient analysis of multi-dimensional data.


Big Data Systems
   We are currently working on three big data systems: distributed SQL engine for large-scale data analytics, disk-based graph analysis system, and query language for large-scale graph processing.

Distributed SQL Engine for Large-Scale Data Analytics
    For thedistributed SQL engine project, we aim to make a fast SQL engine for heterogeneous storage devices. We extend the state-of-the-art SQL engine, Presto, which intelligent resource scheduling, elastic data processing, and support for data analytics. Our current focus is to take advantage of emerging technologies (Non-Volatile Memory) as well as existing memory devices (SSDs/SSD Arrays); we work on efficiently exploit those heterogeneous memory devices in distributed clusters.
    We are also interested in accelerating data science tasks with fast system response time, intelligent query suggestions, and high-level abstractions for data analysis.
  


SociaLite: 
High-level query language for large-scale graph processing

SociaLite Talk at Hadoop Summit

     SociaLite is a high-level query language for large-scale graph analysis. SociaLite is based on Datalog, and extended theoretically as well as practically to make large-scale graph analysis possible. For example, SociaLite is extended to support aggregate functions inside recursive queries as long as they are meet operators; the functions can prune out unnecessary computations for faster convergence. Also, its tail-nested tables can compactly store graphs for faster data access. With the two optimizations, SociaLite is more than 30 times faster than the state-of-the-art Datalog engines including LogicBlox, a commercial Datalog system. When we compare SociaLite with other distributed frameworks for graph algorithms, SociaLite is more than two orders of magnitude faster than Hadoop and Haloop, and an order of magnitude faster than Hama and Giraph. 


Secure Internet of Things (ThingEngine)
 
Help the world connect the dots, with the open source platform for Internet of Things

    ThingEngine is our new research project (starting 2015) in the area of Internet of Things. ThingEngine is an open source implementation of a distributed IoT control hub and personal data store. The main goal of this project is to design an IoT system that gives users ownership over their own data; also our system aims to provide user-friendly interface that helps make use of one's own data. To this extent, we are designed as a distributed system and are able to run on a variety of devices, including shared cloud services, private cloud servers, private home servers such as the Raspberry Pi and even mobile phones, allowing the user the flexibility of choosing the storage solution he prefers.
 
    Access control is also at the core of ThingEngine, making it possible by design to share the data with multiple parties such as doctors and health institution, in a manner as seamless as possible, but also retaining the revocation privilege. All data is grouped by access control, and all parties can subscribe to changes only to the data they have access to at any given time.

 
    Additionally, a goal of ThingEngine is to be programmable by educated people without a computer science background (such as doctors and care givers), such that the programming logic can be distributed using a crowdsourcing platform (called the ThingPedia) to all endpoints running the engine. To this extent, we're designing a declarative domain specific language, with an intuitive syntax but also a powerful set of semantics that would cover most, if not all, the use cases of IoT control hubs.

 
    We also don't want to sacrifice the ability to do rich static analysis and automated code generation, with the ultimate goal of automatically suggesting the most useful programming based on observed behavior pattern, as well as being able to infer the desired pattern from the shared collection based on natural language analysis, using the Sabrina chat interface.


UniFi
 
Dimension Inference and Checking for Object-Oriented Programs.
 
    UniFi is an automatic dimension inferencing system for Java programs. It uses type inference techniques to infer relationships between the dimensions of variables in a program. It then tries to find dimensionality errors automatically by comparing the inferred dimensions across different versions of a program (or two different programs that have something in common).  (this work is in collaboration with Sudheendra Hangal)