The Future of Real-time in Spark. Keynote at Spark Summit. New York City, Feb 2016.
Spark 2.0: What's Next. Keynote at Hadoop and Spark Conference. Tokyo, Feb 2016.
Deep Dive into Project Tungsten: Discusses how we are rearchitecting Spark backend to accomplish 10X performance improvements. Hadoop and Spark Conference. Tokyo, Feb 2016.
State of Spark, and where it is going. Keynote at Strata Hadoop World Asia, covering Spark use cases in Asia. Singapore, Dec 2015.
A look ahead at Spark’s development. Keynote at Spark Summit Europe. Amsterdam, Oct 2015.
Sketching Big Data with Spark: Discusses randomized and sketch algorithms for large-scale data analytics, including Bloom filter, frequent items, stratified sampling. Strata NYC, Oct 2015.
Spark DataFrames for Large-scale Data Science. DataFrame introduction at Bay Area Spark User Meetup. Feb 2015.
Interfaces, Interfaces, Interfaces. On interface design at Databricks Retreat. Aug 2014.
Big Data and Distributed Data Processing. Guest lecture at Stanford's CS145 (Introduction to Databases). Dec 2017.
Apache Spark and Scala. Keynote at Scala Symposium on the synergy between the two. Oct 2017.
Introduction to Spark. Guest lecture at Stanford's CS347 (Parallel and Distributed Data Management). May 2015.
Big Data Analytics Systems: What Goes Around Comes Around. Guest lecture at Berkeley's CS186 (Database Systems). Apr 2015.
Readings in Databases: I maintain a list of papers essential to the understanding of database systems online.