May 3-6, 2015 Asilomar, California

Yedalog: Exploring Knowledge at Scale

Brian Chin, Daniel von Dincklage, Vuc Ercegovac, Peter Hawkins, Mark Miller, Franz Och, Christopher Olston, Fernando Pereira

With huge progress on data processing frameworks, human programmers are frequently the bottleneck when analyzing large repositories of data. We introduce Yedalog, a declarative programming language that allows programmers to mix data-parallel pipelines and computation seamlessly in a single language. By contrast, most existing tools for data-parallel computation embed a sublanguage of data-parallel pipelines in a general-purpose language, or vice versa. Yedalog extends Datalog, incorporating not only computational features from logic programming, but also features for working with data structured as nested records. Yedalog programs can run both on a single machine, and distributed across a cluster in batch and interactive modes, allowing programmers to mix different modes of execution easily.