ProjectFein-granulare Analyse der Datenherkunft in ausdrucksstarken Anfragen
Basic data
Title:
Fein-granulare Analyse der Datenherkunft in ausdrucksstarken Anfragen
Duration:
01/09/2018 to 31/08/2021
Abstract / short description:
Data provenance uncovers how database queries transform, filter, merge, and aggregate input data to arrive at the final output. With today’s characteristic steep growth in data volume as well as query complexity, the inner workings of a query quickly become hard to assess and validate: where in the input did this piece of output originate? Why did the query emit this item but omit another? How did the query produce this result value and exactly which query constructs participated in the evaluation? Data provenance has answers to these and further questions and the responses explain query internals (and bugs), aid in data quality assessments, and help to build trust in query results—a critical service to data-dependent science and society.
With provenance, we shift a query’s focus from values and their transformation to the dependencies between output and input data. This research proposal is built on the central hypothesis that abstract interpretation provides an ideal framework to think and reason about as well as to implement this shift of focus. In abstract interpretation, a program analysis discipline first established in the 1970s, all but one (or few) selected aspect(s) of a program’s evaluation are ignored. This project will adapt these ideas to develop a view of queries and programs in which input/output dependencies—not: values—assume the primary role.
The benefits of data provenance grow with the complexity of the query logic it is able to explain. We set out to derive provenance for advanced query language constructs and idioms like deep nesting, sliding windows, user-defined and built-in functions, or recursion. It is a core goal to embrace practically relevant and complex languages, like modern variants of SQL, where prior work exhibited significant restrictions. We will capitalize on the flexibility of abstract interpretation and design abstract domains that explain provenance at various levels of data granularity, down to individual atomic values (table cells, say). Further adaptations of the abstract domain and query interpretation rules will allow the exploration of new and notoriously difficult types of data provenance (e.g., those of values absent in the output). Abstract interpretation is both, a powerful theoretical but also a practical tool. Building on the latter, we will study parallel provenance derivation for queries over large data volumes and the seamless integration of data provenance into query compilers of existing modern database systems.
With provenance, we shift a query’s focus from values and their transformation to the dependencies between output and input data. This research proposal is built on the central hypothesis that abstract interpretation provides an ideal framework to think and reason about as well as to implement this shift of focus. In abstract interpretation, a program analysis discipline first established in the 1970s, all but one (or few) selected aspect(s) of a program’s evaluation are ignored. This project will adapt these ideas to develop a view of queries and programs in which input/output dependencies—not: values—assume the primary role.
The benefits of data provenance grow with the complexity of the query logic it is able to explain. We set out to derive provenance for advanced query language constructs and idioms like deep nesting, sliding windows, user-defined and built-in functions, or recursion. It is a core goal to embrace practically relevant and complex languages, like modern variants of SQL, where prior work exhibited significant restrictions. We will capitalize on the flexibility of abstract interpretation and design abstract domains that explain provenance at various levels of data granularity, down to individual atomic values (table cells, say). Further adaptations of the abstract domain and query interpretation rules will allow the exploration of new and notoriously difficult types of data provenance (e.g., those of values absent in the output). Abstract interpretation is both, a powerful theoretical but also a practical tool. Building on the latter, we will study parallel provenance derivation for queries over large data volumes and the seamless integration of data provenance into query compilers of existing modern database systems.
Keywords:
Anfragesprachen
Analyse der Datenherkunft
Transformation und Übersetzung von Anfragen
ausdrucksstarke Anfragen
Debugging von Anfragen
abstrakte Interpretation
Programmanalyse
Involved staff
Managers
Faculty of Science
University of Tübingen
University of Tübingen
Wilhelm Schickard Institute of Computer Science (WSI)
Department of Informatics, Faculty of Science
Department of Informatics, Faculty of Science
Local organizational units
University of Tübingen
Funders
Bonn, Nordrhein-Westfalen, Germany