Symposium on New Frontiers in Knowledge Compilation
ORGANIZATION
This symposium is supported by VCLA and the Wolfgang Pauli Institute.
Symposium chairs:
Pierre Marquis (CRIL-CNRS/Université d’Artois, France)
Stefan Szeider (TU Wien, Austria)
AIMS AND SCOPE
Many reasoning problems in artificial intelligence and automated deduction are computationally intractable in general. Various approaches have been devised to deal with this complexity by exploiting additional conditions that are satisfied in the domain of application at hand. One example involves the case where many instances share a common part, which can be preprocessed once for many instances. Knowledge compilation is devoted to understanding the potential and the limits of preprocessing in computational models and concrete applications.
In typical reasoning scenarios (for instance, automated configuration) many instances share a common background knowledge that is not very often subject to change. This “constant” piece of information can be compiled into a format that allows for more efficient reasoning, once the “varying” part of the instances becomes available. The time needed to compile the background knowledge together with the cumulative time needed to solve a sequence of instances with the compiled background knowledge can be lower (sometimes by orders of magnitude) than the cumulative time needed to solve the sequence of instances with the non-compiled background knowledge. Consequently, sometimes a relatively expensive compilation process may be worthwhile if its results are amortized by repeated use of the compiled knowledge.
Pioneered more than two decades ago, knowledge compilation is now a very active field of research. The aim of this symposium is to bring together researchers who work on knowledge compilation from various angles, including knowledge representation, constraints, theory of algorithms, complexity, machine learning, and databases, as well as researchers from related areas. Lectures and discussions will put all these different approaches into context and will stimulate a fruitful exchange of ideas between researchers from different fields.
FORMAT
The symposium will feature invited talks and provide opportunities for all participants to engage in discussions on open problems and future directions.
Participation is by invitation only. Registration is free of charge and includes attendance to all symposium events and lectures, coffee breaks, and lunches.
The invited talks will be open to the public. Local people who are interested in some of the talks but do not want to participate in the entire symposium (and the social program) do not need to register.
INVITED SPEAKERS
- Simone Bova (TU Wien, Austria)
- Guy Van den Broeck (University of California Los Angeles, USA)
- Hubie Chen (Universidad del País Vasco and Ikerbasque, Spain)
- Adnan Darwiche (University of California Los Angeles, USA)
- Hélène Fargier (IRIT-CNRS, Université Paul Sabatier, France)
- Frédéric Koriche (CRIL-CNRS, Université d’Artois, France)
- Stefan Kratsch (Universität Bonn, Gemany)
- João Marques-Silva (IST/INESC-ID, Portugal and University College Dublin, Ireland)
- Igor Razgon (Birkbeck University of London, UK)
- Dan Suciu (University of Washington, USA)
DATE AND VENUE
June 4-6, 2015
Vienna University of Technology
Zemanek seminar room (location on ground floor)
Favoritenstraße 9-11 (Google Maps)
1040 Vienna, Austria
SCHEDULE
June 4th:
10:20-11:10 Adnan Darwiche: Knowledge Compilation and Machine Learning: A New Frontier.
11:10-12:00 Frédéric Koriche: Affine Decision Trees.
12:00-13:00 Lunch break (sandwiches)
13:00-13:50 Hubie Chen: Parameter Compilation.
13:50-14:15 Ronald de Haan: Parameterized Compilability.
14:15-14:40 Umut Oztok: Exhaustive DPLL for Model Counting and Knowledge Compilation.
14:40-15:05 Friedrich Slivovsky: On Compiling CNFs into Structured Deterministic DNNFs.
15:05-15:35 Coffee break
15:35-16:25 Hélène Fargier: A KC Map of Valued Decision Diagrams – application to product configuration.
16:25-16:50 Alexandre Niveau: Towards a knowledge compilation map for heterogeneous representation languages
16:50-17:10 Adnan Darwiche: Beyond NP: Keeping up with solvers that reach beyond NP!
June 5th:
9:30-10:20 João Marques-Silva: Prime Compilation of Non-Clausal Formulae.
10:20-10:45 Laurent Simon: SAT and Knowledge Compilation: a Just-in-Time Approach.
10:45-11:15 Coffee break
11:15-11:40 Ondrej Cepek: Complexity aspects of CNF to CNF compilation.
11:40-12:05 Oliver Kullmann: A measured approach towards “good representations”.
12:05-13:30 Lunch break (in the garden if the weather is good, otherwise in Seminar room 388 on first floor)
13:30-14:20 Igor Razgon: On the relationship between Non-deterministic read-once branching programs and DNNFs.
14:20-15:10 Simone Bova: A Strongly Exponential Separation of DNNFs from CNFs.
18:30 Bus departure for restaurant from Favoritenstraße 12 (in front of Hotel Johann Strauß, directly opposite our computer science building)
19:00-22:00 Dinner at Heuriger Werner Welser (Probusgasse 12, 1190 Vienna)
22:00 Bus departs for Favoritenstraße 12, 1040 Vienna
June 6th:
9:15-10:05 Stefan Kratsch: Kernelization: Efficient Preprocessing for NP-hard Problems.
10:05-10:30 Dan Olteanu: Factorized Databases.
10:30-10:45 Short break
10:45-11:35 Guy Van den Broeck: First-Order Knowledge Compilation for Probabilistic Reasoning.
11:35-12:25 Dan Suciu: Query Compilation: the View from the Database Side.
12:30 end of symposium, joint lunch (Restaurant Ischia, Mozartplatz 1, 1040 Vienna) until ~13:30
TALKS
- Simone Bova: A Strongly Exponential Separation of DNNFs from CNFs. [abstract]
Abstract.
Decomposable Negation Normal Forms (DNNFs) are Boolean circuits in negation normal form where the subcircuits leading into each AND gate are defined on disjoint sets of variables. We prove a strongly exponential lower bound on the size of DNNFs for a class of CNF formulas built from expander graphs. As a corollary, we obtain a strongly exponential separation between DNNFs and CNF formulas in prime implicates form. This settles an open problem in the area of knowledge compilation (Darwiche and Marquis, 2002).
This is joint work with Florent Capelli (Universite Paris Diderot), Stefan Mengel (Ecole Polytechnique), and Friedrich Slivovsky (Technische Universitat Wien).
[slides]
- Guy Van den Broeck: First-Order Knowledge Compilation for Probabilistic Reasoning. [abstract]
Abstract.
The popularity of knowledge compilation for probabilistic reasoning is due to the realization that two properties, determinism and decomposability of logical sentences permit efficient (weighted) model counting. This insight has led to state-of-the-art probabilistic reasoning algorithms for graphical models, statistical relational models, and probabilistic databases, all based on knowledge compilation, to either d-DNNF, OBDD, or SDD. The statistical relational and probabilistic database formalisms are probabilistic extensions of first-order logic. To count the models of a first-order logic sentence, however, these insightful properties are missing. We even lack the formal language to describe and reason about such representations at the first-order level, in the context of knowledge compilation.
To this end, we propose group logic, which extends function-free first-order logic to give groups (i.e., sets of objects) the same status as objects. Group logic facilitates the expression and identification of sentences whose models can be counted efficiently. Indeed, it allows us to lift decomposability and determinism properties to the first-order case, and introduce a new requirement, called automorphism, that is specific to first-order sentences.
[slides]
- Ondrej Cepek: Complexity aspects of CNF to CNF compilation. [abstract]
Abstract.
Knowledge compilation usually deals with transforming some input representation of a given knowledge to some other type of representation on the output. In this talk we will concentrate on compilation where both input and output representation are of the same type, namely in the CNF format. In this case the purpose of the compilation process is to add clauses to the input CNF in order to improve its inference properties. We will look at this process in more detail and study its complexity.
[slides]
- Hubie Chen: Parameter Compilation. [abstract]
Abstract.
In resolving instances of a computational problem, if multiple instances of interest share a feature in common, it may be fruitful to compile this feature into a format that allows for more efficient resolution, even if the compilation is relatively expensive. In this talk, we introduce a complexity-theoretic framework for classifying problems according to their compilability, which includes complexity classes and a notion of reduction.The basic object in our framework is that of a parameterized problem, which here is a language along with a parameterization—a map which provides, for each instance, a so-called parameter on which compilation may be performed. Our framework is positioned within the paradigm of parameterized complexity, and our notions are relatable to established concepts in the theory of parameterized complexity. Indeed, we view our framework as playing a unifying role, integrating together parameterized complexity and compilability theory. Prior to presenting the framework, we will provide some motivation by discussing our work on model checking existential positive queries (see http://arxiv.org/abs/1206.3902). The talk will be mainly based on the article available at http://arxiv.org/abs/1503.00260.
[slides]
- Adnan Darwiche: Knowledge Compilation and Machine Learning: A New Frontier. [abstract]
Abstract.
Knowledge compilation has seen much progress in the last decade, especially as work in this area has been normalized into a systematic study of tractable languages, their relative succinctness, and their efficient support for various queries. What has been particularly exciting is the impact that knowledge compilation has had on several areas, such as probabilistic reasoning and probabilistic databases. In this talk, I will discuss a new area, machine learning, which is bound to be significantly impacted by knowledge compilation. In particular, I will discuss recent work in which knowledge compilation has been used to learn probabilistic models under massive logical constraints, and over combinatorial objects, such as rankings and game traces. I will further identify and discuss three specific roles for knowledge compilation in machine learning, which arise in defining (a) more structured probability spaces, (b) more expressive queries, and (c) new types of datasets that significantly generalize the standard datasets used in the machine learning literature.
Joint work with Arthur Choi and Guy Van den Broeck.
[slides]
- Adnan Darwiche: Beyond NP: Keeping up with solvers that reach beyond NP! [abstract]
Abstract.
We will discuss in this presentation a new community website,
BeyondNP.org, which is planned to launch later this summer. Beyond NP aims to disseminate and promote research on solvers that reach beyond NP, including model counters, knowledge compilers, QBF solvers and function-problem solvers (e.g. MaxSAT, MUS and MCS). Beyond NP will serve as a news and information aggregator for such solvers, including a catalog of open-source solvers, repositories of corresponding benchmarks, and news on related academic activities. The presentation aims to raise awareness about this initiative, to discuss its underlying vision and objectives, and to seek input and participation from the broader community.
- Hélène Fargier: A KC Map of Valued Decision Diagrams – application to product configuration. [abstract]
Abstract.
Valued decision diagrams (VDDs) are data structures that represent functions mapping variable-value assignments to non-negative real numbers. Existing languages in VDD family, including ADD, AADD , and those of the SLDD family, seem to be valuable target languages for compiling utility functions, probability distributions and, in the domain of application we are interested in, cost functions over a catalog of configurable products.This talks first presents a compilation map of such structures and shows that many tasks that are hard on valued CSPs are actually tractable on VDDs.
Indeed, languages from the VDD family (especially, ADD, SLDD, AADD) benefit from polynomial-time algorithms for some tasks of interest (e.g., the optimization one) for which no polynomial-time algorithm exists when the input is the VCSP considered at start.However, the efficiency of these algorithms is directly related to the size of the compiled formulae. The target languages and the heuristics under consideration have been tested on two families of benchmarks, additive VCSPs representing car configuration problems with cost functions and multiplicative VCSPs representing Bayesian nets. It turns out that even if the AADD language is strictly more succinct (from the theoretical side) than SLDD$_{+}$ (resp. SLDD$_{\times}$), the language SLDD$_{+}$ (resp. SLDD$_{\times}$) proves to be good enough in practice when purely additive (resp. purely multiplicative) problems are to be compiled.
This talk is based on a joint work with Pierre Marquis, Alexandre Niveau and Nicolas Schmidt, partially supported by the project BR4CP ANR-11-BS02-008 of the French National Agency for Research:
Hélène Fargier, Pierre Marquis, Nicolas Schmidt: Semiring Labelled Decision Diagrams, Revisited: Canonicity and Spatial Efficiency Issues. IJCAI 2013.
Hélène Fargier, Pierre Marquis, Alexandre Niveau, Nicolas Schmidt: A Knowledge Compilation Map for Ordered Real-Valued Decision Diagrams. AAAI 2014.
[slides]
- Ronald de Haan: Parameterized Compilability. [abstract]
Abstract.
In the framework of Knowledge Compilation (KC), knowledge bases are preprocessed (or compiled) once in order to decrease the computational efforts needed for performing queries on the knowledge base. However, in many cases such compilations lead to a exponential blow-up in the size of the knowledge base. Such an incompilability result occurs for example in the case of clause entailment (CE), where the knowledge base is a propositional formula, and the queries consist of deciding whether a given clause is entailed by the formula. With the aim of relativizing such negative results, following work by Chen (IJCAI 2005), we extend the framework of KC with concepts from parameterized complexity where structure in the input is captured by a problem parameter. In the resulting framework, we focus on fpt-size compilations whose size is polynomial in the input size, but can depend exponentially (or worse) in the problem parameter. We argue that this approach combines the power of KC and parameterized complexity. Concretely, for the problem of CE, we identify several parameters that allow the problem to be compiled in fpt-size. In addition, we provide evidence that for several other parameters, such compilations are not possible.
Joint work with: Simone Bova, Neha Lodha and Stefan Szeider.
[slides]
- Frédéric Koriche: Affine Decision Trees. [abstract]
Abstract.
Decision trees have received a great deal of interest in various areas of computer science. In this talk, we examine a family of tree-like languages which include decision trees as a special case. Notably, we investigate the class of “affine” decision trees (ADT), for which decision nodes are labeled by affine (xor) clauses, and its extension (EADT) to decomposable and-nodes. The key interest of this family is that (possibly conditioned) model counting can be solved in polynomial-time, by exploiting Gauss elimination. After presenting a knowledge compilation map for this family, we describe a top-down compiler “cnf2eadt”, together with comparative experimental results on various benchmarks for #SAT problems. We conclude by mentioning two current research perspectives: probabilistic inference with weighted EADTs, and structure learning of maximum likelihood EADTs.
- Stefan Kratsch: Kernelization: Efficient Preprocessing for NP-hard Problems. [abstract]
Abstract.
Efficient preprocessing is a widely applied opening move when faced with a combinatorially hard problem. The framework of parameterized complexity and its notion of kernelization offer a rigorous approach to understanding the capabilities of efficient preprocessing. In particular, it is possible to prove both upper and lower bounds on the output sizes that be achieved by polynomial-time algorithms. Crucially, using the perspective of parameterized complexity, these bounds are given in relation to problem-specific parameters, whereas unless P = NP there can be no efficient algorithm that shrinks every instance of an NP-hard problem.
The talk will give an introduction to kernelization and cover several different problems like \textsc{Point Line Cover}, \textsc{$d$-Hitting Set}, and \textsc{Planar Steiner Tree}. We will discuss some recent examples of kernelizations that may be of particular interest to this meeting. Finally, we will briefly address the basic intuition behind lower bounds for kernelization.
- Oliver Kullmann: A measured approach towards “good representations”. [abstract]
Abstract.
I want to give an overview on the usage of “hardness measures” in the theory of representations of boolean functions via CNF’s. A special focus will be on separation of classes (given by the levels of the hardness measures), showing that increasing various hardness measures enables much shorter representations.The measures we consider are closely related to SAT solving, that is, making the implicit knowledge explicit happens with SAT solvers in mind. This makes for good connections to proof complexity, but now in a stronger setting — satisfiable clause-sets are the target, and we wish to represent the underlying boolean function as good as possible. “As good as possible” means that the hidden(!) unsatisfiable subinstances are as easy as possible. Since we are aiming at making the life easier for SAT solvers, the concrete nature of the hardness measures becomes of importance, different from general Knowledge Compilation, where one uses whatever polynomial time offers.
[slides]
- João Marques-Silva: Prime Compilation of Non-Clausal Formulae. [abstract]
Abstract.
Formula compilation by generation of prime implicates or implicants finds a wide range of applications in AI. Recent work on formula compilation by prime implicate/implicant generation often assumes a Conjunctive/Disjunctive Normal Form (CNF/DNF) representation. However, in many settings propositional formulae are naturally expressed in non-clausal form. Despite a large body of work on compilation of non-clausal formulae, in practice existing approaches can only be applied to fairly small formulae, containing at most a few hundred variables. This paper describes two novel approaches for the compilation of non-clausal formulae either with prime implicants or implicates, that is based on propositional Satisfiability (SAT) solving. These novel algorithms also find application when computing all prime implicates of a CNF formula. The proposed approach is shown to allow the compilation of non-clausal formulae of size significantly larger than existing approaches.
[slides]
- Alexandre Niveau: Towards a knowledge compilation map for heterogeneous representation languages. [abstract]
Abstract.
The knowledge compilation map introduced by Darwiche and Marquis takes advantage of a number of concepts (mainly queries, transformations, expressiveness, and succinctness) to compare the relative adequacy of representation languages to some AI problems. However, the framework is limited to the comparison of languages that are interpreted in a homogeneous way (formulas are interpreted as Boolean functions). This prevents one from comparing, on a formal basis, languages that are close in essence, such as OBDD, MDD, and ADD.To fill the gap, we present a generalized framework into which comparing formally heterogeneous representation languages becomes feasible. In particular, we explain how the key notions of queries and transformations, expressiveness, and succinctness can be lifted to the generalized setting.
The talk is based on the IJCAI’13 paper by Fargier, Marquis, and Niveau.
[slides]
- Dan Olteanu: Factorized Databases. [abstract]
Abstract.
I will overview recent work on compilation of join queries (First Order formulas with conjunction and existential quantification) into lossless factorized representations. The primary motivation for this compilation is to avoid redundancy in the representation of results (satisfying assignments) of queries in relational databases. The relationship between a relation encoded as a set of tuples and an equivalent factorized representation is on a par with the relationship between propositional formulas in disjunctive normal form and their equivalent nested formulas obtained by algebraic factorization.
For any fixed join query, we give asymptotically tight bounds on the size of their factorized results by exploiting the structure of the query, and we quantify the size gap between factorized and standard relational representation of query results. Factorized databases allow for constant-delay enumeration of represented tuples and provide efficient support for subsequent queries and analytics, such as linear regression.
Joint work with Jakub Zavodny.
- Umut Oztok: Exhaustive DPLL for Model Counting and Knowledge Compilation. [abstract]
Abstract.
DPLL-based methods have played a crucial role in the success of modern SAT solvers, and it is also known that running DPLL-based methods to exhaustion can yield model counters and knowledge compilers. However, a clear semantics of exhaustive DPLL and a corresponding proof of correctness have been lacking, especially in the presence of techniques such as clause learning and component caching. This seems to have hindered progress on model counting and knowledge compilation, leading to a limited number of corresponding systems, compared to the variety of DPLL-based SAT solvers. In this talk, we will present an exhaustive DPLL algorithm with a formal semantics and a corresponding proof of correctness, showing how it can be used for both model counting and knowledge compilation. The presented algorithm is based on a formal framework that abstracts primitives used in SAT solvers in a manner that makes them suitable for use in an exhaustive setting. We will also introduce an upcoming open-source package that implements this framework, which aims to provide the community with a new basis for furthering the development of model counters and knowledge compilers based on exhaustive DPLL.
Joint work with Adnan Darwiche.
[slides]
- Igor Razgon: On the relationship between Non-deterministic read-once branching programs and DNNFs. [abstract]
Abstract.
This talk consists of two parts. In the first part I will present a result published in (Razgon,IPEC2014) stating that for each $k$ there is an infinite class of monotone 2-CNFs of primal graph treewidth at most $k$ for which the equivalent Non-Deterministic Read-Once Branching programs (NROBPs) require space $\Omega(n^{k/c})$ for some constant $c$. Then I will show that, essentially, replacing $k$ with $\log n$ we obtain a class of monotone 2-CNFs with pseudopolynomial space complexity of the equivalent NROBPs. Using a well known result of Darwiche about space fixed parameter tractability of DNNFs for CNFs of bounded primal graph treewidth, it is easy to show that the space complexity of DNNFs on this class of CNFs is polynomial. Thus we obtain a pseudopolynomial separation between NROBPs and DNNFs.
In the second part of the talk I will show that the above separation is essentially tight. In particular I will present a transformation of a DNNF of size $m$ with $n$ variables into an equivalent NROBP of size $O(m^{\log n+2})$. It follows for this transformation that an exponential lower bound (on the space complexity of) NROBP for any class of functions implies an exponential lower bound for DNNFs for this class of functions. Since NROBPs are much better studied than DNNFs from the lower bounds perspective with many exponential lower bounds known, I believe this result is a significant progress in our understanding of the complexity of DNNFs. The proposed transformation is an adaptation of the approach for transformation of a decision DNNF into an FBDD presented in (Beame et al, UAI2013).
[slides]
- Laurent Simon: SAT and Knowledge Compilation: a Just-in-Time Approach. [abstract]
Abstract.
Knowledge Compilation (KC) principles rely on an off-line phase to rewrite the Knowledge base in an appropriate form, ready to be efficiently queried. In our talk, we propose an alternative approach, built on top of an efficient SAT solver. The recent progresses in the practical solving of SAT problems allows us to directly use them to answer the set of classical queries used in most KC works. We show that this very simple approach gives very good practical results. In addition, the learning mechanism is fully exploited from queries to queries, allowing to amortize previous calls by speeding up the process of new queries.
- Friedrich Slivovsky: On Compiling CNFs into Structured Deterministic DNNFs. [abstract]
Abstract.
We show that the traces of recently introduced dynamic programming algorithms for #SAT can be used to construct structured deterministic DNNF (decomposable negation normal form) representations of propositional formulas in CNF (conjunctive normal form). This allows us prove new upper bounds on the complexity of compiling CNF formulas into structured deterministic DNNFs in terms of parameters such as the treewidth and the clique-width of the incidence graph.
Joint work with Simone Bova, Florent Capelli, and Stefan Mengel.
[slides]
- Dan Suciu: Query Compilation: the View from the Database Side. [abstract]
Abstract.
We study knowledge compilation for Boolean formulas that are given as groundings of First Order formulas. This problem is motivated by probabilistic databases, where each record in the database is an independent probabilistic event, and the query is given by a SQL expression or, equivalently, a First Order formula. The query’s probability can be computed in linear time in the size of the compilation representation, hence the interest in studying the size of such a representation. We consider the “data complexity” setting, where the query is fixed, and the input to the problem consists only of the database instance. We consider several compilation targets, of increasing expressive power: OBDDs, FBDDs, and decision-DNNFs (a subclass of d-DNNFs). For the case of OBDDs we establish a dichotomy theorem for queries in restricted languages FO(\exists, \wedge, \vee) and FO(\forall, \wedge, \vee): for each such query the OBDD is either linear in the size of the database, or grows exponentially, and the complexity can be determined through a simple analysis of the query expression. For the other targets we describe a class of queries for which (a) the decision-DNNF is exponentially large in the size of the database, and (b) the probability of the query can be computed in polynomial time in the size of the database. This suggests that the compilation target decision-DNNF is too weak to capture all tractable cases of probabilistic inference. Our lower bound for decision-DNNF’s relies on a translation into FBDD’s, which is of independent interest.
Joint work with Paul Beame, Abhay Jha, Jerry Li, and Sudeepa Roy.
[slides]
PARTICIPANTS
Gilles Audemard
Simone Bova
Guy van den Broeck
Florent Capelli
Ondrej Cepek
Hubie Chen
Jessica Davies
Adnan Darwiche
Eduard Eiben
Hélène Fargier
Johannes Fichte
Ronald de Haan
Frédéric Koriche
Stefan Kratsch
Oliver Kullmann
Jean-Marie Lagniez
Neha Lodha
Joao Marques-Silva
Pierre Marquis
Stefan Mengel
Alexandre Niveau
Jakob Nordström
Dan Olteanu
Sebastian Ordyniak
Umut Oztok
Igor Razgon
Marco Schaerf
Laurent Simon
Friedrich Slivovsky
Zeynep Saribatur
Dan Suciu
Stefan Szeider
PHOTOS
TRAVEL INFORMATION
Here are some suggestions for hotels close to the symposium venue:
Clima Cityhotel (www.climacity-hotel.com)
Hotel Carlton Opera (www.carlton.at)
Hotel Erzherzog Rainer (www.schick-hotels.com)
Hotel Europa (www.austria-trend.at)
Hotel Astoria (www.austria-trend.at)
Wombats City Hostel (http://www.wombats-hostels.com)
For further information, please have a look at our visitors’ guide.