Blog

NEDB Day 2026 Poster

Traditional database systems are highly effective at managing structured data, but they lack native support for reasoning under uncertainty and for modeling complex generative AI models. Although probabilistic extensions of Datalog have been proposed [1] to mitigate these limitations, they generally do not support inference, which remains an open research challenge, particularly at scale. A probabilistic Datalog program typically combines a formal specification of a stochastic tuple-generating process with a set of logical constraints that restrict the space of admissible outcomes. The key challenge is to efficiently approximate the resulting posterior distribution. In this paper, we bridge the gap between declarative data modeling and scalable probabilistic reasoning by introducing a fully automated stochastic variational inference pipeline within the StarfishDB [2] ecosystem. Our approach transforms the inference problem into an optimization task: starting from a generative Datalog theory, our system automatically derives a suitable family of variational distributions and synthesizes an optimization routine to minimize the Kullback-Leibler (KL) divergence. To maximize performance, our method employs knowledge compilation techniques to limit the number of variational parameters, without compromising the representational power of the variational surrogate. Empirical evaluations show that our approach provides a fast-converging alternative that rivals specialized, model-specific algorithms. This allows databases to natively perform sophisticated probabilistic reasoning, providing a seamless declarative path from standard data management to complex logical probabilistic programming. [1] Vince BáRány, Balder Ten Cate, Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena. 2017. Declarative Probabilistic Programming with Datalog. ACM Trans. Database Syst. 42, 4, Article 22 (December 2017), 35 pages. https://doi.org/10.1145/3132700 [2] Ouael Ben Amara, Sami Hadouaj, and Niccolò Meneghetti. 2024. StarfishDB: A Query Execution Engine for Relational Probabilistic Programming. Proc. ACM Manag. Data 2, 3, Article 185 (June 2024), 31 pages. https://doi.org/10.1145/3654988

Reproducing StarfishDB's Experiments

Our team recently received exciting news from the ACM SIGMOD 2024 reproducibility committee. Independent researchers have successfully reproduced the experiments from our paper "StarfishDB: A Query Execution Engine for Relational Probabilistic Programming," confirming our findings about this new approach to probabilistic databases.

Poster NEDB 2024 - Ouael Ben Amara

Presented poster by Ouael Ben Amara at NEDB 2024

Poster NEDB 2023 - Sami Hadouaj

Poster presented by Sami Hadouaj 2023 at NEDB 2023

Poster NEDB 2023 - Ouael Ben Amara

Presented Poster at NEDB 2023 by Ouael Ben Amara

Poster SIGMOD 2024 - Sami Hadouaj

Poster presented by Sami Hadouaj SIGMOD 2024