Blog
NEDB Day 2026 Poster
Traditional database systems are highly effective at managing structured data, but they lack native support for reasoning under uncertainty and for modeling complex generative AI models. Although probabilistic extensions of Datalog have been proposed [1] to mitigate these limitations, they generally do not support inference, which remains an open research challenge, particularly at scale. A probabilistic Datalog program typically combines a formal specification of a stochastic tuple-generating process with a set of logical constraints that restrict the space of admissible outcomes. The key challenge is to efficiently approximate the resulting posterior distribution. In this paper, we bridge the gap between declarative data modeling and scalable probabilistic reasoning by introducing a fully automated stochastic variational inference pipeline within the StarfishDB [2] ecosystem. Our approach transforms the inference problem into an optimization task: starting from a generative Datalog theory, our system automatically derives a suitable family of variational distributions and synthesizes an optimization routine to minimize the Kullback-Leibler (KL) divergence. To maximize performance, our method employs knowledge compilation techniques to limit the number of variational parameters, without compromising the representational power of the variational surrogate. Empirical evaluations show that our approach provides a fast-converging alternative that rivals specialized, model-specific algorithms. This allows databases to natively perform sophisticated probabilistic reasoning, providing a seamless declarative path from standard data management to complex logical probabilistic programming. [1] Vince BáRány, Balder Ten Cate, Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena. 2017. Declarative Probabilistic Programming with Datalog. ACM Trans. Database Syst. 42, 4, Article 22 (December 2017), 35 pages. https://doi.org/10.1145/3132700 [2] Ouael Ben Amara, Sami Hadouaj, and Niccolò Meneghetti. 2024. StarfishDB: A Query Execution Engine for Relational Probabilistic Programming. Proc. ACM Manag. Data 2, 3, Article 185 (June 2024), 31 pages. https://doi.org/10.1145/3654988
Read More →Reproducing StarfishDB's Experiments
Our team recently received exciting news from the ACM SIGMOD 2024 reproducibility committee. Independent researchers have successfully reproduced the experiments from our paper "StarfishDB: A Query Execution Engine for Relational Probabilistic Programming," confirming our findings about this new approach to probabilistic databases.
Read More →