RECENT RESULTS ON CARDINALITY ESTIMATION AND INFORMATION THEORETIC INEQUALITIES
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/37/3/16129Keywords:
Shannon-type inequalities, Cardinality Estimation, Conjunctive Queries, Information TheoryAbstract
I would like to dedicate this little exposition to Prof. Phan Dinh Dieu, one of the giants and pioneers of Mathematics in Computer Science in Vietnam.
In the past 15 years or so, new and exciting connections between fundamental problems in database theory and information theory have emerged. There are several angles one can take to describe this connection. This paper takes one such angle, influenced by the author's own bias and research results. In particular, we will describe how the cardinality estimation problem -- a corner-stone problem for query optimizers -- is deeply connected to information theoretic inequalities. Furthermore, we explain how inequalities can also be used to derive a couple of classic geometric inequalities such as the Loomis-Whitney inequality.
A purpose of the article is to introduce the reader to these new connections, where theory and practice meet in a wonderful way. Another objective is to point the reader to a research area with many new open questions.
Metrics
References
S. Abiteboul, R. Hull, and V. Vianu, Foundations of Databases. Addison-Wesley, 1995.
M. Abo Khamis, P. G. Kolaitis, H. Q. Ngo, and D. Suciu, Bag query containment and
information theory," in Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium
on Principles of Database Systems, PODS 2020, Portland, OR, USA, June 14-19, 2020,
D. Suciu, Y. Tao, and Z. Wei, Eds. ACM, 2020, pp. 95{112. [Online]. Available:
https://doi.org/10.1145/3375395.3387645
||, Decision problems in information theory," in 47th International Colloquium on
Automata, Languages, and Programming, ICALP 2020, July 8-11, 2020, Saarbrucken, Germany
(Virtual Conference), ser. LIPIcs, A. Czumaj, A. Dawar, and E. Merelli, Eds., vol. 168.
Schloss Dagstuhl - Leibniz-Zentrum fur Informatik, 2020, pp. 106:1{106:20. [Online]. Available:
https://doi.org/10.4230/LIPIcs.ICALP.2020.106
M. Abo Khamis, H. Q. Ngo, and A. Rudra, FAQ: questions asked frequently," in Proceedings
of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems,
PODS 2016, San Francisco, CA, USA, June 26 - July 01, 2016, T. Milo and W. Tan, Eds.
ACM, 2016, pp. 13{28. [Online]. Available: http://doi.acm.org/10.1145/2902251.2902280
M. Abo Khamis, H. Q. Ngo, and D. Suciu, Computing join queries with functional
dependencies," in Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on
Principles of Database Systems, PODS 2016, San Francisco, CA, USA, June 26 - July
, 2016, T. Milo and W. Tan, Eds. ACM, 2016, pp. 327{342. [Online]. Available:
http://doi.acm.org/10.1145/2902251.2902289
||, What do shannon-type inequalities, submodular width, and disjunctive datalog have to
do with one another?" in Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium
on Principles of Database Systems, PODS 2017, Chicago, IL, USA, May 14-19, 2017,
E. Sallinger, J. V. den Bussche, and F. Geerts, Eds. ACM, 2017, pp. 429{444. [Online].
Available: http://doi.acm.org/10.1145/3034786.3056105
CARDINALTITY ESTIMATION AND INFORMATION INEQUALITIES 15
N. Alon, On the number of subgraphs of prescribed type of graphs with a given number
of edges," Israel J. Math., vol. 38, no. 1-2, pp. 116{130, 1981. [Online]. Available:
http://dx.doi.org.gate.lib.bualo.edu/10.1007/BF02761855
A. Atserias, M. Grohe, and D. Marx, Size bounds and query plans for relational joins," in
FOCS. IEEE Computer Society, 2008, pp. 739{748.
B. Bollobas and A. Thomason, Projections of bodies and hereditary properties of
hypergraphs," Bull. London Math. Soc., vol. 27, no. 5, pp. 417{424, 1995. [Online]. Available:
https://doi-org.gate.lib.bualo.edu/10.1112/blms/27.5.417
W. Cai, M. Balazinska, and D. Suciu, Pessimistic cardinality estimation: Tighter upper
bounds for intermediate join cardinalities," in Proceedings of the 2019 International Conference
on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30
- July 5, 2019, P. A. Boncz, S. Manegold, A. Ailamaki, A. Deshpande, and T. Kraska, Eds.
ACM, 2019, pp. 18{35. [Online]. Available: https://doi.org/10.1145/3299869.3319894
T. Chan, Recent progresses in characterising information inequalities," Entropy, vol. 13, no. 2,
pp. 379{401, 2011. [Online]. Available: https://doi-org.gate.lib.bualo.edu/10.3390/e13020379
T. H. Chan and R. W. Yeung, On a relation between information inequalities and group
theory," IEEE Transactions on Information Theory, vol. 48, no. 7, pp. 1992{1995, 2002.
[Online]. Available: http://dx.doi.org/10.1109/TIT.2002.1013138
F. R. K. Chung, R. L. Graham, P. Frankl, and J. B. Shearer, Some intersection theorems for
ordered sets and graphs," J. Combin. Theory Ser. A, vol. 43, no. 1, pp. 23{37, 1986. [Online].
Available: http://dx.doi.org.gate.lib.bualo.edu/10.1016/0097-3165(86)90019-1
T. M. Cover and J. A. Thomas, Elements of information theory, 2nd ed. Hoboken, NJ: Wiley-
Interscience [John Wiley & Sons], 2006.
I. Csiszar, Axiomatic characterizations of information measures," Entropy, vol. 10, no. 3, pp.
{273, 2008. [Online]. Available: https://doi.org/10.3390/e10030261
E. Friedgut, Hypergraphs, entropy, and inequalities," Amer. Math. Monthly, vol. 111, no. 9,
pp. 749{760, 2004. [Online]. Available: https://doi-org.gate.lib.bualo.edu/10.2307/4145187
E. Friedgut and J. Kahn, On the number of copies of one hypergraph in another," Israel J.
Math., vol. 105, pp. 251{256, 1998. [Online]. Available: http://dx.doi.org.gate.lib.bualo.edu/
1007/BF02780332
G. Gottlob, S. T. Lee, G. Valiant, and P. Valiant, Size and treewidth bounds
for conjunctive queries," J. ACM, vol. 59, no. 3, p. 16, 2012. [Online]. Available:
http://doi.acm.org/10.1145/2220357.2220363
H. Harmouch and F. Naumann, Cardinality estimation: An experimental survey,"
Proc. VLDB Endow., vol. 11, no. 4, pp. 499{512, 2017. [Online]. Available: http:
//www.vldb.org/pvldb/vol11/p499-harmouch.pdf
A. Hertzschuch, C. Hartmann, D. Habich, and W. Lehner, Simplicity done right for join
ordering," in 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual
Event, January 11-15, 2021, Online Proceedings. www.cidrdb.org, 2021. [Online]. Available:
http://cidrdb.org/cidr2021/papers/cidr2021 paper01.pdf
HUNG Q. NGO
S. Jukna, Extremal combinatorics, 2nd ed., ser. Texts in Theoretical Computer Science. An
EATCS Series. Springer, Heidelberg, 2011, with applications in computer science. [Online].
Available: https://doi-org.gate.lib.bualo.edu/10.1007/978-3-642-17364-6
V. Leis, B. Radke, A. Gubichev, A. Kemper, and T. Neumann, Cardinality estimation
done right: Index-based join sampling," in 8th Biennial Conference on Innovative Data
Systems Research, CIDR 2017, Chaminade, CA, USA, January 8-11, 2017, Online Proceedings.
www.cidrdb.org, 2017. [Online]. Available: http://cidrdb.org/cidr2017/papers/p9-leis-cidr17.
L. H. Loomis and H. Whitney, An inequality related to the isoperimetric inequality," Bull.
Amer. Math. Soc, vol. 55, pp. 961{962, 1949.
F. Matus, Innitely many information inequalities," in IEEE International Symposium on
Information Theory, ISIT 2007, Nice, France, June 24-29, 2007. IEEE, 2007, pp. 41{44.
[Online]. Available: https://doi.org/10.1109/ISIT.2007.4557201
T. Milo and W. Tan, Eds., Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium
on Principles of Database Systems, PODS 2016, San Francisco, CA, USA, June 26 - July 01,
ACM, 2016. [Online]. Available: http://doi.acm.org/10.1145/2902251
H. Q. Ngo, Worst-case optimal join algorithms: Techniques, results, and open problems," in
Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database
Systems, Houston, TX, USA, June 10-15, 2018, J. V. den Bussche and M. Arenas, Eds. ACM,
, pp. 111{124. [Online]. Available: https://doi.org/10.1145/3196959.3196990
H. Q. Ngo, E. Porat, C. Re, and A. Rudra, Worst-case optimal join algorithms: [extended
abstract]," in PODS, 2012, pp. 37{48.
H. Q. Ngo, C. Re, and A. Rudra, Skew strikes back: new developments in the theory
of join algorithms," SIGMOD Record, vol. 42, no. 4, pp. 5{16, 2013. [Online]. Available:
http://doi.acm.org/10.1145/2590989.2590991
J. Radhakrishnan, Entropy and counting," in Computational Mathematics, Modelling and Algorithms,
J. C. Misra, Ed. Narosa Pub House, 2003, pp. 146{168.
R. W. Yeung, Information Theory and Network Coding, 1st ed. Springer Publishing Company,
Incorporated, 2008.
Z. Zhang and R. W. Yeung, On characterization of entropy function via information
inequalities," IEEE Transactions on Information Theory, vol. 44, no. 4, pp. 1440{1452, 1998.
[Online]. Available: http://dx.doi.org/10.1109/18.681320
Downloads
Published
Versions
- 07-10-2021 (2)
- 26-09-2021 (1)
How to Cite
Issue
Section
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.