RECURSIVE JOIN PROCESSING IN BIG DATA ENVIRONMENT

Anh-Cang Phan; Thanh-Ngoan Trieu; Thuong-Cang Phan

doi:10.15625/1813-9663/37/2/15889

RECURSIVE JOIN PROCESSING IN BIG DATA ENVIRONMENT

Anh-Cang Phan, Thanh-Ngoan Trieu, Thuong-Cang Phan

Author affiliations

Authors

Anh-Cang Phan
Thanh-Ngoan Trieu
Thuong-Cang Phan

DOI:

https://doi.org/10.15625/1813-9663/37/2/15889

Keywords:

Apache spark, big data, recursive join, optimize three-way join

Abstract

In the era of information explosion, Big data is receiving increased attention as having important implications for growth, profitability, and survival of modern organizations. However, it also offers many challenges in the way data is processed and queried over time. A join operation is one of the most common operations appearing in many data queries. Specially, a recursive join is a join type used to query hierarchical data but it is more extremely complex and costly. The evaluation of the recursive join in MapReduce includes some iterations of two tasks of a join task and an incremental computation task. Those tasks are significantly expensive and reduce the performance of queries in large datasets because they generate plenty of intermediate data transmitting over the network. In this study, we thus propose a simple but efficient approach for Big recursive joins based on reducing by half the number of the required iterations in the Spark environment. This improvement leads to significantly reducing the number of the required tasks as well as the amount of the intermediate data generated and transferred over the network. Our experimental results show that an improved recursive join is more efficient and faster than a traditional one on large-scale datasets.

Metrics

Metrics Loading ...

Downloads

Published

31-05-2021

How to Cite

[1]

A.-C. Phan, T.-N. Trieu, and T.-C. Phan, “RECURSIVE JOIN PROCESSING IN BIG DATA ENVIRONMENT”, JCC, vol. 37, no. 2, p. 107–122, May 2021.

Download Citation

Issue

Vol. 37 No. 2 (2021)

Section

Computer Science

License

1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.
2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.

RECURSIVE JOIN PROCESSING IN BIG DATA ENVIRONMENT

Authors

DOI:

Keywords:

Abstract

Metrics

Downloads

Published

How to Cite

Issue

Section

License

Published by Year

Information

indexing