A Scalable Distributed Approach for Exploration Global Frequent Patterns
DOI:
https://doi.org/10.32985/ijeces.16.7.1Keywords:
Data mining, Parallel Processing, Frequent Patterns tree, Communication costsAbstract
Finding patterns in transactional databases regularly is an essential part of data mining since it makes it simpler to identify significant connections and reoccurring patterns in datasets. Scalable, high-performance computing solutions that employ parallel computing systems to optimize resource efficiency and data analysis as data volumes continue to grow are necessary for efficiently processing large databases. To solve these issues, this paper presents Exploration Global Frequent Patterns (EGFP), a new parallel algorithm designed to generate global frequent patterns in different distributed datasets. By facilitating the distribution of workloads and data partitioning, the approach reduces communication costs and ensures efficient parallel execution. Our approach uses two prefix-tree structures to generate a significantly compacted and structured representation of frequent patterns. The first structure local-tree serves to store local support values to effectively collect and arrange transaction data. Global prefix counts are then aggregated and ranked to improve frequency-based analysis and provide a more organized and useful representation of frequent patterns. To find the globally prevalent patterns, a Master site develops a second structure global-tree for each prefix based on this arranged data. Experimental results on large-scale benchmark datasets show that EGFP outperforms other existing methods including CD and PFP-tree in terms of execution time and scalability, while incurring considerably less communication cost.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 International Journal of Electrical and Computer Engineering Systems

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.