Parallel and Distributed Multi-level Entropy- Based Approach for Adaptive Global Frequent Pattern Mining in Large Datasets

Authors

  • Houda Essalmi Laboratory of Engineering Sciences, Polydisciplinary Faculty of Taza, University of Sidi Mohamed Ben Abdellah Fez, Morocco
  • Anass El Affar Laboratory of Engineering Sciences, Polydisciplinary Faculty of Taza, University of Sidi Mohamed Ben Abdellah Fez, Morocco

DOI:

https://doi.org/10.32985/ijeces.17.1.5

Keywords:

Data mining, Distributed Datasets, FP-tree, Communication Overhead, Frequent patterns mining, Binary Entropy, Quartile-based Pruning

Abstract

Frequent pattern mining in distributed settings remains a significant challenge due to predominantly high computational expenses and high communication overhead. This paper presents AGFPM (Adaptive Global Frequent Pattern Mining), a novel solution that integrates an extensible Master-Slave architecture with an advanced pruning technique that relies on binary entropy and statistical quartiles. AGFPM proposes two primary data structures: the LP-Tree (Local Prefix Tree) and the GP-Tree (Global Prefix Tree). A single pass of each local Slave site is used to build one LP-Tree, and low information value branches are pruned early on by entropy and quartile thresholds. Rather than transferring complete trees, only succinct metadata is sent to the Master site, where the GP-Tree is built from globally sorted items in order of their entropy rankings. A significant aspect of AGFPM is the flexible pruning approach: either the GP-Tree is pruned or not pruned, based on user criteria. This provides a dynamic adjustment between the performance and generality of results, thereby allowing control over the level of compression applied when generating global patterns. Global frequent patterns are then recursively mined from the GP-Tree based on conditional sub-GP-Trees. Frequent patterns are extended at each level of the hierarchy by intersecting the common prefix paths, guided by a Global Header Table. AGFM demonstrates improved performance in execution time, scalability, and robustness against low support thresholds relative to existing methods.

Downloads

Published

2025-12-08

How to Cite

[1]
H. Essalmi and A. El Affar, “Parallel and Distributed Multi-level Entropy- Based Approach for Adaptive Global Frequent Pattern Mining in Large Datasets”, IJECES, vol. 17, no. 1, pp. 49-64, Dec. 2025.

Issue

Section

Original Scientific Papers