Open Access archive

PUC: Parallel mining of high-utility itemsets with load balancing on spark

Anup Bhat Brahmavar, Manipal Institute of Technology
Harish Sheeranalli Venkatarama, Manipal Institute of Technology
Geetha Maiya, Manipal Institute of Technology

Document Type

Article

Publication Title

Journal of Intelligent Systems

Abstract

Distributed programming paradigms such as MapReduce and Spark have alleviated sequential bottleneck while mining of massive transaction databases. Of significant importance is mining High Utility Itemset (HUI) that incorporates the revenue of the items purchased in a transaction. Although a few algorithms to mine HUIs in the distributed environment exist, workload skew and data transfer overhead due to shuffling operations remain major issues. In the current study, Parallel Utility Computation (PUC) algorithm has been proposed with novel grouping and load balancing strategies for an efficient mining of HUIs in a distributed environment. To group the items, Transaction Weighted Utility (TWU) values as a degree of transaction similarity is employed. Subsequently, these groups are assigned to the nodes across the cluster by taking into account the mining load due to the items in the group. Experimental evaluation on real and synthetic datasets demonstrate that PUC with TWU grouping in conjunction with load balancing converges mining faster. Due to reduced data transfer, and load balancing-based assignment strategy, PUC outperforms different grouping strategies and random assignment of groups across the cluster. Also, PUC is shown to be faster than PHUI-Growth algorithm with a promising speedup.

First Page

568

Last Page

588

DOI

10.1515/jisys-2022-0044

Publication Date

1-1-2022

Recommended Citation

Brahmavar, Anup Bhat; Sheeranalli Venkatarama, Harish; and Maiya, Geetha, "PUC: Parallel mining of high-utility itemsets with load balancing on spark" (2022). Open Access archive. 5000.
https://impressions.manipal.edu/open-access-archive/5000

This document is currently not available here.

COinS

Open Access archive

PUC: Parallel mining of high-utility itemsets with load balancing on spark

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Search

Browse

Author Corner

Open Access archive

PUC: Parallel mining of high-utility itemsets with load balancing on spark

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Share

Search

Browse

Author Corner