Binary Count Tree: An Efficient and Compact Structure for Mining Rare and Frequent Itemsets

Document Type

Article

Publication Title

Engineered Science

Abstract

The discovery of rare and frequent itemsets is done efficiently if the datasets to be processed are stored within the main memory. In recent years, various data structures have been developed to represent a large dataset in a compact form, which otherwise cannot be stored as a whole within the main memory. Binary Count Tree (BIN-Tree), a tree data structure is proposed in this paper, represents the entire dataset in a compact and complete form without any information loss. Each transaction is encoded and stored as a node in the tree, in contrast to the existing algorithms that store each item as a node. The efficiency of BIN-Tree for datasets of varying size and dimensions was evaluated against Single Scan Pattern Tree (SSP-Tree) and Weighted Count Tree (WC-Tree). The results obtained revealed BIN-Tree to be 95% and 75% more space-efficient than SSP-Tree and WC-Tree, respectively. The BIN-Tree construction and discovery of itemsets from a large dataset were found to be 93% and 22% more time-efficient than SSP-Tree and WC-Tree, respectively. BIN-Tree is equally efficient to discover rare and frequent itemsets from a small dataset in the main memory.

First Page

185

Last Page

194

DOI

10.30919/es8d602

Publication Date

1-1-2022

This document is currently not available here.

Share

COinS