MINING TOP-K FREQUENT CLOSED ITEMSETS IN DATA STREAMS USING SLIDING WINDOW

Authors

  • Z. Rehman Department of Computer Science and Engineering, University of Engineering and Technology, Lahore, Pakistan
  • M. Shahbaz Department of Computer Science and Engineering, University of Engineering and Technology, Lahore, Pakistan

Abstract

Frequent itemset mining has become a popular research area in data mining community since the last few years. There are two main technical hitches while finding frequent itemsets. First, to provide an appropriate minimum support value to start and user need to tune this minimum support value by running the algorithm again and again. Secondly, generated frequent itemsets are mostly numerous and as a result a number of association rules generated are also very large in numbers. Applications dealing with streaming environment need to process the data received at high rate, therefore, finding frequent itemsets in data streams becomes complex. In this paper, we present an algorithm to mine top-k frequent closed itemsets using sliding window approach from streaming data. We developed a single-pass algorithm to find frequent closed itemsets of length between user‟s defined minimum and maximum-length. To improve the performance of algorithm and to avoid rescanning of data, we have transformed data into bitmap based tree data structure.

References

B. Li, “Fining Frequent Itemsets from

Uncertain Transaction Streams (2009) pp.

–335.

B. Yang and H. Huang, Knowledge and

Information Systems 23, No. 2, (May 2009)

p. 225

J. H. Chang and W. S. Lee, Finding Recent

Frequent Itemsets Adaptively Over Online

Data Streams (2003) p. 487.

Yin-Ling Cheung and Ada Wai-Chee Fu,

IEEE Transactions on Knowledge and Data

Engg. 16, No. 9, (Sept. 2004) p. 1052

P. Songram and V. Boonjing, N-Most

Interesting Closed Itemset Mining (2008)

p. 619

D. Lee and W. Lee, Finding Maximal

Frequent Itemsets over Online Data Streams

Adaptively (2005) pp. 266–273.

A. Manjhi, V. Shkapenyuk, K. Dhamdhere,

and C. Olston, Finding (Recently) Frequent

Items in Distributed Data Streams (2004)

pp. 767–778.

G. S. Manku and R. Motwani, Approximate

Frequency Counts Over Data Streams,

Proceedings of the 28th International

Conference on Very Large Data Bases,

(2002) pp. 346–357.

R. Agrawal and R. Srikant, Fast Algorithms

for Mining Association Rules in Large

Databases, VLDB‟94, Proceedings of 20th

International Conference on Very Large Data

Bases, September 12-15, 1994, Santiago de

Chile, (1994) pp. 487–499.

J. Wang, J. Han, Y. Lu and P. Tzvetkov,

IEEE Transactions on Knowledge and Data

Engg. 17, No. 5 (2005) 652.

P. S. M. Tsai, Expert Systems with

Applications 37, No. 10 (2010) 6968.

N. Pasquier, Y. Bastide, R. Taouil, and L.

Lakhal, Discovering Frequent Closed

Itemsets for Association Rules (1999) p. 398

H.-F. Li, C.-C. Ho and S.-Y. Lee, Expert

Systems with Applications 36, No. 2 (2009)

pp. 2451

F. Ao, J. Du, Y.n Yan, B. Liu and K. Huang,

An Efficient Algorithm for Mining Closed

Frequent Itemsets in Data Streams, IEEE 8th

International Conference on CIT (2008)

p. 37.

C. C. Aggarwal, Data Streams Models and

Algorithms. New York: Springer (2007).

M. M. Gaber, A. Zaslavsky and

S. Krishnaswamy, ACM SIGMOD Record,

, No. 2 (2005) 18.

S. K. Tanbeer, C. F. Ahmed, B.-S. Jeong and

Y.-K. Lee, CP-Tree: A Tree Structure for

Single-Pass Frequent Pattern Mining,

Advances in Knowledge Discovery and Data

Mining 5012 (2008) 1022.

T. Hu, S. Y. Sung, H. Xiong and Q. Fu,

Discovery of Maximum Length Frequent

Itemsets, Information Sciences 178, No. 1

(2008) 69.

J. Han, H. Cheng, D. Xin and X. Yan, Data

Mining and Knowledge Discovery 15, No. 1

(2007) 55.

Jia-dong Ren and Ke Li, Online Data Stream

Mining of Recent Frequent Itemsets Based

on Sliding Window Model (2008) pp. 293–

H.-F. Li, C.-C. Ho, M.-K. Shan and S.-Y. Lee,

Efficient Maintenance and Mining of Frequent

Itemsets over Online Data Streams with a

Sliding Window (2006) p. 2672.

C. Leung and Q. Khan, DSTree: A Tree

Structure for the Mining of Frequent Sets

from Data Streams (2006) pp. 928.

Y. Chi, H. Wang, P. S. Yu, and R. R. Muntz,

Moment: Maintaining Closed Frequent

Itemsets over a Stream Sliding Window

(Nov 2004) p. 59.

L. Jin, D. J. Chai, Y. K. Lee and K. H. Ryu,

Mining Frequent Itemsets over Data Streams

with Multiple Time-Sensitive Sliding Windows

(2007) p. 486.

B. Mozafari, H. Thakkar and C. Zaniolo,

Verifying and Mining Frequent Patterns from

Large Windows over Data Streams (2008)

p. 179.

S.K. Tanbeer, C. F. Ahmed, B.-S. Jeong and

Y.-K. Lee, Information Sciences 179, No. 22

(2009) 3843.

H.-F. Li, Expert Systems with Applications

, No. 7 (2009) 10779.

C. Lin, D. Chiu, and Y. Wu, Mining Frequent

Itemsets from Data Streams with a TimeSensitive Sliding Window, SDM (2005).

A.J.T. Lee and C.-S. Wang, Information

Sciences 177, No. 17 (2007) 3453.

C. Leung and Q. Khan, Efficient Mining of

Constrained Frequent Patterns from Streams

(2006) 61.

M. J. Zaki and C.-J. Hsiao, IEEE

Transactions on Knowledge and Data Engg.

, No. 4 (2005) 462.

J. Han, J. Wang, Y. Lu, and P. Tzvetkov,

Mining Top-K Frequent Closed Patterns

without Minimum Support, Proceedings of

IEEE International Conference on Data

Mining (2002) p. 211.

A. Metwally, D. Agrawal, and A. Abbadi,

Efficient Computation of Frequent and Top-k

Elements in Data Streams, Database Theory,

, T. Eiter and L. Libkin, Eds. Berlin,

Heidelberg: Springer Berlin Heidelberg,

(2004) pp. 398

A. W. Fu, R. W. Kwong, F. Renfrew, W.

Kwong, and J. Tang, Mining N-most

Interesting Itemsets (2000)

H.-F. Li and S.-Y. Lee, Expert Systems with

Applications 36 No. 2, (2009) 1466.

C. Lin, D. Chiu, and Y. Wu, Mining Frequent

Itemsets from Data Streams with a TimeSensitive Sliding Window, SDM (2005).

M. Deypir and M. H. Sadreddini, Journal of

Systems and Software 85, No. 3 (2012) 746.

Y. Chi, H. Wang, P. S. Yu, and R. R. Muntz,

Knowledge and Information Systems, 10

No. 3 (2006) 265.

J. Cheng, Y. Ke, and W. Ng, Journal of

Intelligent Information Systems 31, No. 3,

(2007) 191.

H. Li and H. Chen, Data & Knowledge

Engineering 68, No. 5 (2009) 481.

J.-L. Koh and C.-Y. Lin, Concept Shift

Detection for Frequent Itemsets from Sliding

Windows over Data Streams, Database

Systems for Advanced Applications, 5667,

L. Chen, C. Liu, Q. Liu, and K. Deng, Eds.

Berlin, Heidelberg: Springer Berlin

Heidelberg (2009) p. 334.

R. C. Wong and A. W. Fu, Mining TopK Itemsets Over a Sliding window Based on

Zipfian Distribution, SIAM International

Conference on Data Mining (2005).

J. Han, J. Pei, and Y. Yin, Mining Frequent

Patterns without Candidate Gneration (2000)

pp. 1–12.

Downloads

Published

02-09-2013

How to Cite

[1]
Z. Rehman and M. Shahbaz, “MINING TOP-K FREQUENT CLOSED ITEMSETS IN DATA STREAMS USING SLIDING WINDOW”, The Nucleus, vol. 50, no. 3, pp. 229–240, Sep. 2013.

Issue

Section

Articles