MINING TOP-K FREQUENT CLOSED ITEMSETS IN DATA STREAMS USING SLIDING WINDOW
Abstract
Frequent itemset mining has become a popular research area in data mining community since the last few years. There are two main technical hitches while finding frequent itemsets. First, to provide an appropriate minimum support value to start and user need to tune this minimum support value by running the algorithm again and again. Secondly, generated frequent itemsets are mostly numerous and as a result a number of association rules generated are also very large in numbers. Applications dealing with streaming environment need to process the data received at high rate, therefore, finding frequent itemsets in data streams becomes complex. In this paper, we present an algorithm to mine top-k frequent closed itemsets using sliding window approach from streaming data. We developed a single-pass algorithm to find frequent closed itemsets of length between user‟s defined minimum and maximum-length. To improve the performance of algorithm and to avoid rescanning of data, we have transformed data into bitmap based tree data structure.References
B. Li, “Fining Frequent Itemsets from
Uncertain Transaction Streams (2009) pp.
–335.
B. Yang and H. Huang, Knowledge and
Information Systems 23, No. 2, (May 2009)
p. 225
J. H. Chang and W. S. Lee, Finding Recent
Frequent Itemsets Adaptively Over Online
Data Streams (2003) p. 487.
Yin-Ling Cheung and Ada Wai-Chee Fu,
IEEE Transactions on Knowledge and Data
Engg. 16, No. 9, (Sept. 2004) p. 1052
P. Songram and V. Boonjing, N-Most
Interesting Closed Itemset Mining (2008)
p. 619
D. Lee and W. Lee, Finding Maximal
Frequent Itemsets over Online Data Streams
Adaptively (2005) pp. 266–273.
A. Manjhi, V. Shkapenyuk, K. Dhamdhere,
and C. Olston, Finding (Recently) Frequent
Items in Distributed Data Streams (2004)
pp. 767–778.
G. S. Manku and R. Motwani, Approximate
Frequency Counts Over Data Streams,
Proceedings of the 28th International
Conference on Very Large Data Bases,
(2002) pp. 346–357.
R. Agrawal and R. Srikant, Fast Algorithms
for Mining Association Rules in Large
Databases, VLDB‟94, Proceedings of 20th
International Conference on Very Large Data
Bases, September 12-15, 1994, Santiago de
Chile, (1994) pp. 487–499.
J. Wang, J. Han, Y. Lu and P. Tzvetkov,
IEEE Transactions on Knowledge and Data
Engg. 17, No. 5 (2005) 652.
P. S. M. Tsai, Expert Systems with
Applications 37, No. 10 (2010) 6968.
N. Pasquier, Y. Bastide, R. Taouil, and L.
Lakhal, Discovering Frequent Closed
Itemsets for Association Rules (1999) p. 398
H.-F. Li, C.-C. Ho and S.-Y. Lee, Expert
Systems with Applications 36, No. 2 (2009)
pp. 2451
F. Ao, J. Du, Y.n Yan, B. Liu and K. Huang,
An Efficient Algorithm for Mining Closed
Frequent Itemsets in Data Streams, IEEE 8th
International Conference on CIT (2008)
p. 37.
C. C. Aggarwal, Data Streams Models and
Algorithms. New York: Springer (2007).
M. M. Gaber, A. Zaslavsky and
S. Krishnaswamy, ACM SIGMOD Record,
, No. 2 (2005) 18.
S. K. Tanbeer, C. F. Ahmed, B.-S. Jeong and
Y.-K. Lee, CP-Tree: A Tree Structure for
Single-Pass Frequent Pattern Mining,
Advances in Knowledge Discovery and Data
Mining 5012 (2008) 1022.
T. Hu, S. Y. Sung, H. Xiong and Q. Fu,
Discovery of Maximum Length Frequent
Itemsets, Information Sciences 178, No. 1
(2008) 69.
J. Han, H. Cheng, D. Xin and X. Yan, Data
Mining and Knowledge Discovery 15, No. 1
(2007) 55.
Jia-dong Ren and Ke Li, Online Data Stream
Mining of Recent Frequent Itemsets Based
on Sliding Window Model (2008) pp. 293–
H.-F. Li, C.-C. Ho, M.-K. Shan and S.-Y. Lee,
Efficient Maintenance and Mining of Frequent
Itemsets over Online Data Streams with a
Sliding Window (2006) p. 2672.
C. Leung and Q. Khan, DSTree: A Tree
Structure for the Mining of Frequent Sets
from Data Streams (2006) pp. 928.
Y. Chi, H. Wang, P. S. Yu, and R. R. Muntz,
Moment: Maintaining Closed Frequent
Itemsets over a Stream Sliding Window
(Nov 2004) p. 59.
L. Jin, D. J. Chai, Y. K. Lee and K. H. Ryu,
Mining Frequent Itemsets over Data Streams
with Multiple Time-Sensitive Sliding Windows
(2007) p. 486.
B. Mozafari, H. Thakkar and C. Zaniolo,
Verifying and Mining Frequent Patterns from
Large Windows over Data Streams (2008)
p. 179.
S.K. Tanbeer, C. F. Ahmed, B.-S. Jeong and
Y.-K. Lee, Information Sciences 179, No. 22
(2009) 3843.
H.-F. Li, Expert Systems with Applications
, No. 7 (2009) 10779.
C. Lin, D. Chiu, and Y. Wu, Mining Frequent
Itemsets from Data Streams with a TimeSensitive Sliding Window, SDM (2005).
A.J.T. Lee and C.-S. Wang, Information
Sciences 177, No. 17 (2007) 3453.
C. Leung and Q. Khan, Efficient Mining of
Constrained Frequent Patterns from Streams
(2006) 61.
M. J. Zaki and C.-J. Hsiao, IEEE
Transactions on Knowledge and Data Engg.
, No. 4 (2005) 462.
J. Han, J. Wang, Y. Lu, and P. Tzvetkov,
Mining Top-K Frequent Closed Patterns
without Minimum Support, Proceedings of
IEEE International Conference on Data
Mining (2002) p. 211.
A. Metwally, D. Agrawal, and A. Abbadi,
Efficient Computation of Frequent and Top-k
Elements in Data Streams, Database Theory,
, T. Eiter and L. Libkin, Eds. Berlin,
Heidelberg: Springer Berlin Heidelberg,
(2004) pp. 398
A. W. Fu, R. W. Kwong, F. Renfrew, W.
Kwong, and J. Tang, Mining N-most
Interesting Itemsets (2000)
H.-F. Li and S.-Y. Lee, Expert Systems with
Applications 36 No. 2, (2009) 1466.
C. Lin, D. Chiu, and Y. Wu, Mining Frequent
Itemsets from Data Streams with a TimeSensitive Sliding Window, SDM (2005).
M. Deypir and M. H. Sadreddini, Journal of
Systems and Software 85, No. 3 (2012) 746.
Y. Chi, H. Wang, P. S. Yu, and R. R. Muntz,
Knowledge and Information Systems, 10
No. 3 (2006) 265.
J. Cheng, Y. Ke, and W. Ng, Journal of
Intelligent Information Systems 31, No. 3,
(2007) 191.
H. Li and H. Chen, Data & Knowledge
Engineering 68, No. 5 (2009) 481.
J.-L. Koh and C.-Y. Lin, Concept Shift
Detection for Frequent Itemsets from Sliding
Windows over Data Streams, Database
Systems for Advanced Applications, 5667,
L. Chen, C. Liu, Q. Liu, and K. Deng, Eds.
Berlin, Heidelberg: Springer Berlin
Heidelberg (2009) p. 334.
R. C. Wong and A. W. Fu, Mining TopK Itemsets Over a Sliding window Based on
Zipfian Distribution, SIAM International
Conference on Data Mining (2005).
J. Han, J. Pei, and Y. Yin, Mining Frequent
Patterns without Candidate Gneration (2000)
pp. 1–12.