HUPSMT: AN EFFICIENT ALGORITHM FOR MINING HIGH UTILITY-PROBABILITY SEQUENCES IN UNCERTAIN DATABASES WITH MULTIPLE MINIMUM UTILITY THRESHOLDS
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/35/1/13234Keywords:
High utility-probability sequence, uncertain quantitative sequence database, upper and lower bounds, width and depth pruning strategies,Abstract
The problem of high utility sequence mining (HUSM) in quantitative se-quence databases (QSDBs) is more general than that of frequent sequence mining in se-quence databases. An important limitation of HUSM is that a user-predened minimum tility threshold is used commonly to decide if a sequence is high utility. However, this is not convincing in many real-life applications as sequences may have diferent importance. Another limitation of HUSM is that data in QSDBs are assumed to be precise. But in the real world, collected data such as by sensor maybe uncertain. Thus, this paper proposes a framework for mining high utility-probability sequences (HUPSs) in uncertain QSDBs (UQS-DBs) with multiple minimum utility thresholds using a minimum utility. Two new width and depth pruning strategies are also introduced to early eliminate low utility or low probability sequences as well as their extensions, and to reduce sets of candidate items for extensions during the mining process. Based on these strategies, a novel ecient algorithm named HUPSMT is designed for discovering HUPSs. Finally, an experimental study conducted in both real-life and synthetic UQSDBs shows the performance of HUPSMT in terms of time and memory consumption.
Metrics
References
C.F. Ahmed, S.K. Tanbeer, and B.S Jeong, "Mining high utility web access sequences in dynamic web log data," in Proc. Software Engineering AI Networking and Paral-lel/Distributed Computing (SNPD), 2010 11th ACIS International Conference, 2010, pp. 76-81.
C. F. Ahmed, S. K. Tanbeer, and B. S. Jeong, "A novel approach for mining high-utility sequential patterns in sequence databases," ETRI Journal, vol. 32, pp. 676{686, 2010.
O.K. Alkan, and P. Karagoz, "CRoM and HuspExt: Improving eciency of high utility sequential pattern extraction," IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 10, pp. 2645-2657, 2015.
C.K. Chui, B. Kao, and E. Hung, "Mining frequent itemsets from uncertain data," in The Pacic-Asia Conference on Knowledge Discovery and Data Mining, 2007, pp. 47-58.
B. Dalmas, P. Fournier-Viger, and S. Norre, "TWINCLE: A Constrained Sequential Rule Mining Algorithm for Event Logs," in Proc. 9th International KES Conference (IDT-KES
, 2017, pp. 205-214.
P. Fournier-Viger, A. Gomariz, M. Campos, and R. Thomas, "Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information," in Proc. 18th Pacic-Asia Con-
ference on Knowledge Discovery and Data Mining, PAKDD '2014, 2014, pp. 40-52.
P. Fournier-Viger, A. Gomariz, T. Gueniche, A. Soltani, C. Wu, and V.S. Tseng, "SPMF: a Java Open-Source Pattern Mining Library," Journal of Machine Learning Research, vol. 15, no. 1, 2014, pp. 3389-3393, 2014.
T.P. Hong, C.H. Lee, and S.L. Wang, "Mining High Average-Utility Itemsets," in Proc. of IEEE Int'l Conf. on Systems, Man, and Cybernetics 2009, pp. 2526-2530.
G.C. Lan, T.P. Hong, V.S. Tseng, and S.L. Wang, "Applying the maximum utility mea- sure in high utility sequential pattern mining," Expert Systems with Applications, vol. 41, no. 11, pp. 5071-5081, 2014.
J.C.W. Lin, J. Zhang, and P. Fournier-Viger, "High-utility sequential pattern mining with multiple minimum utility thresholds," in Proc. Asia-Pacic Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data, 2017, pp. 215-229.
J. Liu, K. Wang, and B. Fung, "Direct discovery of high utility itemsets without candi-date generation," in Proc. 12th IEEE Intern. Conf. Data Mining, 2012, pp. 984{989.
J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, "Mining sequential patterns by pattern-growth: the PrexSpan approach," , Journal IEEE
Transactions on Knowledge and Data Engineering, vol. 16, no. 11, pp. 1424{1440, 2004.
B.E. Shie, P.S. Yu, and V.S. Tseng, "Mining interesting user behavior patterns in mobile commerce environments," Appl. Intell., vol. 38, pp. 418-435, 2013.
T. Truong, H. Duong, B. Le, and P. Fournier Viger, "Ecient Vertical Mining of High Average-Utility Itemsets based on Novel Upper-Bounds," IEEE Transactions on Knowl-edge and Data Engineering, DOI:10.1109/TKDE.2018.2833478, 2018.
T. Truong, A. Tran, H. Duong, B. Le, and P. Fournier-Viger, "EHUSM: Mining high utility sequences with a pessimistic approach," in Proc. UDM 2018, 24th ACM SIGKDD (KDD 2018) [Online]. Available: http://www:philippe-fournier-viger:com=utility-
mining workshop 2018=program:php.
J. Z. Wang, J.L. Huang, and Y.C. Chen, "On eciently mining high utility sequential patterns," Knowledge and Information Systems, vol. 49, no. 2, pp. 597-627, 2016.
J. Yin, Z. Zheng, L. Cao, Y. Song, and W. Wei, "Eciently mining top-k high utility sequential patterns," in Proc. 2013 IEEE 13th International Conference on Data Mining (ICDM), 2013, pp. 1259-1264.
J. Yin, Z. Zheng, and L. Cao, "USpan: an ecient algorithm for mining high utility sequential patterns," in Proc. 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012, pp. 660-668.
S. Zida, P. Fournier-Viger, JC.W. Lin, C.W. Wu, and V.S. Tseng, "EFIM: A Highly Ecient Algorithm for High-Utility Itemset Mining," in Proc. 14th Mexican Intern. Conf.
Articial Intelligence, 2015, pp. 530-546.
M. Zihayat, H. Davoudi, and A. An, "Top-k utility-based gene regulation sequential pattern discovery," in Proc. Bioinformatics and Biomedicine (BIBM), 2016 IEEE Inter-national Conference, 2016, pp. 266-273.
B. Zhang, J.C.W. Lin, P. Fournier-Viger, and T. Li, "Mining of high utility-probability sequential patterns from uncertain databases," PLoS ONE, vol. 12, no. 7, pp. 1-21, 2017.
Downloads
Published
How to Cite
Issue
Section
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.