Evaluation of Concept Drift in Poisson Big Data Stream using Adaptive Sliding Windows

Main Article Content

Chanintorn Jittawiriyanukoon
Vilasinee Srisarkun

Abstract

Involving with big data whose dynamically changes over time is one of the major problems in big data curation. In this research an adaptive sliding window will be presented, as an evaluation with memory and variable length of data stream. Adaptive sliding window concept is to maintain the variable window size in order to carry the latest read data stream arriving in Poisson process from which older ones based upon algorithm rules. We need to involve the change of concepts meaning (i.e. concept drift) which is necessary for data releases and sophisticate data links. The concept drift thus reflects the change of window size and provides statistics update from recent data. Our simulation runs both fixed and continuous data stream so that sliding window is applied to different processing of data curation. In this paper we have proposed Poisson and Random arrival model of data stream which will employ Massive Online Analysis (MOA) for evaluating the concept drift measurements. Stagger stream generator with the Hoeffding bound outperforms and results highest accuracy while Naïve Bayes learner with Gradually Change generator fits Poisson arrival pattern.

Article Details

Section
Articles

References

Albert, Bifet., Eibe, Frank., Geoffrey, Holmes. & Bernard, Pfahringer. (2007). Accurate Ensembles for Data Streams Combining Restricted Hoeffding Trees Using Stacking. Journal of Machine Learning Research, 225-240.

Albert, Bifet., Geoff, Holmes., Richard, Kirkby. & Bernhard, Pfahringer. (2010). MOA: Massive Online Analysis. Journal of Machine Learning Research 11, 1601-1604.

Amreen, Khan. & Kamal, K. Ahirwar. (2011). Mobile Cloud Computing as a Future of Mobile Multimedia Database. International Journal of Computer Science and Communication, 2(1): 219-221.

Bose et al. (2013). Dealing With Concept Drifts in Process Mining. IEEE Transactions on Neural Networks and Learning Systems,1-18. (DOI: 10.1109/TNNLS.2013.2278313)

C, Jittawiriyanukoon. (2014). Performance evaluation of reliable data scheduling for Erlang multimedia in cloud computing. Ninth International Conference on Digital Information Management (ICDIM), 39-44. (DOI: 10.1109/ICDIM.2014.6991394)

Cunningham, P., Nowlan, N., Delany, S. J. & Haahr, M. (2003). A Case-Based Approach to Spam Filtering that Can Track Concept Drift. Proceedings of ICCBR, Workshop on Long-Lived CBR Systems.

G, Hulten., L, Spencer. & P, Domingos. (2001). Mining Time-Changing Data Streams. ACM Press, San Francisco, CA, 97-106. https://www.ibm.com

J, C. Schlimmer. & R, H. Granger. (1986). Incremental Learning from Noisy Data. Machine Learning, 1(3): 317–354.

Koo et al. (1999). Analysis of Erlang Capacity for the Multimedia DS-CDMA Systems. IEICE Transaction Fundamentals, E82-A(5): 849-855.

Ludmila, I. Kuncheva. (2004). Classifier Ensembles for Changing Environments. Lecture Notes in Computer Science, Springer, 1–15.

Srimani & Patil. (2016). Mining Data Streams with Concept Drift in Massive Online Analysis Frame Work. WSEAS Transaction on Computers, 15: 133-142.

Victoria, J. Hodge. (2014). Outlier Detection in Big Data. IGI Global, 1762-1771. (DOI: 10.4018/978-1-4666-5202-6.ch157)W, N. Street. & Y, Kim. (2001). A Streaming Ensemble Algorithm for Large-Scale Classification. Proceeding of 7th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, ACM Press, New York, USA, 377-382.

Wang, H., Fan, W., Yu, P.S. & Han, J. (2003). Mining Concept-Drifting Data Streams using Ensemble Classifiers. 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining KDD, ACM Press, 226-235.