Applied and Computational Engineering

- The Open Access Proceedings Series for Conferences


Proceedings of the 4th International Conference on Computing and Data Science (CONF-CDS 2022)

Series Vol. 2 , 22 March 2023


Open Access | Article

Improving Performance Parameters of Clusters Using Density-Based Algorithm

Twinkle Keshvani 1 , Madhu Shukla * 2 , Meghnesh Jayswal 3 , Kishan Makadiya 4
1 Department of Computer Engineering, Marwadi University, Rajkot-360003, India
2 Department of Computer Engineering - AI, Marwadi University, Rajkot-360003, India
3 Department of Computer Engineering, Marwadi University, Rajkot-360003, India
4 Department of Computer Engineering, Marwadi University, Rajkot-360003, India

* Author to whom correspondence should be addressed.

Applied and Computational Engineering, Vol. 2, 100-110
Published 22 March 2023. © 2023 The Author(s). Published by EWA Publishing
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Citation Twinkle Keshvani, Madhu Shukla, Meghnesh Jayswal, Kishan Makadiya. Improving Performance Parameters of Clusters Using Density-Based Algorithm. ACE (2023) Vol. 2: 100-110. DOI: 10.54254/2755-2721/2/20220605.

Abstract

With the advancement in technology, data generated by non-stationary in day-to-day life is massive, continuous and rapid. Many applications such as IoT, transaction systems, network sensors, video surveillance systems, and network intrusion detection systems generate a massive amount of real-time data. The data used in traditional data mining is static in nature, and it can be revised for processing and Analysis. While data in data stream mining is dynamic in nature and it never stops. Besides, the data generated may have a change imbibed in its characteristics over a long/short period of time which is called concept drift. So, analysing such data has huge inbuilt challenges that deal with the dynamism of the characteristics of data itself. This dynamic nature is because of fast and continuous changing data and its enormity. To overcome this limitation, we can use modified clustering techniques that could help us in proper data analysis. Clustering is an effective method used in data mining; but clustering data streams may add some additional challenges such as storage capacity, limited time, one pass and rate of arrival. Furthermore, data streams are fickle in nature and because of this behaviour it needs to be processed as and when it arrives. In addition to that, knowledge about the number of clusters like in K-means clustering is unknown. In view of these characteristics of the data stream, the information or the data generated in the data stream are non-deterministic. Such non-deterministic information contains noise points or outliers, so developing an effective clustering algorithm in a data stream is a crucial task. These methods can work with labelled data in data stream clustering, which has the potential to identify clusters of any shape and noise. The motivation for this research work is using the said algorithm to address and overcome the constraints of the data stream and to dig out the best knowledge from it.

Keywords

Concept Drift, Pruning, Massive Data, Micro-clusters, Clustering, Datastream, Arbitrary, Density-based

References

1. U. Kokate, A. Deshpande, P. Mahalle, and P. Patil, “Data Stream Clustering Techniques, Applications, and Models: Comparative Analysis and Discussion,” Big Data Cogn. Comput., vol. 2, no. 4, p. 32, 2018, doi: 10.3390/bdcc2040032.

2. Y. Wu, “Network Big Data: A Literature Survey on Stream Data Mining,” J. Softw., vol. 9, no. 9, pp. 2427–2434, 2014, doi: 10.4304/jsw.9.9.2427-2434.

3. F. Cao, M. Ester, W. Qian, and A. Zhou, “Density-based clustering over an evolving data stream with noise,” Proc. Sixth SIAM Int. Conf. Data Min., vol. 2006, pp. 328–339, 2006, doi: 10.1137/1.9781611972764.29.

4. I. Khan, J. Z. Huang, and K. Ivanov, “Incremental density-based ensemble clustering over evolving data streams,” Neurocomputing, vol. 191, pp. 34–43, 2016, doi: 10.1016/j.neucom.2016.01.009.

5. T. Keshvani and M. Shukla, “A Comparative Study on Data Stream Clustering Algorithms,” Lect. Notes Data Eng. Commun. Technol., vol. 31, pp. 219–230, 2020, doi: 10.1007/978-3-030-24643-3_27.

6. J. Chandrika, “A Novel Approach for Clustering Categorical Data Streams,” Int. J. Innov. Manag. Technol., vol. 4, no. 5, 2013, doi: 10.7763/ijimt.2013.v4.447.

7. P. Chauhan and M. Shukla, "A review on outlier detection techniques on data stream by using different approaches of K-Means algorithm," 2015 International Conference on Advances in Computer Engineering and Applications, 2015, pp. 580-585, doi: 10.1109/ICACEA.2015.7164758.

8. M. Jayswal and M. Shukla, "Consolidated study and Analysis of different clustering Techniques for Data streams,” 3rd International Conference on Computing for Sustainable Global Development (INDIACom) pp. 3541–3547, 2016.

9. F. Cao, M. Ester, W. Qian, and A. Zhou, “Density-based clustering over an evolving data stream with noise,” Proc. Sixth SIAM Int. Conf. Data Min., vol. 2006, pp. 328–339, 2006, doi: 10.1137/1.9781611972764.29.

10. Kaneriya, A., & Shukla, M. (2015, March). A novel approach for clustering data streams using granularity technique. In 2015 International Conference on Advances in Computer Engineering and Applications (pp. 586-590). IEEE.

11. J. Gao, J. Li, Z. Zhang, and P. Tan, “Ducstream,An incremental data stream clustering algorithm based on Dense units Detection” pp. 420–425, 2005.

12. Amini, Amineh, and Teh Ying Wah. "DENGRIS-Stream: A density-grid based clustering algorithm for evolving data streams over sliding window." Proc. International Conference on Data Mining and Computer Engineering. 2012.

13. A. Zhou, F. Cao, W. Qian, and C. Jin, “Tracking clusters in evolving data streams over sliding windows,” Knowl. Inf. Syst., vol. 15, no. 2, pp. 181–214, 2008, doi: 10.1007/s10115-007-0070-x.

14. Shukla, M., Kosta, Y. P.,& Chauhan, P. “Analysis and evaluation of outlier detection algorithms in data streams.” In 2015 International Conference on Computer, Communication and Control (IC4) (pp. 1-8). IEEE.

15. P. Viswanath and R. Pinkesh, “L-DBSCAN: A fast hybrid density based clustering method,” Proc. - Int. Conf. Pattern Recognit., vol. 1, pp. 912–915, 2006, doi: 10.1109/ICPR.2006.741.

16. M. Shukla and Y. P. Kosta, “Empirical analysis and improvement of density based clustering algorithm in data streams,” Proc. Int. Conf. Inven. Comput. Technol. ICICT 2016, vol. 2016, pp. 1–4, 2016, doi: 10.1109/INVENTIVE.2016.7830200.

17. P. Pei, D. Zhang, and F. Guo, “A Density-based Clustering Algorithm Using Adaptive Parameter K-Reverse Nearest Neighbor,” 2019 IEEE Int. Conf. Power, Intell. Comput. Syst. ICPICS 2019, pp. 455–458, 2019.

18. C. Ruiz, E. Menasalvas, and M. Spiliopoulou, “C-DenStream: Using domain knowledge on a data stream,” in Proceedings of the 12thInternational Conference on Discovery Science, ser. DS ’09. Berlin, Heidelberg: Springer-Verlag, 2009, pp. 287–301.

19. Shukla, M., Y. P. Kosta, and M. Jayswal. "A modified approach of optics algorithm for data streams." Engineering, Technology & Applied Science Research 7.2 (2017): 1478-1481.

20. A. Amini and T. Y. Wah, “IMECS2011-Density Micro-Clustering Algorithms on Data Streams.pdf,” vol. I, pp. 14–18, 2011.

21. D. Barbará, “Requirements for clustering data streams,” ACM SIGKDD Explor. Newsl., vol. 3, no. 2, p. 23, 2002, doi: 10.1145/507515.507519.

22. A. Amini and T. Y. Wah, “Requirements for Clustering Evolving Data Stream,” 2nd Int. Conf. Soft Comput. its Appl., pp. 47–50, 2013.

23. S. Ding, F. Wu, J. Qian, H. Jia, and F. Jin, “Research on data stream clustering algorithms,” Artif. Intell. Rev., vol. 43, no. 4, pp. 593–600, 2015, doi: 10.1007/s10462-013-9398-7.

24. J. Tamboli and M. Shukla, "A survey of outlier detection algorithms for data streams," 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), 2016, pp. 3535-3540.

25. Lu, J. Feng, J. Zhang, P. Xia, and X. Xiao, “A Parallel Approach on Clustering Traffic Data Stream Based on the Density,” Proc. - 2018 6th Int. Conf. Adv. Cloud Big Data, CBD 2018, pp. 281–286, 2018, doi: 10.1109/CBD.2018.00057.

26. A. Amini, T. Y. Wah, and H. Saboohi, “Streams Clustering Algorithms ” J. Comput. Sci. Technol., vol. 29, no. 1, pp. 116–141, 2014, doi: 10.1007/s11390-013-1416-3.

27. C. Jia, C. Tan, and A. Yong, “A grid and density-based clustering algorithm for processing data stream,” Proc. - 2nd Int. Conf. Genet. Evol. Comput. WGEC 2008, pp. 517–521, 2008, doi: 10.1109/WGEC.2008.32.

28. A. Amini, T. Y. Wah, M. R. Saybani, and S. R. A. S. Yazdi, “A study of density-grid based clustering algorithms on data streams,” Proc. - 2011 8th Int. Conf. Fuzzy Syst. Knowl. Discov. FSKD 2011, vol. 3, pp. 1652–1656, 2011, doi: 10.1109/FSKD.2011.6019867.

29. C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, “A Framework for Clustering Evolving Data Streams - Proceedings of the 29th international conference.pdf,” {VLDB} 2003, Proc. 29th Int. Conf. Very Large Data Bases, Sept. 9-12, 2003, Berlin, Ger., pp. 81–92, 2003, [Online]. Available: http://www.vldb.org/conf/2003/papers/S04P02.pdf.

30. M. Khalilian and N. Mustapha, “Data stream clustering: Challenges and issues,” Proc. Int. MultiConference Eng. Comput. Sci. 2010, IMECS 2010, vol. I, pp. 566–569, 2010.

31. A. Kumar, A. Singh, and R. Singh, “An efficient hybrid-clustream algorithm for stream mining,” Proc. - 13th Int. Conf. Signal-Image Technol. Internet-Based Syst. SITIS 2017, vol. 2018-Janua, pp. 430–437, 2018, doi: 10.1109/SITIS.2017.77.

32. R. Mythily, A. Banu, and S. Raghunathan, “Clustering models for data stream mining,” Procedia Comput. Sci., vol. 46, no. Icict 2014, pp. 619–626, 2015, doi: 10.1016/j.procs.2015.02.107.

Data Availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Authors who publish this series agree to the following terms:

1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.

2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.

3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open Access Instruction).

Volume Title
Proceedings of the 4th International Conference on Computing and Data Science (CONF-CDS 2022)
ISBN (Print)
978-1-915371-19-5
ISBN (Online)
978-1-915371-20-1
Published Date
22 March 2023
Series
Applied and Computational Engineering
ISSN (Print)
2755-2721
ISSN (Online)
2755-273X
DOI
10.54254/2755-2721/2/20220605
Copyright
22 March 2023
Open Access
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited

Copyright © 2023 EWA Publishing. Unless Otherwise Stated