A New Approach to Determine Eps Parameter of DBSCAN Algorithm

Fatma Ozge Ozkok, Mete Celik


In recent years, data analysis has become important with increasing data volume. Clustering, which groups objects according to their similarity, has an important role in data analysis. DBSCAN is one of the most effective and popular density-based clustering algorithm and has been successfully implemented in many areas. However, it is a challenging task to determine the input parameter values of DBSCAN algorithm, which are neighborhood radius, Eps, and minimum number of points, MinPts. The values of these parameters significantly affect clustering performance of the algorithm. In this study, we propose AE-DBSCAN algorithm, which includes a new method to determine the value of neighborhood radius Eps automatically. The experimental evaluations showed that the proposed method outperformed the analytical DBSCAN.


AE-DBSCAN; Clustering; Data Mining; Density-Based Clustering

Full Text:

Submitted: 2017-08-25 14:51:35
Published: 2017-12-12 13:20:45
Search for citations in Google Scholar
Related articles: Google Scholar


M. Ester, H.-P. Kriegel, and X. Xu "A density-based algorithm for discovering clusters in large spatial databases with noise," in Proc. KDD, Oregon, USA, 1996, pp. 226-231.

X. P. Yu, D. Zhou, and Y. Zhou, “A New Clustering Algorithm Based on Distance and Density,” in Proc. ICSSSM, Chongquing, China, 2005, pp. 1016-1021.

S. K. Popat, and M. Emmanuel, "Review and Comparative Study of Clustering Techniques," Int. J. of Computer Science and Information Technologies, vol. 5, no.1, pp. 805–12, 2014.

P. Liu, D. Zhou, and N. J. Wu,“VDBSCAN: Varied density based spatial clustering of applications with noise,” in Proc. ICSSSM, Chengdu, China, 2007, pp 1-4.

K. Khan, S. U. Rehman, K. Aziz, S. Fong and S. Sarasvady, "DBSCAN: Past, present and future." in Proc. ICADIWT, Bangalore, India, 2014, pp. 232-238.

A. Ram, S. Jalal, A. S. Jalal, and M. Kumar "A density based algorithm for discovering density varied clusters in large spatial databases," Int. J. of Computer Applications, vol. 3, no. 6, pp. 1-4, 2010.

A.K. Jain, M.N. Murty, and P.J. Flynn, "Data Clustering: A Review," ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.

D. Birant and A. Kut, “ST-DBSCAN: An algorithm for clustering spatial-temporal data,” Data & Knowledge Engineering, vol. 60, no. 1, pp. 208–221, 2007.

M. Celik, F. Dadaser-Celik, and A. Dokuz, “Anomaly detection in temperature data using dbscan algorithm,” in Proc. INISTA, Istanbul, Turkey, 2011, pp. 91–95.

P. N. Tan, M. Steinbach, and V. Kumar, "Introduction to Data Mining," Boston Addison-Wesley, April 2005.

G. Sheikholeslami, S. Chatterjee, and A. Zhang, "Wave Cluster: A multi-resolution clustering approach for very large spatial databases," in Proc. VLDB, San Francisco, CA, 1998, pp.428-439.

G. Sudipto, R. Rastogi and K. Shim, "CURE: An efficient clustering algorithm for large Databases," in Proc. ACM SIGMOD, Seattle, WA, 1998, pp.73-84.

T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases,” in Proc. ACM SIGMOD, 1996, pp. 103–114.

W. Wang, J. Yang, and R. R. Muntz, “STING: A statistical information grid approach to spatial data mining,” in Proc VLDB, San Francisco, CA, USA, 1997, pp. 186–195.

M. Halkidi, Y. Batistakis, and M. Varzirgiannis, “On clustering validation techniques,” J. of Intelligent Information Systems, vol. 17, no. 2-3, pp. 107–145, 2001.

Karypis, G., Han, E.H., and Kumar, V.: “Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling,” IEEE Computer, vol. 32, no. 8, pp 68-75, August 1999.

Z. Chen and Y. F. Li, "Anomaly detection based on enhanced dbscan algorithm", Procedia Engineering, vol. 15, pp. 178-182, 2011.

H. Zhou, P. Want, and H. Li, "Research on adaptive parameters determination in DBSCAN algorithm," J. of Information & Computational Science, vol. 9, no. 7, pp. 1967-1973, 2012.

A. R. Chowdhury, M. E. Mollah, and M. A. Rahman, "An efficient method for subjectively choosing parameter k automatically in VDBSCAN (varied density based spatial clustering of applications with noise) algorithm," in Proc. ICCAE, Singapore, 2010, pp. 38-41.

M. Daszykowski, B. Walczak, and D. L. Massart, "Looking for Natural Patterns in Data. Part 1: Density Based Approach," Chemometrics and Intelligent Laboratory Systems, vol. 56, no. 2, pp. 83-92, 2001.

Clustering datasets, Available: http://cs.uef.fi/sipu/datasets/. Accessed on: April 23, 2017.

Abstract views:


Copyright (c) 2017 International Journal of Intelligent Systems and Applications in Engineering

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
© AtScience 2013-2018     -     AtScience is a registered trademark property of AtScience.