Data Clustering: K-means, MST based
Data Clustering is the process of assigning the objects in the data into groups or clusters in a way that the objects in the same cluster are more similar than those in other clusters.
A similarity measure is defined over the data to be clustered to calculate the proximity between pairs of objects. Then a clustering algorithm is chosen to perform the grouping of data. Based on the requirement the algorithm is either a hierarchical or partition based. Suitable algorithm needs to be chosen based on the type and size of data, hardware and software availability.
The applications of data clustering are to several exploratory pattern-analysis, decision-making, grouping tasks, machine-learning situations, including data mining, pattern classification, document retrieval and image segmentation.