Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other clusters.
Let's look at two basic and widely used clustering algorithms. hierarchical and k-means clusterings.
Initially, each point is a cluster. Repeatedly combined the two nearest cluster into one.
It's called Hierarchical agglomerative clustering.
The main output of Hierarchical Clustering is a dendrogram, which shows the hierarchical relationship between the clusters.
Q 1 ) How should we represent cluster of more than one point?
Q 2 ) How should we determine the 'nearness' of clusters
Q 3 ) When should we stop combining clusters?
Before merging low cohesion clustering. Don't make bad clusters. The way to measure cohesion
It's too slow. The standard algorithm for hierarchical agglomerative clustering (HAC) has a time complexity of O(n^3) and requires O(n^2)memory, which makes it too slow for even medium data sets.
O(kn) for N points, k clusters. linear goods. But the number of rounds to convergence can be very large.
Two alogrithms are basic and essential. There are many optimization techniques in dealing with data in the real world. However, many optimization techniques have often drived from these two basic algorithms. Therefore, these two concepts can be important as a foundation.