I had an overview of clustering as a whole.
There are various clustering methods available, but two of the most commonly encountered are:
- Hierarchical Clustering:
- Agglomerative: This approach starts with individual data points and gradually combines them into larger clusters. The result is a hierarchical structure, often depicted as a dendrogram.
- Divisive: In contrast, divisive clustering begins with all data points grouped together and then progressively splits them into smaller clusters until individual data points are reached.
- Partitional Clustering:
- K-Means: K-Means is a widely used partitional clustering method that divides data into ‘k’ clusters, where ‘k’ is a parameter set by the user. It aims to minimize the distance between data points and the center (centroid) of their assigned cluster.
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise): DBSCAN identifies clusters based on the density of data points, forming clusters where data points are densely packed and also detecting noisy data.
- Gaussian Mixture Models (GMM): GMM assumes data points originate from a mixture of Gaussian distributions. It estimates the parameters of these distributions to find clusters.
- Fuzzy Clustering: Unlike traditional clustering, where each data point belongs exclusively to one cluster, fuzzy clustering allows data points to have partial membership in multiple clusters.
I believe hierarchical clustering would be quite beneficial in the project