K parametric clustering algorithms vs Nonparametric clustering algorithms

Examples of K-parametric clustering algorithms:

  • K-Means: sum of squared Euclidean distance is the objective function ; K-Means performs a heuristic optimization to minimize an objective function: the within-cluster sum of squares (WCSS), also known as inertia.
  • K-Medoids (PAM)
  • Gaussian Mixture Models (GMMs) – though they model distributions, you still need to define k
  • Spectral Clustering – often requires k for the number of eigenvectors/clusters
  • 1D K-means with dynamic programming : within cluster sum of square is the objective function just as k means (https://medium.com/@andreys95/optimal-1d-k-means-with-dynamic-programming-4d6ff57b6244) Its applications are 

    1. Image and Signal Processing

    • Edge detection in 1D signals: Identifying abrupt changes in intensity, such as in electrocardiogram (ECG) signals or sound waveforms.
    • Grayscale image analysis: Clustering pixel intensities for thresholding and segmentation (e.g., Otsu’s method).

    2. Anomaly Detection

    • Outlier detection: Identifying unusual data points in a sequence such as in temperature logs, stock prices, or sensor readings.
    • Network intrusion detection: Anomalous traffic volumes or latencies can be flagged using 1D clustering.

    3. Finance and Economics

    • Price segmentation: Grouping stock prices, customer expenditures, or transaction amounts into clusters for analysis or marketing.
    • Economic indicator binning: Simplifying complex metrics like inflation rates or GDP growth into meaningful ranges.

    4. Healthcare and Medicine

    • Vital sign monitoring: Clustering heartbeat intervals, glucose levels, or other biometric time series to identify normal vs. abnormal ranges.
    • Dosage grouping: Categorizing drug dosages for different patient groups or treatment levels.

    5. Industrial and IoT Applications

    • Sensor data clustering: Classifying temperature, vibration, or pressure readings for predictive maintenance.
    • Energy usage analysis: Segmenting power consumption values to optimize resource distribution.

    6. Education and Testing

    • Score grading: Clustering test scores to assign grades or identify performance bands.
    • Learning analytics: Grouping students by time spent or attempts on a quiz for intervention strategies.

    7. Natural Language Processing (NLP)

    • Word length or frequency clustering: Used in stylometric analysis or feature engineering in text mining.

    8. Retail and Marketing

    • Customer segmentation: Based on a single metric like frequency of purchase or average order value.
    • Pricing strategy: Grouping products by their price points for tiered marketing approaches.

Examples of Nonparametric clustering algorithms:

  • DBSCAN – defines clusters based on density, not a fixed k
  • OPTICS – an extension of DBSCAN, good for varying densities
  • Mean Shift – mode-seeking algorithm that finds clusters around data density peaks
  • Hierarchical Clustering – builds a dendrogram that can be cut at any level to form clusters
=========
K-Means++ is a smarter way to initialize centroids for the K-Means algorithm. It improves both the accuracy and stability of clustering by reducing the chance of poor local minima. Standard K-Means randomly picks initial centroids, which can Lead to bad clusterings (poor local optima) & Require multiple restarts to get good results. K-Means++ Initialization Steps:
  1. Randomly select the first centroid \mu_1 from the dataset.
  2. For each data point x, compute the squared distance D(x)^2 to the nearest already chosen centroid.
  3. Select the next centroid with probability:
    P(x) = \frac{D(x)^2}{\sum_{x’ \in X} D(x’)^2}
    → This favors points far from existing centroids.
  4. Repeat steps 2–3 until k centroids are chosen.
  5. Run standard K-Means using these initialized centroids.

Example:

  • You’ve selected 1 centroid: \mu_1
  • You have 5 data points with distances to \mu_1:
    D(x_1)^2 = 1,\quad D(x_2)^2 = 4,\quad D(x_3)^2 = 9,\quad D(x_4)^2 = 16,\quad D(x_5)^2 = 0.25
  • Total = 1 + 4 + 9 + 16 + 0.25 = 30.25

Then the probability of picking x_4 as the next centroid is: P(x_4) = \frac{16}{30.25} \approx 0.529

Dr.Jiw

Author: admin

Leave a Reply

Your email address will not be published. Required fields are marked *