Step 6 — K-means Clustering

Partitions genes into k co-regulated groups based on their expression profiles across samples / conditions.

Inputs

  • The expression matrix from Step 4 / 5.

Settings

  • k — number of clusters. Use the elbow + silhouette plot to pick.

  • Max iterations — default 300.

  • Random seed — deterministic by default (seed = 42), exposed for reproducibility.

Diagnostics

  • Elbow plot — within-cluster SSE vs k for k = 2 … 15.

  • Silhouette plot — mean silhouette score vs k.

Outputs

  • clusters/cluster_1.txtcluster_k.txt — gene lists per cluster.

  • cluster_centers.png — average expression profile per cluster.