Table of Contents

R - Cluster Generation

About

How to generate cluster data.

To generate clustered data, the mean of random generated group of data is shifted.

Steps

Create data points

set.seed(101)
x=matrix(rnorm(100*2),100,2)

where:

 x[1:100,]
[,1]        [,2]
  [1,] -0.56843578  0.24912228
  [2,]  0.77859810 -0.16461954
  [3,] -0.15684682  0.37593032
  [4,] -1.81059190 -0.79511759
  [5,] -1.90281490 -0.13780093
  [6,]  2.33700231  1.88560945
  [7,] -0.46189692 -0.93481448
  [8,]  0.54721322  1.26122751
  ....................

plot(x,pch=19)

Random Data R

Assign randomly the points to one of the three clusters

which=sample(1:3,100,replace=TRUE)

where:

[1] 1 3 3 3 1 3 2 1 2 3 3 2 1 1 2 3 2 3 3 1 2 3 2 2 1 3 2 2 1 1 3 3 3 1 3 1 1 1 1 2 3 3 1 2 1 2 1 2 2 3 2 3 3 1
 [55] 1 2 1 1 2 2 3 2 2 1 1 3 2 3 3 2 1 3 3 1 3 3 3 3 1 2 2 3 1 3 3 3 1 2 3 3 2 1 2 1 1 3 2 1 3 3

plot(x,col=which,pch=19)

Random Data 3 Group R

Create 3 random points

xmean=matrix(rnorm(3*2,sd=4),3,2)

where:

[,1]        [,2]
[1,] -4.235016 -1.84473873
[2,]  1.632360 -0.03466352
[3,] -1.100477 -7.02588458

3 Random Points R

Shift the points toward the 3 points

xclusterd=x+xmean[which,]
plot(xclusterd,col=which,pch=19)

where:

Clustered Data Generated R