CLV3W: Clustering around latent variables with Three-Way data

Veronique Cariou

2022-05-28

Clustering around latent variables in the scope of Three-Way data

The functions associated with CLV3W are dedicated to the clustering around latent variables in the context of Three-Way data. Such data are structured as three-way arrays and the purpose is to cluster the second mode corresponding to the various variables (see Wilderjans and Cariou, 2016; Cariou and Wilderjans, 2018).

library(ClustVarLV)
#library(clv3w)

For illustration, we consider the “coffee” dataset which corresponds to consumer emotions associated with a variety of coffee aromas (Cariou and Wilderjans, 2018).

data(coffee)
# 12 coffee aromas rated by 84 consumers on 15 emotion terms

Clustering of the consumers

The aim is to find groups of consumers. Herein “directional” groups are sought. Each group is associated with a latent component which makes it possible to identify the underlying sensory dimensions.

resclv3w_coffee<-CLV3W(coffee,mode.scale=2,NN=TRUE,moddendoinertie=TRUE,graph=TRUE,gmax=11,cp.rand=1)
# option NN=TRUE means that consumers within a group must be positively correlated with its latent component, otherwise its loading is set to 0 

# Print of the 'clv3W' object 
print(resclv3w_coffee)
# Dendrogram of the CLV3W hierarchical clustering algorithm :
plot(resclv3w_coffee,"dendrogram")

# Graph of the variation of the associated clustering criterion
plot(resclv3w_coffee,"delta")

The graph of the variation of the clustering criterion between a partition into K clusters and a partition into (K-1) clusters (after consolidation) is useful for determining the number of clusters to be retained. Because the criterion clearly jumps when passing from 3 to 2 groups, a partition into 2 groups is retained.

# Summary the CLV3W results for a partition into 2 groups
summary(resclv3w_coffee,K=2)
#> $size
#> clusters
#>  1  2 
#> 46 38 
#> 
#> $prop_explained_per_cluster
#> [1] 20.406 25.601
#> 
#> $prop_explained_total
#> [1] 22.756
#> 
#> $comp
#>                 Comp1  Comp2
#> Vanilla         0.320 -0.091
#> B.Rice          0.062  0.223
#> Lemon          -0.527 -0.428
#> Coffee.Flower  -0.440 -0.251
#> Cedar           0.082  0.391
#> Hazelnut        0.113 -0.248
#> Coriander.Seed -0.157 -0.006
#> Honey           0.354 -0.129
#> Medicine        0.252  0.425
#> Apricot        -0.361 -0.284
#> Earth           0.232  0.448
#> Hay             0.069 -0.050
#> 
#> $weight
#>               Comp1  Comp2
#> Calm         -0.208 -0.185
#> Nostalgic    -0.190 -0.221
#> Angry         0.217  0.222
#> Disgusted     0.375  0.372
#> Unique       -0.096 -0.122
#> Excited      -0.142 -0.207
#> Unpleasant    0.335  0.331
#> Disappointed  0.239  0.269
#> Free         -0.227 -0.192
#> Surprised    -0.002  0.015
#> Irritated     0.335  0.307
#> Energetic    -0.221 -0.263
#> Happy        -0.318 -0.320
#> Well         -0.377 -0.338
#> Amused       -0.280 -0.269
#> 
#> $groups
#> $groups$`Cluster 1`
#>    loading cor in.group cor next.group
#> 1    0.217        0.841          0.471
#> 2    0.078        0.619          0.318
#> 3    0.145        0.586          0.126
#> 4    0.032        0.210          0.184
#> 6    0.127        0.607          0.416
#> 8    0.078        0.361          0.166
#> 9    0.069        0.523         -0.076
#> 11   0.000       -0.464         -0.070
#> 12   0.126        0.654          0.204
#> 14   0.151        0.698          0.323
#> 15   0.082        0.340          0.167
#> 16   0.137        0.631          0.617
#> 18   0.234        0.820          0.477
#> 19   0.088        0.370         -0.060
#> 20   0.122        0.522          0.404
#> 22   0.142        0.643          0.366
#> 24   0.209        0.815          0.704
#> 25   0.167        0.733          0.163
#> 26   0.222        0.795          0.269
#> 30   0.211        0.909          0.484
#> 33   0.193        0.790          0.675
#> 34   0.159        0.727          0.330
#> 36   0.156        0.673          0.489
#> 38   0.169        0.762          0.383
#> 40   0.109        0.599          0.283
#> 43   0.053        0.212          0.089
#> 47   0.073        0.409          0.267
#> 48   0.124        0.552          0.167
#> 49   0.083        0.394         -0.108
#> 54   0.136        0.689          0.340
#> 56   0.214        0.833          0.786
#> 58   0.151        0.580          0.451
#> 59   0.225        0.872          0.548
#> 61   0.130        0.717          0.539
#> 62   0.154        0.688          0.425
#> 63   0.161        0.650          0.137
#> 65   0.130        0.551          0.530
#> 66   0.127        0.488          0.252
#> 67   0.159        0.720          0.375
#> 71   0.116        0.618          0.602
#> 73   0.136        0.605          0.089
#> 74   0.110        0.661          0.489
#> 79   0.226        0.855          0.657
#> 80   0.149        0.665          0.552
#> 82   0.178        0.747          0.632
#> 84   0.000       -0.611         -0.279
#> 
#> $groups$`Cluster 2`
#>    loading cor in.group cor next.group
#> 5    0.129        0.551         -0.028
#> 7    0.139        0.603          0.145
#> 10   0.147        0.553          0.276
#> 13   0.160        0.844          0.523
#> 17   0.158        0.621          0.606
#> 21   0.116        0.651          0.118
#> 23   0.059        0.242         -0.069
#> 27   0.123        0.646          0.535
#> 28   0.196        0.839          0.445
#> 29   0.136        0.592          0.439
#> 31   0.147        0.676          0.270
#> 32   0.137        0.562          0.263
#> 35   0.224        0.896          0.639
#> 37   0.113        0.523          0.387
#> 39   0.137        0.598          0.371
#> 41   0.152        0.657          0.530
#> 42   0.062        0.241          0.013
#> 44   0.185        0.681          0.513
#> 45   0.124        0.628          0.245
#> 46   0.204        0.885          0.485
#> 50   0.120        0.520          0.051
#> 51   0.166        0.836          0.688
#> 52   0.251        0.938          0.429
#> 53   0.231        0.921          0.521
#> 55   0.133        0.525          0.238
#> 57   0.173        0.790          0.345
#> 60   0.155        0.701          0.551
#> 64   0.181        0.804          0.412
#> 68   0.145        0.749          0.552
#> 69   0.164        0.754          0.452
#> 70   0.184        0.719          0.567
#> 72   0.199        0.872          0.441
#> 75   0.165        0.767          0.239
#> 76   0.204        0.862          0.618
#> 77   0.192        0.778          0.724
#> 78   0.112        0.546          0.501
#> 81   0.208        0.806          0.304
#> 83   0.133        0.671          0.485
#> 
#> 
#> $cormatrix
#>        Comp.1 Comp.2
#> Comp.1  1.000  0.591
#> Comp.2  0.591  1.000

The function plot_var.clv3w() allows us to describe the groups of variables into a two dimensional space obtained by Candecomp Parafac. Several options are available for the choice of the axes, for adding labels, producing a plot without colours but symbols, having only one plot or a plot by groups of variables. When mode3 is set to TRUE, an additional plot is displayed corresponding to the coordinates of the mode1 elements on the global scores of Parafac together with a projection of the mode 3 elements on it.

# Representation of the group membership for a partition into 4 groups
plot_var.clv3w(resclv3w_coffee,K=2,labels=TRUE,cex.lab=0.8,beside=TRUE,mode3=FALSE)

or

plot_var.clv3w(resclv3w_coffee,K=2,labels=TRUE,cex.lab=0.8,beside=FALSE,mode3=TRUE)

Additional functions :

# Extract the group membership of each variable
get_partition(resclv3w_coffee,2)

# Extract the group latent variables 
get_comp(resclv3w_coffee,2)

# Extract the vector of loadings of the variables 
get_loading(resclv3w_coffee,2)

# Extract the vector of weights associated with mode3
get_weight(resclv3w_coffee,2)

Using the CLV3W_kmeans function

This procedure is less time consuming with a large number of variables (mode2). The number of clusters needs to be fixed (e.g.2).

The initialization of the algorithm can be made at random, “init” times, while the number of starts associated with Candecomp Parafac is set up with cp.rand :

res.clv3wkm.rd<-CLV3W_kmeans(coffee,2,mode.scale=2,NN=TRUE,init=20,cp.rand=2)

It is possible to compare the partitions according to the procedure used :

table(get_partition(resclv3w_coffee,K=2),get_partition(res.clv3wkm.rd,K=2)) 
#>    
#>      1  2
#>   1 41  5
#>   2  0 38

References

Cariou, V., & Wilderjans, T. F. (2018). Consumer segmentation in multi-attribute product evaluation by means of non-negatively constrained CLV3W. Food Quality and Preference, 67, 18-26.

Wilderjans, T. F., & Cariou, V. (2016). CLV3W: A clustering around latent variables approach to detect panel disagreement in three-way conventional sensory profiling data. Food Quality and Preference, 47, 45-53.