Title: | Cluster-of-Clusters Analysis |
---|---|
Description: | Contains the R functions needed to perform Cluster-Of-Clusters Analysis (COCA) and Consensus Clustering (CC). For further details please see Cabassi and Kirk (2020) <doi:10.1093/bioinformatics/btaa593>. |
Authors: | Alessandra Cabassi [aut, cre]
|
Maintainer: | Alessandra Cabassi <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.0 |
Built: | 2025-02-11 03:27:56 UTC |
Source: | https://github.com/acabassi/coca |
This function creates a matrix of clusters starting from a list of heterogeneous datasets.
buildMOC( data, M, K = NULL, maxK = 10, methods = "hclust", distances = "euclidean", fill = FALSE, computeAccuracy = FALSE, fullData = FALSE, savePNG = FALSE, fileName = "buildMOC", widestGap = FALSE, dunns = FALSE, dunn2s = FALSE )
buildMOC( data, M, K = NULL, maxK = 10, methods = "hclust", distances = "euclidean", fill = FALSE, computeAccuracy = FALSE, fullData = FALSE, savePNG = FALSE, fileName = "buildMOC", widestGap = FALSE, dunns = FALSE, dunn2s = FALSE )
data |
List of M datasets, each of size N X P_m, where m = 1, ..., M. |
M |
Number of datasets. |
K |
Vector containing the number of clusters in each dataset. If given an integer instead of a vector it is assumed that each dataset has the same number of clusters. If NULL, it is assumed that the true cluster numbers are not known, therefore they will be estimated using the silhouette method. |
maxK |
Vector of maximum cluster numbers to be considered for each dataset if K is NULL. If given an integer instead of a vector it is assumed that for each dataset the same maximum number of clusters must be considered. Default is 10. |
methods |
Vector of strings containing the names of the clustering methods to be used to cluster the observations in each dataset. Each can be "kmeans" (k-means clustering), "hclust" (hierarchical clustering), or "pam" (partitioning around medoids). If the vector is of length one, the same clustering method is applied to all the datasets. Default is "hclust". |
distances |
Distances to be used in the clustering step for each dataset. If only one string is provided, then the same distance is used for all datasets. If the number of strings provided is the same as the number of datasets, then each distance will be used for the corresponding dataset. Default is "euclidean". Please note that not all distances are compatible with all clustering methods. "euclidean" and "manhattan" work with all available clustering algorithms. "gower" distance is only available for partitioning around medoids. In addition, "maximum", "canberra", "binary" or "minkowski" are available for k-means and hierarchical clustering. |
fill |
Boolean. If TRUE, if there are any missing observations in one or more datasets, the corresponding cluster labels will be estimated through generalised linear models on the basis of the available labels. |
computeAccuracy |
Boolean. If TRUE, for each missing element, the performance of the predictive model used to estimate the corresponding missing label is computer. |
fullData |
Boolean. If TRUE, the full data matrices are used to estimate the missing cluster labels (instead of just using the cluster labels of the corresponding datasets). |
savePNG |
Boolean. If TRUE, plots of the silhouette for each datasets are saved as png files. Default is FALSE. |
fileName |
If |
widestGap |
Boolean. If TRUE, compute also widest gap index to choose best number of clusters. Default is FALSE. |
dunns |
Boolean. If TRUE, compute also Dunn's index to choose best number of clusters. Default is FALSE. |
dunn2s |
Boolean. If TRUE, compute also alternative Dunn's index to choose best number of clusters. Default is FALSE. |
This function returns a list containing:
moc |
the Matrix-Of-Clusters, a binary matrix of size N x sum(K) where element (n,k) contains a 1 if observation n belongs to the corresponding cluster, 0 otherwise. |
datasetIndicator |
a vector of length sum(K) in which each element is the number of the dataset to which the cluster belongs. |
number_nas |
the total number of NAs in the matrix of clusters. (If the
MOC has been filled with imputed values, |
clLabels |
a matrix that is equivalent to the matrix of clusters, but is in compact form, i.e. each column corresponds to a dataset, each row represents an observation, and its values indicate the cluster labels. |
K |
vector of cluster numbers in each dataset. If these are provided as input, this is the same as the input (expanded to a vector if the input is an integer). If the cluster numbers are not provided as input, this vector contains the cluster numbers chosen via silhouette for each dataset. |
Alessandra Cabassi [email protected]
The Cancer Genome Atlas, 2012. Comprehensive molecular portraits of human breast tumours. Nature, 487(7407), pp.61–70.
Rousseeuw, P.J., 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20, pp.53-65.
# Load data data <- list() data[[1]] <- as.matrix(read.csv(system.file("extdata", "dataset1.csv", package = "coca"), row.names = 1)) data[[2]] <- as.matrix(read.csv(system.file("extdata", "dataset2.csv", package = "coca"), row.names = 1)) data[[3]] <- as.matrix(read.csv(system.file("extdata", "dataset3.csv", package = "coca"), row.names = 1)) # Build matrix of clusters outputBuildMOC <- buildMOC(data, M = 3, K = 6, distances = "cor") # Extract matrix of clusters matrixOfClusters <- outputBuildMOC$moc
# Load data data <- list() data[[1]] <- as.matrix(read.csv(system.file("extdata", "dataset1.csv", package = "coca"), row.names = 1)) data[[2]] <- as.matrix(read.csv(system.file("extdata", "dataset2.csv", package = "coca"), row.names = 1)) data[[3]] <- as.matrix(read.csv(system.file("extdata", "dataset3.csv", package = "coca"), row.names = 1)) # Build matrix of clusters outputBuildMOC <- buildMOC(data, M = 3, K = 6, distances = "cor") # Extract matrix of clusters matrixOfClusters <- outputBuildMOC$moc
This function allows to choose the number of clusters in a dataset based on the area under the curve of the empirical distribution function of a consensus matrix, calculated for different (consecutive) cluster numbers, as explained in the article by Monti et al. (2003), Section 3.3.1.
chooseKusingAUC(areaUnderTheCurve, savePNG = FALSE, fileName = "deltaAUC.png")
chooseKusingAUC(areaUnderTheCurve, savePNG = FALSE, fileName = "deltaAUC.png")
areaUnderTheCurve |
Vector of length maxK-1 containing the area under the curve of the empirical distribution function of the consensus matrices obtained with K varying from 2 to maxK. |
savePNG |
Boolean. If TRUE, a plot of the area under the curve for each value of K is saved as a png file. The file is saved in a subdirectory of the working directory, called "delta-auc". Default is FALSE. |
fileName |
If |
This function returns a list containing:
deltaAUC |
a vector of length maxK-1 where element i is the area under the curve for K = i+1 minus the area under the curve for K = i (for i = 2 this is simply the area under the curve for K = i) |
K |
the lowest among the values of K that are chosen by the algorithm. |
Alessandra Cabassi [email protected]
Monti, S., Tamayo, P., Mesirov, J. and Golub, T., 2003. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine learning, 52(1-2), pp.91-118.
# Assuming that we want to choose among any value of K (number of clusters) # between 2 and 10 and that the area under the curve is as follows: areaUnderTheCurve <- c(0.05, 0.15, 0.4, 0.5, 0.55, 0.56, 0.57, 0.58, 0.59) # The optimal value of K can be chosen with: K <- chooseKusingAUC(areaUnderTheCurve)$K
# Assuming that we want to choose among any value of K (number of clusters) # between 2 and 10 and that the area under the curve is as follows: areaUnderTheCurve <- c(0.05, 0.15, 0.4, 0.5, 0.55, 0.56, 0.57, 0.58, 0.59) # The optimal value of K can be chosen with: K <- chooseKusingAUC(areaUnderTheCurve)$K
This function allows to do Cluster-Of-Clusters Analysis on a binary matrix where each column is a clustering of the data, each row corresponds to a data point and the element in position (i,j) is equal to 1 if data point i belongs to cluster j, 0 otherwise.
coca( moc, K = NULL, maxK = 6, B = 1000, pItem = 0.8, hclustMethod = "average", choiceKmethod = "silhouette", ccClMethod = "kmeans", ccDistHC = "euclidean", maxIterKM = 1000, savePNG = FALSE, fileName = "coca", verbose = FALSE, widestGap = FALSE, dunns = FALSE, dunn2s = FALSE, returnAllMatrices = FALSE )
coca( moc, K = NULL, maxK = 6, B = 1000, pItem = 0.8, hclustMethod = "average", choiceKmethod = "silhouette", ccClMethod = "kmeans", ccDistHC = "euclidean", maxIterKM = 1000, savePNG = FALSE, fileName = "coca", verbose = FALSE, widestGap = FALSE, dunns = FALSE, dunn2s = FALSE, returnAllMatrices = FALSE )
moc |
N X C data matrix, where C is the total number of clusters considered. |
K |
Number of clusters. |
maxK |
Maximum number of clusters considered for the final clustering if K is not known. Default is 6. |
B |
Number of iterations of the Consensus Clustering step. |
pItem |
Proportion of items sampled at each iteration of the Consensus Cluster step. |
hclustMethod |
Agglomeration method to be used by the hclust function to perform hierarchical clustering on the consensus matrix. Can be "single", "complete", "average", etc. For more details please see ?stats::hclust. |
choiceKmethod |
Method used to choose the number of clusters if K is NULL, can be either "AUC" (area under the curve, work in progress) or "silhouette". Default is "silhouette". |
ccClMethod |
Clustering method to be used by the Consensus Clustering algorithm (CC). Can be either "kmeans" for k-means clustering or "hclust" for hiearchical clustering. Default is "kmeans". |
ccDistHC |
Distance to be used by the hiearchical clustering algorithm inside CC. Can be "pearson" (for 1 - Pearson correlation), "spearman" (for 1- Spearman correlation), or any of the distances provided in stats::dist() (i.e. "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski"). Default is "euclidean". |
maxIterKM |
Number of iterations for the k-means clustering algorithm. Default is 1000. |
savePNG |
Boolean. Save plots as PNG files. Default is FALSE. |
fileName |
If |
verbose |
Boolean. |
widestGap |
Boolean. If TRUE, compute also widest gap index to choose best number of clusters. Default is FALSE. |
dunns |
Boolean. If TRUE, compute also Dunn's index to choose best number of clusters. Default is FALSE. |
dunn2s |
Boolean. If TRUE, compute also alternative Dunn's index to choose best number of clusters. Default is FALSE. |
returnAllMatrices |
Boolean. If TRUE, return consensus matrices for all considered values of K. Default is FALSE. |
This function returns a list containing:
consensusMatrix |
a symmetric matrix where the element in position (i,j) corresponds to the proportion of times that items i and j have been clustered together and a vector of cluster labels. |
clusterLabels |
the final cluster labels. |
K |
the final number of clusters. If provided by the user, this is
the same as the input. Otherwise, this is the number of clusters selected via
the requested method (see argument |
consensusMatrices |
if returnAllMatrices = TRUE, this array also returned, containing the consensus matrices obtained for each of the numbers of clusters considered by the algorithm. |
Alessandra Cabassi [email protected]
The Cancer Genome Atlas, 2012. Comprehensive molecular portraits of human breast tumours. Nature, 487(7407), pp.61–70.
Cabassi, A. and Kirk, P. D. W. (2019). Multiple kernel learning for integrative consensus clustering of 'omic datasets. arXiv preprint. arXiv:1904.07701.
# Load data data <- list() data[[1]] <- as.matrix(read.csv(system.file("extdata", "dataset1.csv", package = "coca"), row.names = 1)) data[[2]] <- as.matrix(read.csv(system.file("extdata", "dataset2.csv", package = "coca"), row.names = 1)) data[[3]] <- as.matrix(read.csv(system.file("extdata", "dataset3.csv", package = "coca"), row.names = 1)) # Build matrix of clusters outputBuildMOC <- buildMOC(data, M = 3, K = 5, distances = "cor") # Extract matrix of clusters moc <- outputBuildMOC$moc # Do Cluster-Of-Clusters Analysis outputCOCA <- coca(moc, K = 5) # Extract cluster labels clusterLabels <- outputCOCA$clusterLabels
# Load data data <- list() data[[1]] <- as.matrix(read.csv(system.file("extdata", "dataset1.csv", package = "coca"), row.names = 1)) data[[2]] <- as.matrix(read.csv(system.file("extdata", "dataset2.csv", package = "coca"), row.names = 1)) data[[3]] <- as.matrix(read.csv(system.file("extdata", "dataset3.csv", package = "coca"), row.names = 1)) # Build matrix of clusters outputBuildMOC <- buildMOC(data, M = 3, K = 5, distances = "cor") # Extract matrix of clusters moc <- outputBuildMOC$moc # Do Cluster-Of-Clusters Analysis outputCOCA <- coca(moc, K = 5) # Extract cluster labels clusterLabels <- outputCOCA$clusterLabels
This function allows to perform consensus clustering using the k-means clustering algorithm, for a fixed number of clusters. We consider the number of clusters K to be fixed.
consensusCluster( data = NULL, K = 2, B = 100, pItem = 0.8, clMethod = "hclust", dist = "euclidean", hclustMethod = "average", sparseKmeansPenalty = NULL, maxIterKM = 1000 )
consensusCluster( data = NULL, K = 2, B = 100, pItem = 0.8, clMethod = "hclust", dist = "euclidean", hclustMethod = "average", sparseKmeansPenalty = NULL, maxIterKM = 1000 )
data |
N X P data matrix |
K |
Number of clusters. |
B |
Number of iterations. |
pItem |
Proportion of items sampled at each iteration. |
clMethod |
Clustering algorithm. Can be "hclust" for hierarchical clustering, "kmeans" for k-means clustering, "pam" for partitioning around medoids, "sparse-kmeans" for sparse k-means clustering or "sparse-hclust" for sparse hierarchical clustering. Default is "hclust". However, if the data contain at least one covariate that is a factor, the default clustering algorithm is "pam". |
dist |
Distance used for hierarchical clustering. Can be "pearson" (for 1 - Pearson correlation), "spearman" (for 1- Spearman correlation), any of the distances provided in stats::dist() (i.e. "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski"), or a matrix containing the distances between the observations. |
hclustMethod |
Hierarchical clustering method. Default is "average". For
more details see |
sparseKmeansPenalty |
If the selected clustering method is "sparse-kmeans", this is the value of the parameter "wbounds" of the "KMeansSparseCluster" function. The default value is the square root of the number of variables. |
maxIterKM |
Number of iterations for the k-means clustering algorithm. |
The output is a consensus matrix, that is a symmetric matrix where the element in position (i,j) corresponds to the proportion of times that items i and j have been clustered together.
Alessandra Cabassi [email protected]
Monti, S., Tamayo, P., Mesirov, J. and Golub, T., 2003. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine learning, 52(1-2), pp.91-118.
Witten, D.M. and Tibshirani, R., 2010. A framework for feature selection in clustering. Journal of the American Statistical Association, 105(490), pp.713-726.
# Load one dataset with 300 observations, 2 variables, 6 clusters data <- as.matrix(read.csv(system.file("extdata", "dataset1.csv", package = "coca"), row.names = 1)) # Compute consensus clustering with K=5 clusters cm <- consensusCluster(data, K = 5)
# Load one dataset with 300 observations, 2 variables, 6 clusters data <- as.matrix(read.csv(system.file("extdata", "dataset1.csv", package = "coca"), row.names = 1)) # Compute consensus clustering with K=5 clusters cm <- consensusCluster(data, K = 5)
Expand matrix of cluster labels into matrix of clusters
expandMOC(clLabels, datasetNames = NULL)
expandMOC(clLabels, datasetNames = NULL)
clLabels |
Matrix of cluster labels of size N x M. |
datasetNames |
Vector of cluster names of length M. Default is NULL. |
The output is a list containing:
moc |
the matrix of clusters. |
datasetIndicator |
a vector containing the dataset indicator. |
datasetNames |
an expanded vector of dataset names for the moc. |
Alessandra Cabassi [email protected]
# Load data data <- list() data[[1]] <- as.matrix(read.csv(system.file("extdata", "dataset1.csv", package = "coca"), row.names = 1)) data[[2]] <- as.matrix(read.csv(system.file("extdata", "dataset2.csv", package = "coca"), row.names = 1)) data[[3]] <- as.matrix(read.csv(system.file("extdata", "dataset3.csv", package = "coca"), row.names = 1)) # Build matrix of clusters outputBuildMOC <- buildMOC(data, M = 3, K = 6, distances = "cor") # Extract matrix of clusters clLabels <- outputBuildMOC$clLabels # Impute missing values outputFillMOC <- fillMOC(clLabels, data = data) # Replace matrix of cluster labels with new (full) one clLabels <- outputFillMOC$fullClLabels # Expand matrix of cluster labels into matrix of clusters outputExpandMOC <- expandMOC(clLabels) clLabels <- outputExpandMOC$clLabels
# Load data data <- list() data[[1]] <- as.matrix(read.csv(system.file("extdata", "dataset1.csv", package = "coca"), row.names = 1)) data[[2]] <- as.matrix(read.csv(system.file("extdata", "dataset2.csv", package = "coca"), row.names = 1)) data[[3]] <- as.matrix(read.csv(system.file("extdata", "dataset3.csv", package = "coca"), row.names = 1)) # Build matrix of clusters outputBuildMOC <- buildMOC(data, M = 3, K = 6, distances = "cor") # Extract matrix of clusters clLabels <- outputBuildMOC$clLabels # Impute missing values outputFillMOC <- fillMOC(clLabels, data = data) # Replace matrix of cluster labels with new (full) one clLabels <- outputFillMOC$fullClLabels # Expand matrix of cluster labels into matrix of clusters outputExpandMOC <- expandMOC(clLabels) clLabels <- outputExpandMOC$clLabels
This function fills in a matrix of clusters that contains NAs, by estimating the missing cluster labels based on the available ones or based on the other datasets. The predictive accuracy of this method can also be estimated via cross-validation.
fillMOC(clLabels, data, computeAccuracy = FALSE, verbose = FALSE)
fillMOC(clLabels, data, computeAccuracy = FALSE, verbose = FALSE)
clLabels |
N X M matrix containing cluster labels. Element (n,m) contains the cluster label for element data point n in cluster m. |
data |
List of M datasets to be used for the label imputation. |
computeAccuracy |
Boolean. If TRUE, for each missing element, the performance of the predictive model used to estimate the corresponding missing label is computer. Default is FALSE. |
verbose |
Boolean. If TRUE, for each NA, the size of the matrix used to estimate its values is printed to screen. Default is FALSE. |
The output is a list containing:
fullClLabels |
the same matrix of clusters as the input matrix
|
nRows |
matrix where the item in position (i,j) indicates the
number of observations used in the predictive model used to estimate the
corresponding missing label in the |
nColumns |
matrix where the item in position (i,j) indicates the
number of covariates used in the predictive model used to
estimate the corresponding missing label in the |
accuracy |
a matrix where each element
corresponds to the predictive accuracy of the predictive model used to
estimate the corresponding label in the cluster label matrix. This is only
returned if the argument |
accuracy_random |
This is computed in the same way as |
Alessandra Cabassi [email protected]
The Cancer Genome Atlas, 2012. Comprehensive molecular portraits of human breast tumours. Nature, 487(7407), pp.61–70.
# Load data data <- list() data[[1]] <- as.matrix(read.csv(system.file("extdata", "dataset1.csv", package = "coca"), row.names = 1)) data[[2]] <- as.matrix(read.csv(system.file("extdata", "dataset2.csv", package = "coca"), row.names = 1)) data[[3]] <- as.matrix(read.csv(system.file("extdata", "dataset3.csv", package = "coca"), row.names = 1)) # Build matrix of clusters outputBuildMOC <- buildMOC(data, M = 3, K = 6, distances = "cor") # Extract matrix of clusters clLabels <- outputBuildMOC$clLabels # Impute missing values using full datasets outputFillMOC <- fillMOC(clLabels, data) # Extract full matrix of cluster labels clLabels2 <- outputFillMOC$fullClLabels
# Load data data <- list() data[[1]] <- as.matrix(read.csv(system.file("extdata", "dataset1.csv", package = "coca"), row.names = 1)) data[[2]] <- as.matrix(read.csv(system.file("extdata", "dataset2.csv", package = "coca"), row.names = 1)) data[[3]] <- as.matrix(read.csv(system.file("extdata", "dataset3.csv", package = "coca"), row.names = 1)) # Build matrix of clusters outputBuildMOC <- buildMOC(data, M = 3, K = 6, distances = "cor") # Extract matrix of clusters clLabels <- outputBuildMOC$clLabels # Impute missing values using full datasets outputFillMOC <- fillMOC(clLabels, data) # Extract full matrix of cluster labels clLabels2 <- outputFillMOC$fullClLabels
Choose the number of clusters K that maximises the silhouette, starting from a set of kernel matrices each corresponding to a different choice of K and the corresponding clusterings of the data for each of those values of K.
maximiseSilhouette( kernelMatrix, clLabels, maxK, savePNG = FALSE, fileName = "silhouette", isDistance = FALSE, widestGap = FALSE, dunns = FALSE, dunn2s = FALSE )
maximiseSilhouette( kernelMatrix, clLabels, maxK, savePNG = FALSE, fileName = "silhouette", isDistance = FALSE, widestGap = FALSE, dunns = FALSE, dunn2s = FALSE )
kernelMatrix |
N X N X (maxK-1) array of kernel matrices. |
clLabels |
(maxK-1) X N matrix containing the clusterings obtained for different values of K. |
maxK |
Maximum number of clusters considered. |
savePNG |
If TRUE, a plot of the silhouette is saved in the working folder. Defaults to FALSE. |
fileName |
If |
isDistance |
Boolean. If TRUE, the kernel matrices are interpreted as matrices of distances, otherwise as matrices of similarities. |
widestGap |
Boolean. If TRUE, also computes widest gap index (and plots
it if |
dunns |
Boolean. If TRUE, also computes Dunn's index: minimum separation
/ maximum diameter (and plots it if |
dunn2s |
Boolean. If TRUE, also computes an alternative version
of Dunn's index: minimum average dissimilarity between two cluster / maximum
average within cluster dissimilarity (and plots it if |
The function returns a list containing:
silh |
a vector of length |
K |
the lowest number of clusters for which the silhouette is maximised. |
Alessandra Cabassi [email protected]
This function creates a matrix of clusters, starting from a list of heterogeneous datasets.
plotMOC( moc, datasetIndicator, datasetNames = NULL, annotations = NULL, clr = FALSE, clc = FALSE, savePNG = FALSE, fileName = "moc.png", showObsNames = FALSE, showClusterNames = FALSE, annotation_colors = NA )
plotMOC( moc, datasetIndicator, datasetNames = NULL, annotations = NULL, clr = FALSE, clc = FALSE, savePNG = FALSE, fileName = "moc.png", showObsNames = FALSE, showClusterNames = FALSE, annotation_colors = NA )
moc |
Matrix-Of-Clusters of size N x sumK. |
datasetIndicator |
Vector containing integers indicating which rows correspond to some clustering of the same dataset. |
datasetNames |
Vector containing the names of the datasets to which each column of labels corresponds. If NULL, datasetNames will be the same as datasetIndicator. Default is NULL. |
annotations |
Dataframe containing annotations. Number of rows must be
N. If the annotations are integers, use |
clr |
Cluster rows. Default is FALSE. |
clc |
Cluster columns. Default is FALSE. |
savePNG |
Boolean. If TRUE, plot is saved as a png file. |
fileName |
If |
showObsNames |
Boolean. If TRUE, the plot will also include the column names (i.e. name of each observation). Default is FALSE, since there are usually too many columns. |
showClusterNames |
Boolean. If TRUE, plot cluster names next to corresponding row. Default is FALSE. |
annotation_colors |
Optional. See annotation_colors in pheatmap::pheatmap. |
Alessandra Cabassi [email protected]
The Cancer Genome Atlas, 2012. Comprehensive molecular portraits of human breast tumours. Nature, 487(7407), pp.61–70.
# Load data data <- list() data[[1]] <- as.matrix(read.csv(system.file("extdata", "dataset1.csv", package = "coca"), row.names = 1)) data[[2]] <- as.matrix(read.csv(system.file("extdata", "dataset2.csv", package = "coca"), row.names = 1)) data[[3]] <- as.matrix(read.csv(system.file("extdata", "dataset3.csv", package = "coca"), row.names = 1)) # Create vector of dataset names, in the same order as they appear above datasetNames <- c("Dataset1", "Dataset2", "Dataset3") # Build matrix of clusters outputBuildMOC <- buildMOC(data, M = 3, K = 6, distances = "cor") # Extract matrix of clusters and dataset indicator vector moc <- outputBuildMOC$moc datasetIndicator <- outputBuildMOC$datasetIndicator # Prepare annotations true_labels <- as.matrix(read.csv( system.file("extdata", "cluster_labels.csv", package = "coca"), row.names = 1)) annotations <- data.frame(true_labels = as.factor(true_labels)) # Plot matrix of clusters plotMOC(moc, datasetIndicator, datasetNames = datasetNames, annotations = annotations)
# Load data data <- list() data[[1]] <- as.matrix(read.csv(system.file("extdata", "dataset1.csv", package = "coca"), row.names = 1)) data[[2]] <- as.matrix(read.csv(system.file("extdata", "dataset2.csv", package = "coca"), row.names = 1)) data[[3]] <- as.matrix(read.csv(system.file("extdata", "dataset3.csv", package = "coca"), row.names = 1)) # Create vector of dataset names, in the same order as they appear above datasetNames <- c("Dataset1", "Dataset2", "Dataset3") # Build matrix of clusters outputBuildMOC <- buildMOC(data, M = 3, K = 6, distances = "cor") # Extract matrix of clusters and dataset indicator vector moc <- outputBuildMOC$moc datasetIndicator <- outputBuildMOC$datasetIndicator # Prepare annotations true_labels <- as.matrix(read.csv( system.file("extdata", "cluster_labels.csv", package = "coca"), row.names = 1)) annotations <- data.frame(true_labels = as.factor(true_labels)) # Plot matrix of clusters plotMOC(moc, datasetIndicator, datasetNames = datasetNames, annotations = annotations)