rep_biclustermd {biclustermd}  R Documentation 
Repeat a biclustering to achieve a minimum SSE solution
rep_biclustermd( data, nrep = 10, parallel = FALSE, ncores = 2, col_clusters = floor(sqrt(ncol(data))), row_clusters = floor(sqrt(nrow(data))), miss_val = mean(data, na.rm = TRUE), miss_val_sd = 1, similarity = "Rand", row_min_num = 5, col_min_num = 5, row_num_to_move = 1, col_num_to_move = 1, row_shuffles = 1, col_shuffles = 1, max.iter = 100 )
data 
Dataset to bicluster. Must to be a data matrix with only numbers and missing values in the data set. It should have row names and column names. 
nrep 
The number of times to repeat the biclustering. Default 10. 
parallel 
Logical indicating if the user would like to utilize the

ncores 
The number of cores to use if parallel computing. Default 2. 
col_clusters 
The number of clusters to partition the columns into. 
row_clusters 
The number of clusters to partition the rows into. 
miss_val 
Value or function to put in empty cells of the prototype matrix.
If a value, a random normal variable with sd = 
miss_val_sd 
Standard deviation of the normal distribution 
similarity 
The metric used to compare two successive clusterings. Can be "Rand" (default), "HA" for the Hubert and Arabie adjusted Rand index or "Jaccard". See RRand and for details. 
row_min_num 
Minimum row prototype size in order to be eligible to be chosen when filling an empty row prototype. Default is 5. 
col_min_num 
Minimum column prototype size in order to be eligible to be chosen when filling an empty row prototype. Default is 5. 
row_num_to_move 
Number of rows to remove from the sampled prototype to put in the empty row prototype. Default is 1. 
col_num_to_move 
Number of columns to remove from the sampled prototype to put in the empty column prototype. Default is 1. 
row_shuffles 
Number of times to shuffle rows in each iteration. Default is 1. 
col_shuffles 
Number of times to shuffle columns in each iteration. Default is 1. 
max.iter 
Maximum number of iterations to let the algorithm run for. 
A list of the minimum SSE biclustering, a vector containing the final SSE of each repeat, and the time it took the function to run.
Li, J., Reisner, J., Pham, H., Olafsson, S., and Vardeman, S. (2019) Biclustering for Missing Data. Information Sciences, Submitted
data("synthetic") # 20 repeats without parallelization repeat_bc < rep_biclustermd(synthetic, nrep = 20, col_clusters = 3, row_clusters = 2, miss_val = mean(synthetic, na.rm = TRUE), miss_val_sd = sd(synthetic, na.rm = TRUE), col_min_num = 2, row_min_num = 2, col_num_to_move = 1, row_num_to_move = 1, max.iter = 10) repeat_bc autoplot(repeat_bc$best_bc) plot(repeat_bc$rep_sse, type = 'b', pch = 20) repeat_bc$runtime # 20 repeats with parallelization over 2 cores repeat_bc < rep_biclustermd(synthetic, nrep = 20, parallel = TRUE, ncores = 2, col_clusters = 3, row_clusters = 2, miss_val = mean(synthetic, na.rm = TRUE), miss_val_sd = sd(synthetic, na.rm = TRUE), col_min_num = 2, row_min_num = 2, col_num_to_move = 1, row_num_to_move = 1, max.iter = 10) repeat_bc$runtime