Benchmarking

Benchmarking missing data strategies for k-means clustering

The goal is to compare a few algorithms for missing imputation when used before k-means clustering is performed. For the latter we use the same algorithm as in ClustImpute to ensure that only the computation time of the imputation is compared. In a nutshell, we’ll se that ClustImpute scales like a random imputation and hence is much faster than a pre-processing with MICE / MissRanger. This is not surprising since ClustImpute basically runs a fixed number of random imputations conditional on the current cluster assignment.