Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm

Single-cell RNA sequencing technologies have enabled us to study tissue heterogeneity at cellular resolution.Fast-developing sequencing platforms like droplet-based sequencing make it feasible to parallel process thousands of single cells effectively.Although a unique molecular identifier (UMI) can rawafricaonline.com remove bias from amplification noise to a certain extent, clustering for such sparse and high-dimensional large-scale discrete data remains intractable and challenging.

Most existing deep learning-based clustering methods utilize the mean square error or negative binomial distribution with or without zero inflation to denoise single-cell UMI count data, which may underfit or overfit the gene expression profiles.In addition, neglecting the molecule sampling mechanism and extracting representation by simple linear dimension reduction with a hard clustering algorithm may distort data structure and lead to spurious analytical results.In this paper, we combined the deep autoencoder technique with statistical modeling and developed a novel and effective clustering method, scDMFK, for single-cell transcriptome UMI count data.

ScDMFK utilizes multinomial distribution to characterize data structure and draw support from neural network to facilitate model parameter estimation.In the learned low-dimensional latent space, we proposed an adaptive fuzzy k-means algorithm with entropy hellfire sloe gin regularization to perform soft clustering.Various simulation scenarios and the analysis of 10 real datasets have shown that scDMFK outperforms other state-of-the-art methods with respect to data modeling and clustering algorithms.

Besides, scDMFK has excellent scalability for large-scale single-cell datasets.

Leave a Reply

Your email address will not be published. Required fields are marked *