Fig. 1
From: Binning long reads in metagenomics datasets using composition and coverage information

Overall workflow of LRBinner. (Step 1) The feature vectors of composition and coverage information are computed from long reads. Composition vectors are the normalized k-mer counts where as the coverage vectors are the normalized k-mer counts histograms for each read. The feature vectors are input into a variational auto-encoder to obtain low-dimensional latent representations. Note that variational auto-encoders learn a lower dimensional representation while learning to reconstruct the original. (Step 2) Sample a seed point (read) in the latent space. Use this seed to estimate a confident cluster (bin) that contains this seed point. Step 2 is iterated until there are no seed points. (Step 3) The unclustered points are assigned to the clusters using a statistical model. Note that the 2-dimensional representation of points is only for the illustration purpose