Fig. 1
From: The paralog-to-contig assignment problem: high quality gene models from fragmented assemblies

The EMS-pipeline explicitly solves the paralog-to-contig assignment problem. Sequence-matches to individual TCEs are collected in a step-wise procedure applying either tblastn (from single sequences of individual TCEs) or hmmsearch (starting from a sequence alignment for each TCE). Depending on the input, pre-processing steps (0a) or (0b) are performed before similarity search. The colored boxes represent TCEs. The pre-processing steps, which are performed separately for all individual TCEs of all paralogs, are exemplified here for one paralog encoded by three exons. For a detailed description of the individual steps, we refer to the "Methods" section. AA amino acid sequence, hMMs hidden Markov Models, ILP integer linear programming problem, TCE translated coding exon