Articles

Page 1 of 10

Estimating similarity and distance using FracMinHash

The increasing number and volume of genomic and metagenomic data necessitates scalable and robust computational models for precise analysis. Sketching techniques utilizing

Authors: Mahmudur Rahman Hera and David Koslicki

Citation: Algorithms for Molecular Biology 2025 20:8

Content type: Research Published on: 15 May 2025
- View Full Text
- View PDF
AlfaPang: alignment free algorithm for pangenome graph construction

The success of pangenome-based approaches to genomics analysis depends largely on the existence of efficient methods for constructing pangenome graphs that are applicable to large genome collections. In the cu...

Authors: Adam Cicherski, Anna Lisiecka and Norbert Dojer

Citation: Algorithms for Molecular Biology 2025 20:7

Content type: Research Published on: 15 May 2025
- View Full Text
- View PDF
\(\textsc {McDag}\): indexing maximal common subsequences for k strings

Analyzing and comparing sequences of symbols is among the most fundamental problems in computer science, possibly even more so in bioinformatics. Maximal Common Subsequences (MCSs), i.e., inclusion-maximal seq...

Authors: Giovanni Buzzega, Alessio Conte, Roberto Grossi and Giulia Punzi

Citation: Algorithms for Molecular Biology 2025 20:6

Content type: Research Published on: 19 April 2025
- View Full Text
- View PDF
Unbiased anchors for reliable genome-wide synteny detection

Orthology inference lies at the foundation of comparative genomics research. The correct identification of loci which descended from a common ancestral sequence is not only complicated by sequence divergence b...

Authors: Karl K. Käther, Andreas Remmel, Steffen Lemke and Peter F. Stadler

Citation: Algorithms for Molecular Biology 2025 20:5

Content type: Research Published on: 5 April 2025
- View Full Text
- View PDF
The open-closed mod-minimizer algorithm

Authors: Ragnar Groot Koerkamp, Daniel Liu and Giulio Ermanno Pibiri

Citation: Algorithms for Molecular Biology 2025 20:4

Content type: Research Published on: 17 March 2025
- View Full Text
- View PDF
Mem-based pangenome indexing for k-mer queries

Pangenomes are growing in number and size, thanks to the prevalence of high-quality long-read assemblies. However, current methods for studying sequence composition and conservation within pangenomes have limi...

Authors: Stephen Hwang, Nathaniel K. Brown, Omar Y. Ahmed, Katharine M. Jenike, Sam Kovaka, Michael C. Schatz and Ben Langmead

Citation: Algorithms for Molecular Biology 2025 20:3

Content type: Research Published on: 1 March 2025
- View Full Text
- View PDF
Finding high posterior density phylogenies by systematically extending a directed acyclic graph

Bayesian phylogenetics typically estimates a posterior distribution, or aspects thereof, using Markov chain Monte Carlo methods. These methods integrate over tree space by applying local rearrangements to move...

Authors: Chris Jennings-Shaffer, David H. Rich, Matthew Macaulay, Michael D. Karcher, Tanvi Ganapathy, Shosuke Kiami, Anna Kooperberg, Cheng Zhang, Marc A. Suchard and Frederick A. Matsen IV

Citation: Algorithms for Molecular Biology 2025 20:2

Content type: Research Published on: 28 February 2025
- View Full Text
- View PDF
Fractional hitting sets for efficient multiset sketching

The exponential increase in publicly available sequencing data and genomic resources necessitates the development of highly efficient methods for data processing and analysis. Locality-sensitive hashing techni...

Authors: Timothé Rouzé, Igor Martayan, Camille Marchet and Antoine Limasset

Citation: Algorithms for Molecular Biology 2025 20:1

Content type: Research Published on: 8 February 2025
- View Full Text
- View PDF
On the parameterized complexity of the median and closest problems under some permutation metrics

Genome rearrangements are events where large blocks of DNA exchange places during evolution. The analysis of these events is a promising tool for understanding evolutionary genomics, providing data for phyloge...

Authors: Luís Cunha, Ignasi Sau and Uéverton Souza

Citation: Algorithms for Molecular Biology 2024 19:24

Content type: Research Published on: 24 December 2024
- View Full Text
- View PDF
TINNiK: inference of the tree of blobs of a species network under the coalescent model

The tree of blobs of a species network shows only the tree-like aspects of relationships of taxa on a network, omitting information on network substructures where hybridization or other types of lateral transf...

Authors: Elizabeth S. Allman, Hector Baños, Jonathan D. Mitchell and John A. Rhodes

Citation: Algorithms for Molecular Biology 2024 19:23

Content type: Research Published on: 5 November 2024
- View Full Text
- View PDF
New generalized metric based on branch length distance to compare B cell lineage trees

The B cell lineage tree encapsulates the successive phases of B cell differentiation and maturation, transitioning from hematopoietic stem cells to mature, antibody-secreting cells within the immune system. Ma...

Authors: Mahsa Farnia and Nadia Tahiri

Citation: Algorithms for Molecular Biology 2024 19:22

Content type: Research Published on: 5 October 2024
- View Full Text
- View PDF
Metric multidimensional scaling for large single-cell datasets using neural networks

Metric multidimensional scaling is one of the classical methods for embedding data into low-dimensional Euclidean space. It creates the low-dimensional embedding by approximately preserving the pairwise distan...

Authors: Stefan Canzar, Van Hoan Do, Slobodan Jelić, Sören Laue, Domagoj Matijević and Tomislav Prusina

Citation: Algorithms for Molecular Biology 2024 19:21

Content type: Research Published on: 11 June 2024
- View Full Text
- View PDF
Compression algorithm for colored de Bruijn graphs

A colored de Bruijn graph (also called a set of k-mer sets), is a set of k-mers with every k-mer assigned a set of colors. Colored de Bruijn graphs are used in a variety of applications, including variant call...

Authors: Amatur Rahman, Yoann Dufresne and Paul Medvedev

Citation: Algorithms for Molecular Biology 2024 19:20

Content type: Research Published on: 26 May 2024
- View Full Text
- View PDF
ESKEMAP: exact sketch-based read mapping

Given a sequencing read, the broad goal of read mapping is to find the location(s) in the reference genome that have a “similar sequence”. Traditionally, “similar sequence” was defined as having a high alignme...

Authors: Tizian Schulz and Paul Medvedev

Citation: Algorithms for Molecular Biology 2024 19:19

Content type: Research Published on: 4 May 2024
- View Full Text
- View PDF
NestedBD: Bayesian inference of phylogenetic trees from single-cell copy number profiles under a birth-death model

Copy number aberrations (CNAs) are ubiquitous in many types of cancer. Inferring CNAs from cancer genomic data could help shed light on the initiation, progression, and potential treatment of cancer. While suc...

Authors: Yushu Liu, Mohammadamin Edrisi, Zhi Yan, Huw A Ogilvie and Luay Nakhleh

Citation: Algorithms for Molecular Biology 2024 19:18

Content type: Research Published on: 29 April 2024
- View Full Text
- View PDF
Revisiting the complexity of and algorithms for the graph traversal edit distance and its variants

The graph traversal edit distance (GTED), introduced by Ebrahimpour Boroojeny et al. (2018), is an elegant distance measure defined as the minimum edit distance between strings reconstructed from Eulerian trai...

Authors: Yutong Qiu, Yihang Shen and Carl Kingsford

Citation: Algorithms for Molecular Biology 2024 19:17

Content type: Research Published on: 29 April 2024
- View Full Text
- View PDF
Fast, parallel, and cache-friendly suffix array construction

String indexes such as the suffix array (sa) and the closely related longest common prefix (lcp) array are fundamental objects in bioinformatics and have a wide variety of applications. Despite their importance i...

Authors: Jamshed Khan, Tobias Rubel, Erin Molloy, Laxman Dhulipala and Rob Patro

Citation: Algorithms for Molecular Biology 2024 19:16

Content type: Research Published on: 28 April 2024
- View Full Text
- View PDF
Pfp-fm: an accelerated FM-index

FM-indexes are crucial data structures in DNA alignment, but searching with them usually takes at least one random access per character in the query pattern. Ferragina and Fischer [1] observed in 2007 that word-b...

Authors: Aaron Hong, Marco Oliva, Dominik Köppl, Hideo Bannai, Christina Boucher and Travis Gagie

Citation: Algorithms for Molecular Biology 2024 19:15

Content type: Research Published on: 10 April 2024
- View Full Text
- View PDF
Space-efficient computation of k-mer dictionaries for large values of k

Computing k-mer frequencies in a collection of reads is a common procedure in many genomic applications. Several state-of-the-art k-mer counters rely on hash tables to carry out this task but they are often optim...

Authors: Diego Díaz-Domínguez, Miika Leinonen and Leena Salmela

Citation: Algorithms for Molecular Biology 2024 19:14

Content type: Research Published on: 5 April 2024
- View Full Text
- View PDF
Infrared: a declarative tree decomposition-powered framework for bioinformatics

Many bioinformatics problems can be approached as optimization or controlled sampling tasks, and solved exactly and efficiently using Dynamic Programming (DP). However, such exact methods are typically tailore...

Authors: Hua-Ting Yao, Bertrand Marchand, Sarah J. Berkemer, Yann Ponty and Sebastian Will

Citation: Algorithms for Molecular Biology 2024 19:13

Content type: Research Published on: 16 March 2024
- View Full Text
- View PDF
Median quartet tree search algorithms using optimal subtree prune and regraft

Gene trees can be different from the species tree due to biological processes and inference errors. One way to obtain a species tree is to find one that maximizes some measure of similarity to a set of gene tr...

Authors: Shayesteh Arasti and Siavash Mirarab

Citation: Algorithms for Molecular Biology 2024 19:12

Content type: Research Published on: 13 March 2024
- View Full Text
- View PDF
Suffix sorting via matching statistics

We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the...

Authors: Zsuzsanna Lipták, Francesco Masillo and Simon J. Puglisi

Citation: Algorithms for Molecular Biology 2024 19:11

Content type: Research Published on: 12 March 2024
- View Full Text
- View PDF
Finding maximal exact matches in graphs

We study the problem of finding maximal exact matches (MEMs) between a query string Q and a labeled graph G. MEMs are an important class of seeds, often used in seed-chain-extend type of practical alignment metho...

Authors: Nicola Rizzo, Manuel Cáceres and Veli Mäkinen

Citation: Algorithms for Molecular Biology 2024 19:10

Content type: Research Published on: 11 March 2024
- View Full Text
- View PDF
SparseRNAfolD: optimized sparse RNA pseudoknot-free folding with dangle consideration

Computational RNA secondary structure prediction by free energy minimization is indispensable for analyzing structural RNAs and their interactions. These methods find the structure with the minimum free energy...

Authors: Mateo Gray, Sebastian Will and Hosna Jabbari

Citation: Algorithms for Molecular Biology 2024 19:9

Content type: Research Published on: 3 March 2024
- View Full Text
- View PDF
Recombinations, chains and caps: resolving problems with the DCJ-indel model

One of the most fundamental problems in genome rearrangement studies is the (genomic) distance problem. It is typically formulated as finding the minimum number of rearrangements under a model that are needed ...

Authors: Leonard Bohnenkämper

Citation: Algorithms for Molecular Biology 2024 19:8

Content type: Research Published on: 27 February 2024
- View Full Text
- View PDF
Unifying duplication episode clustering and gene-species mapping inference

We present a novel problem, called MetaEC, which aims to infer gene-species assignments in a collection of partially leaf-labeled gene trees labels by minimizing the size of duplication episode clustering (EC)...

Authors: Paweł Górecki, Natalia Rutecka, Agnieszka Mykowiecka and Jarosław Paszek

Citation: Algorithms for Molecular Biology 2024 19:7

Content type: Research Published on: 14 February 2024
- View Full Text
- View PDF
Predicting horizontal gene transfers with perfect transfer networks

Horizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from a particular genomic signature, while phylogenetic methods use sequen...

Authors: Alitzel López Sánchez and Manuel Lafond

Citation: Algorithms for Molecular Biology 2024 19:6

Content type: Research Published on: 6 February 2024
- View Full Text
- View PDF
Global exact optimisations for chloroplast structural haplotype scaffolding

Scaffolding is an intermediate stage of fragment assembly. It consists in orienting and ordering the contigs obtained by the assembly of the sequencing reads. In the general case, the problem has been largely ...

Authors: Victor Epain and Rumen Andonov

Citation: Algorithms for Molecular Biology 2024 19:5

Content type: Research Published on: 6 February 2024
- View Full Text
- View PDF
Co-linear chaining on pangenome graphs

Pangenome reference graphs are useful in genomics because they compactly represent the genetic diversity within a species, a capability that linear references lack. However, efficiently aligning sequences to t...

Authors: Jyotshna Rajput, Ghanshyam Chandra and Chirag Jain

Citation: Algorithms for Molecular Biology 2024 19:4

Content type: Research Published on: 27 January 2024
- View Full Text
- View PDF
Fulgor: a fast and compact k-mer index for large-scale matching and color queries

The problem of sequence identification or matching—determining the subset of reference sequences from a given collection that are likely to contain a short, queried nucleotide sequence—is relevant for many imp...

Authors: Jason Fan, Jamshed Khan, Noor Pratap Singh, Giulio Ermanno Pibiri and Rob Patro

Citation: Algorithms for Molecular Biology 2024 19:3

Content type: Research Published on: 22 January 2024
- View Full Text
- View PDF
Dollo-CDP: a polynomial-time algorithm for the clade-constrained large Dollo parsimony problem

The last decade of phylogenetics has seen the development of many methods that leverage constraints plus dynamic programming. The goal of this algorithmic technique is to produce a phylogeny that is optimal wi...

Authors: Junyan Dai, Tobias Rubel, Yunheng Han and Erin K. Molloy

Citation: Algorithms for Molecular Biology 2024 19:2

Content type: Research Published on: 8 January 2024
- View Full Text
- View PDF
Investigating the complexity of the double distance problems

Authors: Marília D. V. Braga, Leonie R. Brockmann, Katharina Klerx and Jens Stoye

Citation: Algorithms for Molecular Biology 2024 19:1

Content type: Research Published on: 4 January 2024
- View Full Text
- View PDF
EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment

Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed usin...

Authors: Chengze Shen, Baqiao Liu, Kelly P. Williams and Tandy Warnow

Citation: Algorithms for Molecular Biology 2023 18:21

Content type: Research Published on: 7 December 2023
- View Full Text
- View PDF
Correction: Constructing founder sets under allelic and non-allelic homologous recombination

Authors: Konstantinn Bonnet, Tobias Marschall and Daniel Doerr

Citation: Algorithms for Molecular Biology 2023 18:20

Content type: Correction Published on: 6 December 2023

The original article was published in Algorithms for Molecular Biology 2023 18:15
- View Full Text
- View PDF
Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model

Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequ...

Authors: Yunheng Han and Erin K. Molloy

Citation: Algorithms for Molecular Biology 2023 18:19

Content type: Research Published on: 1 December 2023
- View Full Text
- View PDF
Automated design of dynamic programming schemes for RNA folding with pseudoknots

Although RNA secondary structure prediction is a textbook application of dynamic programming (DP) and routine task in RNA structure analysis, it remains challenging whenever pseudoknots come into play. Since t...

Authors: Bertrand Marchand, Sebastian Will, Sarah J. Berkemer, Yann Ponty and Laurent Bulteau

Citation: Algorithms for Molecular Biology 2023 18:18

Content type: Research Published on: 1 December 2023
- View Full Text
- View PDF
New algorithms for structure informed genome rearrangement

We define two new computational problems in the domain of perfect genome rearrangements, and propose three algorithms to solve them. The rearrangement scenarios modeled by the problems consider Reversal and Bl...

Authors: Eden Ozeri, Meirav Zehavi and Michal Ziv-Ukelson

Citation: Algorithms for Molecular Biology 2023 18:17

Content type: Research Published on: 1 December 2023
- View Full Text
- View PDF
Relative timing information and orthology in evolutionary scenarios

Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the mapping of the vertices of a gene tree T to vertices and edges of a species tree S. The relative ti...

Authors: David Schaller, Tom Hartmann, Manuel Lafond, Peter F. Stadler, Nicolas Wieseke and Marc Hellmuth

Citation: Algorithms for Molecular Biology 2023 18:16

Content type: Research Published on: 8 November 2023
- View Full Text
- View PDF
Constructing founder sets under allelic and non-allelic homologous recombination

Homologous recombination between the maternal and paternal copies of a chromosome is a key mechanism for human inheritance and shapes population genetic properties of our species. However, a similar mechanism ...

Authors: Konstantinn Bonnet, Tobias Marschall and Daniel Doerr

Citation: Algorithms for Molecular Biology 2023 18:15

Content type: Research Published on: 29 September 2023

The Correction to this article has been published in Algorithms for Molecular Biology 2023 18:20
- View Full Text
- View PDF
Efficient gene orthology inference via large-scale rearrangements

Recently we developed a gene orthology inference tool based on genome rearrangements (Journal of Bioinformatics and Computational Biology 19:6, 2021). Given a set of genomes our method first computes all pairwise...

Authors: Diego P. Rubert and Marília D. V. Braga

Citation: Algorithms for Molecular Biology 2023 18:14

Content type: Research Published on: 28 September 2023
- View Full Text
- View PDF
Constructing phylogenetic networks via cherry picking and machine learning

Combining a set of phylogenetic trees into a single phylogenetic network that explains all of them is a fundamental challenge in evolutionary studies. Existing methods are computationally expensive and can eit...

Authors: Giulia Bernardini, Leo van Iersel, Esther Julien and Leen Stougie

Citation: Algorithms for Molecular Biology 2023 18:13

Content type: Research Published on: 16 September 2023
- View Full Text
- View PDF
The solution surface of the Li-Stephens haplotype copying model

The Li-Stephens (LS) haplotype copying model forms the basis of a number of important statistical inference procedures in genetics. LS is a probabilistic generative model which supposes that a sampled chromoso...

Authors: Yifan Jin and Jonathan Terhorst

Citation: Algorithms for Molecular Biology 2023 18:12

Content type: Research Published on: 9 August 2023
- View Full Text
- View PDF
phyBWT2: phylogeny reconstruction via eBWT positional clustering

Molecular phylogenetics studies the evolutionary relationships among the individuals of a population through their biological sequences. It may provide insights about the origin and the evolution of viral dise...

Authors: Veronica Guerrini, Alessio Conte, Roberto Grossi, Gianni Liti, Giovanna Rosone and Lorenzo Tattini

Citation: Algorithms for Molecular Biology 2023 18:11

Content type: Research Published on: 3 August 2023
- View Full Text
- View PDF
A topology-marginal composite likelihood via a generalized phylogenetic pruning algorithm

Bayesian phylogenetics is a computationally challenging inferential problem. Classical methods are based on random-walk Markov chain Monte Carlo (MCMC), where random proposals are made on the tree parameter an...

Authors: Seong-Hwan Jun, Hassan Nasif, Chris Jennings-Shaffer, David H Rich, Anna Kooperberg, Mathieu Fourment, Cheng Zhang, Marc A Suchard and Frederick A Matsen IV

Citation: Algorithms for Molecular Biology 2023 18:10

Content type: Research Published on: 31 July 2023
- View Full Text
- View PDF
On the complexity of non-binary tree reconciliation with endosymbiotic gene transfer

Reconciling a non-binary gene tree with a binary species tree can be done efficiently in the absence of horizontal gene transfers, but becomes NP-hard in the presence of gene transfers. Here, we focus on the s...

Authors: Mathieu Gascon and Nadia El-Mabrouk

Citation: Algorithms for Molecular Biology 2023 18:9

Content type: Research Published on: 30 July 2023
- View Full Text
- View PDF
Mono-valent salt corrections for RNA secondary structures in the ViennaRNA package

RNA features a highly negatively charged phosphate backbone that attracts a cloud of counter-ions that reduce the electrostatic repulsion in a concentration dependent manner. Ion concentrations thus have a lar...

Authors: Hua-Ting Yao, Ronny Lorenz, Ivo L. Hofacker and Peter F. Stadler

Citation: Algorithms for Molecular Biology 2023 18:8

Content type: Research Published on: 29 July 2023
- View Full Text
- View PDF
Locality-sensitive bucketing functions for the edit distance

Many bioinformatics applications involve bucketing a set of sequences where each sequence is allowed to be assigned into multiple buckets. To achieve both high sensitivity and precision, bucketing methods are ...

Authors: Ke Chen and Mingfu Shao

Citation: Algorithms for Molecular Biology 2023 18:7

Content type: Research Published on: 24 July 2023
- View Full Text
- View PDF
Weighted ASTRID: fast and accurate species trees from weighted internode distances

Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting...

Authors: Baqiao Liu and Tandy Warnow

Citation: Algorithms for Molecular Biology 2023 18:6

Content type: Research Published on: 19 July 2023
- View Full Text
- View PDF
Eulertigs: minimum plain text representation of k-mer sets without repetitions in linear time

A fundamental operation in computational genomics is to reduce the input sequences to their constituent k-mers. For maximum performance of downstream applications it is important to store the k-mers in small spac...

Authors: Sebastian Schmidt and Jarno N. Alanko

Citation: Algorithms for Molecular Biology 2023 18:5

Content type: Research Published on: 4 July 2023
- View Full Text
- View PDF
A classification algorithm based on dynamic ensemble selection to predict mutational patterns of the envelope protein in HIV-infected patients

Therapeutics against the envelope (Env) proteins of human immunodeficiency virus type 1 (HIV-1) effectively reduce viral loads in patients. However, due to mutations, new therapy-resistant Env variants frequen...

Authors: Mohammad Fili, Guiping Hu, Changze Han, Alexa Kort, John Trettin and Hillel Haim

Citation: Algorithms for Molecular Biology 2023 18:4

Content type: Research Published on: 19 June 2023
- View Full Text
- View PDF

How was your experience today?

Rating Please select one rating

Awful

Bad

Good

Great

Thank you for your feedback.

Tell us why (opens in a new tab)

Articles

Algorithms for Molecular Biology

Contact us