Log on/register
BioMed Central home | Journals A-Z | Feedback | Support | My details
 
Open AccessHighly AccessResearch

Robinson-Foulds Supertrees

Mukul S Bansal1,2 email, J Gordon Burleigh3 email, Oliver Eulenstein2 email and David Fernández-Baca2 email

The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel

Department of Computer Science, Iowa State University, Ames, IA 50011, USA

Department of Biology, University of Florida, Gainesville, FL 32611, USA

author email corresponding author email

Algorithms for Molecular Biology 2010, 5:18doi:10.1186/1748-7188-5-18

Published: 24 February 2010

Abstract

Background

Supertree methods synthesize collections of small phylogenetic trees with incomplete taxon overlap into comprehensive trees, or supertrees, that include all taxa found in the input trees. Supertree methods based on the well established Robinson-Foulds (RF) distance have the potential to build supertrees that retain much information from the input trees. Specifically, the RF supertree problem seeks a binary supertree that minimizes the sum of the RF distances from the supertree to the input trees. Thus, an RF supertree is a supertree that is consistent with the largest number of clusters (or clades) from the input trees.

Results

We introduce efficient, local search based, hill-climbing heuristics for the intrinsically hard RF supertree problem on rooted trees. These heuristics use novel non-trivial algorithms for the SPR and TBR local search problems which improve on the time complexity of the best known (naïve) solutions by a factor of Θ(n) and Θ(n2) respectively (where n is the number of taxa, or leaves, in the supertree). We use an implementation of our new algorithms to examine the performance of the RF supertree method and compare it to matrix representation with parsimony (MRP) and the triplet supertree method using four supertree data sets. Not only did our RF heuristic provide fast estimates of RF supertrees in all data sets, but the RF supertrees also retained more of the information from the input trees (based on the RF distance) than the other supertree methods.

Conclusions

Our heuristics for the RF supertree problem, based on our new local search algorithms, make it possible for the first time to estimate large supertrees by directly optimizing the RF distance from rooted input trees to the supertrees. This provides a new and fast method to build accurate supertrees. RF supertrees may also be useful for estimating majority-rule(-) supertrees, which are a generalization of majority-rule consensus trees.


© 1999-2010 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.