Molecular homology and multiple-sequence alignment: an analysis of concepts and practice
David A. Morrison A D, Matthew J. Morgan B and Scot A. Kelchner C
A Systematic Biology, Uppsala University, Norbyvägen 18D, Uppsala 75236, Sweden.
B CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601, Australia.
C Department of Biology, Utah State University, 5305 Old Main Hill, Logan, UT 84322-5305, USA.
D Corresponding author. Email: david.morrison@ebc.uu.se
Australian Systematic Botany 28(1) 46-62
Abstract
Sequence alignment is just as much a part of phylogenetics as is tree building, although it is often viewed solely as a necessary tool to construct trees. However, alignment for the purpose of phylogenetic inference is primarily about homology, as it is the procedure that expresses homology relationships among the characters, rather than the historical relationships of the taxa. Molecular homology is rather vaguely defined and understood, despite its importance in the molecular age. Indeed, homology has rarely been evaluated with respect to nucleotide sequence alignments, in spite of the fact that nucleotides are the only data that directly represent genotype. All other molecular data represent phenotype, just as do morphology and anatomy. Thus, efforts to improve sequence alignment for phylogenetic purposes should involve a more refined use of the homology concept at a molecular level. To this end, we present examples of molecular-data levels at which homology might be considered, and arrange them in a hierarchy. The concept that we propose has many levels, which link directly to the developmental and morphological components of homology. Of note, there is no simple relationship between gene homology and nucleotide homology. We also propose terminology with which to better describe and discuss molecular homology at these levels. Our over-arching conceptual framework is then used to shed light on the multitude of automated procedures that have been created for multiple-sequence alignment. Sequence alignment needs to be based on aligning homologous nucleotides, without necessary reference to homology at any other level of the hierarchy. In particular, inference of nucleotide homology involves deriving a plausible scenario for molecular change among the set of sequences. Our clarifications should allow the development of a procedure that specifically addresses homology, which is required when performing alignment for phylogenetic purposes, but which does not yet exist.
Additional keywords: multiple alignment, nucleotide alignment, sequence homology.
FREE PDF GRATIS: Australian Systematic Botany