Professional Documents
Culture Documents
Reconstructing the
Tree of Life
Taxonomy and Systematics of
Species Rich Taxa
9579_C000.fm Page ii Friday, November 17, 2006 12:30 PM
The first of the Systematics Association’s publications The New Systematics (1940) was a classic
work edited by its then-president Sir Julian Huxley, that set out the problems facing general
biologists in deciding which kinds of data would most effectively progress systematics. Since
then, more than 70 volumes have been published, often in rapidly expanding areas of science
where a modern synthesis is required.
The modus operandi of the Association is to encourage leading researchers to organize symposia
that result in a multi-authored volume. In 1997 the Association organized the first of its international
Biennial Conferences.This and subsequent Biennial Conferences, which are designed to provide
for systematists of all kinds, included themed symposia that resulted in further publications. The
Association also publishes volumes that are not specifically linked to meetings and encourages new
publications in a broad range of systematics topics.
Anyone wishing to learn more about the Systematics Association and its publications should
refer to our website at www.systass.org.
Other Systematics Association publications are listed after the index for this volume.
9579_C000.fm Page iii Friday, November 17, 2006 12:30 PM
Reconstructing the
Tree of Life
Taxonomy and Systematics of
Species Rich Taxa
Edited by
Trevor R. Hodkinson
John A. N. Parnell
Department of Botany
School of Natural Sciences
Trinity College Dublin
Dublin, Ireland
Outside to inside of image: water ermine moth, UK (Spilosoma urticae); barley, UK (Hordeum distichon); fossilised sea
urchins, Tunisia (Mecaster spp.); seeds, unknown origin (Bignoniaceae); purple sea snails, worldwide (Janthina janthina);
fossilised shark teeth, USA (Isurus sp.); and sea urchin, Greece (Arbacia lixula)
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2007 by The Systematics Association
CRC Press is an imprint of Taylor & Francis Group, an Informa business
This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted
with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to
publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of
all materials or for the consequences of their use.
No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or
other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any informa-
tion storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For orga-
nizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Reconstructing the tree of life : taxonomy and systematics of species rich taxa / editors, Trevor R.
Hodkinson and John A.N. Parnell.
p. cm. -- (The Systematics Association special volume series)
Includes bibliographical references and index.
ISBN 0-8493-9579-8 (alk. paper)
1. Biology--Classification. I. Hodkinson, Trevor R. II. Parnell, John A. N.
QH83.R43 2006
578.01’2--dc22 2006048341
Preface
The twenty chapters of this book are based on the theme of the plenary session of the Fourth
Biennial Conference of the Systematics Association, held at Trinity College Dublin (TCD), Ireland,
in August 2003, namely the systematics of species rich taxa. During the five-day conference, there
were stimulating presentations, posters and discussions, covering a broad sample of the ‘tree of
life’; these also influenced the shape and content of this volume. Papers were contributed by a
number of conference delegates and by others subsequently invited to broaden the book’s scope
or address particular theoretical issues.
Consideration of the book’s theme and content began at a conference planning meeting at TCD
in early 2003 with the local conference organiser, Steve Waldren of TCD, and Gordon Curry, the
honorary treasurer of the Systematics Association. These were refined further in discussions with
Alan Warren, the Systematics Association special volumes series editor, and Chris Humphries, the
president of the Systematics Association. We are grateful to all of them for their input and
encouragement, particularly our colleague, Steve. Two anonymous book proposal reviewers also
provided valuable content guidance. We are particularly grateful for the manuscript preparation
input of Sandra Velthuis of Whitebarn Consulting, who has worked long and hard to proofread
chapters and standardise their format, and to the production team, especially Gail Renard, Pat
Roberson and John Sulzycki, at CRC Press, who have been highly supportive and professional.
We also thank Diccon Alexander for the superb cover artwork. Finally we thank all 51 contributing
authors to the book, many of whom also peer reviewed other chapters. We encourage all readers
to support the activities of the Systematics Association (www.systass.org).
Trevor R. Hodkinson
John A.N. Parnell
Department of Botany
School of Natural Sciences
Trinity College Dublin
Ireland
9579_C000.fm Page vi Friday, November 17, 2006 12:30 PM
9579_C000.fm Page vii Friday, November 17, 2006 12:30 PM
The Editors
Dr Trevor Hodkinson is Senior Lecturer in the Department of Botany, School of Natural
Sciences, Trinity College Dublin (TCD), Ireland. He is head of the Molecular Laboratory and
specialises in the research fields of molecular systematics, genetic resources and taxonomy
(http://www.tcd.ie/Botany/Staff/THodkinson.html).
Professor John Parnell is also from the Department of Botany at TCD. He is curator of the
herbarium and his research interests are mainly in the fields of taxonomy and systematics
(http://www.tcd.ie/Botany/Staff/JParnell.html).
9579_C000.fm Page viii Friday, November 17, 2006 12:30 PM
9579_C000.fm Page ix Friday, November 17, 2006 12:30 PM
Contributors
T.G. Barraclough J.J. Clarkson
Division of Biology and NERC Centre Jodrell Laboratory
for Population Biology Royal Botanic Gardens, Kew
Imperial College London, UK Richmond, UK
J.A. Cotton
E. Biffin
Zoology Department
Division of Botany and Zoology
The Natural History Museum
Australian National University
London, UK
Canberra, Australia
L.A. Craven
O.R.P. Bininda-Emonds Australian National Herbarium
Institut für Spezielle Zoologie und Centre for Plant Biodiversity Research
Evolutionsbiologie mit Phyletischem Canberra, Australia
Museum
Friedrich-Schiller-Universität Jena C.J. Creevey
Jena, Germany European Molecular Biology Laboratory
EMBL Heidelberg
K.E. Black Heidelberg, Germany
School of Forest Resources
Penn State University T.J. Davies
University Park, Pennsylvania, USA Department of Biology
University of Virginia
Y. Bouchenak-Khelladi Charlottesville, Virginia, USA
Department of Botany
School of Natural Sciences R.P.J. de Kok
Trinity College Dublin Herbarium
Dublin, Ireland Royal Botanic Gardens, Kew
Richmond, UK
J. Brodie
Botany Department D.A. Fitzpatrick
The Natural History Museum Conway Institute
London, UK University College Dublin
Dublin, Ireland
G. Cassis
Research and Collections Branch G. Fusco
Australian Museum Department of Biology
Sydney, Australia University of Padova
Padova, Italy
M.W. Chase
Jodrell Laboratory M. Geerts
Royal Botanic Gardens, Kew Burg. Heynenstraat 11
Richmond, UK Swalmen, The Netherlands
9579_C000.fm Page x Friday, November 17, 2006 12:30 PM
M.J. O’Connell
T. R. Hodkinson
Department of Biochemistry
Department of Botany
University College Cork
School of Natural Sciences
Cork, Ireland
Trinity College Dublin
Dublin, Ireland
J.A.N. Parnell
Department of Botany
K.D. Hyde School of Natural Sciences
Centre for Research in Fungal Diversity Trinity College Dublin
Department of Ecology and Biodiversity Dublin, Ireland
The University of Hong Kong
Hong Kong, China G. Petersen
Botanical Garden and Museum
The Natural History Museum of Denmark
S.W.L. Jacobs
Copenhagen, Denmark
National Herbarium
Royal Botanic Gardens
D.E. Pisani
Sydney, Australia
Department of Biology
National University of Ireland Maynooth
M.S. Kinney Maynooth, Ireland
Department of Botany
School of Natural Sciences G. Reid
Trinity College Dublin Botany Department
Dublin, Ireland The Natural History Museum
London, UK
A.F. Konings
N. Rønsted
Cichlid Press
Jodrell Laboratory
El Paso, Texas, USA
Royal Botanic Gardens, Kew
Richmond, UK
J.O. McInerney
Department of Biology N. Salamin
National University of Ireland Maynooth Department of Ecology and Evolution
Maynooth, Ireland University of Lausanne
Lausanne, Switzerland
K.R. McKaye
V. Savolainen
Appalachian Laboratory
Jodrell Laboratory
University of Maryland System
Royal Botanic Gardens, Kew
Frostburg, Maryland, USA
Richmond, UK
T.M.A. Utteridge
O. Seberg Herbarium
Botanical Garden and Museum Royal Botanic Gardens, Kew
The Natural History Museum of Denmark Richmond, UK
Copenhagen, Denmark
M.A. Wall
B.D. Shenoy Department of Entomology
Centre for Research in Fungal Diversity San Diego Natural History Museum
Department of Ecology and Biodiversity San Diego, California, USA
The University of Hong Kong
Hong Kong, China W.C. Wheeler
Division of Invertebrate Zoology
American Museum of Natural History
A. Stamatakis
New York, New York, USA
Swiss Federal Institute of Technology
School of Computer and Communication
M. Wilkinson
Sciences
Zoology Department
Lausanne, Switzerland
The Natural History Museum
London, UK
J.R. Stauffer, Jr.
School of Forest Resources D.M. Williams
Penn State University Botany Department
University Park, Pennsylvania, USA The Natural History Museum
London, UK
M. Steel
Biomathematics Research Centre E. Yektaei-Karin
University of Canterbury Jodrell Laboratory
Christchurch, New Zealand Royal Botanic Gardens, Kew
Richmond, UK
A.M.C. Tang
Centre for Research in Fungal Diversity G.C. Zuccarello
Department of Ecology and Biodiversity School of Biological Sciences
The University of Hong Kong Victoria University of Wellington
Hong Kong, China Wellington, New Zealand
9579_C000.fm Page xii Friday, November 17, 2006 12:30 PM
9579_C000.fm Page xiii Friday, November 17, 2006 12:30 PM
Contents
Chapter 1
Introduction to the Systematics of Species Rich Groups .................................................................3
T. R. Hodkinson and J. A. N. Parnell
Chapter 2
Taxonomy/Systematics in the Twenty-First Century ......................................................................21
F. R. Schram
Chapter 3
Assembling the Tree of Life: Magnitude, Shortcuts and Pitfalls ...................................................33
O. Seberg and G. Petersen
Chapter 4
Evolutionary History of Prokaryotes: Tree or No Tree?.................................................................49
J. O. McInerney, D. E. Pisani, M. J. O’Connell, D. A. Fitzpatrick and C. J. Creevey
Chapter 5
Supertree Methods for Building the Tree of Life: Divide-and-Conquer
Approaches to Large Phylogenetic Problems .................................................................................61
M. Wilkinson and J. A. Cotton
Chapter 6
Taxon Sampling versus Computational Complexity and Their Impact
on Obtaining the Tree of Life..........................................................................................................77
O. R. P. Bininda-Emonds and A. Stamatakis
Chapter 7
Tools to Construct and Study Big Trees: A Mathematical Perspective..........................................97
M. Steel
Chapter 8
The Analysis of Molecular Sequences in Large Data Sets: Where Should
We Put Our Effort? ........................................................................................................................113
W. C. Wheeler
9579_C000.fm Page xiv Friday, November 17, 2006 12:30 PM
Chapter 9
Species-Level Phylogenetics of Large Genera: Prospects of Studying
Coevolution and Polyploidy ..........................................................................................................129
N. Rønsted, E. Yektaei-Karin, K. Turk, J. J. Clarkson and M. W. Chase
Chapter 10
The Diversification of Flowering Plants through Time and Space: Key
Innovations, Climate and Chance ..................................................................................................149
T. J. Davies and T. G. Barraclough
Chapter 11
Skewed Distribution of Species Number in Grass Genera: Is It
a Taxonomic Artefact? ...................................................................................................................165
K. W. Hilu
Chapter 12
Reconstructing Animal Phylogeny in the Light of Evolutionary Developmental Biology..........177
A. Minelli, E. Negrisolo and G. Fusco
Chapter 13
Insect Biodiversity and Industrialising the Taxonomic Process: The Plant Bug
Case Study (Insecta: Heteroptera: Miridae) ..................................................................................193
G. Cassis, M. A. Wall and R. T. Schuh
Chapter 14
Cichlid Fish Diversity and Speciation...........................................................................................213
J. R. Stauffer, Jr., K. E. Black, M. Geerts, A. F. Konings and K. R. McKaye
Chapter 15
Fungal Diversity.............................................................................................................................227
A. M. C. Tang, B. D. Shenoy and K. D. Hyde
Chapter 16
Matters of Scale: Dealing with One of the Largest Genera of Angiosperms ..............................251
J. A. N. Parnell, L. A. Craven and E. Biffin
Chapter 17
Supersizing: Progress in Documenting and Understanding
Grass Species Richness..................................................................................................................275
T. R. Hodkinson, V. Savolainen, S. W. L. Jacobs, Y. Bouchenak-Khelladi, M. S. Kinney
and N. Salamin
9579_C000.fm Page xv Friday, November 17, 2006 12:30 PM
Chapter 18
Collecting Strategies for Large and Taxonomically Challenging Taxa:
Where Do We Go from Here, and How Often? ...........................................................................297
T. M. A. Utteridge and R. P. J. de Kok
Chapter 19
Large and Species Rich Taxa: Diatoms, Geography and Taxonomy ...........................................305
D. M. Williams and G. Reid
Chapter 20
Systematics of the Species Rich Algae: Red Algal Classification,
Phylogeny and Speciation..............................................................................................................323
J. Brodie and G. C. Zuccarello
Index ..............................................................................................................................................337
9579_C000.fm Page xvi Friday, November 17, 2006 12:30 PM
9579_S001.fm Page 1 Monday, October 16, 2006 5:46 PM
Section A
Introduction and General Context
9579_S001.fm Page 2 Monday, October 16, 2006 5:46 PM
9579_C001.fm Page 3 Wednesday, November 15, 2006 12:10 PM
CONTENTS
1.1 Introduction...............................................................................................................................4
1.2 What Is a Species Rich Group? ...............................................................................................6
1.2.1 Quantitative and Objective Definitions ........................................................................6
1.2.2 Qualitative and Subjective Definitions.........................................................................7
1.2.3 Combining Objective and Subjective Definitions......................................................10
1.2.4 Large Taxonomic Groups ...........................................................................................10
1.3 Reconstructing and Using the Tree of Life ...........................................................................11
1.3.1 The Tree of Life .........................................................................................................11
1.3.2 Big Tree Reconstruction for Species Rich Groups: Are Large
Phylogenetic Trees Accurate? ....................................................................................12
1.3.3 Characters and Homology..........................................................................................14
1.3.4 Patterns and Processes of Diversity and Understanding the Hollow Curve .............14
1.4 Taxonomy of Species Rich Groups .......................................................................................15
1.4.1 Collecting....................................................................................................................15
1.4.2 Naming, Describing and Classifying .........................................................................16
1.5 Conclusions: Blame Evolution and Politicians......................................................................17
Acknowledgements ..........................................................................................................................18
References ........................................................................................................................................18
ABSTRACT
To completely document the world’s diversity of species we need to undertake some simple but
mountainous tasks; above all we need to tackle its species rich groups. We need to collect them, name
and classify them, and position them on the tree of life. We need to do this systematically across all
groups of organisms, and because of the biodiversity crisis we need to do it quickly. A qualitative
approach to defining a species rich taxon — such as a species rich genus, family, order, class or
phylum — appears more broadly applicable than a quantitative definition, but combining such cate-
gories of definition also appears useful. We define a species rich group as: ‘a group with a relatively
high number of species in comparison to other groups of the same, and comparable, taxonomic rank’.
This chapter introduces, with examples, the concept of species rich groups and discusses how these
groups are central to efforts to document the world’s diversity of species and to help address the
biodiversity crisis. Naming and describing species rich groups is the first step in placing them on the
phylogenetic tree of life. Phylogenetic trees are becoming bigger (supersized) and methods are being
developed to deal with the computational complexity of such trees. This paper also outlines the wider
context of the book and papers presented herein. With species rich taxa, evolution has set taxonomists
and systematists a difficult, but not unattainable, challenge that must be addressed as a matter of urgency.
3
9579_C001.fm Page 4 Wednesday, November 15, 2006 12:10 PM
1.1 INTRODUCTION
It may be a surprise to many readers that biologists cannot answer two seemingly simple yet
fundamental questions: ‘how many species are there in the world?’ and ‘how do the world’s species
relate to one another in an evolutionary context?’. The first question is a basic challenge for
taxonomists who list, describe and classify the world’s organisms. The second is a challenge for
systematists/phylogeneticists who try to place organisms in an evolutionary framework by inferring
a tree of life such as that shown in Figure 1.1. Activities of both groups of workers are critically
impeded by species rich taxa, as they are often poorly sampled and described, yet make up a high
proportion of total global species richness.
There is a huge variance in the published estimates of the total number of species on Earth. It
could lie anywhere in the region of 4 million to 100 million1–3. We cannot even accurately count
the number of species that have so far been described because of synonymy (the same species
unwittingly recorded under different names by different researchers, that is, duplication). For
example, 1.7 million species have been described but levels of synonymy could be in the range of
20–50%4–6 (but see Cassis et al., Chapter 13, for a higher value). Even for a particular species rich
group, estimates can vary enormously. For example, in the insects with approximately 1 million
described species, estimates of the total number of species have varied from 1.8 million by
Hodkinson and Casson7 to 80 million by Stork8. An intermediate 10 million, proposed by Ødegaard
et al.9, may well be more appropriate, but such estimates are often based on crude methods (Cassis et al.,
Chapter 13). Furthermore, for many species rich groups, only a low proportion of the total estimated
number of species has been described. For example, approximately 100,000 fungi have been described
but 1.5 million species may exist (Tang et al., Chapter 15), only 15,000–20,000 diatom species (heter-
okont algae) have been described but up to 200,000 may exist (Williams and Reid, Chapter 19) and
approximately 5,800 red algae (Rhodophyta) have been described but 20,000 may exist (Brodie and
Zuccarello, Chapter 20).
Why do estimates of the number of species in the world vary by an order of magnitude or
more, and why is there such uncertainty? Some of the reasons are covered in the chapters of this
book, particularly Chapter 2 (Schram) and Chapter 3 (Seberg and Petersen), but one problem stands
out above all others, namely that of the species rich groups. It is probably fair to say that taxonomists
have collected representatives of most of the major lineages (groups) of life and that the discovery
of new major branches is a rare event meriting high publicity; for example that surrounding a new
species, Symbion pandora, discovered feeding on the mouth of the Norway lobster and assigned
to a new phylum, Cycliophora10. However, there is now a need to fill in the gaps to find and
characterise, in an evolutionary framework, all the other representatives belonging to those groups
and particularly, in the context of this book, its species rich taxa.
Species diversity is not evenly distributed across the range of life forms that have existed on
Earth. If species were distributed evenly between and within major groups of organisms, and if the
taxonomic units were strictly comparable, we could simply and accurately count the number of
species in one section of the tree (Figure 1.2a) and multiply up by the number of comparable
sections so that the whole tree is represented. However, this pattern is not seen in nature, and we
find striking examples of imbalance. Some evolutionary lineages have succeeded while others have
perished. For example the hexapods, a group including the insects, are a species rich group compared
to their closest relatives the myriapods, crustaceans, cheliceriformes and tardigrades (Figure 1.2b)
and all other eukaryotic life (Cassis et al., Chapter 13). Furthermore, there may be as many as
200,000 diatoms (heterokont algae), but their sister group has recently been recognised as a group
of tiny flagellates, Bolidophyceae, which has no more than three to five currently recognised
species11,12 (see also Williams and Reid, Chapter 19). Therefore, speciation and extinction are not
random processes; some groups of organism have speciated to a staggering degree, while others
have not. The factors leading to such imbalance are discussed throughout this book but especially
in Chapter 10 (Davies and Barraclough), Chapter 11 (Hilu) and Chapter 17 (Hodkinson et al.).
9579_C001.fm Page 5 Wednesday, November 15, 2006 12:10 PM
Echinodermata
Hemichordata
Chordata
Ecdysozoa
Unikonts
Animals, Chapter 12
Lophotrochozoa Insects, Chapter 13
Cichlids, Chapter 14
Fungi, Chapter 15
Sponge -
jellyfish grade
Choanozoa
Fungi
All life
Chapters 1-3,
Amoebozoa 5-8
Plantae Primoplantae
Angiosperms, Chapters 9-11, 16-18
Red algae, Chapter 20
Archaea
Prokaryotes
Chapter 4
Bacteria
FIGURE 1.1 Tree of life. Chapters within the book that relate to specific species rich taxa are indicated. Open
squares represent eukaryotes, the black square represents archaea and the hatched square represents bacteria.
Representatives of the major groups include (1) Bacteria: hydrogenobacteria, blue-green bacteria, green-
sulphur bacteria, spirochaetes; (2) Archaea: korarchaeotes, crenarchaeotes, euryarchaeotes; (3) Discricristales:
euglenids, trypanosomes, acrasid slime moulds; (4) Amitochondriate excavates: parabasalids, diplomonads;
(5) Radiolaria: radiolarians; (6) Cercozoa: cercomonads; (7) Foraminifera: foraminiferans; (8) Chromalveolates:
diatoms, brown algae, oomycetes (water moulds), ciliates, dinoflagellates; (9) Plantae: angiosperms (flowering
plants), gymnosperms, ferns, liverworts, mosses, green algae; (10) Amoebozoa: slime moulds, lobose amoebae
(mycetozoans); (11) Fungi: microsporidians, zygomycetes, basidiomycetes, ascomycetes; (12) Choanozoa:
choanoflagellates, ichthyosporeans; (13) Sponge — jellyfish grade: siliceous ‘sponges’, calcareous ‘sponges’,
corals, jellyfish, aceolomorphs; (14) Lophotrochozoa: gastropods (snails), bivalves (clams), platyhelminths,
rotifers, brachiopods; (15) Ecdysozoa: nematodes, insects, centipedes, crabs, barnacles, spiders, velvet worms;
(16) Chordata: humans, birds, lizards, fish, lancelets, tunicates; (17) Echinodermata: sea urchins, sea cucumbers;
and (18) Hemichordata: acorn worms. (Major groups and representatives adapted from Pennisi2 and super-
groups of eukaryotes from Baldauf27.)
9579_C001.fm Page 6 Wednesday, November 15, 2006 12:10 PM
Hexapods
15,000
Myriapoda
15,000
Crustacea
15,000
Cheliceriformes
15,000
Tardigrades
a b
FIGURE 1.2 Species richness of phylogenetic groups is not evenly distributed. (a) If speciation and extinction
had proceeded in a stochastic manner we would not expect to see significant levels of variation from the model
shown (in a fully resolved and bifurcating tree). Triangles are drawn in proportion to species richness in that
clade (15,000 species in all clades of Figure 1.2a). (b) An example of imbalance in species diversification
within the animal group comprising the insects. Insects belong to the hexapods and account for three quarters
of all described animal diversity. The hexapod clade is much larger in terms of species number than any of
its sister groups of same taxonomic rank (Mriapoda, Crustacea and Cheliceriformes). (Figure 1.2b adapted
from Cassis et al., Chapter 13.)
This book concerns the taxonomy and systematics of species rich groups; it is about how to
collect, document, describe and classify them. It is also about the inextricably linked phylogenetic
studies that try to position species rich taxa on the tree of life and represent their diversity. This
introduction defines species rich groups, highlights examples of major species rich groups, introduces
the concept of the tree of life and discusses the problems and prospects of dealing with species rich
groups. It unashamedly focuses on species rich groups. Species poor groups are obviously important
components of world species diversity, but they lie outside the aims and scope of this volume.
considered as big. For example, a large family could be defined as containing at least 5,000 species
or a large order as containing at least 20,000 species. This approach may work within some groups
such as the angiosperms or insects and for comparisons between them. For example, the grass
family (Hilu, Chapter 11 and Hodkinson et al., Chapter 17) and the insect bug family Miridae
(Cassis et al., Chapter 13) both contain approximately 10,000 species and both can be considered,
under this definition, to be species rich families. These families can also be considered big in that
they usually present a mountainous challenge to systematists specialising in the group.
Therefore, a quantitative approach can sometimes work, but it soon runs into difficulties if used
in a wider context. For example, the threshold value given above could not be used to sensibly
describe the largest families of mammal because no mammal families or genera would be considered
big under such a definition; there are only an estimated 5,500 mammal species in approximately
1,000 genera which themselves tend to be small. The largest mammal order, Rodentia, contains
2,000–3,000 species, but the largest mammal family, Muridae (including mice, rats and gerbils),
has approximately 600 species14–17. Likewise this threshold figure could not be used for the fish
suborder, Labroidei, containing the cichlids (Stauffer et al., Chapter 14), a group with approximately
1,800 species. Clearly this is unsatisfactory, as the cichlids, in most biologists’ minds, are species
rich (850 species of cichlid have been found in the African Great Lake Malawi alone).
A further complication in trying to numerically define a species rich group is that there are no
quantitative ways of defining a particular taxonomic rank. Taxonomic ranks are clearly defined in a
relative hierarchical sense (a genus is a collection of species; a family a collection of genera, and so
on) but not in any absolute numerical sense. Without such common yardsticks, taxonomists can
recognise species and classify them in different ways, and because of this, a taxonomic group in one
rank does not necessarily represent the same degree of distinction (evolutionary divergence) as that
in another taxonomic group of the same rank. For this reason it is often not possible to make meaningful
comparisons from one taxonomic group to another even if they are from the same rank.
The size of taxonomic groups can also be quantified using a phylogenetic approach and sister
clade comparisons. A clade may be large in comparison to its sister clade(s). For example, Hexapoda
in Figure 1.2b are much more species rich than Myriapoda and Crustacea. This approach allows
us to get a relative measure for comparative purposes but is not widely applicable beyond the sister
clades in question. For example, both Myriapoda and Crustacea can be considered large in com-
parison to many other animal groups of the same taxonomic rank. This quantitative method is also
open to the same problems of transferability between taxonomic groups as is the basic quantitative
definition of a species rich group discussed above. Thresholds must be chosen in order to say how
big a group has to be to be regarded as species rich in comparison to its sister groups.
TABLE 1.1
Top Five Species Rich Orders of Insects
Orders Species % of All Insect Species
Note: The five largest orders, representing 6.4% percent of all insect orders, contain approximately
83.5% of all insect species.
the hollow curve in Section 1.2.3 below, where we attempt to combine quantitative and qualitative
definitions, and in Section 1.3.4, where patterns and processes are tackled. The following examples
serve to illustrate the qualitative definition of species rich groups, namely the species rich insects and
the species rich angiosperms and various subgroups within each.
Within the species rich hexapods (Cassis et al., Chapter 13; Figure 1.2b), the insects dominate,
and so far approximately 1 million species have been described and divided into 31 orders. Insects
also make up approximately three quarters of all animal species that have been described. The
insects are, therefore, clearly species rich hexapods and species rich animals. Within the insects,
the vast majority of species are found in one of five orders (Coleoptera, Diptera, Hymenoptera,
Lepidoptera and Hemiptera). These represent 835,000 of the species and over 80% of all insect
species diversity (Table 1.1). They can without difficulty be called species rich orders. The top five
families account for 21% of the species (Table 1.2) despite representing less than 1% of all insect
families, and 20 insect families account for almost 45% of the insects; these can all legitimately
be termed species rich families. A number of species rich genera can also be identified, such as
Agrilus (Coleoptera) with over 8,000 species, Camponotus (Hymenoptera) with over 1,500 species,
and Megaselia (Diptera) also with over 1,500 species (Wall, personal communication).
Such a pattern of uneven species distribution holds true across all major groups of life. For
example, within the angiosperms (more than 250,000 species in 13,185 genera21), five families
TABLE 1.2
Top Five Species Rich Families of Insects
Families Species % of All Insect Species
Note: The five largest families (all beetles), representing less than 1% of all insect families,
contain approximately 21% of all insect species.
TABLE 1.3
Top Five Species Rich Families of Angiosperms
Families Species % of All Species Genera % of All Genera
Note: The largest five families, representing just 1% of all angiosperm families, contain
31.6% of all angiosperm species.
(beans, coffees, daisies, grasses, orchids) account for 31.6% of the species and 32.1% of the genera
(Table 1.3), so these can be legitimately defined as species rich families. The top 10 angiosperm
genera all have more than 1,000 species and account for 7% of all angiosperm species, the largest
15 for 9.3% (Table 1.4) and the largest 50 for 19.8% (data not shown) despite representing only
TABLE 1.4
Top 15 Species Rich Angiosperm Genera
Rank Genus (Family) Number of Species
Note: All top 10 angiosperm genera have at least 1,000 species, and
together they contain 7% of the angiosperm species despite only repre-
senting 0.075% of the genera. The top 15 largest genera contain 23,149
species (9.3% of all angiosperms) despite representing 0.1% of all
angiosperm genera. Syzygium (Mrytaceae) ranks 16th with 1,041 species
(but see Parnell et al., Chapter 16), and Ficus ranks 31st with 750 species
and is the topic of Chapter 9 (Rønsted et al.).
0.4% of all angiosperm genera (calculated from values given in Frodin13 and Mabberley21). These
can all be defined as species rich genera. The taxonomy and systematics of two species rich
angiosperm genera are explored in more detail within this book; Syzygium with between 1,000 and
1,500 species (Parnell et al., Chapter 16) and Ficus with 750 species (Rønsted et al., Chapter 9).
Whilst we believe that the pattern of uneven distribution does allow for the construction of a
qualitative definition of a species rich group it is somewhat unsatisfactory in that it is largely
subjective. How are the defining percentages to be set?
hierarchy. For example, a family can be considered a large group because it contains a large number
of genera, with no reference to the number of species. So we can use terms such as ‘genus rich
family’, ‘family rich order’ or ‘order rich class’.
1
2
Eukaryotes
Archaea
Bacteria
FIGURE 1.3 Tree of life models. (a) A standard phylogenetic tree showing the three domains of life; within
the triangles a standard tree like branching pattern is seen. (b) A network tree incorporating reticulation;
reticulations are seen by endosymbiotic events (fusion of genomes) and by exchange of genes in gene trees. For
example one event involved bacteria giving rise to chloroplasts (1) and another event involved bacteria giving
rise to mitochondria (2). (c) A ring of life, a model used to depict evolutionary pattern especially useful for the
prokaryotes and origin of the bigenomic eukaryotes. Small circles within the ring represent defining ancestors
of the major groups. (Figure 1.3a adapted from Woese30, 1.3b from Zimmer36; 1.3c from Rivera and Lake34.)
the idea that a union has occurred between achaebacterial and eubacterial genomes, likely to be
an endosymbiotic association between two prokaryotes. The evolution of prokaryotes and the notion
of a prokaryotic tree are discussed further in McInerney et al. (Chapter 4).
The debate about the shape of the tree, network or ring of life is essential and stimulating.
However, all models are by definition imperfect, but many are good enough to work from, and a
simple tree is as good a place to start as anywhere else (as it is the simplest of the models). Even
though a network or other model may better explain these patterns, they may not have the same
analytical power or simplicity of a tree (or combination of trees).
1.3.2 BIG TREE RECONSTRUCTION FOR SPECIES RICH GROUPS: ARE LARGE
PHYLOGENETIC TREES ACCURATE?
Most phylogenetic studies have included relatively few species, and only a few studies have included
the large numbers of taxa required for detailed understanding of species rich groups or other large
tree of life problems37,38. The next decade will see the rise of supersized phylogenetic trees
(Hodkinson et al., Chapter 17) because DNA sequencing has become a standard laboratory tech-
nique and costs have dropped. Advances in DNA sequencing techniques are also envisaged.
Phylogenetic analyses will therefore include more characters and more species.
9579_C001.fm Page 13 Wednesday, November 15, 2006 12:10 PM
One major concern is whether methods of phylogenetic reconstruction can accommodate large
datasets. The first step in the production of phylogenetic trees often involves applying a method of
phylogenetic reconstruction such as maximum likelihood, parsimony analysis or Bayesian infer-
ence. The second step in maximum likelihood and parsimony (but not Bayesian inference) involves
the assessment of internal support, via resampling methods such as bootstrapping and the
jacknife39,40 so that the investigator can discriminate between groups with clear phylogenetic signals
and those needing more investigation or more data to resolve44. The production of large phylogenetic
trees and assessing internal support via resampling methods are mathematical and computational
challenges because they involve searches of tree space (the total set of possible trees for the relevant
set of taxa), and the number of possible trees grows more than exponentially with the number of
taxa on the tree. This means that, as the number of taxa increases, the job of accurately finding the
optimal trees under some objective function becomes relatively much more difficult due to the
increase in tree space. We must therefore ask whether existing methods can, or will ever be able
to, accurately reconstruct the phylogeny of species rich groups with several thousands and possibly
hundreds of thousands of taxa. These are the topics explored by Wilkinson and Cotton (Chapter 5),
Bininda-Emonds and Stamatakis (Chapter 6) and Steel (Chapter 7) and to a lesser extent by Wheeler
(Chapter 8) and Hodkinson et al. (Chapter 17). Despite the scale of the problem there is cause for
optimism. Increasing the number of characters in a dataset42–44 and the number of taxa sampled45–47
(see also Hodkinson et al., Chapter 17) generally results in more reliable phylogenetic inferences,
if not limited significantly by computational issues. At some point the computational complexity
of the problem must, however, outweigh the benefits of adding taxa (Bininda-Emonds and
Stamatakis, Chapter 6).
Empirical and theoretical studies show that existing methods perform relatively well with large
datasets47–49. For example, Salamin et al.44 have shown, using Monte Carlo simulations, good
accuracy of parsimony and neighbour joining methods to retrieve model trees with taxon numbers
up to 13,000 (the number of angiosperm genera and close to the number of species in a large
angiosperm family such as the grasses) if sequences of sufficient length (number of nucleotides)
were used (see Hodkinson et al., Chapter 17). Testing the reliability of phylogenetic inference
using, for example, resampling methods is also a major challenge with large DNA matrices41.
However, existing methods and shortcuts perform relatively well41, and we expect that advances in
tree search methods will facilitate this process.
Better and more powerful phylogenetic methods are being developed and tested for analysing
large computationally demanding phylogenetic datasets. These methods can be categorised into
supermatrix and supertree methods50–52 (see also Bininda-Emonds and Stamatakis, Chapter 6; Steel,
Chapter 7). Supermatrix and supertree approaches are not mutually exclusive, as supertrees are
essential in many formal divide-and-conquer analysis methods of single datasets (supermatrices).
These divide-and-conquer strategies seek to break down the problem into smaller subproblems (a
process known as decomposition) that are computationally easier to solve (Wilkinson and Cotton,
Chapter 5). The results from these subproblems are then combined to provide an answer for the
initial global problem. Large analyses may incorporate divide-and-conquer search strategies such
as quartet puzzling and disk covering. These methods are likely to become increasingly important
for analyses of large data sets as well as for searches of smaller data sets using more complex and
computationally demanding optimality criteria.
Wilkinson and Cotton (Chapter 5) discuss advances in supertree methodology as part of a
divide-and-conquer strategy. They explore the issue of effective taxon overlap and how it may be
achieved via suitable decomposition, and they present a new fast supertree method. Bininda-Emonds
and Stamatakis (Chapter 6) further discuss theoretical issues surrounding the reconstruction of
large phylogenetic trees. They investigate the potential to reconstruct phylogenies for species rich
groups and ever-larger portions of the tree of life using a range of methods; they explore the
scalability of phylogenetic accuracy with respect to species number. Their results show that taxon
number itself, especially with the implementation of disk covering methods, may not be the
9579_C001.fm Page 14 Wednesday, November 15, 2006 12:10 PM
constraining factor in these analyses but that the strategy used to sample taxa may have a larger impact
on both accuracy and analysis time.
Large phylogenetic trees can be used for the study of pattern and processes in evolution but also
a whole list of other biological questions. Dobzhansky’s statement that ‘nothing in biology makes
sense except in the light of evolution’52 has almost become a cliché but remains highly relevant
and pertinent.
One of the most commonly used applications of large trees is for classification and taxonomy.
However, they also have wider application to a host of biological and evolutionary questions53,54.
Large trees have convenience from a statistical perspective (Steel, Chapter 7) and there are many
theoretical reasons for using large trees46,55–57. For example, they are required for accurate inferences
of macro-evolutionary processes because in such studies it is desirable to sample most of the
diversity within a study group to reduce the risk of incorrect phylogenetic tree reconstruction and
to allow meaningful comparisons to be made or hypotheses to be tested40,53.
Large phylogenetic trees of species rich taxa are useful tools for detecting diversification rate
variation, extinction and exploring the processes that may have led to the diversity of the group.
We may, for example, wish to know why some groups have become species rich and others have
either failed to diversify or have perished. The distribution of species richness within a phylogenetic
tree, even between closely related groups of organisms, can vary enormously.
As discussed above, the hollow curve18,19 has been used to describe patterns of diversification
where few taxonomic groups are species rich while the majority are species poor. There may be,
for example, an inverse relationship of large to small genera (that is, lots of small genera and few
large ones). Within the angiosperms the frequency distribution of genera containing increasing
numbers of species (number of species in a genus plotted against the number of genera) approxi-
mates to the logarithmic hollow curve, although the first term is always larger than expected.
Because of this, classifications are generally strongly polarised, having some 80% of the genera
smaller than average but some 80% of the species concentrated in genera larger than average58,59.
Age of a genus, species richness of genera and geographical area that the genus occupies tend to
9579_C001.fm Page 15 Wednesday, November 15, 2006 12:10 PM
be correlated, although there are opposing views as to how that correlation maps out. Cronk22
considers large genera to be recent blooms of evolution, whereas Willis interpreted big genera as
being old (for further discussion see Hilu, Chapter 11). Modern phylogenetic reconstruction allows
these alternative hypotheses to be tested. Widespread genera are often larger than continental
genera60,61. Clayton and Renvoize61 suggest that there may be a dichotomy in evolutionary strategies
between large genera speciating in a wide variety of niches and small genera in labile environments
subject to continuing processes of disruption and replacement. These are hypotheses that require
detailed analysis and testing. The properties of the hollow curve and processes leading to it are
discussed in detail by Hilu (Chapter 11) and Parnell et al. (Chapter 16).
A number of tests using the temporal and/or topological properties of phylogenetic trees exist
to determine if diversification variation is statistically significant62–65. In the species rich
angiosperms, for example, diversification can vary by over several orders of magnitude between
clades (Davies and Barraclough, Chapter 10). Furthermore, within any particular angiosperm
family, such as the grasses, diversification rates have also been shown to vary (Hodkinson et al.,
Chapter 17). Factors including key biological traits, coevolution, geography and environmental
variables may have contributed to the variation that exists in net diversification between clades62,65.
Davies and Barraclough (Chapter 10) review studies to explore diversification in flowering plants
using large scale phylogenetic trees. They also discuss further statistical tests to explore these
processes. Rønsted et al. (Chapter 9) also discuss the coevolution and cospeciation of Ficus with
hymenopteran wasps belonging to the species rich insect family Agonidae.
1.4.1 COLLECTING
Collecting trips need to avoid unnecessary duplication and ensure that the maximum species
diversity is sampled. They also need to be shown to be good value for money. Collecting is one
of the main rate determining steps in documenting the world’s species and further characterising
them. The topics of how we should focus and prioritise our collecting efforts to maximise new
species discovery are covered in Chapter 18 (Utteridge and de Kok) and to a lesser degree in
Chapter 2 (Schram) and Chapter 3 (Seberg and Petersen). Collecting is a slow and expensive
process. For example, over 100 grasses were collected in a recent two-week period in New South
Wales and Queensland, Australia, by the first author of this chapter and Surrey Jacobs, a highly
cooperative and experienced grass taxonomist, from the Royal Botanic Gardens, Sydney. One of
the grass species, Alexfloydia repens, is only known from one location in the world. The second,
Homopholis belsonii, is very rare and endangered (Jacobs, personal communication). Both species
took close to a day to track down and collect, entailing considerable financial expense, not to
mention leech attacks, tick infestations and mosquito bites (bloody biodiversity!). Beyond such
anecdotal statements, others have tried to quantify the pace of collecting in an attempt to estimate
the scale of the task. Parnell’s quantification of the costs of collecting66 showed that about 85% of
the costs of collecting a specimen for a number of expeditions were salary associated, with 63%
being direct salary costs. Surprisingly, he showed that expenses such as travel, local living and
9579_C001.fm Page 16 Wednesday, November 15, 2006 12:10 PM
postage for a collecting expedition, which is the part external agencies are most likely to be asked
to fund (and without which the expedition simply cannot occur), constituted only about 12–17%
of the total costs. Seberg and Petersen (Chapter 3) and Cassis et al. (Chapter 13) have tried to
quantify the effort required to sample species rich groups by doing some simple calculations based
on the number of people days it will take to collect all remaining species of a species rich group.
Such estimates allow us to see the scale and potential cost of the problem, but we should also
remember this is only part of the process. It covers the resources required for collection, but not
the additional resources needed for describing and classifying the organisms (that could amount to
the same or more again). In reality these figures are also likely to be underestimates because
geographical areas will need to be resampled many times, at different times of the year, with
different methods (with specialist and generalist collectors; see Utteridge and de Kok, Chapter 18)
before we can be sure that we are close to collecting all species in an area.
systems. We must also remember that the digital interface is only a tool and cannot replace well
trained taxonomists or physical resources such as herbaria and museums. These resources will only
work with international cooperation. Such coordinated action at an international level is also needed
to reach consensus over taxonomic nomenclature and accepted names.
The DNA revolution has offered huge potential to taxonomy and systematics, but as with the
digital revolution, we should take care. Obviously we should be prepared to embrace the methods
where they can offer real help. For example, a recent development that may help with the taxonomy
and systematics of species rich groups is DNA barcoding and DNA taxonomy. The slow pace of
species description and taxonomy has led some to call for a modern DNA based taxonomy70–72. In
this method, DNA sequences are used to identify the organism. Sequences are generated and
compared to sequences found in a database that have known identity and are linked to real,
accurately identified specimens in institutions such as herbaria and museums. The appeal of this
fully automated approach is that anybody should be able to identify an organism without specialist
knowledge of the group. It also offers the potential to develop futuristic tools that can instanta-
neously identify an organism by sampling its DNA and making a comparison to a database of
sequences. This would have particular advantages in species rich groups where taxon identification
is often a problem and synonymy a big issue.
However, there are a number of issues with this technology, especially if interpreted in the
strict sense, including concerns about sequence quality, insufficient sampling within and amongst
species, pseudogenes, herbarium specimen quality and availability, type specimen use and common
occurrence of hybridisation and introgression and associated DNA exchange (capture) between
closely related species. Seberg and Petersen (Chapter 3) discuss the pitfalls of DNA technology
and highlight the danger of using it inappropriately as a shortcut in taxonomy. DNA barcoding is
seen by many as a better alternative in that it uses DNA sequences to aid identification but is not
all prevailing when it comes to identification. DNA can certainly facilitate and improve taxonomy.
DNA sequences have the added bonus that they have high potential for phylogenetics, classification
and for providing a phylogenetic framework for developing a meaningful monographic study
(Hodkinson et al., Chapter 17), although caveats may apply73. Phylogenetics, molecular systematics
and taxonomy are therefore inextricably linked.
To document and characterise the world’s species rich groups is one of the largest challenges
of biology and needs financial and political support. The reason this challenge has not been
adequately addressed is partly because evolution has set us an enormous task and partly because
politicians have not prioritised the problem sufficiently highly; we should therefore blame both
evolution and politicians. However, the task is achievable. Schram, in the next chapter, outlines his
vision of how this could be achieved. Readers may not agree with all his points but will hopefully
find some common ground on most of them. It will require the meshing together of phylogenetics
and taxonomy, considerable advances in informatics, improved and increased collecting, training
of taxonomists and significant financial support. We hope that this book goes some way to help
achieve that aim.
9579_C001.fm Page 18 Wednesday, November 15, 2006 12:10 PM
ACKNOWLEDGEMENTS
We thank Gerry Cassis, Nicolas Salamin and Michael Wall for comments on this manuscript.
REFERENCES
1. Blackmore, S., Biodiversity update: progress in taxonomy, Science, 298, 365, 2002.
2. Pennisi, E., Modernizing the tree of life, Science, 300, 1692, 2003.
3. Wheeler, Q.D., Taxonomic triage and the poverty of phylogeny, Phil. Trans. R. Soc. Lond. B, 359,
571, 2004.
4. Gaston, K.J. and May, R.M., The taxonomy of taxonomists, Nature, 356, 281, 1992.
5. Solow, A.R., Mound, L.A., and Gaston, K.J., Estimating the rate of synonymy, Syst. Biol., 44, 93, 1995.
6. May, R.M., The dimensions of life on earth, in Nature and Human Society: The Quest for a Sustainable
World, National Academy of Sciences Press, Washington DC, 2000.
7. Hodkinson, I.D. and Casson, D., A lesser predilection for bugs — Hemiptera (Insecta) diversity in
tropical rain-forests, Biol. J. Linn. Soc., 43, 101, 1991.
8. Stork, N.E., Insect diversity: facts, fiction and speculation, Biol. J. Linn. Soc., 35, 321, 1988.
9. Ødegaard, F., Diserud, O.H., and Ostbye, K., The importance of plant relatedness for host utilization
among phytophagous insects, Ecol. Lett., 8, 612, 2005.
10. Funch, P. and Kristensen, R.M., Cycliophora is a new phylum with affinities to Entoprocta and
Ectoprocta, Nature, 378, 711, 1995.
11. Guillou, L. et al., Bolidomonas: a new genus with two species belonging to a new algal class, the
Bolidophyceae (Heterokonta), J. Phycol., 35, 368, 1999.
12. Kühn, S., Medin, M., and Eller, G., Phylogenetic position of the parasitoid nanoflagellate Pirsonia
inferred from nuclear-encoded small subunit ribosomal DNA and a description of Pseudopirsonia n.
gen. and Pseudopirsonia mucosa (Drebes) comb. nov., Protist, 155, 143, 2004.
13. Frodin, D.G., History and concepts of big plant genera, Taxon, 53, 753, 2004.
14. Vaughan, T.A., Ryan, J.M., and Capzaplewski, N.J., Mammalogy, 4th ed., Saunders College Publishing,
2000.
15. Michaux, J., Reyes, A., and Catzeflis, F., Evolutionary history of the most speciose mammals:
molecular phylogeny of muroid rodents, Molec. Biol. Evol., 17, 280, 2001.
16. O’Leary, M.A. et al., Building the mammalian sector of the tree of life: combining different data and
a discussion of divergence times for placental mammals, in Assembling the Tree of Life, Cracraft, J.
and Donoghue, M.J., Eds., Oxford University Press, Oxford, 2004, 490.
17. Wilson, D.E. and Reeder, D.M., Eds., Mammal Species of the World, 3rd ed., Johns Hopkins University
Press, 2005.
18. Willis, J.C., Age and Area, Cambridge University Press, Cambridge, 1922.
19. Willis, J.C., The birth and spread of plants, Boissera, 8, 1949.
20. Dial, K.P. and Marzluff, J.M., Nonrandom diversification within taxonomic assemblages, Syst. Zool.,
38, 26, 1989.
21. Mabberley, D.J., The Plant Book, 2nd ed., Cambridge University Press, Cambridge, 1997.
22. Cronk, Q., Measurement of biological and historical influences on plant classifications, Taxon, 38,
357, 1989.
23. Scotland, R.W. and Sanderson, M.J., The significance of few versus many in the tree of life, Science,
303, 643, 2004.
24. Jackson, D.A., Stopping rules in principal components analysis: a comparison of heuristical and
statistical approaches, Ecology, 74, 2204, 1993.
25. Darwin, C., On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured
Races in the Struggle for Life, John Murray, London, 1859.
26. Haeckel, E., Generale Morphologie der Organismen, Verlag von Georg Reimer, Berlin, 1866.
27. Baldauf, S.L., The deep roots of eukaryotes, Science, 300, 1703, 2003.
28. Cracraft, J. and Donoghue, M.J., Assembling the Tree of Life, Oxford University Press, Oxford, 2004.
29. Palmer, J.D., Soltis, D.E., and Chase, M.W., The plant tree of life: an overview and some points of
view, Amer. J. Bot., 91, 1437, 2004.
9579_C001.fm Page 19 Wednesday, November 15, 2006 12:10 PM
30. Woese, C.R., Kandler, O., and Wheelis, M.C., Towards a natural system of organisms: proposal for
the domains Archaea, Bacteria and Eucarya, Proc. Natl. Acad. Sci. USA, 87, 4576, 1990.
31. Woese, C.R., The universal ancestor, Proc. Natl. Acad. Sci. USA, 95, 6854, 1998.
32. Doolittle, W.F., Phylogenetic classification and the universal tree, Science, 284, 2124, 1999.
33. Martin, W. and Embley, T.M., Early evolution comes full circle, Nature, 431, 134, 2004.
34. Rivera, M.C. and Lake, J.A., The ring of life provides evidence for a genome fusion origin of
eukaryotes, Nature, 431, 152, 2004.
35. Linder, C.R. and Rieseberg, L.H., Reconstructing patterns of reticulate evolution in plants, Amer. J.
Bot., 91, 1700, 2004.
36. Zimmer, C., Evolution: The Triumph of an Idea, William Heinemann, London, 2002, 101.
37. Savolainen, V. and Chase M.W., A decade of progress in plant molecular phylogenetics, Trends Genet.,
19, 717, 2003.
38. Sanderson, M.J. and Driskell, A.C., The challenge of constructing large phylogenetic trees, Trends
Plant Sci., 8, 374, 2003.
39. Efron, B., Bootstrap methods: another look at the jackknife. Ann., Stat., 7, 1, 1979.
40. Felsenstein, J., Phylogenies and the comparative method, Am. Nat., 125, 1, 1985.
41. Salamin N., et al., Assessing internal support with large phylogenetic DNA matrices, Molec. Phylo-
genet. Evol., 27, 528, 2003.
42. Erdos, P.L. et al., A few logs suffice to build (almost) all trees: part II, Theor. Comp. Sci., 221, 77, 1999.
43. Bininda-Emonds, O.R.P., et al., Scaling of accuracy in extremely large phylogenetic trees, in Pacific
Symposium on Biocomputing 6, Altman, R.B., et al., Eds., World Scientific Publishing Company,
River Edge, New Jersey, 2001, 547.
44. Salamin, N., Hodkinson T.R., and Savolainen, V., Towards building the tree of life: a simulation study
for all angiosperm genera, Syst. Biol., 54, 183, 2005.
45. Hillis, D.M., Inferring complex phylogenies, Nature, 383, 130, 1996.
46. Hillis, D.M., Taxonomic sampling, phylogenetic accuracy, and investigator bias, Syst. Biol., 47, 3, 1998.
47. Källersjö, M. et al., Simultaneous parsimony jackknife analysis of 2538 rbcL DNA sequences reveals
support for major clades of green plants, land plants, seed plants and flowering plants, Pl. Syst. Evol.,
213, 259, 1998.
48. Soltis, P.S., Soltis, D.E., and Chase, M.W., Angiosperm phylogeny inferred from multiple genes as a
tool for comparative biology, Nature, 402, 402, 1999.
49. Savolainen, V. et al., Phylogeny reconstruction and functional constraints in organellar genomes:
plastid versus animal mitochondrion, Syst., Biol., 51, 638, 2002.
50. Salamin, N., Hodkinson T.R., and Savolainen, V., Building supertrees: an empirical assessment using
the grass family (Poaceae), Syst. Biol., 51, 136, 2002.
51. Wilkinson, M. et al., The shape of supertrees to come: tree shape related properties of fourteen
supertree methods, Syst. Biol., 54, 419, 2005.
52. Dobzhansky, T., Nothing in biology makes sense except in the light of evolution, Am. Biol. Teach.,
35, 125, 1973.
53. Purvis, A., Using interspecies phylogenies to test macroevolutionary hypotheses, in New Uses for
New Phylogenies, Harvey, P.H. et al., Eds., Oxford University Press, Oxford, 1996, 153.
54. Harvey, P.H. et al., Eds., New Uses for New Phylogenies, Oxford University Press, Oxford, 1996.
55. Rannala, B. et al., Taxon sampling and the accuracy of large phylogenies, Syst. Biol., 47, 702, 1998.
56. Källersjö, M., Albert, V.A., and Farris, J.S., Homoplasy increases phylogenetic structure, Cladistics,
15, 91, 1999.
57. Hillis, D.M. et al., Is sparse taxon sampling a problem for phylogenetic inference? Syst. Biol., 52,
124, 2003.
58. Clayton, W.D., Some aspects of the genus concept, Kew Bull., 27, 281, 1972.
59. Clayton, W.D., The logarithmic distribution of angiosperm families, Kew Bull., 29, 271, 1974.
60. Clayton, W.D., Chorology of the genera of Gramineae, Kew Bull., 30, 111, 1975.
61. Clayton, W.D. and Renvoize, S.A., Genera Graminum: Grass Genera of the World, Her Majesty’s
Stationery Office, London, 1986.
62. Barraclough, T.G. and Nee, S., Phylogenetics and speciation, Trends Ecol. Evol., 16, 391, 2001.
63. Chan, K.M.A. and Moore B.R., Whole-tree methods for detecting differential diversification rates,
Syst. Biol., 51, 855, 2002.
9579_C001.fm Page 20 Wednesday, November 15, 2006 12:10 PM
64. Chan, K.M.A. and Moore B.R., SYMMETREE: whole-tree analysis of differential diversification
rates, Bioinformatics, 21, 1709, 2004.
65. Moore, B.R., Chan, K.M.A., and Donoghue, M.J., Detecting diversification rate variation in supertrees,
in Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, Bininda-Emonds,
O.R.P., Ed., Kluwer Academic Publishers, Dordrecht, 2004, 487.
66. Parnell, J.A.N., The monetary value of herbarium collections, in Biological Collections and Biodi-
versity, Rushton, B.S., Hackney, P., and Tyrie, C.R., Eds., Linnean Society of London Special Publi-
cation 3, England, 2001, 271.
67. Compton, J.A., Clennett, J.C.B., and Culham, A., Nomenclature in the dock. Overclassification leads
to instability: a case study in the horticulturally important genus Cyclamen, Bot. J. Linn. Soc., 146,
339, 2004.
68. Bisby, F.A. et al., Taxonomy, at the click of a mouse, Nature, 418, 367, 2002.
69. Wheeler, Q.D., Lipscomb, D., and Platnick, N., Terascale taxonomy: cyber-infrastructure and the
Linnaean legacy, in Proc. of the Fourth Biennial Conference of the Systematics Association, Trinity
College Dublin, Ireland, 2003.
70. Tautz, D. et al., DNA points the way ahead in taxonomy, Nature, 418, 479, 2002.
71. Tautz D. et al., A plea for DNA taxonomy, Trends Ecol. Syst., 18, 70, 2003.
72. Lipscomb, D., Platnick N., and Wheeler, Q., The intellectual content of taxonomy: a comment on
DNA taxonomy, Trends Ecol. Syst., 18, 65, 2003.
73. Stace, C.A., Plant taxonomy and biosystematics: does DNA provide all the answers? Taxon, 54, 999,
2005.
9579_C002.fm Page 21 Saturday, November 11, 2006 10:12 AM
2 Taxonomy/Systematics in the
Twenty-First Century
F. R. Schram
Department of Biology, University of Washington, Seattle, USA
Formerly of Zoological Museum, University of Amsterdam, The Netherlands
CONTENTS
And out of the ground the Lord God formed every beast of the field, and every fowl of the air; and
brought them unto Adam to see what he would call them: and whatsoever Adam called every living
creature, that was the name thereof. And Adam gave names to all cattle, and to the fowl of the air, and
to every beast of the field … (Genesis 2:19–20)
ABSTRACT
Taxonomy/systematics has had a history extending back to the 1880s, with Cassandras issuing dire
warnings about the future of the science, but little hard data exist to document these warnings.
Some institutions have done well, while others have endured severe cutbacks or even disappeared.
Meanwhile, the need for effective biodiversity knowledge is increasing exponentially. The numbers
of species in many groups is truly staggering, and the use of information technology to manage
terascale volumes of data in the science of taxonomy is inarguably essential. The tools to effectively
move on this need to be developed, and online models for specific groups of organisms including
21
9579_C002.fm Page 22 Saturday, November 11, 2006 10:12 AM
species rich groups need to be made available. Some unfortunate decisions and trends in the
management of natural history museums and universities have occurred in the recent past. Human
capital and mobility need to be enhanced. The biodiversity crisis is real. Rivalries must be put
aside, and true cooperation must occur if the crisis is to be addressed. An action plan is needed to:
(1) establish an international structure to deal with issues vital to furthering a healthy taxonomy/
systematics community, including a czar to spearhead the plan; (2) increase spending with funding
levels targeted on per capita population; (3) approach staffing needs in universities with proactive
arguments for replacing retiring staff with taxonomists; (4) channel people and training into the
study of understudied groups of organisms; (5) direct training and education at enhancing human
capital in systematics in developing countries; (6) require and facilitate international cooperation
of networks and institutions; and (7) apply information technology on a large scale with the
establishment of super computing centres.
left at all, let alone institutions in which they work? Are modern-day taxonomists the equivalent of
Cassandras to whom we are not listening? Or are they merely crying wolf?
Hard data is often difficult to come by, as Parnell8 pointed out in connection with his observa-
tions concerning the state of tropical systematic botany. Many of the negative observations are
anecdotal. We can point to institutions such as the Field Museum in Chicago, USA, or the Zoological
Museum in Copenhagen, Denmark, which have successfully maintained curatorial staff numbers
for decades. Nevertheless, there are also institutions such as the Zoological Museum in Amsterdam,
the Netherlands, that have been decimated, where there were 12 curators on the staff in the late
1980s–early 1990s, but where at the time of writing there are only three full-time curators and one
half-time curator. Another seriously affected institution is the Philadelphia Academy of Science,
USA, which recently sustained some significant cuts in curatorial staff9.
At the very least, the apparent general reduction in overall staff numbers in natural history
institutions will make it difficult to meet the target of the Convention on Biological Diversity. Are
we now reaping the result of over a century of de-emphasis of organismic biology in favour of the
more stylish and ‘modern’ molecular studies? I would hope not, since the integration of molecular
studies holds promise for a great stimulus for progress in taxonomy/systematics. We stand on the
threshold of a real renaissance in biodiversity studies. However, we cannot refuse to recognise past
problems. If a living dynamic science is to grow, then steps have to be taken to correct these
conditions.
higher level phylogenetic relationships of 10,000 species of grasses. In addition, often there are
curious disjunctions in our knowledge. Nina Rønsted et al. (Chapter 9) admit that, while entomol-
ogists have a good understanding of the phylogenetic relationships of fig wasps, the 750 species
of figs upon which the wasps are symbionts are only now being sorted out.
These numbers are startling enough. However, one example can serve to illustrate what is
involved with terascaling. I have published an inventory of the species of mantis shrimp, or
Stomatopoda11. This is a relatively small group of crustaceans, comprising only about 482 fossil
and living species. Even with the advantage of beginning with an established database it, nonethe-
less, took me almost two person years to track down all the valid species and available names. I
also assembled information on species distributions; most of it not yet in geographic coordinates
let alone in any Geographical Information System (GIS) format. That is, much of it was only
general designations of approximate localities, depth ranges, general habitat (not always available
I discovered), colour (an important element in the biology for mantis shrimp) and size ranges for
both sexes. I had only mixed success in this effort despite the fact that this is a group that has had
intensive scholarly attention over the last 40 years. Some 25% of the species of mantis shrimp are
known from little more than their original morphological description based on a single specimen
of only one of the sexes in museum collections; in other words, with little or no data yet available
on depth, details of locality and habitat preferences. Whole genera of the mantis shrimps are
incompletely understood due to these parameters.
Now, multiply the work load involved for the assembling of base data for the 482 mantis
shrimp by whatever factor you need to scale up to any of the figures given above, for example,
the 10,000 species of grasses. One can clearly see that to arrive at just a basic species list for
some groups will entail intense effort. Why is this so? I never cease to be amazed at how much
alpha taxonomic science has been done sloppily, an earmark perhaps of its cottage industry
tradition. For example, a major effort in assembling the above mentioned catalogue of stomatopods
entailed tracing down the location of, and catalogue numbers for, the type specimens. A significant
amount of such material was never clearly designated in the original taxonomic descriptions (and
many of these are late twentieth-century papers), and a significant number of the total array of
mantis shrimp type specimens remain ‘lost’. This little group of crustaceans is not unique.
Moreover, assembling the basic species list of life is only the beginning. Wheeler has pointed out
that to terascale our knowledge of the diversity of life, that is, to be able to link the basic species
list to information about nomenclatural history, ecology, behaviour, development and geographic
distribution (with proper GIS coordinates) will be a daunting task. This is not to say we should
not undertake this effort; we must! However, it will not be easy. It will take time and money, as
well as patience and persistence. This effort will also distract taxonomists and slow down their
efforts to describe new species.
Modern information technology can achieve some amazing things DNA barcoding is one
approach that derives directly from inventory control in consumer marketing. Recent discussions
at conferences have even proposed the possibility that one day inventory barcoding might be
combined with science fiction so that StarTrek-like ‘tricorders’ aimed at specimens in the field are
linked by satellite to central databases of previously identified DNA sequences. This would certainly
be an exciting new tool if it could be perfected, but Wheeler1 warns that, unless it is handled in a
rational way, barcoding could do more harm than good (see also Seberg and Petersen, Chapter 3).
This does not mean we should not undertake barcoding, we just need to be careful that we make
clear to people or agencies demanding this kind of service what the limitations of the tool are. In
this case, the technology might serve to belittle real species as discovered in nature. Barcoded DNA
sequences are rather dry data, but majestic 100-metre redwood trees or dazzling male peacocks in
full colour are real. As Wheeler1 asks, “Why forego all that is intellectually engaging and aesthet-
ically beautiful to settle for what is clinically efficient?”. There is wisdom in these words. There
is a value in standing in front of actual species in nature; it is the principle upon which pilgrimage,
and indeed mass tourism, is based.
9579_C002.fm Page 25 Saturday, November 11, 2006 10:12 AM
Information technology tools must be practically focused. Just because we can do something
does not mean we have to do something. Too many recently funded projects and programmes
involving information technology have been done because they could be done, that is, they were
undertaken because money was available, and someone from the ‘top’ enticed the scientists ‘below’
to undertake it. As an example, a European Union funded programme was recently completed to
produce a species list of the terrestrial and freshwater species in Fauna Europaea (FE)
(http://www.faunaeur.org). Some 4,000,000 over four years was expended. The stated target
audience included the public agencies that might have call to have this data, such as customs officers
who might want to know if someone is trying to import an endangered species. Much intense
argument occurred amongst cooperating taxonomists within the programme concerning what sorts
of data were to be included in the checklist. Furthermore, an unexpected amount of time was needed
to develop the software to enter the data and to provide online access. However, what was achieved
in the end works better for specialists rather than the non-scientist public servants on whom the
development of the database was originally justified.
This programming problem is a critical point. A similar earlier project for the European Register
of Marine Species (ERMS) ran into the same difficulties, which in the end were never resolved.
ERMS settled on merely putting the faunal lists online (http://www.marbef.org/data/erms.php). This
was not necessarily all bad. ERMS has the advantage of ease of use, allowing users to see whole
lists of species without having to know specific Linnaean binomens before the website is further
engaged. On the other hand, FE has a rather sophisticated website, but it assumes that any user
will know a name to begin with before any data can be accessed, and of course the slightest
misspelling (not uncommon with Latin binomens) will yield nothing.
I do not find these types of top down databases as useful as the bottom up sites that have been
assembled by scientists ‘in the trenches’ on an as needed basis. As an example, the Crayfish Home
Page is amongst the nicest and most useful (http://crayfish.byu.edu). Here one can find all sorts of
practical and useful data including a complete taxonomy of the group, regional faunal lists, biogeo-
graphic distributions (with maps), photographs of species, and links to other sites. It is a well designed
and easy to use site. Its utility is a reflection of its bottom up genesis. The Crayfish Home Page shows
every sign of evolving into the kind of online monograph for which Godfray12 called. Slightly more
complex is Antbase (http://www.antbase.org), which focuses on the 11,000 species of ants. Antbase
is a bit more difficult to navigate than the Crayfish Home Page until one becomes accustomed to the
layout, but it does provide very up-to-date information and other material, including scanned literature
for old species descriptions. At the other end of the spectrum, the Turbellarian Taxonomic Database
(http://devbio.umesci.maine.edu/styler/turbellaria) is simplicity itself to use.
Naturally, these kinds of websites are not easy to develop, but fortunately there are people who
specialise in doing such technical things. It merely requires that the people with the taxonomic
expertise pair with a person with the necessary information technology expertise, and if done
correctly the team can emerge with a very useful tool that not only advances and facilitates research
in that group, but is also attractive, easy to navigate and informative to the general public. We
should look forward to the day that there is a website for every taxonomist, or better yet for every
cooperating network of taxonomists.
intellectual in their focus. They were responsible for the first great blooming of science and natural
history in the 1700s. They persisted in this vein for almost 200 years with the addition of exhibits
in glass cases of excess materials for the edification and education of the public.
All of this began to quickly fall by the wayside on the occasion of a particularly famous royal
visit in 1977, the King Tut Exhibit. This blockbuster travelled the world drawing huge crowds to
institutions in a few weeks that were only accustomed to similar numbers of entrants spread out
over several years. Suddenly, there was not a museum director or board of trustees in the world
that was not asking ‘Why not us? Why not here?’. The blockbuster exhibit suddenly became a
permanent fixture in the calendars of museum events. This happened without any serious debate
on what museums used to be and the ‘infotainment’ venues they became. There was a practical
downside as well. Carpeting in exhibition halls that was to last years, toilets that were to survive
decades, had to be replaced after King Tut moved on to his next venue, immediately consuming
some of the profits generated from the blockbuster. For natural history museums this was further
exacerbated by the appearance a few years after Tut’s show by the marvelously clever dinorobots
that in many places eventually became parts of permanent exhibits.
I enjoy blockbuster exhibitions like everyone else. What I bewail is the impact they have had on
institutions that had formed an important part of fundamental science and real education of the public.
Institutions have handled the impact in different ways. In the mid-1960s, when I first went to work
as a research assistant in the Field Museum, my boss took me down to the personnel and finance
department to be signed on to the payroll. It was located in a single large room on the ground floor
and about six people, including the department head, handled the entire operation. Today, the depart-
ments of personnel, financial operations and development occupy significant portions of that museum.
Conversely, what was a rather modest exhibits department in the mid 1960s, is now staffed by a
considerable number of people well housed in quarters quite expanded from what they were. As a
scientist, I cannot complain about what happened at that particular museum. Significant improvements
were made to the collections storage facilities in the ensuing decades, and the number of active
scientist/curators has been maintained and perhaps somewhat expanded from what they were. I do
regret that the amount of exhibition space for the general public has decreased while needs for more
office and marketing space were assuaged. However, not all the museums of natural history in the
world have fared so well. Many smaller collections have been orphaned in the last 40 years, and
whilst their demise was one thing, their impact on the larger institutions that have taken on the curation
of the orphaned collections has been significant13.
There have been some notable good and useful things happening in the world of natural history
museums. The creation of virtual museums online and their increasing linkage through initiatives
such as the Global Biodiversity Information Facility (GBIF) have been very helpful (http://www.
gbif.org). In connection with my stomatopod catalogue mentioned above, I was able to make good
use of online data for the verification of location of type specimens, saving me weeks of time
relying on regular surface post or actual physical visits to collections. There is a slowly growing
development of online collection records. Some of these are marvellously constructed for ease of
use. The American Museum of Natural History has an excellent site that is relatively easy to use
(http://research.amnh.org/informatics) with an hierarchical layout that allows one to navigate the
collections without having to necessarily know Linnaean binomens. The Smithsonian National
Museum of Natural History is another example (http://www.mnh.si.edu/rc/db/colldb.html), although
some sections seem to have organised their data in a more user friendly format than others. The
Royal Botanic Gardens Kew’s epic site is another good example (http://www. rbgkew.org.uk/
epic/index.htm) that will soon allow linked online access to images created for the African Plants
Initiative, one of the most exciting digitisation projects in herbaria worldwide (John Parnell, personal
communication). In addition, the Chicago area institutions that cooperated in setting up Plant Base
have done a good job in creating relatively easy access to their herbarium and botanical garden
holdings. If all museums, herbaria, live culture collections and botanic gardens did similar jobs,
the ease of taxonomic research would be increased considerably.
9579_C002.fm Page 27 Saturday, November 11, 2006 10:12 AM
well as the general public). Eventually the scientists surrendered. Now even the hardened profes-
sionals use the word biodiversity without a second thought, but it took years of persistence to get
this acceptance. The biodiversity crisis arises because world events threaten to destroy much of the
evidence of evolution of biological diversity before it can be discovered and described. The
biodiversity crisis is real. Nevertheless, whilst the current crisis is largely attributed to man-made
changes, biodiversity crises have occurred repeatedly in the history of the planet. This is not offered
as an excuse to do nothing, but it does offer an opportunity to examine why the time imperative
is so critical now.
How systematists treat each other is another great problem. All too often, these interactions
entail rancour. Good, old fashioned argument and debate in science is healthy; it makes for progress.
Some sciences are known for it. Particle physics is a case in point, where my physicist friends tell
me that the arguments at meetings and conferences can be intense, but when everyone retires and
drinks some beer, all is harmony. Systematists do this too. The difference is that when particle
physicists return home and receive colleagues’ grant proposals to review, they generally see the
benefit for the field as a whole and provide a good evaluation. All too often when systematists get
proposals to review, they remember the arguments at the meetings rather than the beer that was
drunk afterwards, see not the good of the field but rather take the opportunity to even scores, and
provide bad or cool reviews to colleagues’ proposals. Admittedly, not everyone does this, but enough
do, and I am reliably informed that it remains a problem for maintaining a viable level of funding
for the field of systematics as a whole around the world. Meanwhile, important aspects of the
biodiversity crisis go unaddressed.
Aside from reacting more kindly to each other, there are many things that taxonomist/system-
atists can do in their own research that would relieve the workload. Mark Wilkinson and James
Cotton (Chapter 5) demonstrate that there are tricks we can employ to deal with terascale data
while the software and hardware available to systematics are being upgraded. This field, however,
has sustained much argument of late, both in journals and in scientific meetings, as to whether it
is best to use either supertrees or supermatrices (total evidence). This is again an example of one
of those pointless and suicidal tendencies in systematics. Neither method is perfect; each has both
strong and weak points; each is a way to organise and process very large chunks of information.
So, why argue and why not do both?
Systematists have to learn to talk with each other, and listen. Systematists are sometimes like
people on islands, isolated by deep and dangerous waters from the other islands (other systematists).
What taxonomy/systematics needs to develop is a ‘polder model’ for our science. The Dutch replaced
many of their islands with polders. Early in their history, the Dutch discovered they had to sit together
and come to a consensus as to how, and where, to build dams and dikes to sequester areas of the
landscape out of which they could pump the water and unite the high terps into good, productive
polder land. Cooperation in this effort was seen as a benefit to all; so too it should be with systematics.
The purpose of our work is not to pointlessly argue about ‘methodologies’, a term I have come to
loathe, but to discover biodiversity, describe it, and try to understand it! Why is it a do or die proposition
if one uses method A versus method B in one’s work? Use both and get on with it! All too often
there is little sense of balance, or even common sense, in the debates of taxonomists.
many times in the past. Yet, we still read articles every few years bewailing the state of taxonomy/
systematics. We can keep doing this, or we can get organised. What I think we need is an
international coordinating council to push these issues and keep pushing. It should be formed out
of the taxonomically oriented societies and institutions. Maybe this body should be attached to
GBIF, or maybe it should come out of the international commissions; whatever, it needs to be
formed urgently. What we need in effect is a taxonomic czar with dedicated cadres who know
where the buttons are to push on the political and institutional scene and when to push them. The
objective of this commission would be to advance the points to follow.
2.6.3 JOBS
Third, we do need more jobs; permanent positions for taxonomists and systematists the world
over. This has been a repeated theme through the decades. Jobs mean money to hire people. How
this may be achieved is problematic. One avenue meriting further consideration concerns recruit-
ment in universities in North America and Europe. If job adverts are any indication, many of these
positions are calling for floristic and faunistic specialists. While subspeciality is often open, many
are intended for ecologists and behaviourists. It would seem to me, however, that many of these
9579_C002.fm Page 30 Saturday, November 11, 2006 10:12 AM
positions should go to morphologists and taxonomists with some proactive persuasion being exerted
from colleagues and scientific societies when the positions become available.
2.6.7 INFORMATICS
Seventh, the adoption of technology needs to be encouraged and supported even more than it is
now. We need to move quickly from the model project stage to the actuated system stage. This could
9579_C002.fm Page 31 Saturday, November 11, 2006 10:12 AM
ACKNOWLEDGEMENTS
Dr. Ronald Sluys, University of Amsterdam, offered some useful comments on the manuscript. He
and Dr. Cees Hof, University of Amsterdam, and Prof. Koen Martens, now of the Royal Belgian
Institute, Brussels, have offered debate and discussion about many of these issues in the past, from
which I have drawn ideas and inspiration.
REFERENCES
1. Wheeler, Q.D., Taxonomic triage and the poverty of phylogeny, Phil. Trans. R. Soc. Lond. B, 359,
571, 2004.
2. Wheeler, Q.D., Raven, P.H., and Wilson, E.O., Taxonomy: impediment or expedient? Science, 303,
285, 2004.
3. Carvalho, M.R. et al., Revisiting the taxonomic imperative, Science, 307, 353, 2005.
4. Manning, R.B., The importance of taxonomy and museums in the 1990s, Memoirs of the Queensland
Museum, 31, 205, 1991.
5. Feldmann, R.M. and Manning, R.B., Crisis in systematic biology in the ‘age of biodiversity’, J. Paleontol.,
66, 157, 1992.
6. Anonymous, Conference on the Importance of Systematics in Biology, April 22 1953, National
Academy of Science, National Research Council, Washington, DC, 1953.
7. Schmitt, W.L., The study of scientific material in the museum, The Museum News, 8, 8, 1930.
8. Parnell, J., Plant taxonomic research, with special reference to the tropics: problems and potential
solutions, Conserv. Biol., 7, 809, 1993.
9. Kaiser, J., Philadelphia institution forced to cut curators, Science, 307, 28, 2005.
10. Almeda, F. et al., Miconia, 1531 species names, 1061 readily distinguishable entities, in Systematics,
Fourth Biennial Conference of the Systematics Association, Program and Abstracts, Trinity College,
Dublin, 2003, 15.
9579_C002.fm Page 32 Saturday, November 11, 2006 10:12 AM
11. Schram, F.R. and Müller, H.G., Catalog and Bibliography of the Fossil and Recent Stomatopoda,
Backhuys Publ., Leiden, 2004.
12. Godfray, H.C.J., Challenges for taxonomy, Nature, 417, 17, 2002.
13. West, R.M., Endangered and orphaned natural history and anthropology collections in the United
States and Canada, Collection Forum, 4, 65, 1988.
14. Wilson, E.O., The biological diversity crisis, Bioscience, 35, 700, 1985.
15. Schram, F.R., The truly new systematics: megascience in the information age, Hydrobiologia, 519,
1, 2004.
9579_C003.fm Page 33 Saturday, November 11, 2006 4:04 PM
CONTENTS
3.1 Introduction.............................................................................................................................33
3.2 The Scale of the Problem.......................................................................................................34
3.3 Shortcuts in Systematics: DNA Taxonomy ...........................................................................35
3.4 The Identification Problem.....................................................................................................36
3.4.1 DNA Barcoding..........................................................................................................36
3.4.2 Practical Problems of DNA-Based Methods .............................................................38
3.5 Instability of Linnaean Names ...............................................................................................40
3.6 Taxonomic Bias ......................................................................................................................40
3.7 The Taxonomic Impediment ..................................................................................................41
3.8 Inadequacy of Taxonomic Data and Standards in Existing Databases .................................41
3.9 Conclusions.............................................................................................................................42
Acknowledgements ..........................................................................................................................43
References ........................................................................................................................................43
ABSTRACT
Assembling the tree of life is ‘big science’, and this chapter discusses the magnitude of the task.
It also discusses in greater depth DNA taxonomy and DNA barcoding, two recent shortcuts, that
have been proposed to achieve the goal of assembling the tree of life. Whilst DNA taxonomy is
largely a futile exercise, DNA barcodes, short standardised portions of the genome, may become
helpful tools in species identification, especially in species rich taxa. However, it is less likely that
barcodes will significantly speed up the discovery of new species.
3.1 INTRODUCTION
Phylogenetic information is central to biology1 and has proven useful in many fields, such as
choosing experimental systems for biological research, tracking the origin and spread of emerging
diseases and their vectors, bioprospecting for pharmaceutical and agrochemical products, preserving
germplasm, targeting biological control of invasive species and evaluating risk factors for species
conservation and ecosystem restoration2.
33
9579_C003.fm Page 34 Saturday, November 11, 2006 4:04 PM
Acknowledging that many branches in the tree of life remain unanalysed and unresolved, particularly
the species rich groups, and accepting that we have only limited information about most species on
Earth, have been significant factors behind the USA National Science Foundation’s recent initiative to
launch a major programme aimed at assembling the tree of life (http://www.nsf.gov/bio/progdes/ bioa-
tol.htm). Assembling the tree of life is ‘big science’, and its planetary scope makes it mandatory that all
countries realise their responsibility for adding to this endeavour.
TABLE 3.1
The Number of New Species Described per Year, the Total Number
of Accepted/Described Species, and the Estimated Total Number
of Species in Selected Groups
New Accepted/Described4,6 Estimated Total
Note: In some sources, for example Groombridge and Jenkins4 and Scotland and Wortley6, the
terms ‘accepted species’ and ‘described species’ are used interchangeably. The total number of new
species of Mandibulata described covers insects only. Estimates for diatom species numbers range
from 6,000 to 1,000,000 according to Guiry in Parnell10.
Source: Data from Groombridge and Jenkins4; Hall and Hawksworth7; Hammond8; Hawksworth9.
9579_C003.fm Page 35 Saturday, November 11, 2006 4:04 PM
2500
2000
1500
1000
500
0
1975 1980 1985 1990 1995 2000 2005
FIGURE 3.1 Cumulative number of publications in WebSpirs (version 5.02) since 1980 that cite the term
taxonomic revision in the title or abstract (r2 = 0.99, P = 1.0)
All these estimates ignore the fact that one has to find the new species first13,14 and that there
is a paucity of field naturalists. If the accumulation of type specimens at the Royal Botanic Gardens,
Kew, UK, and the US National Herbarium is an indication of taxonomic activities, the description
of new plant species has decreased dramatically since 190915. Even revolutions in taxonomic
methods5 will not solve this problem13. For this reason alone, it seems overly optimistic to assume
that the task of describing all species will be done in only 25 years as the All Species Foundation
(http://www.all-species.org) hopes to do16. It is pertinent to reiterate Raven’s recent statement:
“Finally, nothing will substitute for the activities of the field naturalist. No matter how much we
speak about instant identification through DNA analysis, hand-held keys or other modern
approaches, unless there are very many people who can recognise organisms, find them, go into
the field and find them again, whether they be in the tropical moist forests of the Congo or the
chalk grasslands on the South Downs of England, nothing will work”17.
In marked contrast to the nearly exponential increase in number of papers that deal with
molecular phylogenies1,18, the number of published taxonomic revisions has been remarkably
constant over the last 20 years and has steadily been in the region of 90 per year (Figure 3.1).
Acknowledging the bias of WebSpirs, this figure is verging on the insignificant and is unlikely to
increase in the near to middle future.
It has been suggested that somewhere between 50,000 and 80,000 species19,20 are currently
placed in a phylogenetic analysis. Accordingly, the task of gathering the overwhelming number of
basic building blocks to assemble the tree of life will require an effort that will be orders of
magnitude larger than the work that went into sequencing the human genome. To speed up the
process of describing unknown biodiversity and to make it possible to assemble the tree of life
within reasonable time, it is tempting to develop shortcuts to reach the goals. Two such recent
proposals are DNA taxonomy21,22 and supertrees23–25. Supertrees construct new trees based on the
topologies of trees obtained from existing phylogenetic trees. In this way they substitute the collection
of new data with analyses of trees derived from different data sources (occasionally analysed with
incompatible methods) and at least partially different taxon sampling. Supertrees are not dealt with
further here, as their advantages and disadvantages are discussed elsewhere (Hodkinson and Parnell,
Chapter 1; Wilkinson and Cotton, Chapter 5; Bininda-Emonds and Stamatakis, Chapter 6).
as a first approximation, an identification tag, for the species from which the DNA sample was
extracted. The sequence is compared against existing sequences and made available to the scientific
community through appropriate databases together with other types of information, ideally includ-
ing its taxonomic status. The sequence is a standard for future reference and should ideally be
linked with the type specimen and the DNA preparation. As a prerequisite, existing Linnaean names
should be matched with appropriate DNA sequences. However, many or most existing type spec-
imens are not useful for this purpose. In such instances, DNA preparations should be based mainly
on sequences from newly collected individuals, preferably from the type locality, and identified by
experienced taxonomists; it is even suggested that these should have the nomenclatural status of
neotypes and hence should replace the existing types22!
Even if theoretically possible, DNA taxonomy seems to pay no attention to the fact that many
species are known only from the type collection. Although the ‘neotypifications’ proposed by Tautz
et al.22 seem straightforward, it is not necessarily a simple matter given, for example, that 50–70%
of all arthropod species only turn up as one or two specimens in most surveys, approximately 40%
of all beetles are only known from one locality5, and roughly half of the 38,000 known spiders
were originally described on the basis of a single specimen26. Generally, it will be extremely difficult
and labour intensive to find neotypes in all the megadiverse groups. Perhaps even worse, given the
postulated lack of taxonomic experience (and the complete lack of expertise in many fields) one
may wonder who the experienced taxonomists are that should undertake the task of verifying
existing species identifications and identify the new samples in the endless number of instances of
‘neotypification’? Although it is unlikely to happen, consider the havoc that would ensue when
mixing the types of Homo sapiens and Pan troglodytes. There are innumerable examples far less
likely to be spotted).
Whilst in principle acknowledging the importance of morphological information, it emerges that
in DNA taxonomy sequences will have preference over all other types of data, even if it involves total
destruction of the specimen (or type) and replacement by a photograph. Tautz et al.22 suggest that the
routine identification of specimens collected during ecological studies should be done by high through-
put DNA sequencing facilities. Such facilities “could routinely handle c. 1,000 samples per day” at
a cost of 5 per sample (a calculation that of course disregards all other expenses, such as equipment,
technical and scientific staff).
According to its proponents, DNA taxonomy will solve a number of pertinent problems, each
of which is also relevant to DNA barcoding, and these are discussed in the following sections:
DNA barcoding is a technique for characterizing species of organisms using a short DNA sequence
from a standard and agreed-upon position in the genome. DNA barcodes are therefore useful to taxon-
omists who are trying to discover, distinguish and describe new species, and to anyone who is trying
to assign an unidentified specimen to a known species. DNA barcodes can be a powerful addition to the
traditional methods we use to discover new species and identify specimens. They can be used by people
who are not experts on a particular group of organisms, and can be obtained from specimens that are hard
or impossible to identify with traditional methods (like damaged, incomplete, or immature specimens).
(Consortium for the Barcode of Life, CBOL, http://barcoding.si.edu/index_detail.htm).
In striking contrast, DNA taxonomy clearly goes beyond species identification and puts prime
importance on the sequence22. The widespread acceptance of DNA barcoding, even within the
usually very conservative taxonomic community, and the willingness of larger institutions to use
their resources for barcoding is evident from the success of the newly founded CBOL, which was
created as an international initiative in 2004, and in December 2005 the consortium counted 93
member organisations including many botanic gardens, herbaria, natural history museums and zoos.
The major advantages and applications of barcoding may well lie outside the taxonomic commu-
nity, in applied taxonomy34 where it will enable nonspecialists in governmental and intergovernmental
agencies, NGOs and other users of taxonomic information, to produce fast and reliable species
identifications. Within the taxonomic realm its strength lies in its ability to provide researchers with
a new way of identifying potentially new species38–41. Even for the experienced taxonomist, species
identification from material at all life history stages (or from fragments of specimens) may be difficult
if not impossible. In such cases barcoding offers tremendous help40.
Obviously, finding a new barcode sequence does not mean a new species has been
discovered30,31,42; the new sequence may just add to the variation of an already described species, but
it directs attention to the potentially undescribed. It is not the intention, and certainly not necessary
or desirable, to use mtDNA divergence (or divergence of whatever sequence is being used) as “a
primary criterion for recognizing species boundaries”35 (see also Sites and Marshall 43). Similarly,
different species may have identical barcode sequences, depending on the choice of sequences used.
For barcoding to work, a number of problems must be solved or minimised to the largest
possible extent. Ideally, a simple, short DNA sequence should be able to identify all known and
unknown species. Thus, a sequence of only 15 base pairs (bps) is in theory able to distinguish four
to the power of 15 (more than a billion) species; a figure that by far exceeds even the most unrealistic
estimates of the Earth’s biodiversity. However, it is evident that no such single, universally applicable
sequence exists. In the animal kingdom, there has been a strong focus and increased consensus on
using a small portion (c. 650 bps) of the mitochondrial COI (cytochrome c oxidase subunit I)
gene38,39, although small subunit ribosomal sequences (SSU) are also in use44,45.
Among higher plants, mitochondrial sequences are unlikely to be useful as barcodes simply
because they are far too invariable. Several attempts to find other suitable regions including
nonprotein coding plastid regions are under way36,46. Finding suitable DNA regions depends on
finding short sequences with sufficient variability to discern even closely related species, but at the
same time intraspecific variation should be minimal (if possible nonexistent)35. In some animal
groups COI meets these criteria, but in others it does not, and it may be necessary to look for
different sequences in different taxonomic groups. In the higher plant community an approach that
rests either on simultaneous use of different barcode sequences, or on a hierarchical system of
identification, seems far more acceptable than among zoologists and is perhaps the only way
forward. It appears likely that the widely used plastid gene rbcL (the large subunit of ribulose-1,5-
biphosphate carboxylase/oxygenase) will in the majority of cases be able to identify plant species
to genus or family level, but other sequences will be required for identifications at the species level,
most likely even different sequences in different taxa. Obviously, in species identification there will
be instances where barcode sequences behave just as poorly or worse than morphology. Recently
evolved species may lack clear sequence divergence, as has occurred in many island environments47.
Equally, organelle sequences (which are in most instances inherited from single parents) are, despite
9579_C003.fm Page 38 Saturday, November 11, 2006 4:04 PM
their practical advantages, unable to allow the identification of hybrids, instances of introgression
and recent polyploidisation, all of which may blur species boundaries.
For most taxonomists DNA barcodes may never become anything more than a new gizmo in
the toolbox. For researchers in other fields of biology they may become another identification tool
akin to a good illustrated field guide. Obviously, in instances when organisms have no or limited
morphological variation or when they are unable to be cultured, such as most prokaryotes and
nematodes, the only option may be to collect DNA sequence data and invent a classification48,49,
which reflects sequence similarity only. This is of course a caricature of a classification and has
resulted from a simple need for recognition/identification. Needless to say, microbiologists and
other scientists working with such groups would prefer to know considerably more about their
organisms than this, and they know that they can do far better if the organisms can be cultured28.
To use this parody of a classification as an argument for revolutionising taxonomy seems bizarre50.
However, once made, DNA barcodes may obviously be used for purposes other than identification.
Even though the length and variation level of most barcode sequences make them largely inappropriate
for phylogenetic analyses, they do include information that makes them amenable to rough and dirty
estimates of phylogeny22, and they are a resource that may be combined with other data.
It is obvious that DNA barcodes may be viewed as a first step on the slippery slope to reach
the goals of DNA taxonomy34, but this is a route that we wholeheartedly advise against28,29, and
CBOL has wisely avoided it. Implementation of DNA taxonomy and the widespread, naïve attitudes
towards classification as articulated by, for example, Felsenstein in his recent textbook51 in which
he takes a stance on ‘the irrelevance of classification’, will only serve to take us further away from
assembling the tree of life, and represent an unwarranted arrogance and ignorance towards one of
the central issues in biology.
If we imagine that all the described species are valid (that is, no synonymy) and are immediately
available for sequencing, it would require approximately seven years (given 225 work days, of
eight hours per day, per year) to produce a single sequence for each, given the availability of a
single high-throughput DNA sequencing facility, that could routinely handle c. 1,000 samples per
day, as estimated by Tautz et al.22. However, just producing the sequences is insufficient, as they
also have to be checked and read. Even if we assume absolutely perfect sequences of c.1,000 bps
and a minimal handling time of five minutes per sequence, it would take approximately 10.5 years
to handle just one year’s worth of the generated sequences. This amounts to more than 80 years for
all known species and around 640 years for the estimated total of species, at a cost of approximately
€8 and €70 million, respectively (neglecting all other expenses such as equipment and salaries for
technical and scientific staff).
Evidently, a much higher level of automation is possible at all stages in the process than is
standard, but sequencing such an enormous diversity of species is not a straightforward process, and
DNA extraction (which has not been included in the calculations) is far from trivial. In GenBank,
the overall acquisition rate of sequence(s) from new species has been constant in the period
1995–2003 at 2,088 (r2 = 0.99, P = 0.95) new species per year. This also applies to green plants
(Viridiplantae = Chlorobiota) where there has been a constant acquisition rate of 764 (r2 = 0.95, P =
0.77) new species per year. In comparison, the accumulation of sequence from the very widely used
rbcL gene has also been constant but a quarter of the total species acquisition rate. However, given
the widespread commitment to barcoding, there is every reason to believe that barcodes will be
obtained at a significantly increased rate.
Existing collections are of course a potentially valuable source of material both for DNA
taxonomy and barcoding, and as stated by Tautz et al.22 and Blaxter44 it may be possible to nonde-
structively sample large animals, insects, most plants and fungi in existing collections. However, the
present, often well justified, reluctance of many curators to accept destructive sampling of many
collections (for example, in the case of small insects or when only the type or very few collections
are known) makes it difficult to believe that such practice will ensue more widely in groups that are
less suitable for DNA extraction. It is an additional complication that a very large fraction of the
existing types are in excess of 100 years old, which would significantly increase the error rates
during PCR and complicate matters when assembling the small fractions of sequence that may be
amplified from low-quality DNA.
Additionally, a large number of specimens are preserved under conditions that makes successful
large-scale recovery of DNA unrealistic, such as animals pickled in formalin and plants dried with
alcohol. All specimens have interesting features that go beyond their contents of nucleic acids, and
it would be ill advised to replace them, if successfully extracted, with a DNA sample or a photograph.
Knowing that, if the quality of the extraction and the sequence(s) are far from perfect, all we might
be left with is one or more lousy sequences and a photograph. Every practicing taxonomist is aware
of the limited value of types that exist only as descriptions or drawings. For example, although the
actual taxonomic status of the new shrike species, Laniarius liberatus Smith et al.52 was based on
extensive studies of morphology and behaviour, only a minimalist holotype (photographs, moulted
feathers and minute samples of blood) were used to designate the type. This created considerable
furore among ornithologists53–55, and the decision not to preserve a complete specimen seems ill
founded. However, it is important to stress that neither the zoological, nor the botanical, code
preclude the inclusion of DNA data in the diagnosis or description of species, or in the designation
of tissue samples as types.
One of the alleged advantages of DNA taxonomy and DNA barcoding is that sequence infor-
mation is digital and not influenced by subjective assessment. This is unquestionably true from the
very moment the sequence is entered into the computer, but a series of decisions leading up to this
particular moment are certainly not. Errors occur due to misincorporation of bases, misreading of
chromatograms, mistyping of results and miscommunication of the sequences to the database. Some
of these errors can be quantified, like the misincorporation of bases by the polymerase, but others
9579_C003.fm Page 40 Saturday, November 11, 2006 4:04 PM
are difficult to estimate56. However, the problems are all largely technical, and one way around
them is to make it possible to quantify error rates by including trace files with the sequences (as
implemented in BOLD). These trace files will in turn make it possible to calculate a probability
score for each base call, and archive them with each trace file for every barcode sequence.
Additional problems may be caused by the analysis of the sequences themselves such as alignment
and distinguishing orthologous from paralogous sequences. Quantifying similarity between a new
barcode sequence and existing sequences from other specimens may not be a trivial matter, and the
standard practice involves a series of difficult steps, namely sequence alignment, calculation of
pairwise similarities or dissimilarities, and clustering. These problems are theoretical and not easily
solved. A wide range of different alignment algorithms, (dis)similarity measures and clustering
methods are in existence, and phenetics is surely not the only available or potential methodology.
Although the actual species limits are open to interpretation, the 10 groupings, provisionally recog-
nised as cryptic species recovered in the Skipper Butterfly, Astraptes fulgerator, complex using
Neighbour Joining by Herbert et al.40 based on COI sequences, are robust when subjected to a
parsimony approach (personal observation). However, the interspecific sequence variation in some of
these putative cryptic species is comparable to the intraspecific variation, and some of the groupings
gain their credibility from other types of data, such as correlation with the colour patterns of the
caterpillars and their food plants.
Our ability to identify species using barcodes may also be hampered by paralogous sequences,
which may often have a higher level of sequence divergence than the orthologous sequences;
contrary to widely held beliefs, orthology is not an emergent property of a sequence, but a testable
hypothesis. Putative paralogous sequences, nuclear mitochondrial pseudogenes (NUMTS), are of
widespread occurrence in eukaryotes57 and are also known from COI58.
towards lesser known groups, but we must not neglect how important groups like birds, butterflies
and whales are to ecology and the general public. To a large extent these are the kind of organisms
that shape public opinion on threats to biodiversity. However, it is difficult to see how DNA technology
can divert more scientific attention to orphan groups68. Would it be any more interesting to study mite
sequences than mite morphology? It is difficult to persuade anyone that we will know more about
mites because we have a sequence and a photograph of one, than if we know something about its
morphology and life history.
confusion in the phylogeny of Cyperaceae and Juncaceae, are erroneous, at least one of them being
chimeric. The quality of taxonomy and sequences in GenBank relies solely on the quality and
thoroughness of the researchers. Neither problem will be solved by DNA taxonomy or barcoding.
To what extent initiatives like GenBank should enforce standards that require the linking of taxa
with sequences is a contentious issue. In the majority of cases it suffices to require that traditional
voucher information is available, either directly or indirectly through the journal in which the studies
are published. This is of course the responsibility of the editors of refereed journals, who should never
allow publications of sequence-based studies without simultaneous submission of the sequences to a
public database and should never allow publication of such data unless the necessary voucher infor-
mation is available. The cornerstone of scientific inquiry is repeatability79. It can hardly be stressed
enough that specimens used in scientific investigations should be catalogued and vouchered in publicly
accessible sites such as culture collections, herbaria and museums, ensuring that species identification
can be checked79–82. To exclude voucher information from the printed issues of journals and publish
it on web pages or other ephemeral media is a highly detrimental and unfortunate practice that some
journals like the American Journal of Botany have started to implement82.
In connection with DNA barcoding, very strict rules for voucher information have to be
followed, requiring that the barcode sequences must be linked to voucher specimens. At this stage,
the NCBI has accepted that barcode sequences submitted by members of CBOL should carry the
keyword ‘barcode’, and discussions of further improvements are ongoing.
3.9 CONCLUSIONS
DNA data certainly have a major role to play in taxonomy and for documenting and understanding
species rich groups. However, if we know nothing else about the organisms than a tiny part of their
DNA, there are few interesting observations to make, apart from sequence similarities. For millennia
humans have been fascinated and puzzled by morphological diversity; this is a large part of what
needs to be explained, but of course not the only part. Hence, the potential of DNA taxonomy to
relegate taxonomy to a high tech service industry centred around a few DNA sequences will turn
taxonomy away from being an intellectually stimulating hypothesis-driven science into a purely
technical, metaphysical discipline2,28, or at best, into a cataloguing device for other biologists.
It is often erroneously supposed that taxonomy is a descriptive science, but as emphasised by
Wheeler15, taxonomy is hypothesis driven. “The conclusion that the distribution of a homologous
attribute qualifies it as a character of a species or a synapomorphy of a higher taxon is a hypothesis.
A species is a hypothesis. Every clade at every Linnaean rank is a hypothesis . . . [a] species name is
an effective shorthand notation for an explicit hypothesis about the distribution of attributes among
populations of organisms”.
In contrast, it is the goal of DNA taxonomy to reinvent taxonomy and give the sequence prime
importance over all other types of data. Wisely, BOLD has strongly emphasised that barcodes are
not substitutes for, or an attempt to supplant, existing taxonomic practice (http://phe.rock-
efeller.edu/BarcodeConference). However, these identifications will never be better than the avail-
able taxonomy. Hence, DNA barcodes may be used to create or test hypotheses about species, as
in the recent investigation on the neotropical skipper butterfly (Astraptes fulgerator)40, but barcodes
are not the arbiter of species status30,31,35,42. Although the studies of the Astraptes fulgerator complex
is viewed as a scholarly implementation of the barcoding technique, it is based on more than 25 years
of experiments and has involved the rearing of more than 2,500 caterpillars caught in the wild, and
is supplemented by meticulous studies of the morphology of both caterpillars and imagos.
It is difficult to disagree with Wheeler et al.83 that molecular data, abundant and inexpensive
as they are, have revolutionised phylogenetics but not diminished the importance of traditional
work. The need for this research has largely been masked because molecular researchers have been
able to draw on centuries of banked morphological knowledge.
9579_C003.fm Page 43 Saturday, November 11, 2006 4:04 PM
Likewise the views of May5 are pertinent. “The task of inventorying is sometimes mistaken for
‘stamp collecting’ by thoughtless colleagues in the physical sciences [and, sadly, one might add,
amongst ecologists and microbiologists]. But such information is a prerequisite to the proper formu-
lation of evolutionary and ecological questions, and essential for rational assignment of priorities in
conservation biology. Lacking basic knowledge about the underlying taxonomic facts, we are impeded
in our efforts to understand the structure and dynamics of food webs, patterns in the relative abundance
of species, or, ultimately, the causes and consequences of biological diversity”.
Although we fundamentally agree with the dire need for trained taxonomists and with the
controversial but simple fact that a barcode sequence in itself is of limited value14,84,85, we do not
envisage DNA barcoding as a replacement of classical taxonomy but recognise it as a means to
revitalise it. DNA barcodes are data, and “the future for systematics and biodiversity research is
integrative taxonomy, which uses a large number of characters including DNA and many other
types of data, to delimit, discover, and identify meaningful, natural species and taxa at all levels”34.
However, DNA barcoding at least has the potential to raise public awareness of an increased need
for taxonomy expertise, and hence become a major benefit to the taxonomic community86,87.
ACKNOWLEDGEMENTS
This manuscript has benefited from comments from Chris Humphries (Natural History Museum,
London), Nikolaj Scharff (Natural History Museum of Denmark), Dennis W. Stevenson (New York
Botanical Garden) and Dave Williams (Natural History Museum, London).
REFERENCES
1. Pagel, M., Inferring the historical patterns of biological evolution, Nature, 401, 877, 1999.
2. Cracraft, J.M. et al., Eds., Assembling the Tree of Life, American Museum of Natural History, New York.
3. Hammond, P.M., The current magnitude of biodiversity, in Global Biodiversity Assessment, Heywood,
V.H., Ed., Cambridge University Press, Cambridge, UK, 1995, 113.
4. Groombridge, B. and Jenkins, M.D., World Atlas of Biodiversity: Earth’s Living Resources in the 21st
Century, University of California Press, Berkeley, Los Angeles, 2002.
5. May, R.M., The dimensions of life on Earth, in Nature and Human Society: The Quest for a Sustainable
World, Raven, P.H. and Williams, T., Eds., The National Academy of Sciences, Washington, DC,
1999, 30.
6. Scotland, R.W. and Wortley, A.H., How many species of seed plants are there? Taxon, 52, 101, 2003.
7. Hall, G.S. and Hawksworth, D.L., Resources for microbial biosystematics in Europe, in Systematics
Agenda 2000: The Challenge for Europe, Blackmore, S. and Cutler, D., Eds., Samara Press for the
Linnean Society of London, London, 1996, 5.
8. Hammond, P.M., Species inventory, in Global Biodiversity: Status of the Earth’s Living Resources,
Groombridge, B., Ed., Chapman and Hall, London, 1992, 17.
9. Hawksworth, D.L., Orphans in ‘botanical’ diversity, Muelleria, 10, 111, 1991.
10. Parnell, J., European systematics and the European flora, in Systematics Agenda 2000: The Challenge
for Europe, Blackmore, S. and Cutler, D., Eds., Samara Press for the Linnean Society of London,
London, 1996, 31.
11. Wheeler, Q.D., Systematics, the scientific basis for inventories of biodiversity, Biodivers. Conserv.,
4, 476, 1995.
12. Stork, N.E., Measuring global biodiversity and its decline, in Biodiversity II: Understanding and
Protecting Our Biological Resources, Reaka-Kudla, M.L., Wilson, D.E. and Wilson, E.O., Eds., Joseph
Henry Press, Washington, DC, 1997, 41.
13. May, R.M., Tomorrow’s taxonomy: collecting new species in the field will remain the rate-limiting
step, Phil. Trans. R. Soc. Lond. B, 359, 733, 2004.
14. Scotland, R. et al., The Big Machine and the much-maligned taxonomist. Sys. Biodiv., 1, 139, 2003.
15. Wheeler, Q.D., Taxonomic triage and the poverty of phylogeny, Phil. Trans. R. Soc. Lond. B., 359, 571, 2004.
9579_C003.fm Page 44 Saturday, November 11, 2006 4:04 PM
16. Wilson, E.O., The encyclopedia of life, Trends Ecol. Evol. 18, 77, 2003.
17. Raven, P.H., Taxonomy: where are we now? Phil. Trans. R. Soc. Lond. B, 359, 720, 2004.
18. Hillis, D.M., The tree of life and the grand synthesis of biology, in Assembling the Tree of Life,
Cracraft, J. and Donoghue, M.J., Eds., Oxford University Press, Oxford, 2004, 545.
19. Cracraft, J., The seven great questions of systematic biology: an essential foundation for conservation
and sustainable use of biodiversity, Ann. Missouri Bot. Gard., 89, 127, 2002.
20. Pennisi, E., Modernizing the tree of life, Science, 300, 1692, 2003.
21. Tautz, D. et al., DNA points the way ahead in taxonomy, Nature, 418, 479, 2002.
22. Tautz, D. et al., A plea for DNA taxonomy, Trends Ecol. Evol., 18, 70, 2003.
23. Bininda-Emonds, O.R.P., Ed., Phylogenetic Supertrees: Combining Information to Reveal the Tree of
Life, Kluwer Academic Publishers, Dordrecht, 2004.
24. Bininda-Emonds, O.R.P., Gittleman, J.L. and Steel, M.A., The (super)tree of life: procedures, prob-
lems, and prospects, Annu. Rev. Ecol. Syst., 33, 265, 2002.
25. Sanderson, M.J., Purvis, A. and Henze, C., Phylogenetic supertrees: assembling the tree of life, Trends
Ecol. Evol., 13, 105, 1998.
26. Coddington, J.A. and Levi, H.W., Systematics and evolution of spiders (Araneae), Annu. Rev. Ecol.
Syst., 22, 565, 1991.
27. Andersen, N.M., Publishing in systematic entomology: present and future, Insects Syst. Evol., 34, 1, 2003.
28. Lipscomb, D., Platnick, N. and Wheeler, Q.D., The intellectual content of taxonomy: a comment on
DNA taxonomy, Trends Ecol. Evol., 18, 65, 2003.
29. Seberg, O. et al., Shortcuts in systematics? A commentary on DNA-based taxonomy, Trends Ecol.
Evol., 18, 63, 2003.
30. Funk, D.J. and Olmland, K.E., Species-level paraphyly and polyphyly: frequency, causes, and conse-
quences, with insights from animal mitochondrial DNA, Annu. Rev. Ecol. Syst., 34, 397, 2003.
31. Sperling, F., DNA barcoding: deus ex machina, Newsl. Biol. Surv. Canada (Terrestrial Arthropods),
22, 50, 2003.
32. Will, K.W. and Rubinoff, D., Myth of the molecule: DNA barcodes for species cannot replace
morphology for identification and classification, Cladistics, 20, 47, 2004.
33. Wheeler, Q.D., Losing the plot: DNA ‘barcode’ and taxonomy, Cladistics, 21, 405, 2005.
34. Will, K.W., Mishler, B.D. and Wheeler, Q.D., The perils of DNA barcoding and the need for integrative
taxonomy, Syst. Biol., 54, 844, 2005.
35. Moritz, C. and Cicero, C., DNA barcoding: promise and pitfalls, PLoS Biology, 2, 1529, 2004.
36. Chase, M.W. et al., Land plants and DNA barcodes: short-term and long-term goals, Phil. Trans. R.
Soc. B., 360, 1889, 2005.
37. Savolainen, V. et al., Towards writing the encyclopaedia of life: an introduction to DNA barcoding,
Phil. Trans. R. Soc. B., 360, 1805, 2005.
38. Hebert, P.D.N. et al., Biological identification through DNA barcodes, Proc. R. Soc. Lond. B., 270, 313, 2003.
39. Herbert, P.D.N., Ratnasingham, S. and de Waard, J.R., Barcoding animal life, cytochrome c oxidase
subunit 1 divergences among closely related species, Proc. R. Soc. London. B. (Suppl.), 270, 1, 2003.
40. Herbert, P.D.N. et al., Ten species in one: DNA barcoding reveals cryptic species in the neotropical
skipper butterfly Astraptes fulgerator, Proc. Nat. Acad. Sci. USA, 101, 14812, 2004.
41. Stoeckle, M., Taxonomy, DNA, and the barcode of life, BioScience, 23, 2, 2003.
42. Sperling, F., Butterfly molecular systematics: from species definition to higher-level phylogenies, in
Butterflies: Ecology and Evolution Taking Flight, Boggs, C.L., Watt, W.B. and Ehrlich, P.R., Eds.,
The University of Chicago Press, Chicago, 2003, 431.
43. Sites Jr., J.W. and Marshall, J.C. Delimiting species: a renaissance issue in systematic biology, Trends
Ecol. Evol., 18, 462, 2003.
44. Blaxter, M.L., The promise of DNA taxonomy, Phil. Trans. R. Soc. Lond. B., 359, 669, 2004.
45. Blaxter, M.L., Elsworth, B. and Daub, J., DNA taxonomy of a neglected animal phylum: an unexpected
diversity of tardigrades, Proc. R. Soc. Lond. B. (Suppl.), 271, S189, 2003.
46. Kress W.J. et al, Use of DNA barcodes to identify flowering plants, Proc. Nat. Acad. Sci. USA, 102,
8369, 2005
47. Givnish, T.J., Adaptive plant evolution on islands: classical patterns, molecular data, new insights, in
Evolution on islands, Grant, P.R., Ed., Oxford University Press, Oxford, 1998, 281.
48. Blaxter, M. et al., Defining operational taxonomic units using DNA barcode data, Phil. Trans. R. Soc.
Lond. B., 360, 1889, 2005.
9579_C003.fm Page 45 Saturday, November 11, 2006 4:04 PM
49. Markmann, M. and Tautz, D., Reverse taxonomy: an approach towards determining the diversity of
meiobenthic organisms based on ribosomal RNA signature sequences, Phil. Trans. R. Soc. Lond. B.,
360, 1917, 2005.
50. Godfray, H.C.J., Challenges for taxonomy, Nature, 417, 17, 2002.
51. Felsenstein, J., Inferring Phylogenies, Sinauer Associates, Sunderland, Massachusetts, 2004.
52. Smith, E.F.G. et al., A new species of shrike (Laniidae, Laniarius) from Somalia, verified by DNA-
sequence data from the only known individual, Ibis, 133, 227, 1991.
53. Hughes, A.L., Avian species described on the basis of DNA only, Trends Ecol. Evol., 7, 2, 1992.
54. Hughes, A.L., Reply from Austin Hughes (to Peterson, A.T. and Layon, S.M., 1992), Trends Ecol.
Evol., 7, 168, 1992.
55. Peterson, A.T. and Layon, S.M., New bird species: DNA studies and type specimens, Trends Ecol.
Evol., 7, 167, 1992.
56. Clark, A.G. and Whittam, T.S., Sequencing errors and molecular evolutionary analysis, Mol. Biol.
Evol., 9, 744, 1992.
57. Bensasson, D. et al., Mitchondrial pseudogenes: evolution’s misplaced witnesses, Trends Ecol. Evol.,
16, 314, 2001.
58. Bucklin, A. et al., Taxonomic and systematic assessment of planktonic copepods using mitochondrial
COI sequence variation and competitive, species-specific PCR, Hydrobiologia, 401, 230, 1999.
59. Carpenter, J.M., Critique of pure folly, Bot. Rev. 69, 79, 2003.
60. Keller, R.A., Boyd, R.N. and Wheeler, Q.D., The illogical basis of phylogenetic nomenclature, Bot.
Rev., 69, 93, 2003.
61. Nixon, K.C., Carpenter, J.M. and Stevenson, D.W., The Phylocode is fatally flawed, and the ‘Linnaean’
system can easily be fixed, Bot. Rev., 69, 111, 2003.
62. Flann, C., Phylocode — may the force be with us: an attempt to understand, The Systematist, 24, 9,
2005.
63. Pickett, K.M., The new and improved Phylocode, now with types, ranks, and even polyphyly: a
conference report from the First International Phylogenetic Nomenclature Meeting, Cladistics, 21,
79, 2005.
64. Gaffney, E.S., An introduction to the logic of phylogenetic reconstruction, in Phylogenetic Analysis
and Paleontology, Cracraft, J. and Eldredge, N., Eds., Columbia University Press, New York, 1979, 79.
65. Vane-Wright, R.I., Indifferent philosophy versus almighty authority: on consistency, consensus and
unitary taxonomy, Syst. Biodiv., 1, 3, 2003.
66. Knapp, S., Lughadha, E.N. and Paton, A., Taxonomic inflation, species concepts and global species
lists, Trends Ecol. Evol., 20, 7, 2005.
67. Isaac, N.J.B., Mallet, J. and Mace, G.M., Taxonomic inflation: its influence on macroecology and
conservation, Trends Ecol. Evol., 19, 464, 2005.
68. Hyde, K.D., Who will look after the orphans? Muelleria, 10, 139, 1997.
69. Schram, F.R. and Los, W., Training systematists for the 21st century, in Systematics Agenda 2000:
The Challenge for Europe, Blackmore, S. and Cutler, D., Eds., Samara Press for the Linnean Society
of London, London, 1996, 89.
70. Holst-Jensen, A., Vrålstad, T. and Schumacher, T., On reliability, New Phytol., 161, 11, 2004.
71. Hawksworth, D.L., ‘Misidentifications’ in fungal DNA sequence databanks, New Phytol., 161, 13,
2004.
72. Vilgalys, R., Taxonomic misidentifications in public DNA bases, New Phytol., 160, 4, 2003.
73. Bridge. P.D. et al., On the unreliability of published DNA sequences, New Phytol., 160, 43, 2003.
74. Bridge, P.D., Spooner, B.M. and Roberts, P.J., Reliability and use of published sequence data, New
Phytol., 161, 15, 2004.
75. Harris, D.J., Can you bank on GenBank? Trends Ecol. Syst., 18, 317, 2003.
76. Harris, D.J., Reassessment of comparative genetic distance in reptiles from mitochondrial cytochrome
b genes., Herp. J., 12, 85, 2002.
77. Noor, M.A.F. and Larkin, J.C., A re-evaluation of 12S ribosomal RNA variability in Drosophila
pseudoobscura, Mol. Biol. Evol., 17, 938, 2000.
78. Kristiansen, K.A. et al., DNA taxonomy: the riddle of Oxychloë (Juncaceae), Syst. Bot., 30, 284, 2005.
79. Ruedas, L.A. et al., The importance of being earnest: what, if anything, constitutes a ‘specimen
examined’, Mol. Phyl. Evol., 17, 129, 2000.
9579_C003.fm Page 46 Saturday, November 11, 2006 4:04 PM
80. Agerer, R. et al., Always deposit vouchers, Mycol. Res., 104, 642, 2000.
81. Barkworth, M.E. and Jacobs, S.W.L., Valuable research or short stories: what makes the difference?
Hereditas, 135, 263, 2001.
82. Funk, V.A. et al., The importance of vouchers, Taxon, 54, 127, 2005.
83. Wheeler, Q.D., Raven, P.H. and Wilson, E.O., Taxonomy: impediment or expedient? Science, 303,
285, 2004.
84. Dunn, C.P., Keeping taxonomy based in morphology, Trends Ecol. Evol., 18, 270, 2003.
85. Ebach, M.C. and Holdrege, C., DNA barcoding is no substitute for taxonomy, Nature, 434, 697, 2005.
86. Gregory, T.R., DNA barcoding does not compete with taxonomy, Nature, 434, 1067, 2004.
87. Schindel, D.E. and Miller, S.E., DNA barcoding a useful tool for taxonomists, Nature, 435, 17, 2005.
9579_S002.fm Page 47 Monday, October 16, 2006 5:46 PM
Section B
Reconstructing and Using
the Tree of Life
9579_S002.fm Page 48 Monday, October 16, 2006 5:46 PM
9579_C004.fm Page 49 Monday, November 13, 2006 2:41 PM
4 Evolutionary History
of Prokaryotes: Tree
or No Tree?
J. O. McInerney and D. E. Pisani
Department of Biology, National University of Ireland Maynooth, Ireland
M. J. O’Connell
Department of Biochemistry, University College Cork, Ireland
D. A. Fitzpatrick
Conway Institute, University College Dublin, Ireland
C. J. Creevey
European Molecular Biology Laboratory, EMBL Heidelberg, Germany
CONTENTS
ABSTRACT
Prokaryotes are likely to be the most numerous and species rich organisms on the planet1, occupying
a more diverse set of ecological niches than eukaryotes. Knowledge of prokaryote diversity is
severely limited by our inability to recreate the conditions in the laboratory that are needed to
cultivate the majority. Discrepancies between direct microscopical counts and the numbers of
colony-forming units can be as much as 100-fold, leading to speculation concerning how much we
really know about prokaryotes. In contrast, genomic studies of prokaryotes are advanced. So, while
on one hand we know that we have a poor overview of prokaryotic life on the planet, we have,
paradoxically, succeeded in obtaining more completed genomic sequences of prokaryotes than of
49
9579_C004.fm Page 50 Monday, November 13, 2006 2:41 PM
eukaryotes. Therefore, even though taxon sampling has been restricted, we have now reached the
stage where we can evaluate whether there is a meaningful prokaryotic phylogenetic tree or
taxonomy. Questions remain as to whether the history of prokaryotic life has been overwritten by
continuous and random interspecies gene transfer and occasional genome fusions, or whether these
events have only been minor contributors, thereby enabling prokaryotic evolutionary history to be
adequately described by a tree.
Pa. multocida
E. coli K12
V. cholera
E. coli H7B
X. fastidiosa E. coli H7
9579_C004.fm Page 53 Monday, November 13, 2006 2:41 PM
Ps. aeruginosa
Ps. aeruginosa
Evolutionary History of Prokaryotes: Tree or No Tree?
N. gonnorrhoea
N. meningitidis A
N. gonnorrhoea
N. meningitidis B N. meningitidis C N. meningitidis A
N. meningitidis C
N. meningitidis B
FIGURE 4.1 Phylogenetic trees of Escherichia, Haemophilus, Neisseria, Pseudomonas (Ps), Pasteurella (Pa), Vibrio and Xylella. On
the left is a phylogenetic tree derived using orthologs from the ribonuclease family. On the right is a phylogenetic tree derived using
orthologs of DNA polymerase III. The completed genomes of these species were searched for orthologs, and all available orthologs
were used. For the DNA polymerase III family, there was no ortholog present in the genomes of V. cholera and X. fastidiosa.
53
9579_C004.fm Page 54 Monday, November 13, 2006 2:41 PM
Therefore, the early part of this century has resulted in the formation of two camps, one that
emphasises evolution by vertical inheritance and focuses on the identification of ‘core’ genomic
components that tend to be inherited together, using this information to define prokaryotic rela-
tionships (tree thinkers), and the other that emphasises LGT and attempts to accommodate it (net
thinkers or web thinkers).
.....
d2
d3
d1
S = d1+d2+d3 .....
FIGURE 4.2 Outline of the procedure for evaluating a supertree using DFIT or SFIT measures as imple-
mented in CLANN. For each input tree, its similarity to an appropriately pruned supertree is measured. The
overall score for the supertree is either the sum or average distance computed for all input trees. The difference
between the DFIT and the SFIT measures is to be found in the way in which the distance is computed. S =
supertree score, d = distance between the input tree and the appropriately pruned supertree.
of Split Fit, or a flip distance in the case of Min Flip40. An alternative is to use a path length
distance-based approach to infer the optimal supertree41. Approaches using path length distances
include the Distance Fit (DFIT) method42, and the Average Consensus43, the latter having the
potential advantage that it can use branch length information if available.
All of these methods (with the exclusion of Min Flip) are implemented in the program
CLANN44, which also implements a fast Neighbour Joining Average Consensus (NJAC) procedure,
and Quartet Fit (QFIT). For the DFIT approach, a supertree can be proposed for the dataset; this
supertree can be randomly generated, or an initial rapid supertree construction method such as
NJAC can be used to provide a starting tree. The proposed supertree is compared with any input
tree, even when the input tree only contains a subset of the total complement of leaves. This can
be achieved by pruning the supertree appropriately. Once the pruned supertree and the input tree
have the same leaf set, a simple comparison can be made to evaluate their similarity (see Figure 4.2).
The DFIT approach involves the calculation of a path length distance from every taxon to the
others. The distance is simply the number of nodes that separates the taxa on the tree. If the pruned
supertree and the input tree are identical, then the distance matrix that is derived from the pruned
supertree and the distance matrix derived from the input tree will also be identical. If the two are
different, then the distance matrices will be different, and with increasing dissimilarity in tree shape,
there will be increasing dissimilarity in the distances derived from the trees. The supertree that is
chosen is therefore the one that is most similar to the input trees.
Other methods like QFIT and Split Fit (SFIT), although originally thought as matrix representation
based methods34,41, can be similarly derived. QFIT involves breaking up the pruned supertree and the
input trees into the quartets they entail. Naturally, the two collections of quartets will be identical in
terms of leaf content. Again, if both the pruned supertree and the input tree have identical topologies,
their quartets will be identical. However, increasing dissimilarity in tree shape will result in fewer
quartets with identical topologies. Therefore for QFIT, the score of any given supertree will be
proportional to the number of quartets that it contains that have identical topologies to those found
in the input trees. SFIT involves breaking up the pruned supertree and the input trees into the splits
they entail. SFIT can then be seen as comparing an appropriately pruned supertree with each input
tree. The measure of similarity in this case will be the Robinson-Foulds distance39, and the best
supertree will be the one minimising the distance between it and the input trees.
The first large-scale supertree that was constructed for prokaryotes was constructed by Daubin
and coworkers45. The dataset included a total of 33 prokaryotes and four eukaryotes. They indicated
that they could produce a robust supertree when they used ortholog trees with a broader taxon
sampling, that is when they avoided using gene trees with small numbers of leaves, and they also
indicated that this genome phylogenetic tree was very much in agreement with the ribosomal RNA
9579_C004.fm Page 56 Monday, November 13, 2006 2:41 PM
trees. Subsequently, this work was followed up with an analysis of differences between these
ortholog trees, using a multivariate analysis method to identify a core of gene trees with similar
topologies and then using these gene trees in order to construct a MRP supertree. For many of the
groups on this supertree there is strong support (support being assessed using the bootstrap method);
however, the spine of the tree appeared to only have low to medium levels of support.
Eukaryotes
Bacteria
Archaea
FIGURE 4.3 A stylised outline of how the evolutionary history of cellular life could be represented using
the ring of life theory.
ACKNOWLEDGEMENTS
This work was supported by a Science Foundation Ireland Research Frontiers Programme grant to
James McInerney, and a Marie Curie Intra European Fellowship to Davide Pisani (contract number
MEIF-CT-2005-01002). The authors would like to thank the two referees for their helpful advice.
REFERENCES
1. Whitman, W.B., Coleman, D.C., and Wiebe, W.J., Prokaryotes: the unseen majority, Proc. Natl. Acad.
Sci. USA, 95, 6578, 1998.
2. Darwin, C., On the Origin of Species by Means of Natural Selection, John Murray, London, 1859.
3. Sapp, J., The prokaryote-eukaryote dichotomy: meanings and mythology, Microbiol. Mol. Biol. Rev.,
69, 292, 2005.
4. Haeckel, E., The History of Creation, Trench and Co., London, 1883.
5. Stanier, R.Y. and van Niel, C.B., The concept of a bacterium, Arch. Mikrobiol., 42, 17, 1961.
6. Holt, J.G., Bergey’s Manual of Determinative Bacteriology, Williams and Wilkins, Baltimore, 1994.
7. Garrit, G.M., Ed., Bergey’s Manual of Systematic Bacteriology, Springer, New York, 2001.
8. Breed, R.S., Murray, E.G.D., and Hitchens, A.P., Bergey’s Manual of Determinative Bacteriology.
The Williams and Wilkins Company, 1948.
9. Zuckerkandl, E. and Pauling, L., Molecules as documents of evolutionary history, J. Theor. Biol., 8,
357, 1965.
10. Sogin, S.J., Sogin, M.L., and Woese, C.R., Phylogenetic measurement in procaryotes by primary
structural characterization, J. Mol. Evol., 1, 173, 1971.
11. Woese, C.R. and Fox, G.E., Phylogenetic structure of the prokaryotic domain: the primary kingdoms,
Proc. Natl. Acad. Sci. USA, 74, 5088, 1977.
12. Sanger, F., Nicklen, S., and Coulson, A.R., DNA sequencing with chain-terminating inhibitors, Proc.
Natl. Acad. Sci. USA, 74, 5463, 1977.
13. Smith, L.M. et al., Fluorescence detection in automated DNA sequence analysis, Nature, 321, 674,
1986.
14. Woese, C.R., Bacterial evolution, Microbiol. Rev., 51, 221, 1987.
15. Lederberg, J. and Tatum, E., Gene recombination in Escherichia coli, Nature, 158, 558, 1946.
16. Wilson, R.K. et al., Development of an automated procedure for fluorescent DNA sequencing, Genom-
ics, 6, 626, 1990.
17. Fleischmann, R.D. et al., Whole-genome random sequencing and assembly of Haemophilus influenzae
Rd, Science, 269, 496, 1995.
18. Bult, C.J. et al., Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii,
Science, 273, 1058, 1996.
19. Fraser, C.M. et al, The minimal gene complement of Mycoplasma genitalium, Science, 270, 397, 1995.
20. Blattner, F.R. et al., The complete genome sequence of Escherichia coli K-12, Science, 277, 1453, 1997.
21. Hayashi, T. et al., Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and
genomic comparison with a laboratory strain K-12, DNA Res., 8, 11, 2001.
22. Welch, R.A. et al., Extensive mosaic structure revealed by the complete genome sequence of uro-
pathogenic Escherichia coli, Proc. Natl. Acad. Sci. USA, 99, 17020, 2002.
23. Lawrence, J.G. and Ochman, H., Molecular archaeology of the Escherichia coli genome, Proc. Natl.
Acad. Sci. USA, 95, 9413, 1998.
24. Doolittle, W.F., Phylogenetic classification and the universal tree, Science, 284, 2124, 1999.
25. Doolittle, W.F., Lateral genomics, Trends Cell. Biol., 9, M5, 1999.
26. Kurland, C.G., Canback, B., and Berg, O.G., Horizontal gene transfer: a critical view, Proc. Natl.
Acad. Sci. USA, 100, 9658, 2003.
27. Woese, C.R, The universal ancestor, Proc. Natl. Acad. Sci. USA, 95, 6854, 1998.
28. Woese, C.R., On the evolution of cells, Proc. Natl. Acad. Sci. USA, 99, 8742, 2002.
29. Ge, F., Wang, L.S., and Kim, J., The cobweb of life revealed by genome-scale estimates of horizontal
gene transfer, PLoS Biol., 3, e316, 2005.
30. Gordon, A.D., Consensus supertrees: the synthesis of rooted trees containing overlapping sets of
labeled leaves, J. Classif., 3, 31, 1986.
9579_C004.fm Page 59 Monday, November 13, 2006 2:41 PM
31. Aho, A.V. et al., Inferring a tree from lowest common ancestors with an application to the optimisation
of relational expressions, SIAM J. Comput., 10, 405, 1981.
32. Semple, C. and Steel, M., A supertree method for rooted trees., Discrete Appl. Math., 105, 2000.
33. Pisani, D.E. and Wilkinson, M., Matrix representation with parsimony, taxonomic congruence and
total evidence, Syst. Biol., 51, 151, 2002.
34. Wilkinson, M. et al., Measuring support and finding unsupported relationships in supertrees, Syst.
Biol., 54, 823, 2005.
35. Wilkinson, M. et al., Some desiderata for meta-analytical supertrees, in Phylogenetic Supertrees:
Combining Information to Reveal the Tree of Life., Bininda-Emonda, O.R.P., Ed., Kluwer Academic,
Dordrecht, 2004, 227.
36. Ragan, M.A., Phylogenetic inference based on matrix representation of trees, Mol. Phylogenet. Evol.,
1, 53, 1992.
37. Ross, H.A. and Rodrigo, A.G., An assessment of matrix representation with compatibility in supertree
construction, in Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, Bininda-
Emonda, O.R.P., Ed., Kluwer Academic, Dordrecht, 2004, 35.
38. Burleigh, J.G et al., MRF supertrees, in Phylogenetic Supertrees: Combining Information to Reveal
the Tree of Life, Bininda-Emonds, O.R.P., Ed., Kluwer Academic, 2004, 65.
39. Robinson, D. and Foulds, L., Comparison of phylogenetic trees, Math. Biosci., 53, 131, 1981.
40. Chen, D. et al., Flipping: A Supertree Construction Method, American Mathematical Society, Prov-
idence, Rhode Island, 2003, 135.
41. Steel, M. and Penny, D., Distributions of tree comparison metrics—some new results, Syst. Biol., 42,
126, 1993.
42. Creevey, C.J. et al., Does a tree-like phylogeny only exist at the tips in the prokaryotes? Proc. R. Soc.
Lond. B. Biol. Sci., 271, 2551, 2004.
43. Lapointe, F.-J. and Cucumel, G., The average consensus procedure: combination of weighted trees
containing identical or overlapping sets of taxa, Syst. Biol., 46, 306, 1997.
44. Creevey, C.J. and McInerney, J.O., Clann: investigating phylogenetic information through supertree
analyses, Bioinformatics, 21, 390, 2005.
45. Daubin, V., Gouy, M., and Perriere, G., Bacterial molecular phylogeny using supertree approach,
Genome Inform. Ser. Workshop Genome Inform., 12, 155, 2001.
46. Lerat, E., Daubin, V., and Moran, N.A., From gene trees to organismal phylogeny in prokaryotes: the
case of the gamma-Proteobacteria, PLoS Biol., 1, e19, 2003.
47. Fitzpatrick, D.A., Creevey, C.J., and McInerney, J.O., Genome phylogenies indicate a meaningful
{alpha}-proteobacterial phylogeny and support a grouping of the mitochondria with the Rickettsiales,
Mol. Biol. Evol., 23, 74, 2006.
48. Beiko, R.G., Harlow, T.J., and Ragan, M.A., Highways of gene sharing in prokaryotes, Proc. Natl.
Acad. Sci. USA, 102, 14332, 2005.
49. Rivera, M.C. and Lake, J.A., The ring of life provides evidence for a genome fusion origin of
eukaryotes, Nature, 431, 152, 2004.
50. Lake, J.A. and Rivera, M.C., Deriving the genomic tree of life in the presence of horizontal gene
transfer: conditioned reconstruction, Mol. Biol. Evol., 21, 681, 2004.
51. McInerney, J.O. and Wilkinson, M., New methods ring changes for the tree of life, Trends Ecol. Evol.,
20, 105, 2005.
9579_C004.fm Page 60 Monday, November 13, 2006 2:41 PM
9579_C005.fm Page 61 Saturday, November 11, 2006 4:06 PM
CONTENTS
5.1 Introduction.............................................................................................................................62
5.2 Divide-and-Conquer Methods ................................................................................................62
5.3 Effective Overlap....................................................................................................................63
5.3.1 The Importance of Phylogeny....................................................................................64
5.3.2 The Importance for Experimental Design and Future Sampling ..............................65
5.4 Fast Quartet-Based Supertree Construction...........................................................................68
5.4.1 Voting Systems ...........................................................................................................68
5.4.2 Using Fewer Quartets.................................................................................................71
5.4.3 Quartet Joining ...........................................................................................................72
5.5 Conclusion ..............................................................................................................................73
Acknowledgements ..........................................................................................................................73
References ........................................................................................................................................73
ABSTRACT
Reconstructing the tree of life will require fast methods for building very large phylogenetic trees
from patchy data. The leading candidates for such an approach employ supertree methods as part
of a divide-and-conquer strategy. Here, we discuss two aspects of a phylogenetic divide-and-conquer
method: the decomposition of the tree into subproblems and the recombining of these into an
overall solution. In particular, we highlight and explore the issue of effective taxon overlap, how
it might be achieved via suitable decomposition and how it might be used to guide the setting of
priorities for additional data acquisition, and we show how some knowledge of phylogeny is vital
in both contexts. Last, we show that quartet puzzling, the best known phylogenetic divide-and-
conquer method, can perform poorly when not all quartets are available, and we present a new fast
supertree method designed to perform better in this context. Whilst a great deal of work remains,
such an approach has great potential as part of a divide-and-conquer method for reconstructing
large phylogenies on the scale of the tree of life or for large subsets of species rich taxa.
61
9579_C005.fm Page 62 Saturday, November 11, 2006 4:06 PM
5.1 INTRODUCTION
Reconstructing the complete history of life is an ultimate goal of biology1, and recent interest in
constructing the phylogenetic tree of life reflects the central role of phylogenetic trees in under-
standing evolutionary history2,3. Notwithstanding that most living species may be as yet undescribed,
there are major methodological challenges in realising the tree of life. Whilst there has been some
debate4–8, we generally expect accurate reconstruction of phylogenetic trees to become more difficult
as trees become larger. The main reason for this is computational complexity. As is well known,
the number of possible trees grows more than exponentially with the number of taxa on the tree,
so as we seek to identify optimal trees under some objective function the size of tree space (the
set of trees for the relevant set of taxa) in which we hope to locate them becomes impossibly large.
Exact methods such as exhaustive searches or branch-and-bound algorithms are prohibitively time
consuming for all but the smallest phylogenetic problems, and for any substantial problem we are
forced to rely on heuristics to guide a limited search of tree space in the hope of finding good
(optimal or near optimal) trees. As the size of tree space grows, such searches will become more
and more difficult, as they search for one or more needles in an ever expanding haystack. Building
trees as large as the tree of life (of the order of millions of taxa) using any known heuristic will
be unfeasible, and new approaches will be needed.
A second difficulty in reconstructing the tree of life is the patchy availability of data for different
leaves. DNA sequences have become the principal source of data for phylogenetic reconstruction and
are accumulating at a rapid rate. As the number of leaves increases, however, it becomes increasingly
unlikely that a single gene or single source of data is available for all the taxa, or that a single gene
will be effective in reconstructing their relationships. Thus we can expect some information to be
unavailable for some taxa (‘missing data’). Extensive nonrandom missing data may complicate or
compromise analyses9. Furthermore, if different markers are needed for different taxa, then accurate
analysis will probably need to account for heterogeneity between these markers10,11. Modelling this
heterogeneity can be complex and will tend to make methods for analysing such data slow.
A solution to both of these problems may be to use divide-and-conquer approaches in which
large phylogenetic problems are decomposed into subproblems and the solutions of these subprob-
lems combined to give a global solution. Such approaches reflect the expectation that subproblems
can be more easily analysed separately because they are smaller in size and because they can
include just those taxa for which a particular type of data is available, reducing the problem of
missing data and allowing the process of evolution for particular data to be more accurately
modelled. A decomposition might be a natural one, such as dividing a large molecular dataset into
data from individual orthologous sequences, or could be designed to yield subproblems that should
be more easy to solve accurately and that are readily recombined. The problem of combining a set
of phylogenetic trees into a single estimate of phylogeny is addressed by supertree methods, which
are therefore integral to any divide-and-conquer approach to building large phylogenetic trees. For
example, quartet puzzling (QP)12 is perhaps the best known divide-and-conquer approach to phy-
logenetic inference, and the puzzling step is a heuristic supertree method.
In this chapter our main aims are to draw attention to the problem of achieving effective overlap
between subproblems and to outline a new fast supertree method. Speed is an obvious important
consideration in building large phylogenetic trees, but the importance of effective overlap and how
it might most efficiently be achieved has been less widely appreciated. We begin with an overview
of divide-and-conquer methods.
approach, such as merge sort and quick sort and the fast Fourier transform. The efficiency of solving
the subproblems and the efficiency of this merging process will determine how effective a divide-
and-conquer approach can be. Whilst classical divide-and-conquer algorithms provide globally
optimal solutions to problems, this is probably an unrealistic aim for phylogenetic methods, given
that most optimisation-based phylogenetic problems are known to be, or likely to be, NP-complete14
(NP = nondeterministic polynomial time) so there is very unlikely to be a polynomial time algorithm
to solve them (NP-completeness has been shown for parsimony15, compatibility16, distance metrics17
and at least one likelihood problem18). Thus we should expect phylogenetic divide-and-conquer
strategies to be heuristic rather than exact algorithms. There is no guarantee that solutions to
subproblems will be accurate and thus no guarantee that they will all be compatible or readily
combinable. Even apparently disjoint subproblems may be incompatible (while quartets with less
than three leaves in common must be pairwise compatible, three such quartets can be incompatible).
In the phylogenetic context, there may be choices to be made about the order in which subproblems
are combined, and even about which sets of subproblems to consider at all.
Most uses of supertree methods have been to build larger phylogenetic trees from sets of
previously published trees. Whilst this is divide-and-conquer analysis of a sort (the published trees
can be thought of as the results of a given decomposition of the overall problem), we prefer to
view divide-and-conquer approaches more narrowly as those in which a designed decomposition
of the problem is integral to the analysis. Here, divide-and-conquer analyses offer methods for
inferring trees from large datasets rather than from sets of previously inferred trees, and the criterion
for choosing among alternative inferences, be it parsimony, likelihood or something else, need be
no different from that used by other methods of analysing large datasets.
An important advantage of divide-and-conquer approaches is that they may be relatively
computationally efficient19. Whilst only two supertree methods have been studied in the divide-
and-conquer setting20, the structure of divide-and-conquer algorithms suggest that many existing
supertree methods are perhaps unsuitable for such use. In particular, optimisation supertree methods
such as matrix representation with parsimony (MRP) require time-consuming heuristic searches of
trees, and combining subproblems for large sets of taxa using these methods will take just as long
as solving the problem in a single step (it may or may not be more accurate). These optimisation
methods will not be suitable for the amalgamation step in a divide-and-conquer strategy that hopes
to be quicker than a conventional analysis: we need faster supertree methods that take a length of
time proportional to some polynomial in the number of input taxa. MinCut21 and modified MinCut22
supertree methods are both polynomial time approaches, as is the strict consensus merger (SCM23).
However, whilst saving time is important, accuracy is paramount. MinCut supertrees have been
shown to be less accurate than trees constructed using other methods in simulation studies24 and
show a significant bias with respect to shape25 that might be correlated with poor accuracy. There
is clearly scope for new, fast supertree methods in the context of divide-and-conquer approaches
(see below).
Exact divide-and-conquer algorithms generally break a problem down into the smallest, trivial
subproblems. Similarly, QP breaks a phylogeny problem into the smallest meaningful (unrooted)
problem of quartets of taxa. Solving quartets is possible very quickly, as only three different quartets
need to be compared for a four-taxon subproblem. However, some quartets may be difficult to
accurately infer, and a heuristic analysis of larger subproblems might be quicker or more accurate,
leading to an optimal ‘granularity’ of the decomposition for particular problems.
A
C
1 2 6 7 8 10 3 4 11 12 13 5 14 9
Supertree
FIGURE 5.1 Two compatible input trees and their strict consensus supertree. Polytomies on the supertree
show where there is no effective overlap between the input trees. Dots indicate the seven different positions
in which leaf 10 could occur on input tree 1 while the two input trees remain compatible. Letters indicate the
same positions for taxa 6, 7 and 8 on tree 2. Sampling leaf 10 for the tree 1 gene would produce a fully
resolved supertree if it was placed in any of these positions. The improvement in overlap with sequencing
leaves 6, 7 or 8 for tree 2 depends on where the taxa appear on this tree. (Adapted from Gordon26.)
the input trees. The set of supertrees that displays all the input trees is termed the span of the input
trees and denoted <S>. The strict component consensus of <S> is referred to here as the consensus
supertree. Figure 5.1 gives an example of two compatible input trees and their consensus supertree
in which there is a mixture of effective and ineffective overlap26. Their span <S> includes seven
fully resolved supertrees that differ only in the placement of leaf 10 with respect to leaves 1, 2, 6,
7 and 8. Although it mostly does a good job of combining the information in the two input trees,
the consensus supertree conveys much less information about the relationships of these leaves than
does tree 1. Dealing with compatible trees allows a natural definition of effective and ineffective
overlap: effective overlap occurs when the consensus supertree displays all of the input trees, whilst
overlap is ineffective to the extent that information present in the input trees is not present in their
consensus supertree. In this example, it is easy to see that there is mostly good overlap, but that
there is not sufficient information in the input trees to determine the relationships of leaves 6–8
from tree 1 to leaf 10 from tree 2.
A B C D A B E F A B C D E F
A B C D A E F B A E F B C D
A C D B A E F B A F C D E B
FIGURE 5.2 Three pairs of four taxon input trees together with their strict consensus supertrees. The phy-
logenetic position of the two shared leaves has a profound effect on the effectiveness of the overlap between
the two trees.
We can show the importance of the phylogeny of the input leaves with some simple examples.
The three simple cases shown in Figure 5.2 each consist of a pair of trees with two leaves in
common, but differ greatly in the relative positions of the common leaves. These examples show
that the common taxa occurring as sister taxa in both trees give no effective overlap between the
trees, and neither do the two taxa occurring widely separated on both trees. The optimal situation
appears to be when a small clade in one tree spans a deep split in the second tree, as in the second
example. Figure 5.3 shows this result more generally. We can define the separation of any pair of
taxa (and mean separations of any sets of taxa) on a given tree by counting the number of internal
edges separating them. This quantity might be helpful in weighting simple co-occurrence metrics
used to assess overlap; if the above result holds in more complicated cases, then it should be optimal
to have taxa with highly different mean separations across different input trees. Measures developed
in different contexts might also prove useful here, such as the proportion of phylogenetic history
sampled by a set of taxa31.
J I H G F E D C A B A B C D E F G H I J
9579_C005.fm Page 66 Saturday, November 11, 2006 4:06 PM
X A B,C...J A,B,C...I X J
tree 1
tree 2
18 18
16 16
14 14
12 12
10 10
8 8
size of span
size of span
6 6
4 4
2 2
0 0
B C D E F G H I J A B C D E F G H I
shared leaf shared leaf
FIGURE 5.3 The size of the span of supertrees inferred from two different input trees combining with two larger trees. The two
small input trees overlap by two taxa (one of which is varied) with the larger trees and have one unique leaf (X). The effectiveness
of overlap between the two trees (measured by the size of the span) varies greatly with which leaves are shared and with the topology
Reconstructing the Tree of Life
particular taxa that might improve overlap sufficiently to allow the consensus supertree to display
both input trees, and some selections that would be less likely to help. For example, assuming that
the new sequences introduce no conflict in the input trees, then obtaining additional data for leaves
11–14 so as to include them in tree 1 would obviously produce no practical improvement in overlap.
An obvious choice that would provide completely effective overlap would be to sample 10 to include
it in tree 1, but we could alternatively sample from 6, 7 and 8 for the tree 2 gene. Other things being
equal we would target whichever (10 or 6, 7, 8) were most convenient or least expensive to obtain.
If we are constrained by the availability of samples or other resources (for example, if we had sufficient
funding to sample a single gene for a single taxon and gene 1 cannot be sampled for taxon 10), we
can ask which of the remaining targets should be our priority. Thus we would ask whether the available
phylogenetic information suggests that additional data for one of 6, 7 and 8 would be most likely to
provide effective overlap. Sampling taxon 6 and adding it to tree 2 on branches A, B or C gives a
fully resolved supertree if on branch A or B but not on C, while sampling 7 (or 8) on this tree gives
a fully resolved supertree on branch C, but not on either A or B. Sampling taxon 6 is more likely to
resolve the problem and so might be an optimal choice for this gene.
Similar issues need to be addressed when designing a divide-and-conquer algorithm. Different
subproblems need to overlap to some extent if they are to be combined in a global solution, but
too much overlap will lead to the same relationships being inferred many times, making the
algorithm inefficient. The problem of designing a decomposition of a particular tree so as to provide
effective overlap is trivial. Simultaneously providing effective overlap and easily solvable subprob-
lems is more challenging, particularly without knowledge of the tree. Previous workers have
designed decompositions that give subproblems that are easily solved, and are even provably easy
to solve. Huson et al. created the original disk-covering method (DCM) to produce subproblems
of minimal ‘evolutionary diameter’ in that taxa in a particular subproblem have small pairwise
sequence divergence23. Distance-based methods such as neighbour joining are known to be accurate
for such data, and this allowed Huson et al. to prove a number of theorems about the accuracy of
analysis using their decomposition together with these methods. This DCM decomposition did not
attempt to control the degree of overlap, however, and so performs poorly in the more general
supertree context19,20. A second method, DCM2, identifies a single set of taxa which have bounded
diameter and which produce subproblems of bounded diameters such that the largest subproblem
is as small as possible35. Both DCM2 and a related recursive alternative (Rec-I-DCM3) have been
shown to be remarkably effective divide-and-conquer algorithms20,36 (see also Bininda-Emonds
and Stamatakis, Chapter 6), but it remains unclear whether a strategy in which all subproblems
share a set of common taxa is preferable to one in which pairs of subproblems have different
shared taxa. We note in passing that consensus efficiency37, which is the ratio of the cladistic
information content38 of a consensus to that of a set of trees (such as the span), provides a potential
measure of the efficacy of overlap which could be used to compare different decompositions of
the same dataset.
If different taxonomic groups are studied using different molecular markers, as they inevitably
will be to some extent, then the tree of life can only be inferred by combining individual studies.
It would be helpful to be able to give some guidance to molecular systematists as to how to design
such studies (focusing on their particular taxonomic group of interest) to be easily combinable in
this context. For example, it might be best to sequence a marker for a few closely related taxa (as
is currently done for outgroup rooting for instance), or it might be better for every study to include
a few of a selected set of systematic ‘model organisms’ which might, but need not, coincide with
the model organisms of molecular biologists, for many of which complete sequence data are already
available. Tentatively, it seems that the first solution is likely to be better, and we might encourage
systematists to sample closely related sequential sister groups to their clade of interest in molecular
studies (Figure 5.4). Further work is needed, and our limited discussion and exploration of the
simplest examples is intended simply to highlight this need.
9579_C005.fm Page 68 Saturday, November 11, 2006 4:06 PM
C
B
FIGURE 5.4 Choosing new taxa for optimal supertree construction. Overlap might be maximised by sequenc-
ing a few model organisms (indicated by dashed lines within large radiations) when sequencing a particular
marker for a clade of interest (indicated by grey triangle), as many other markers will also be sequenced for
these organisms. If some idea of the relationships of the sequenced organisms is known, more effective overlap
might be obtained by sequencing closely related outgroups that form sequential sister groups to the clade of
interest (perhaps taxa A and B would be the best choices here). This relates to existing taxonomic practice,
in which closely related outgroups are chosen to root phylogenies, but care should be taken that the outgroups
do not form a monophyletic group to the exclusion of the clade of interest (as taxa B and C would), as this
would result in no effective overlap.
(a)
A C C E O C O E
B D B D E D J D
O K O K K L K M
D J J N J N N L
O H O H I H F I
E I D E E G E D
(b) (c) D
H
I
G G
I
C
D A H 392
B
672 670
F
B 60 245
E A
12
C 19 576
N
J O E
M 655 209 114
K F
N L
279 O
J
L M K
FIGURE 5.5 Performance of QP in the supertree setting. (a) Twelve quartets; (b) the unique tree displaying
all 12 quartets; (c) the majority-rule component consensus of trees constructed from the 12 quartets using
1,000 replicates of the voting method of QP. Numbers indicate frequencies of occurrence of splits in the QP
trees, with those for splits entailed by the 12 quartets shown in bold.
growing tree. We find the path between the other two leaves in the growing tree and give a score
of +1 to every edge on that path (Figure 5.6a). This is a vote against the grafting of the new leaf
to the tree on any of those edges. The votes of all relevant quartets are counted, and the new leaf
is attached to an edge with the smallest vote against, ties being broken randomly (Figure 5.6c).
Strimmer et al. subsequently developed an approximate system for weighting the votes of quartets
according to their posterior probabilities that is employed in TREE-PUZZLE42,43.
The inadequacy of the QP voting system in the more general supertree case, that is, where we
do not have the luxury of votes from all possible relevant quartets and have to rely upon a subset
of them, is readily demonstrated and diagnosed. Consider in our example (Figure 5.6) that we have
only the single quartet AE/BC to vote on the position of the new leaf E. The puzzling voting system
leaves a tie between two branches, which are thus equally likely inferred placements of E. However,
only one of these placements (with A) is consistent with what the quartets actually entail about the
relationships of E. The other (with D) actually contradicts the information in the relevant quartet
because it entails AB/CE. That the puzzling voting system does not provide a vote against this
illogical position may not matter when all or nearly all quartets are available, because other quartets
(for example AE/BD) may vote directly against this position, but it is expected to compromise its
performance, as we have already seen, when not all relevant quartets are available.
9579_C005.fm Page 70 Saturday, November 11, 2006 4:06 PM
(a)
A B A C
0 1
1
1 0
E C B D
A B A C
0 0
1
1 1
E D B D
(b)
A B A C
0 1
1
1 1
E C B D
(c)
B
A C
E D
FIGURE 5.6 Quartet voting systems. (a) The QP voting system showing the votes cast by two quartets relevant
to the placement of E (on the left) on the quartet AB/CD; (b) the Vinh and von Haeseler44 voting system
showing votes cast by a single quartet relevant to the addition of E to AB/CD; (c) the fully resolved tree these
quartets entail.
An alternative voting procedure, used by Vinh and von Haeseler44 in the somewhat different
context of an algorithm which efficiently elucidates the landscape of possible optimal trees, seems
much better suited for the supertree context. For any three leaves, A, B and C, in a tree there is a
unique node or vertex where the paths connecting each pair of these leaves intersect and which is
subtended by three subtrees (one containing A, one containing B and one containing C), which we
shall call the subtrees of the node. The resolution of a quartet on A, B, C and a new leaf tells us
to which subtree the new leaf must be grafted in order for the quartet to be displayed by the tree.
Other positions contradict the quartet. Thus, instead of voting against branches lying on a particular
path, we can vote either for all branches in the subtree in which any grafting of the new taxon
would display the quartet, or against all branches in the subtrees in which grafting would contradict
the quartet. What is entailed by a pair of quartets is governed by dyadic inference rules of which
there are just two45–47. Vinh and von Haeseler’s44 voting system reflects these simple inference rules
extended to the case where one of the quartets being compared is embedded in a larger tree.
Although not suggested by Vinh and von Haeseler44, we could use their voting system in place
of the original puzzling step of QP. Taking the twelve quartets in Figure 5.5a as input, a QP-type
analysis using this alternative voting system would return the unique tree defined by the quartets
(Figure 5.5b and Figure 5.7) with maximum support for all splits, a far more satisfactory result
than unmodified QP (Figure 5.5c). The comparative performance of this alternative voting system
in more typical QP-type analysis merits further investigation. Certainly we would expect it to offer
9579_C005.fm Page 71 Saturday, November 11, 2006 4:06 PM
D D
D
O K O K K K M E
1 2 3 L O 4 J O O 5
+ O + J + +
J K
D J J N J N K N L N J D
K N
N
L L M
D A D D
E E D
E B E
B C
C C J O
O H J O A C J O C E J O O C
9 8 7 6 K
+ K + K + K +N
EN N B D N
D B D E D
L M
L M L M L M
I H H
H D A H G
D A G I I
D A D A
E B E B F
C E B B
E
J O O H C F 12 I
J O C C
10 I 11 H O + J O
K + J
N N K + E D
E I G K K
E N N
L M
L M
L M L M
FIGURE 5.7 Quartet joining of the quartet trees shown in Figure 5.5. Choosing a starting quartet at random,
a supertree is built up by sequentially adding a single taxon using the information from relevant quartets. For
the minimal set of quartets used here, there is always only one relevant quartet, and the order in which leaves
are selected does not matter, but it will matter in general.
placements, they defined the k3 important quartets with respect to a node as those including a new
leaf and one leaf from each of the k-representative sets of each of the three subtrees of the node.
The important quartets of a tree are all those that are important quartets of any node in the tree,
and time can be saved by permitting only these quartets to vote.
Important quartets were used by Vinh and von Haeseler44 to vote on the reattachment of leaves
that have been deleted from a tree, as part of a method for exploring tree space. They note that
using only important quartets in QP would yield a decrease in the complexity of puzzling, from
O(n4) to O(n2), but because of poor performance in simulations they did not pursue this further.
Better performance might be expected using their alternative voting procedure, given the frailty of
unmodified QP when not all relevant quartets are available. In their study, Vinh and von Haeseler
set k to four, but in the extreme k could be set to one, so that there are just N – 2 important quartets
for the tree. This is the minimum number needed to uniquely specify a tree for N + 1 leaves, but
there is no guarantee that minimal sets of important quartets will specify a tree: there may be conflict.
be quartets; they could be the trees inferred using any designed decomposition, and this obviates
any concern that quartet trees are difficult to infer accurately because of poor taxon sampling. As
with QP, this approach may be sensitive to the starting quartet and to the order in which new leaves
are added, with the extent of any variation reflecting conflict and/or ineffective overlap.
5.5 CONCLUSION
Supertree methods provide ways of combining phylogenetic information in diverse trees. As such
they can be used to produce large-scale phylogenetic trees from sets of trees that are culled from
the literature or produced anew through mining of genomic data, and they are essential to any more
formal divide-and-conquer analysis of single data sets. To date, supertree methods have been used
mostly to produce composite phylogenetic trees from previously published trees, but there has been
a recent increase in their application to the phylogenetic analysis of genomic data49,50. Molecular
data is still available for relatively few genes from relatively few taxa51, but this is rapidly improving
as more complete genomes are sequenced and as ‘shallow genomics’ projects such as expressed
sequence tag (EST) surveys52 and organelle genome sequences are completed53, and we expect the
use of supertree methods to increase along with the available genomic data.
Much has been made of the potential for supertree methods to combine ‘data’ that are otherwise
difficult to combine in a single phylogenetic analysis54. Increasingly, however, most phylogenetic
work will be based on molecular data that could, in principle, be combined so that this justification
for supertree methods will become less important55,56. While there has been a debate between advocates
of supertree methods and those who prefer simultaneous analysis of data, we agree with others in not
seeing a stark choice between mutually exclusive alternatives57,58. This is perhaps most clear in the
use of supertree methods as part of divide-and-conquer approaches to finding best fitting trees for a
given set of data, that is to efficiently and accurately perform simultaneous analysis of large datasets.
Whereas we do not know whether supertrees constructed from published trees are particularly
accurate, we do know that supertree methods embedded in a rationally designed divide-and-conquer
strategy can improve heuristic searches, producing better trees faster19,20,59. We also know that some
sort of supertree analysis will be needed to join together disparate parts of the tree of life inferred
using different markers. We consider the question of how best to achieve effective overlap to be an
extremely important one, because good answers have the potential to help us target our future research
efforts to build the tree of life as efficiently as possible. Efficient supertree construction also requires
polynomial time algorithms. The quartet joining method we have outlined is a very fast method of
supertree construction that should work well in the absence of conflict. However, its accuracy when
confronted with real inference problems is unknown. A priori, one might anticipate some trade-off
between speed and accuracy, given that speed is achieved partly by considering less evidence. Hence,
accuracy might be improved by considering the evidence from multiple relevant quartets, should they
be available. We are currently developing an implementation of quartet joining that will allow the
performance of the method to be investigated when input trees conflict.
ACKNOWLEDGEMENTS
This work was supported by BBSRC grant 40/G18385. We thank Melissa Pentony for running the
QP. supertree analysis.
REFERENCES
1. Haldane, J.B.S., Possible Worlds and Other Essays, Chatto and Windus, London, 1927.
2. Cracraft, J. and Donoghue, M.J., Assembling the Tree of Life, Oxford University Press, New York, 2004.
3. Soltis, P.S. and Soltis, D.E., Molecular systematics: assembling and using the tree of life, Taxon, 50,
663, 2004.
9579_C005.fm Page 74 Saturday, November 11, 2006 4:06 PM
35. Huson, D.H., Vawter, L., and Warnow, T., Solving large scale phylogenetic problems using DCM2,
in Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology,
Lengauer, T. et al., Eds., AAAI Press, Menlo Park, CA, 1999, 118.
36. Roshan, U.W. et al., Rec-I-DCM3: a fast algorithmic technique for reconstructing large phylogenetic
trees, in Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, 2004, 98.
37. Wilkinson, M. and Thorley, J.L., Efficiency of strict consensus trees, Syst. Biol., 50, 610, 2001.
38. Thorley, J.L., Wilkinson, M., and Charleston, M., The information content of consensus trees, in
Advances in Data Science and Classification, Rizzi, A., Vichi, M., and Bock, H.H., Eds., Springer-
Verlag, Berlin, 1998, 91.
39. Pentony, M.M., Quartet Puzzling Supertrees, Ph.D. thesis, National University of Ireland, Maynooth,
2004.
40. Pisani, D. and Wilkinson, M., Matrix representation with parsimony, taxonomic congruence, and total
evidence, Syst. Biol., 51, 151, 2002.
41. Wilkinson, M. et al., Some desiderata for liberal supertrees, in Phylogenetic Supertrees: Combining
Information to Reveal the Tree of Life, Bininda-Emonds, O.R.P., Ed., Kluwer Academic, Dordrecht,
The Netherlands, 2004, chap. 11.
42. Strimmer, K., Goldman, N., and von Haeseler, A., Bayesian probabilities and quartet puzzling, Mol.
Biol. Evol., 14, 210, 1997.
43. Schmidt, H.A. et al., TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and
parallel computing, Bioinformatics, 18, 502, 2002.
44. Vinh, L.S. and von Haeseler, A., IQPNNI: moving fast through tree space and stopping in time, Mol.
Biol. Evol., 21, 1565, 2004.
45. Bryant, D., Building Trees, Hunting for Trees and Comparing Trees, Ph.D. thesis, University of
Canterbury, 1997.
46. Dekker, M.C.H., Reconstruction Methods for Derivation Trees, Masters thesis, Vrije Universiteit,
1986.
47. Wilkinson, M., Cotton, J.A., and Thorley, J.L., The information content of trees and their matrix
representations, Syst. Biol., 53, 989, 2004.
48. Steel, M., The complexity of reconstructing trees from qualitative characters and subtrees, J. Classif.,
9, 91, 1992.
49. Creevey, C.J. et al., Does a tree-like phylogeny only exist at the tips in the prokaryotes? Proc. Royal
Soc. Lond. B, 271, 2551, 2004.
50. Beiko, R.G., Harlow, T.J., and Ragan, M.A., Highways of gene sharing in prokaryotes, Proc. Natl.
Acad. Sci. USA, 102, 14332, 2005.
51. Sanderson, M.J. and Driskell, A.C., The challenge of constructing large phylogenetic trees, Trends
Plant Sci., 8, 374, 2003.
52. Theodorides, K. et al., Comparison of EST libraries from seven beetle species: towards a framework
for phylogenomics of the Coleoptera, Insect Mol. Biol., 11, 467, 2002.
53. Miya, M., Kawaguchi, A., and Nishida, M., Mitogenomic exploration of higher teleostean phylogenies:
a case study for moderate-scale evolutionary genomics with 38 newly determined complete mitochon-
drial DNA sequences, Mol. Biol. Evol., 18, 1993, 2001.
54. Sanderson, M.J., Purvis, A., and Henze, C., Phylogenetic supertrees: assembling the trees of life,
Trends Ecol. Evol., 13, 105, 1998.
55. Rokas, A. et al., Genome-scale approaches to resolving incongruence in molecular phylogenies,
Nature, 42, 798, 2003.
56. Scotland, R.W., Olmstead, R.G., and Bennett, J.R., Phylogeny reconstruction: the role of morphology,
Syst. Biol., 52, 539, 2003.
57. Levausser, C. and Lapointe, F.-J., War and peace in phylogenetics: a rejoinder on total evidence and
consensus, Syst. Biol., 50, 881, 2001.
58. Holmes, S., Statistics for phylogenetic trees, Theor. Popul. Biol., 63, 17, 2003.
59. Fuellen, G., Wagele, J.W., and Giegerich, R., Minimum conflict: a divide-and-conquer approach to
phylogeny estimation, Bioinformatics, 17, 1168, 2001.
9579_C005.fm Page 76 Saturday, November 11, 2006 4:06 PM
9579_C006.fm Page 77 Saturday, November 11, 2006 4:07 PM
6 Computational
Taxon Sampling versus
Complexity
and Their Impact on Obtaining
the Tree of Life
O. R. P. Bininda-Emonds
Institut für Spezielle Zoologie und Evolutionsbiologie mit Phyletischem
Museum, Friedrich-Schiller-Universität Jena, Germany
A. Stamatakis
Swiss Federal Institute of Technology, School of Computer and Communication
Sciences, Lausanne, Switzerland
CONTENTS
6.1 Introduction.............................................................................................................................78
6.2 Materials and Methods ...........................................................................................................79
6.2.1 Simulation Protocol ....................................................................................................79
6.2.2 Phylogenetic Analysis ................................................................................................80
6.2.3 Variables Examined ....................................................................................................81
6.2.4 Software Availability ..................................................................................................82
6.3 Results.....................................................................................................................................82
6.3.1 Resolution ...................................................................................................................82
6.3.2 Accuracy .....................................................................................................................84
6.3.3 Running Time .............................................................................................................84
6.4 Discussion...............................................................................................................................88
6.4.1 Accuracy and Speed ...................................................................................................88
6.4.2 The Importance of Sampling Strategy .......................................................................90
6.4.3 Implications for the Divide-and-Conquer Framework ..............................................91
6.5 Conclusions.............................................................................................................................92
Acknowledgements ..........................................................................................................................92
References ........................................................................................................................................93
ABSTRACT
The scope of phylogenetic analysis has increased greatly in the last decade, with analyses of
hundreds, if not thousands, of taxa becoming increasingly common in our efforts to reconstruct
the tree of life and study large and species rich taxa. Through simulation, we investigated the
potential to reconstruct ever larger portions of the tree of life using a variety of different methods
77
9579_C006.fm Page 78 Saturday, November 11, 2006 4:07 PM
(maximum parsimony, neighbour joining, maximum likelihood and maximum likelihood with a
divide-and-conquer search algorithm). For problem sizes of 4, 8, 16 … 1,024, 2,048 and 4,096
taxa sampled from a model tree of 4,096 taxa, we examined the ability of the different methods
to reconstruct the model tree and the running times of the different analyses. Accuracy was generally
good, with all methods returning a tree sharing more than 85% of its clades with the model tree
on average, regardless of the size of the problem. Unsurprisingly, analysis times increased greatly
with tree size. Only neighbour joining, by far the fastest of the methods examined, was able to
solve the largest problems in under 12 hours. However, the trees produced by this method were
the least accurate of all methods (at all tree sizes). Instead, the strategy used to sample the taxa
had a larger impact on both accuracy and, somewhat unexpectedly, analysis times. Except for the
largest problem sizes, analyses using taxa that formed a clade generally both were more accurate
and took less time than those using taxa selected at random. As such, these results support recent
suggestions that taxon number in and of itself might not be the primary factor constraining
phylogenetic accuracy and also provide important clues for the further development of divide-and-
conquer strategies for solving very large phylogenetic problems.
6.1 INTRODUCTION
Reconstructing the tree of life accurately and precisely represents the holy grail of phylogenetics
and systematics. However, the impact of obtaining the tree goes well beyond these research fields
to include all of the life sciences because, as it was nicely put recently by Rokas and Carroll1, the
conclusions we make as phylogeneticists form part of the assumptions underlying the analyses of
the other biologists. Evolutionary information is now becoming increasingly included in fields as
diverse as comparative biology, genomics and pharmaceutics. In the past decade, the increasing
accumulation of phylogenetic data, made possible by the molecular revolution, has brought the
dream of realising a highly comprehensive tree of life tantalisingly close.
Currently, however, the continued lack of suitable phylogenetic data represents a proximate
hindrance in our efforts to reconstruct the tree of life. Although whole genomic data is becoming
available at an increasing rate (but more so for prokaryotic organisms with their smaller genomes),
molecular sampling has generally been sparse and restricted largely to model organisms and model
genes2,3. However, even with the prospect of abundant whole-genome sequence data, the ultimate
hindrance is the sheer size of the tree of life itself, which has been estimated to comprise anywhere
from 3.6 million to 100+ million species (but most commonly 10–15 million)4.
It has long been appreciated that the number of possible phylogenetic trees increases superex-
ponentially with the number of taxa5. For example, there are three distinct rooted phylogenetic
trees for three species, 15 for four species, 105 for five species, and so on. For only 67 species,
the number of possible trees is on the order of 10 to the power of 111 trees, a number that just
exceeds the volume of the universe in cubic Ångstroms (a comparison first heard by the first author
from David Hillis). Phylogenetic analyses are now routinely conducted on data sets of this size
and larger (up to hundreds of taxa). Albeit comparatively rare, analyses of thousands of taxa have
also been performed, mostly as proof of concepts for new algorithmic implementations. These
include a neighbour joining (NJ) analysis of nearly 8,000 sequences6, a maximum likelihood (ML)
analysis of 10,000 taxa7, and a maximum parsimony (MP) analysis of 13,921 taxa8,9. However, we
are unsure of the prospects of achieving a correct or nearly correct answer for studies of these size,
given the literally astronomical size of ‘tree space’.
Compounding this limitation is the fact that the general problem of reconstructing a tree (or a
network, given that the tree of life is not always tree-like) from a given data set is one of a set of
non-deterministic polynomial time (NP) problems for which no efficient solution is known or, more
pessimistically, one for which no such solution potentially exists (NP-complete)10. Thus, the analysis
of larger data sets requires a disproportionately longer time (or disproportionately more computer
resources) and/or the use of increasingly less efficient heuristic search strategies, with both factors
9579_C006.fm Page 79 Saturday, November 11, 2006 4:07 PM
impacting negatively on our ability to recover the best solution for that given data set. Fortunately,
several studies using empirical and/or simulated data have shown that even phylogenetic analyses
at the high end of the scale currently examined are both tractable and show acceptable, if not
surprising, accuracy with shorter sequence lengths than might be expected11–13, thereby reinforcing
some theoretical work in the latter area14,15. Additionally, advances in computer technology and
architecture such as parallel and distributed computing and programs that exploit them efficiently
in combination with the continual development of faster search strategies promise to make even
larger phylogenetic problems increasingly tractable. However, the NP-completeness of the phylogeny
problem represents a fundamental limitation in our efforts to unearth the tree of life.
As such, we face a dilemma in attempting to reconstruct the tree of life (or even major portions
thereof). Smaller problems are computationally easier to solve, but at the extreme, have been
demonstrated to be susceptible to the adverse effects of taxon sampling and, for parsimony in
particular, long branch attraction16 (for a review of the latter, see Bergsten17). In these cases, the
fact that DNA has only four character states can lead to a high number of convergent changes
(noise) along two long branches leading to unrelated taxa. These convergent changes can pull the
two branches together, thereby leading the phylogenetic analysis astray. Thus, the general consensus
is that, given a suitable sampling strategy18, the addition of species to a phylogenetic analysis is
usually beneficial in terms of accuracy because it ameliorates the effects of these two problems19,20
(see Rosenberg and Kumar21 for a contrary view). At some point, however, the computational com-
plexity of the phylogeny problem must begin to outweigh the benefits of adding taxa. Although it is
not stated explicitly in the literature, it seems that the general expectation is that phylogenetic accuracy
shows a convex distribution with respect to the number of taxa in the analysis, with taxon sampling
and computational complexity limiting accuracy when species numbers are low and high, respectively.
It remains to be demonstrated whether or not this expectation is true and, if so, at what point
accuracy is maximised, while simultaneously considering the running time of the analysis. Estab-
lishing the latter could be especially important to the further development of the so-called ‘divide-
and-conquer’ search strategies such as quartet puzzling22 and disk-covering9,23,24. These strategies
generally seek to solve large phylogenetic problems by breaking them down into numerous smaller
subproblems that are computationally easier to solve precisely because they are smaller with respect
to both the number of taxa and the evolutionary distance between those taxa. The results from the
subproblems are then combined to provide an answer for the initial, global problem. As such,
divide-and-conquer strategies essentially attempt to bridge the gap between the problems of taxon
sampling and computational complexity. However, it is unknown what the optimal sizes of the
subproblems should be in order to achieve the greatest accuracy in the shortest time possible. To
date, subproblem sizes have usually been determined empirically on a case-by-case basis.
Thus, the goal of this chapter is to extend on previous analyses examining the scalability of
phylogenetic accuracy with respect to the number of species in the analysis (the ‘size’ of the analysis).
Specifically, we use simulation to investigate the changes in various parameters (accuracy, resolution
and running time) related to the analysis of increasingly larger phylogenetic problems under different
optimisation criteria (NJ, MP and ML) and methods of data set selection (random or clade sampling).
Our results elucidate the prospects for phylogenetic analyses of very large phylogenetic problems, as
might be needed to infer the tree of life or study large and species rich taxa, and provide additional
insights into the potential of divide-and-conquer search strategies within this context.
YULE_C procedure in the program r8s v1.6025. Branch lengths on the tree were modelled assuming
a model of substitution that departs from a molecular clock. Specifically, branch-specific rates of
evolution were determined by drawing random normal variates (mean of 1.0 and standard deviation
of 0.5, truncated outside of [0.1, 2.0]) and multiplying by an overall tree-wide rate of substitution.
Branch lengths were determined by multiplying branch-specific rates with branch durations
obtained from the Yule process model.
A model data set was then created by evolving a nucleotide sequence down the model tree
using a standard Markov process model as implemented in Seq-Gen v.1.2.726. The sequence length
was 2,000 bp, which is of sufficient length for simulated data with its stronger signal to achieve
good accuracy for even the largest tree examined herein13, but is also short enough to keep running
times within acceptable limits. Sequences were generated under a Kimura 2-parameter model27
with a transition/transversion ratio (ti:tv) of 2.0, site-to-site rate heterogeneity (that is, Gamma
model) with shape parameter of 0.5, and an overall average rate of evolution of 0.1 substitutions/site,
measured along a path from the root to a tip of the tree. No invariant sites were explicitly modelled.
The model data set was then sampled to create test data sets where the number of taxa varied on
a log2 scale from 4 to 2,048. No sampling of characters was performed so that the sequence length
was always 2,000 bp. Taxon sampling was accomplished by either selecting taxa at random (random
sampling) or by selecting a single clade from the model tree of the same size as the number of taxa
to be retained (clade sampling); all other taxa were pruned from the test data sets. The expectation
is that clade sampling should result in improved accuracy, given that it minimises the evolutionary
diameter of the problem; this is the logic underlying the disk-covering family of divide-and-conquer
methods28. By contrast, random sampling will tend to result in an increased number of long branches
and/or extend the diameter of the problem, especially when the proportion of taxa sampled is very
low. Both factors have been demonstrated to reduce the accuracy of phylogenetic inference.
Clade sampling requires the model tree to possess at least one clade for all the test sizes.
Because this situation was difficult to achieve, clades that were within ±2.5% of the desired size
were used when there was no clade of exactly the size desired. When multiple clades for a given
size existed, one was chosen at random. If the model tree did not contain clades of all the desired
sizes, it was discarded, and a new model tree was generated.
Each subsampled data set (for both random and clade sampling) as well as the full data set
were analysed using three optimisation criteria, each of which accounted for the model of evolution
Kimura 2-parameter + Gamma (K2P + G) as far as possible: MP, NJ, and ML. For the four largest
matrices (512; 1,024; 2,048; and 4,096 taxa), a ML analysis in conjunction with a disk-covering
divide-and-conquer framework (ML-DCM3) was also used. Bayesian analysis was not examined
due to time and memory constraints29. Because Bayesian analysis samples from the posterior
distribution of trees, it is necessarily significantly slower than the other methods examined here,
especially if a high number of generations is employed to ensure reliable results. Even without
Bayesian analysis, each replicate required just over five days to complete.
Thus, the results for each individual run were based on data matrices all derived from the same
model set of molecular data evolved along the same model tree. This procedure differs substantially
from that used by Bininda-Emonds et al.13, in which model trees of the desired problem size were
generated (that is, there was no sampling performed). Additionally, for each subproblem size and
sampling strategy, the same alignment was analysed by each of MP, NJ, ML and where appropriate
ML-DCM3. In total, 50 runs were conducted, comprising nearly eight CPU months of analysis time.
a thorough heuristic (<256 taxa), the parsimony ratchet (<1,024 taxa31), and finally a greedy heuristic
(Σ 4,096 taxa). The thorough heuristic consisted of 100 random addition sequences with TBR branch-
swapping, with a maximum of 10,000 trees being retained at any time during the analysis. The
parsimony ratchet consisted of 10 batches of 100 iterative weighting steps, with 25% of the characters
receiving a weight of two at each step. Thereafter, all equally most parsimonious trees were used as
starting trees for a heuristic search using TBR branch swapping and limited to one hour of CPU time.
Each replicate used the same command file for the ratchet, which was created using the Perl script
PerlRat v1.0.9a. However, for the largest matrices, even the parsimony ratchet proved to be too slow
during the test phase, especially because of the use of a step matrix to account for the ti:tv ratio.
Therefore, a greedy heuristic was used, consisting of a single simple stepwise addition sequence
followed by TBR branch-swapping with a maximum of 10 trees being retained at any time.
NJ analyses used QuickTree32 using a Kimura translation to determine the pairwise distances.
ML analyses used RAxML-V (Randomized Axelerated Maximum Likelihood)33, which is one
of the fastest and most accurate programs for ML-based phylogenetic inference. A key feature of
RAxML is its comparatively low memory consumption29, which in combination with its advanced
search algorithms and accelerated likelihood function33,34 makes it uniquely suitable for ML analyses
of large numbers of taxa. All RAxML analyses used the default hill-climbing search option (–f c)
using an HKY85 substitution model with an estimate of 50 distinct per site evolutionary rate
categories (CAT). This HKY + CAT model is essentially empirically equivalent to the better known
HKY + I + G model, but requires fewer floating point operations and memory.
Finally, we also performed ML analyses using a divide-and-conquer search algorithm at the
largest problem sizes (512 or more taxa) using RAxML in concert with the Recursive Iterative
Disk Covering Method (Rec-I-DCM3)9. This combination of methods has been more formally
referred to as Rec-I-DCM3(RAxML); however, we use the simpler ML-DCM3 throughout this
chapter. Based on an initial ‘guide tree’ containing all taxa (here, the starting tree for the ML
analyses as computed by RAxML), Rec-I-DCM3 intelligently decomposes the data set into smaller
subproblems that overlap in their taxon sets. These subproblems are then solved using RAxML
(using the same parameters as above), with the respective subtrees merged into a comprehensive
tree with the Strict Consensus Merger23. This global tree was then further improved using RAxML
(using the fast hill climbing heuristic; option –f f) to construct the new guide tree. The processes
of decomposition, subproblem inference, subtree merging and global refinement were repeated for
three iterations. The maximum size of the subproblems was 25% of the size of the full data set,
as suggested in the user notes to Rec-I-DCM3.
A time limit of 12 hours was imposed on each individual analysis. This limit was never invoked
for the NJ analyses and only for the largest matrices for MP (4,096 only), ML (2,048 and 4,096, but
not always for both sizes) and ML-DCM3 (2,048 and 4,096). The use of a time limit will obviously
impact accuracy negatively and potentially penalise the more computationally intensive ML analyses
to a greater extent. However, the reality is that shortcuts of various types (for example, time limits or
less thorough search strategies) must be employed when analysing very large matrices, so this constraint
might represent a reasonable one. To judge the effects of imposing a time limit, one additional run was
performed for the full data set of 4,096 taxa with all methods being allowed to run to completion.
In all cases, the inferred tree was held to be the strict consensus of all equally optimal solutions.
All analyses were conducted on a cluster of unloaded 2.4-GHz Opteron 850 processors, each with
8 GB of RAM, located at the Department of Informatics at the Technical University of Munich.
All programs used (including those used to simulate the data) were compiled as needed for this
platform.
number of clades on the inferred tree relative to the total number of clades on a fully bifurcating
tree of the same size (n – 2 for an unrooted tree, where n = number of taxa). Resolution varies
between 0 and 1, with the former value indicating a completely unresolved bush and the latter
indicating a fully resolved tree. This parameter reflects the decisiveness of the analysis and is most
relevant for the MP analyses. NJ always returns a single, fully resolved tree, and ML analyses
invariably do so as well.
Accuracy was measured as the ability to reconstruct the model tree. In computer science, the
optimality score of an analysis (either in isolation or in relation to that of the model tree) is also often
used as a proxy for accuracy. However, the use of three different optimality criteria in this study
prevents such an approach, and the comparison to a known ‘true’ tree is perhaps more intuitive to
biologists. Accuracy was quantified using both the consensus fork index (CFI35,36) of the strict
consensus of the inferred and model trees and the symmetric difference (or partition metric) between
these two trees (dS37). The CFI indicates the proportion of clades shared between the two trees, whereas
dS indicates the number of clades found on one tree or the other, but not both. To make these values
comparable, dS was normalised according to the number of taxa on the trees (by dividing by 2n – 6,
where n = number of taxa38) and subtracted from one to derive a similarity measure equivalent to
CFI. Although it is not strictly accurate, we continue to refer to this metric as dS for convenience.
CFI and dS differ most importantly in how they treat polytomies in the inferred tree (the model
tree is always fully bifurcating). CFI treats all polytomies as errors, whereas dS essentially ignores
them because they do not specify any unique clades. Thus, in comparing a fully resolved tree with
a fully unresolved one, CFI = 0 and dS = 0.5. As such, the difference between CFI and dS is again
most relevant for the MP analyses, which are the only ones expected to produce trees that are not
fully resolved. For the comparison of two fully resolved trees, CFI = dS.
Finally, the running time for each analysis was recorded in seconds. Again, an upper limit of
12 hours (43,200 seconds) was imposed on all analyses. However, analysis times could still
substantially exceed this limit in some cases due to the discrete nature of the stopping mechanisms.
For instance, a search can be terminated only after the completion of an iteration or calculation of
an optimality score, both of which can represent long-running operations at the largest tree sizes.
For each variable, results were compared using a multivariate analysis of variance (ANOVA),
with the method of analysis and sampling strategy as factors, and the size of the data set as a
covariate. The level of significance was α = 0.05. Fisher’s protected least significant difference
(PLSD) test was used to determine significant differences between categories within a factor.
• PerlRat.pl: www.uni-jena.de/~b6biol2/ProgramsMain.html
• RAxML: diwww.epfl.ch/~stamatak (under ‘software’)
• Rec-I-DCM3: www.cs.njit.edu/usman/RecIDCM3.html
6.3 RESULTS
6.3.1 RESOLUTION
Resolution was always one for each individual NJ, ML, and ML-DCM3 analysis. MP produced
trees that were significantly less resolved (P < 0.0001 for all pairwise comparisons) and, except
for a tree size of four with random sampling, were never fully resolved on average (Figure 6.1A).
Nevertheless, the MP trees were generally well resolved at all tree sizes, with the average resolution
being always greater than 0.90. Resolution for MP differs significantly with tree size (P < 0.0001),
9579_C006.fm Page 83 Saturday, November 11, 2006 4:07 PM
A
1.000
0.975
Average resolution (%)
0.950 MP (random)
MP (clade)
0.925
0.900
0.875
1 10 100 1000 10000
Size of subsampled tree
B
0.98
Ratio of average resolution
0.97
(clade / random)
0.96
0.95
0.94
0.93
1 10 100 1000 10000
Size of subsampled tree
FIGURE 6.1 Resolution of trees inferred using MP from data sampled from a model matrix of 2,000 bp for
4,096 taxa. (A) Average resolution over 50 individual runs; error bars represent standard errors. (B) Ratio of
average resolutions from clade sampling as compared to random sampling. Resolution for all other optimisation
criteria was always 1.
showing a concave pattern that is noticeably higher at extremely small and extremely large tree
sizes. In the latter case, however, this is an artefact of only 10 trees being retained in analyses of
1,024 or more taxa. Otherwise, it appears that resolution reaches a plateau of about 0.90 for clade
sampling and 0.95 for random sampling. The average resolution for the MP analyses using clade
sampling was always significantly less than that for random sampling (P < 0.0001); the ratio of
the values for clade versus random sampling fell between 0.94 and 0.98 at all tree sizes
(Figure 6.1B). All methods yielded fully resolved trees, or nearly so for MP, in the time-unlimited
analyses (Table 6.1).
9579_C006.fm Page 84 Saturday, November 11, 2006 4:07 PM
TABLE 6.1
Statistics Relating to a Time-Unlimited Analysis of the Full Dataset
of 4,096 Taxa
Accuracy
Optimisation Criterion/Method Resolution CFI (1 – dS) Time (seconds)
6.3.2 ACCURACY
Accuracy, whether measured by CFI or dS was generally good at all tree sizes and for all methods
(Figure 6.2A and Figure 6.3A). In all cases, accuracy was greater than 80% on average and often
better than 90%. Tree size had a variable impact on accuracy. It did not influence accuracy for
either ML-DCM3 (P = 0.4812; although only four sizes were tested for this method), MP as
measured by dS (P = 0.4132), or ML (P = 0.1995), but had a significant effect for both NJ (P =
0.0244) and MP as measured by CFI (P = 0.0087). However, the only clear trend is for NJ under
clade sampling where accuracy decreases with the size of the problem. In all the remaining cases,
the curves are reasonably flat and/or sigmoidal. Except for ML-DCM3, allowing all methods to
run to completion in the time-unlimited analyses produced significantly more accurate results when
compared to the 12-hour limited analyses (P < 0.0001 according to a one sample t-test).
The different optimisation criteria/methods used also had an impact on the accuracy of the
solutions. When CFI was used to measure accuracy (Figure 6.2A), ML and ML-DCM3 were not
significantly different (P = 0.0763), and neither were MP and NJ (P = 0.6982). However, the trees
derived using the former methods were significantly more accurate than those from the latter (P <
0.0001). When dS was used (Figure 6.3A), ML trees were statistically indistinguishable from those
from either MP (P = 0.7037) or ML-DCM3 (P = 0.0618), although the latter two were significantly
different from one another (P = 0.0340). NJ yielded significantly worse trees in all cases (P < 0.0001).
Only the MP analyses showed a difference in accuracy as measured by the two metrics (compare
Figure 6.2A and Figure 6.3A), with the analogous values for dS being either equal to, or more
commonly, greater than those for CFI. The effect was the most pronounced for clade sampling,
which also produced solutions that were less resolved than were those from random sampling
(Figure 6.2B and Figure 6.3B).
For both NJ and MP (dS only), the sampling strategy had a significant effect on accuracy (P <
0.0001), with clade sampling generally leading to increasingly accurate solutions as the size of the
problem decreased. However, the two sampling strategies showed similar performance with respect
to accuracy for trees of 512 or more taxa. No effect was present for MP when accuracy was
measured using CFI (P = 0.1248). Likewise, there was no significant trend for ML with respect to
the sampling strategy (P = 0.0698). Random sampling produced slightly, but significantly more
accurate trees with ML-DCM3 at the three relevant problem sizes examined for it (512; 1,024; and
2,048 taxa; P < 0.0001).
A
1.00
MP (random)
Average similarity to model tree (CFI)
0.95 MP (clade)
NJ (random)
0.90 NJ (clade)
ML (random)
0.85
ML (clade)
DCM (random)
0.80
DCM (clade)
0.75
1 10 100 1000 10000
Size of subsampled tree
B
1.10
Ratio of average CFI-similarity
(clade / random)
1.05 MP
NJ
ML
1.00 DCM
0.95
1 10 100 1000 10000
Size of subsampled tree
FIGURE 6.2 Phylogenetic accuracy of trees inferred using different methods from data sampled from a model
matrix of 2,000 bp for 4,096 taxa. Accuracy was measured as the value of the CFI between the inferred tree
and the model tree upon which the data were simulated; both trees were pruned so as to have identical taxon
sets. (A) Average accuracy over 50 individual runs; error bars represent standard errors. (B) Ratio of average
accuracy from clade sampling as compared to random sampling.
Running times were significantly influenced by all three factors and covariates examined, either
in isolation or in combination (all P < 0.0001). Fisher’s PLSD tests also revealed highly significant
differences (all P < 0.0001) between all pairs of categories within the factors of sampling strategy
and method of analysis.
For all optimisation criteria, running times increased approximately linearly with tree size on
a log-log scale (Figure 6.4A and Figure 6.4C). For each doubling in tree size, the running time of
NJ increased by a factor of about three on average (random sampling: 3.22 ± 0.60 (mean ± SE);
clade sampling: 3.33 ± 0.61). MP showed both the largest and most variable increases in running
9579_C006.fm Page 86 Saturday, November 11, 2006 4:07 PM
A
1.000
Average similarity to model tree (1 – ds)
MP (random)
0.950 MP (clade)
NJ (random)
0.900
NJ (clade)
ML (random)
0.850
ML (clade)
0.750
1 10 100 1000 10000
Size of subsampled tree
B
1.15
Ratio of average ds-similarity
1.10
(clade / random)
MP
1.05 NJ
ML
DCM
1.00
0.95
1 10 100 1000 10000
Size of subsampled tree
FIGURE 6.3 Phylogenetic accuracy of trees inferred using different methods from data sampled from a model
matrix of 2,000 bp for 4,096 taxa. Accuracy was measured as one minus the normalised value of the partition
metric between the inferred tree and the model tree upon which the data were simulated; both trees were
pruned so as to have identical taxon sets. (A) Average accuracy over 50 individual runs; error bars represent
standard errors. (B) Ratio of average accuracy from clade sampling as compared to random sampling.
time (random sampling: 6.54 ± 2.31; clade sampling: 12.26 ± 5.84). The largest increases for MP
occurred for the comparisons 8–16 and 64–128 taxa (random sampling) and 16–32, 32–64 and
64–128 taxa (clade sampling). Many of these high rates corresponded with either the adoption of
a new, less thorough search strategy or when a given search strategy was apparently becoming
‘overloaded’ for a given problem size. Finally, the rate increases for ML were intermediate between
NJ and MP and with low variation (random sampling: 4.34 ± 0.69; clade sampling: 5.03 ± 0.87).
Compared to the other methods (Figure 6.4A), NJ always produced the shortest running times
(P < 0.0001), with the differences becoming the most marked at tree sizes of 16 taxa or greater.
9579_C006.fm Page 87 Saturday, November 11, 2006 4:07 PM
A
100000
MP (random)
Average analysis time (seconds)
10000
MP (clade)
1000 NJ (random)
100 NJ (clade)
ML (random)
10
ML (clade)
1 DCM (random)
0.1 DCM (clade)
0.01
1 10 100 1000 10000
Size of subsampled tree
B
1.5
Ratio of average analysis time
1.0
(clade / random)
MP
NJ
ML
0.5
DCM
0.0
1 10 100 1000 10000
Size of subsampled tree
FIGURE 6.4 Analysis times for trees inferred using different methods from data sampled from a model matrix
of 2,000 bp for 4,096 taxa. The time used for 4,096 taxa derives from the single time-unlimited analysis for
MP, ML and ML-DCM3. (A) Average analysis time over 50 individual runs; error bars represent standard
errors. (B) Ratio of average analysis time from clade sampling as compared to random sampling. (C) Ratio
of average analysis time for a given sample size as compared to previous sample size.
The running times of MP and ML were roughly comparable, although MP was significantly faster
(P < 0.0001) on the whole. MP tended to run faster for the smallest and largest tree sizes with
clade sampling, whereas ML generally ran faster for all tree sizes under random sampling. In those
cases where the time limit was not exceeded, ML-DCM3 was faster on average than ML, but slower
than MP (in both cases by a factor of two to three).
Analysis times under clade sampling were almost always less than those for random sampling,
with the differences becoming smaller with increasing tree size (Figure 6.4B). For ML, running
times under clade sampling were faster by a factor of at least two for all tree sizes except four and
2,048. The most marked differences for MP were limited to tree sizes of 64 or fewer taxa, where
the differences were the largest for all the methods examined (including a factor of nearly 40 with
9579_C006.fm Page 88 Saturday, November 11, 2006 4:07 PM
C
(sample size to next smaller sample size) 100.0
MP (random)
Ratio of average analysis time
MP (clade)
10.0
NJ (random)
NJ (clade)
1.0 ML (random)
ML (clade)
0.0
1 10 100 1000 10000
Size of subsampled tree
16 taxa). Beyond 128 taxa, the respective running times for the different sampling strategies were
approximately equal for MP. For 256 and 512 taxa, at least, this result reflects the more deterministic
nature of the parsimony ratchet, which must perform a set number of iterations.
6.4 DISCUSSION
Our simulation study produced three key findings, some unexpected:
• That accuracy is at best only weakly influenced by the size of the problem
• That the methods of inference examined produce solutions of comparable, and good,
accuracy
• That the sampling strategy employed has a significant effect on both accuracy (to a point)
and, more strongly, on the running time of the analysis
Naturally, caveats abound. Our study used simulated data, which tend to be ‘cleaner’ and contain
more phylogenetic signal than real sequence data. Moreover, the model of evolution used (HKY
+ G) is less complicated, and therefore less computationally intensive, than those more commonly
used. Finally, the methods of inference were refined to match the known model as precisely as
possible. Thus, our results might, in isolation, represent a best case scenario. However, the com-
parative aspects of our study should be accurate.
that the amount of sequence data (number of genes), and not the number of taxa, is the more critical
factor influencing phylogenetic accuracy. In our case, the alignment length of 2,000 bp was
specifically chosen because it apparently contained sufficient phylogenetic signal for all problem
sizes examined here13, thereby minimising the effects of sequence length. In real terms, 2,000 bp
would represent perhaps one or two genes of lengths that are typically used in phylogenetic analysis.
Although this might be an insufficient number of genes to achieve good accuracy1,39 it must again
be remembered that the simulated sequence data often contain much more signal than real data
due to the absence of gaps and a lack of noise that would arise because of alignment errors.
Similarly, good evidence exists that the performance of NJ with respect to accuracy, although
acceptable, is generally inferior to that produced by ML and weighted MP40. This difference in
performance, as well as the vastly shorter running times of NJ, derives from the absence of branch-
swapping in NJ to correct for suboptimal topologies created during the tree construction process.
As such, NJ by itself seems ideally suited as a method to very quickly generate a relatively accurate
starting tree for subsequent and more computationally intensive branch-swapping. Exactly such an
approach is implemented in PHYML41 and could equally well be applied to MP or within a distance
framework (minimum evolution, ME).
Our results also attest to the recent advances in heuristic search strategies, particularly in a ML
framework. Despite the increased complexity of ML as compared to MP (which has long been
viewed as an obstacle to ML analyses), both accuracy and running times were comparable between
the two optimisation criteria. Moreover, the more complex nature of the likelihood surface means
that, at most, only a few equally optimal solutions are usually found (and typically only a single
solution)42, thus providing a more resolved solution than is usually the case for the analogous MP
analyses (Figure 6.1A). Many would see this as being a desirable feature in that the ML estimate
of the phylogeny of a large, species-rich group could be argued to be more decisive and definitive
than the MP estimate.
In addition, whereas MP running times were kept in check by applying increasingly faster
heuristic search algorithms, all ML searches used the same standard hill climbing searches in
RAxML. A less thorough, but faster hill climbing strategy also exists, which was used to optimise
the global tree in the ML-DCM3 analyses based on previous empirical work showing it to work
the best of all methods in this context. As revealed by the single analysis of the full data set, the
use of this option makes the ML analyses faster (by a factor of 7.8) with virtually no loss in accuracy
(see Table 6.1). In fact, the ‘fast’ ML analyses were both faster and more accurate than the MP
analyses performed. However, it should also be realised that faster implementations of MP searches
than those used here, such as those implemented in TNT (available from http://www.zmuc.dk/public/
phylogeny/TNT)43 also exist. Rec-I-DCM3 has also been used in conjunction with TNT9, boosting
performance even further. A faster implementation of NJ than that used here was also recently
published6. At the same time, it should be pointed out that the computational ‘arms race’ is still
ongoing, with the latest version of RAxML, RAxML-VI-HPC (v2.1), showing significant speed
improvements over the version used here, particularly for very large data sets.
Altogether, these findings bode well, not only for reconstructing very large phylogenies, but
also for estimating support for the groups present in those phylogenies44. Analogous to our finding
that accuracy is relatively flat with respect to the size of the problem and, in the case of MP, to the
use of greedier heuristic search strategies, Salamin et al.44 found that estimated bootstrap frequencies
are apparently robust to the use of less effective branch-swapping methods during the tree searching
operations (for example, in decreasing order of searching thoroughness, TBR versus SPR versus
NNI; see Swofford et al.27 for descriptions). Again, the use of NJ offers a means to quickly generate
a reasonably accurate starting tree for further branch-swapping operations. Finally, an additional
option for quickly determining support values in a ML framework is the use of the resampling of
estimated log-likelihood (RELL) approximation45,46, which apparently can estimate bootstrap pro-
portions for a given tree more accurately than a true bootstrap analysis that uses fast heuristics to
search through tree space47.
9579_C006.fm Page 90 Saturday, November 11, 2006 4:07 PM
resolved based on our findings that the sampling strategy has a decreasing effect on accuracy and
inference times for proportionately larger sampling sizes, and that the choice of sampling strategy
does not significantly affect the accuracy of the ML analyses, which are known to be more immune
to the adverse effects of taxon sampling and long-branch attraction. As such, it should be possible
in a parallel implementation of Rec-I-DCM3(RAxML) to initially split the alignment naïvely into
relatively large subsets of approximately equal size (comprising approximately 12.5–50% of the
original dataset) based on the guide tree. This strategy should improve load balance without any
undue loss of performance. In turn, these large initial subproblems would then be optimised using
the more intelligent subdivision method employed by Rec-I-DCM3, where the benefits of clade
sampling take on greater importance. The potential utility of a similar naïve division method has
been observed with a proprietary divide-and-conquer algorithm implemented in RAxML
(Stamatakis unpublished data).
10000000
1000000
MP (random)
Number of analyses
100000 MP (clade)
NJ (random)
10000
NJ (clade)
1000
ML (random)
100 ML (clade)
10
1
1 10 100 1000 10000
FIGURE 6.5 Number of analyses for a given tree size that could be completed in the same time needed for
an analysis of 4,096 taxa. For each tree size, the average running time over the 50 runs (see Figure 6.4) was
compared against the time required to analyse 4,096 taxa from the time-unlimited analyses (see Table 6.1).
The black line represents the number of analyses required such that 4,096 taxa are analysed in total.
9579_C006.fm Page 92 Saturday, November 11, 2006 4:07 PM
size of 32 taxa, this amounts to 128 individual analyses (although a global tree of all 4,096 taxa
could not be derived from these analyses because the trees do not overlap). However, Figure 6.5
reveals that over 20,000 MP analyses of 32 taxa selected using clade sampling could be conducted
in the time taken for a single MP analysis of 4,096 taxa, or 176 sets of 128 analyses. Note also
that these particular numbers are underestimates, given that the MP search strategy used for 4,096
taxa was considerably less robust, and therefore comparatively faster, than that used for 32 taxa.
For all optimisation criteria (including NJ) and at virtually all subproblem sizes, the time savings
are similarly enormous; only MP analyses at tree sizes of 128 to 512 taxa show a decrease in time
(Figure 6.5). Thus, there is tremendous scope in a divide-and-conquer framework for many indi-
vidual analyses to ensure high overlap between the trees, a factor that has been shown to improve
the accuracy of the merged supertree51.
Although it is tempting to try and derive the optimal subproblem size from Figure 6.5 under
the assumption that accuracy is approximately flat with respect to the size of the subproblem, it
must be remembered that these numbers do not account for the initial accuracy of the merged
solution and, therefore, the time required for any global optimisations of it. Because such optimi-
sations are computationally expensive (which accounts for the proportionately longer analysis times
of larger solutions), they represent an important performance bottleneck. For example, the global
optimisation step, even with the use of the fast ML algorithm, consumed the most execution time
for the ML-DCM3 analyses. Moreover, there is a general consensus among researchers involved
in the development of divide-and-conquer algorithms that global optimisations must be applied at
some point to obtain the most accurate trees possible52. As such, the role for divide-and-conquer
strategies will be, as for NJ, to yield as good a starting tree in as little time as possible. Research
should now focus, therefore, on determining the optimal subproblem size and merger method that
maximise the accuracy of the merged tree (so as to minimise the global optimisation time) in as
short a time as possible. To our knowledge, there has been little work in this area (although the
Rec-I-DCM3 user guide suggests a maximum subproblem size of 25% of the global size), nor in
examining the accuracy of the merged tree without any subsequent global optimisation. Additional
benefits would derive from pursuing this course of action with an eye toward the development of
efficient parallel optimisation methods, particularly for the computationally intensive global opti-
misation step.
6.5 CONCLUSIONS
Together with other similar findings11–13, the results we present here are encouraging for the
prospects of building ever larger phylogenetic trees in our efforts to reconstruct the tree of life.
Continued developments in computer technology and algorithm development can only increase our
feeling of optimism. Even so, it must be remembered that even 10,000 taxa, the approximate limit
for all simulations performed to date, represent only a minute fraction of the entire tree of life.
Larger problems have been analysed successfully, but without any real knowledge of how accurate
the answers might be. We simply do not know at this point how far the scalability of acceptable
accuracy extends. As such, it seems clear that a divide-and-conquer approach, whereby we can
break the problem down into pieces where we are confident of achieving good accuracy (and in
less time), must form a necessary part of our efforts to obtain the tree of life.
ACKNOWLEDGEMENTS
We thank Trevor Hodkinson and John Parnell for the invitation to contribute to this volume. We
are also grateful to the Department of Informatics at the Technical University of Munich for
providing access to their Infiniband computer cluster and to Usman Roshan for allowing us to
9579_C006.fm Page 93 Saturday, November 11, 2006 4:07 PM
include the (at the time) unpublished Rec-I-DCM3(RAxML) procedure in our simulations. This
work was funded as part of the NGFN-funded project Bioinformatics for the Functional Analysis
of Mammalian Genomes (BFAM) (Olaf Bininda-Emonds).
REFERENCES
1. Rokas, A. and Carroll, S.B., More genes or more taxa? The relative contribution of gene number and
taxon number to phylogenetic accuracy, Mol. Biol. Evol., 22, 1337, 2005.
2. Bininda-Emonds, O.R.P., Supertree construction in the genomic age, in Molecular Evolution: Pro-
ducing the Biochemical Data, part B, Zimmer, E.A. and Roalson, E., Eds., Methods in Enzymology,
Vol. 395, Elsevier, Amsterdam, 2005, 745.
3. Sanderson, M.J. and Driskell, A.C., The challenge of constructing large phylogenetic trees, Trends
Pl. Sci., 8, 374, 2003.
4. Wilson, E.O., Taxonomy as a fundamental discipline, Philos. Trans. R. Soc. Lond. B, 359, 739, 2004.
5. Felsenstein, J., The number of evolutionary trees, Syst. Zool., 27, 27, 1978.
6. Mailund, T. and Pedersen, C.N., QuickJoin: fast neighbour-joining tree reconstruction, Bioinformatics,
20, 3261, 2004.
7. Stamatakis, A., Parallel inference of a 10,000-taxon phylogeny with maximum likelihood, in Proceedings
of 10th International Euro-Par Conference (Euro-Par 2004), Springer Verlag, 2004, 997.
8. Coarfa, C. et al., PRec-I-DCM3: a parallel framework for fast and accurate large scale phylogeny
reconstruction, in The First IEEE Workshop on High Performance Computing in Medicine and Biology
(HiPCoMP 2005), Fukuoka, Japan, 2005.
9. Roshan, U. et al., Rec-I-DCM3: a fast algorithmic technique for reconstructing large phylogenetic
trees, in Proceedings of the IEEE Computational Systems Bioinformatics conference (CSB), IEEE
Computer Society Press, Stanford, California, 2004.
10. Garey, M.R. and Johnson, D.S., Computers and Intractability: a Guide to the Theory of NP-Completeness,
W.H. Freeman, San Francisco, 1979.
11. Salamin, N., Hodkinson, T.R., and Savolainen, V., Towards building the tree of life: a simulation study
for all angiosperm taxa, Syst. Biol., 54, 183, 2005.
12. Hillis, D.M., Inferring complex phylogenies, Nature, 383, 130, 1996.
13. Bininda-Emonds, O.R.P. et al., Scaling of accuracy in extremely large phylogenetic trees, in Pacific
Symposium on Biocomputing 2001, Altman, R.B. et al., Eds., World Scientific Publishing Company,
River Edge, NJ, 2000, 547.
14. Erdös, P.L. et al., A few logs suffice to build (almost) all trees (I), Random Struc. Alg., 14, 153, 1999.
15. Erdös, P.L. et al., A few logs suffice to build (almost) all trees: part II, Theoret. Comput. Sci., 221,
77, 1999.
16. Huelsenbeck, J.P. and Hillis, D.M., Success of phylogenetic methods in the four-taxon case, Syst.
Biol., 42, 247, 1993.
17. Bergsten, J., A review of long-branch attraction, Cladistics, 21, 163, 2005.
18. Hillis, D.M., Taxonomic sampling, phylogenetic accuracy, and investigator bias, Syst. Biol., 47, 3,
1998.
19. Pollock, D.D. et al., Increased taxon sampling is advantageous for phylogenetic inference, Syst. Biol.,
51, 664, 2002.
20. Graybeal, A., Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol., 47,
9, 1998.
21. Rosenberg, M.S. and Kumar, S., Incomplete taxon sampling is not a problem for phylogenetic
inference, Proc. Natl. Acad. Sci. USA, 98, 10751, 2001.
22. Strimmer, K. and von Haeseler, A., Quartet puzzling: a quartet maximum-likelihood method for
reconstructing tree topologies, Mol. Biol. Evol., 13, 964, 1996.
23. Huson, D.H., Nettles, S.M., and Warnow, T.J., Disk-covering, a fast-converging method for phyloge-
netic tree reconstruction, J. Comput. Biol., 6, 369, 1999.
24. Huson, D.H., Vawter, L., and Warnow, T.J., Solving large scale phylogenetic problems using DCM2,
in Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology,
Lengauer, T. et al., Eds., AAAI Press, Menlo Park, California, 1999, 118.
9579_C006.fm Page 94 Saturday, November 11, 2006 4:07 PM
25. Sanderson, M.J., r8s: inferring absolute rates of molecular evolution and divergence times in the
absence of a molecular clock, Bioinformatics, 19, 301, 2003.
26. Rambaut, A. and Grassly, N.C., Seq-Gen: an application for the Monte Carlo simulation of DNA
sequence evolution along phylogenetic trees, Comput. Appl. Biosci., 13, 235, 1997.
27. Swofford, D.L. et al., Phylogenetic inference, in Molecular Systematics, Hillis, D.M., Moritz, C., and
Mable, B.K., Eds., Sinauer Associates, Sunderland, Massachusetts, 1996, 407.
28. Roshan, U. et al., Performance of supertree methods on various data set decompositions, in Phyloge-
netic Supertrees: Combining Information to Reveal the Tree of Life, Bininda-Emonds, O.R.P., Ed.,
Kluwer Academic, Dordrecht, Netherlands, 2004, 301.
29. Stamatakis, A., Ludwig, T., and Meier, H., New, fast and accurate heuristics for inference of large
phylogenetic trees, in Proceedings of 18th IEEE/ACM International Parallel and Distributed Process-
ing Symposium (IPDPS2004), High Performance Computational Biology Workshop, IEEE Computer
Society, Santa Fe, New Mexico, 2004.
30. Swofford, D.L., PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4,
Sinauer Associates, Sunderland, MA, 2002.
31. Nixon, K.C., The parsimony ratchet, a new method for rapid parsimony analysis, Cladistics, 15, 407,
1999.
32. Howe, K., Bateman, A., and Durbin, R., QuickTree: building huge neighbour-joining trees of protein
sequences, Bioinformatics, 18, 1546, 2002.
33. Stamatakis, A., Ludwig, T., and Meier, H., RAxML-III: a fast program for maximum likelihood-based
inference of large phylogenetic trees, Bioinformatics, 21, 456, 2005.
34. Stamatakis, A.P. et al., AxML: a fast program for sequential and parallel phylogenetic tree calculations
based on the maximum likelihood method, Proc. IEEE Comput. Soc. Bioinform. Conf., 1, 21, 2002.
35. Colless, D.H., Congruence between morphometric and allozyme data for Menidia species: a reap-
praisal, Syst. Zool., 29, 288, 1980.
36. Colless, D.H., Predictivity and stability in classifications: some comments on recent studies, Syst.
Zool., 30, 325, 1981.
37. Robinson, D.F. and Foulds, L.R., Comparison of phylogenetic trees, Math. Biosci., 53, 131, 1981.
38. Steel, M., The complexity of reconstructing trees from qualitative characters and subtrees, J. Classif.,
9, 91, 1992.
39. Rokas, A. et al., Genome-scale approaches to resolving incongruence in molecular phylogenies,
Nature, 425, 798, 2003.
40. Hillis, D.M., Huelsenbeck, J.P., and Cunningham, C.W., Application and accuracy of molecular
phylogenies, Science, 264, 671, 1994.
41. Guindon, S. and Gascuel, O., A simple, fast, and accurate algorithm to estimate large phylogenies by
maximum likelihood, Syst. Biol., 52, 696, 2003.
42. Rogers, J.S. and Swofford, D.L., Multiple local maxima for likelihoods of phylogenetic trees: a
simulation study, Mol. Biol. Evol., 16, 1079, 1999.
43. Goloboff, P.A., Analyzing large data sets in reasonable times: solutions for composite optima, Cla-
distics, 15, 415, 1999.
44. Salamin, N. et al., Assessing internal support with large phylogenetic DNA matrices, Mol. Phylogenet.
Evol., 27, 528, 2003.
45. Hasegawa, M. and Kishino, H., Accuracies of the simple methods for estimating the bootstrap
probability of a maximum-likelihood tree, Mol. Biol. Evol., 11, 142, 1994.
46. Kishino, H., Miyata, T., and Hasegawa, M., Maximum likelihood inference of protein phylogeny and
the origin of chloroplasts, J. Mol. Evol., 31, 151, 1990.
47. Waddell, P.J., Kishino, H., and Ota, R., Very fast algorithms for evaluating the stability of ML and
Bayesian phylogenetic trees from sequence data, in Genome Informatics 2002, Lathrop, R. et al.,
Eds., Universal Academy Press, Tokyo, 2002, 82.
48. Kim, J., General inconsistency conditions for maximum parsimony: effects of branch lengths and
increasing numbers of taxa, Syst. Biol., 45, 363, 1996.
49. le Vinh, S. and Von Haeseler, A., IQPNNI: moving fast through tree space and stopping in time, Mol.
Biol. Evol., 21, 1565, 2004.
9579_C006.fm Page 95 Saturday, November 11, 2006 4:07 PM
50. Du, Z. et al., Parallel divide-and-conquer phylogeny reconstruction by maximum likelihood, in High
Performance Computing and Communications: First International Conference, HPCC 2005, Sorrento,
Italy, September, 21–23, 2005, Proceedings, Dongarra, J. et al., Eds., Springer Verlag, Berlin, 2005,
776.
51. Bininda-Emonds, O.R.P. and Sanderson, M.J., Assessment of the accuracy of matrix representation
with parsimony supertree construction, Syst. Biol., 50, 565, 2001.
52. Roshan, U., Fast Algorithmic Techniques for Large Scale Phylogenetic Reconstruction, Ph.D. thesis,
University of Texas at Austin, 2004.
9579_C006.fm Page 96 Saturday, November 11, 2006 4:07 PM
9579_C007.fm Page 97 Monday, November 13, 2006 11:36 AM
CONTENTS
ABSTRACT
This chapter describes some of the ways in which mathematical techniques provide insights and useful
tools for reconstructing and analysing large trees and networks that will be required for species rich
groups. Classically, ‘mathematical biology’ conjures up images of complex systems of differential
equations; however, in phylogenetics quite different approaches are appropriate. Discrete mathematics,
particularly algorithmic methods, graph theory and combinatorics, along with probability theory and
statistics are the tools of choice. We describe how they can be used to address a range of topical
issues relevant to constructing a tree of life. Why might large trees be useful? Does it even make
sense to talk about a tree in the presence of reticulate evolution (and how does tree incongruence
allow one to quantify the extent of reticulation?). How can one best combine trees on overlapping
sets of taxa into supertrees or supernetworks? And how is biodiversity lost as species go extinct? At
present most of these questions have only partial answers that are undergoing constant revision. Their
full solution is a challenge for the future that will involve a close interplay between (at least) four
disciplines: biology, mathematics, statistics and computer science.
97
9579_C007.fm Page 98 Monday, November 13, 2006 11:36 AM
from its set of clusters). Often the edges of the trees (rooted or unrooted) will have a (branch) length,
corresponding perhaps to the expected amount of evolutionary change on that edge. For a rooted
tree, if the sum of these branch lengths from the root to any leaf is the same (for each leaf) we say
the branch lengths satisfy a molecular clock. Of course, for any tree (including ones constructed
from non-molecular data), if the vertices all have (temporal) dates and the leaves represent con-
temporaneous taxa, then assigning each branch the difference between the dates of its two vertices
also gives branch lengths that satisfy a molecular clock.
Each viewpoint conveys some truth, but none is compelling in itself. For example, the traditionalist
imposes a hierarchy a priori, whilst a more objective approach would allow the data to reveal how
well it fits this scheme; experience with genetic data has shown that numerous relationships have
turned out to violate a traditional classification (for example, see Maley and Marshall9 and the
references therein). The logician is technically correct (it is a mathematical theorem that the
collection of rooted trees on triples of species determines uniquely the global tree), but it ignores
a statistical reality: namely, one simply cannot infer all 3-taxon trees accurately due to sampling
effects, model violation and site saturation on long branches (more about this below). Similarly,
the view concerning life as a tangled network championed by Ford Doolittle and colleagues10,11 is
no doubt correct up to a point; however, it may still make sense to talk about a tree of life, and we
describe below two ways this can be done.
9579_C007.fm Page 100 Monday, November 13, 2006 11:36 AM
is that evolution has involved reticulate processes such as horizontal gene transfer and the formation
of hybrid species (for example in certain plant, insect and animal species25), gene fusion and
endosymbiosis. Furthermore, a species is not a single entity, but rather a population of individuals,
and under sexual reproduction recombination can further complicate a tree-like description of
ancestry. These and other details throw into doubt the plausibility of constructing any well defined
notion of a tree of life, as noted by several authors (for example, Bapteste10). Wayne Maddison26
has also explored the related question of what really constitutes a ‘phylogeny’.
In this section we describe a simple mathematical result that shows how an underlying tree of
life always can be defined (and exists) even in the presence of these various complications. To
explain this result, first recall that a hierarchy C on X is a collection of subsets of X, containing X,
and satisfying the property
A, B ∈C ⇒ A ∩ B ∈{φ , A, B},
that is, the sets in C are nested; if they have one or more species in common then one set is a
subset of the other. It is a classical result that a hierarchy on X forms a tree whose leaves are
labelled with subsets of X that partition X.
One can define a tree of life and avoid problematic notions concerning the definition of species
by working at the level of individual organisms. Furthermore, all one needs to assume about
evolution for this definition is that each organism on earth (now or in the past, excluding the first
organisms) had at least one parent who originated before that organism. We show now how this
assumption alone allows one to define an underlying tree. This tree does not represent the detailed
history of ancestry of individual organisms (after all, sexual reproduction is inherently reticulate
and so is represented by a pedigree graph rather than a tree). Rather, we describe a coarser structure,
based on subsets of extant organisms that form nested clusters (and hence a tree) according to a
property of their ancestry.
Of course this definition of a tree of life should not be taken too literally; the purpose here is
much more modest, namely to show that one can define such a tree even in the face of the many
complications of evolutionary biology mentioned above. Also, although the tree we discuss can in
principle be computed (and in polynomial time), it requires knowing some detailed information
about ancestry, and is unlikely to be feasible, at least at present. Nevertheless it is interesting to
speculate what the tree we describe here looks like, and the approach may provide some enticing
questions and fruitful approaches for future work.
Let X be the set of all extant living taxa, that is, all living organisms currently on Earth. Note
that we are not regarding here X as a set of species or populations, but of individuals. Let Ω denote
the (large but finite) collection of all living organisms throughout the history of life on Earth, and
for any real number t > 0, let Ωt denote all organisms that were alive anytime up to t years ago.
Thus X = Ω0. For x ∈ Ω, let t(x) denote the time when organism x first arose (i.e., was born),
measured, say, in years. Thus:
t ( x ) = max{t ≥ 0 : x ∈Ωt } .
We suppose that each organism that has ever existed arose from one or more parent organism(s)
either:
• By haploid reproduction, or
• By diploid (sexual) reproduction, or
• By some higher-level process involving two or more parent organisms (for example a
complex endosymbiosis event), or
• By being part of the initial population P0 that constituted an origin of life.
9579_C007.fm Page 102 Monday, November 13, 2006 11:36 AM
Stated formally, for each organism x ∈ Ω – P0 there is a subset p(x) of Ω (of size 1, 2 or
possibly higher) and with the following property (for some fixed ε > 0):
which merely formalises a familiar fact: parents originate before their offspring.
We refer to the triple L = ((Ωt, t ≥ 0), P0, p) as a history of life; it is essentially a pedigree on
a grand scale showing all organisms and their parents back through time to the origin of life. There
are two ways to define a natural system of clusters from L, and we will see below (Proposition 7.1.1)
that they are actually equivalent and form a tree.
For a ∈ X, let Pt(a) denote the set of organisms that lived up to t years ago and which have a
as an descendant. Formally, Pt(a) is the subset of Ωt consisting of those x in Ωt for which there is
a sequence of organisms, x = x0, x1, x2 ,…, xk = a (for some k) and with xi ∈p(xi+1) for each i. We
say that a set of extant organisms A ⊆ X is an L-cluster if it satisfies the following property: there
is some time t for which the following holds for all a, a′ ∈ A and all x ∈ X − A:
In words, this property states there was some time t (measured, say, in years) for which any two
organisms in A shared an ancestor that lived at most t years ago, and such that any organism that
was an ancestor of both an organism in A and an organism not in A lived more than t years ago.
Let CL denote the set of L-clusters of X.
The second way to define a collection of subsets of X uses distances. Define a ‘distance’ dL
on X as follows: let dL(x, y) be the first time before the present when x and y shared an ancestral
organism. Formally:
d L ( x , y) := min {t ≥ 0 : Pt ( x ) ∩ Pt ( y) ≠ φ}.
Proposition 7.1.1 For any life history, L satisfying (P1), the set CL is precisely the set of Apresjan
clusters of d L and so forms a hierarchy (or equivalently a rooted tree). Furthermore, CL can be
reconstructed from d L in O(| X |2 ) time.
Proof. Suppose that A is a L-cluster of X. Let tA denote a value of t for which (7.1) holds for all
a, a′ ∈ A and x ∈ X − A. Then for all a, a′ ∈ A, x ∈ X − A , we have max{d L (a, a′) : a, a′ ∈ A} ≤ t A and,
by (P1), min{d L (a, x ) : a ∈ A, x ∈ X − A} > t A so that A is an Apresjan cluster of dL. Conversely, suppose
that A is an Apresjan cluster of dL. Let t = max{d L (a, a ′) : a, a ′ ∈ A}. Then t satisfies (7.1) for
all a, a′ ∈ A and x ∈ X − A , and so A is an L-cluster. The last part of Proposition 7.1.1, follows from
Corollary 2.1 of Bryant and Berry 27 that provides an explicit algorithm to reconstruct the Apresjan
clusters from any distance function.
Although it may be reassuring to know that a tree of life can still be defined in the presence
of complications such as reticulation, such a tree will inevitably miss much of the detail and richness
9579_C007.fm Page 103 Monday, November 13, 2006 11:36 AM
of evolutionary history, and may be largely unresolved in places. To describe a second type of tree
that underlies evolution, though at a higher species level, we first need to talk about the sorts of
networks that have been proposed to represent reticulate evolution.
Proposition 7.1.2 Let H = (V, E) be a hybrid phylogeny on X. Then the collection {c( v ) : v ∈VT }forms
a hierarchy on X, and so forms a tree.
h(H ) = ∑ (d (v) − 1)
v∈V − ρ
−
where d − ( v ) is the in-degree of vertex v. It is easily seen that h(H ) = 0 precisely if H is a tree. For
the hybrid phylogeny H in Figure 7.1, h(H) = 2.
Now, suppose we have a collection of phylogenetic trees constructed from different genes.
Assuming each tree correctly represents the history of the corresponding gene, then any incompat-
ibility between the trees must be due to other processes such as reticulate evolution or lineage
sorting. One can then ask for the fewest reticulate events required to explain the incompatibility.
This question can be phrased more precisely as follows: Given rooted phylogenetic X–trees
T1 , T2 , …, Tk find a hybrid phylogeny H that minimises h(H) and displays T1 , …, Tk . For example,
consider Figure 7.1. This hybrid phylogeny displays both of the trees on the right, and, and this is
the minimum value possible.
It was recently shown that, even for two rooted binary trees, this optimisation problem is
computationally intractable38; nevertheless there are useful mathematical theorems that allow for
lower bounds on h(H) to be established, and these are often strong enough to pin down its exact
9579_C007.fm Page 104 Monday, November 13, 2006 11:36 AM
a b c d a b c d a c b d
FIGURE 7.1 A hybrid phylogeny (left) that displays the two trees on the right.
value if the degree of reticulation is not too extreme 36,39. A different, information-based approach
to quantify reticulation, based on a notion of phylogenetic ‘compression’, has also been described
recently by Ané and Sanderson30.
Methods of the first type include Splits Graphs45, NeighborNet46, Median Neworks47, Consensus
Networks48 and Z-closure networks49; all of which were developed by mathematicians. They provide
useful representations of data. Methods of the second type include the supernetwork approach of
Huson et al. 37 based on modifying split decomposition, and several other approaches34,50,51. A
particularly simple and general approach to network construction is to construct the ‘cover digraph’
of a set of clusters (subsets of X); in some cases this can be an effective strategy for reconstructing
a reticulate network when sufficient phylogenetic signals ‘accumulate’ in an evolutionary process50.
A mathematically elegant technique to construct a network that displays a tree with multiple labels
(arising from polyploidy) has also recently been developed52.
c ρ
d
a
e
b
f
g
i
h a b c d e
FIGURE 7.2 Left: For an unrooted tree PD(W) for W = {b, c, f, g, i} is the sum of the lengths of the dashed
edges. Right: For a rooted tree PD(W ) for W = {b, c) is the sum of the lengths of the dashed edges.
Given a subset W of X, consider the induced phylogenetic W-tree, denoted T /W that connects just
those species in W and its associated edge weighting λW which assigns to each edge e of T /W the
sum of the λ(e)values over those edges of T in the path that corresponds to e. The PD value of W,
denoted PD(W), is defined as
PD (W ) := ∑λ
e
W
(e)
where the summation is over all edges e in the tree T /W. An example is illustrated in Figure 7.2
for W = {b, c, f , g, i }. Note that PD (W ) also depends on (T , λ ) , but we will think of these as fixed.
Also, when |W | = 1 we set PD (W ) = 0 . In the case of a rooted phylogenetic tree, with root vertex
ρ, we can regard the root as a leaf of an unrooted tree (with associated edge length 0), and then it
is usual to define the phylogenetic diversity of a set W as PD(W ∪ {ρ}) as illustrated in Figure 7.2
(this quantity for rooted trees has also been referred to as ‘evolutionary history’62).
The PD score provides some indication of how much genetic variation each possible subset W
contains in relation to the entire variation in the tree (by comparing PD (W ) to the total length of
the tree PD( X ) = ∑e λ (e) ). The PD score also turns out to have some interesting mathematical
properties. In particular, it is possible to quickly find subsets of X of a given size that maximise
PD by using a simple greedy approach. This was established for trees whose branch lengths satisfy
a molecular clock62 and extended to arbitrary trees63. The latter extension also allows for a subset
Y of X of given size to be found that maximises PD(Y ) + a ⋅ ∑ y∈Y f ( y) ,where f ( y) is some value (or
cost) of species y, and a is any (scale conversion) constant. In particular, one can ensure that certain
species (including the root of a rooted tree) are always in the set Y (by giving them a large enough
f value), and one can also find the taxa that are in all (or in none) of the maximal PD sets of given
size. The fact that a fast (in this case greedy) approach works is vital for applications to large trees,
since if one has a tree with (say) 1,000 taxa, and one wishes to find a subset of (say) 100 taxa that
maximises the PD, then it is impossible for any computer to search all subsets of size 100 from
the 1,000.
The combinatorial properties of PD have also been investigated64, although for a different
purpose, namely to show that the PD values of subsets of given size m suffice to uniquely determine
the underlying tree (provided m is less than half the number of leaves of the tree). This approach
has been developed further65 to extend the popular neighbor joining tree reconstruction method so
that it uses the PD values of taxa of given size (estimated, for example, by maximum likelihood)
rather than just pairwise distance data.
9579_C007.fm Page 107 Monday, November 13, 2006 11:36 AM
Expected
PD
FIGURE 7.3 Concave relationships between PD gain/loss in a tree with addition/deletion of taxa.
We turn now to the statistical properties of PD as species go extinct. Nee and May62 investigated
the loss of PD as taxa are randomly deleted from random trees under a simple model in which
each taxon is equally likely to be the next to go extinct (the ‘field of bullets’ model). The trees
were generated by a random birth model with branch lengths that satisfy a molecular clock. They
found a characteristic concave shape in the relationship between expected PD and the proportion
of taxa deleted. This relationship was further investigated recently66 on random deletion of taxa
from certain biological trees. Once again the relationship between taxa deleted and PD was concave,
as illustrated schematically in Figure 7.3. Recall that a sequence x = ( x1 , x 2 , … , x n ) of real numbers
is concave if, when we let ∆ xr = xr − xr −1, the following inequality holds for all r:
∆ xr − ∆ xr +1 ≥ 0
and the sequence is strictly concave if the inequality is strict for all r; geometrically this means
that the slope of the line joining adjacent points in the graph of xr versus r is decreasing. Note that
xr is concave precisely if the complementary (reverse) sequence yr = x n−r is. The significance of
(strict) concavity for PD is that it says (informally) that most of the loss of PD comes near the end
of an extinction process, as illustrated in Figure 7.3.
In this section we investigate the following question: is the concave relationship observed between
the average PD and the number of taxa deleted particular to the trees (and the data or processes that
generated them), or is it a generic property that applies to any tree with any set of branch lengths?
We will see that the latter is true for any fully resolved tree with positive branch lengths. This makes
intuitive sense because each interior branch survives until the point where there is no taxon that lies
below that branch (which is likely to occur towards the end of a random extinction process). However,
one could suspect that some trees with a certain assemblage of branch lengths might still lead to a
violation of the concavity relationship, but the argument below rules this out. Perhaps the most
satisfying aspect of the argument, however, is that we obtain exact expressions to describe the degree
of concavity, in terms of the topology and branch lengths of the trees.
W = X − S ). For r ∈{1, … , n} let µr = E[ PD(W )] , the expected value of PD(W ) over all such choices
of W. Equivalently,
−1
n
µr =
r ∑
W ⊆ X :|W |= r
PD(W ).
where (nr) is the binomial coefficient ( = r!( nn−! r )! ), the number of ways of selecting r elements from
n− n
a set of size n. Clearly µn = PD ( X ). For an edge e of T and a positive integer r let θ (e, r ) = ( ( nr )e ) ,
r
where ne denotes the number of leaves of T that lie ‘below’ (i.e., separated from the root by) e.
Proposition 7.3.1 Consider a rooted phylogenetic tree T with an assignment l of positive branch
lengths. Then, for all r ∈{0, …, n },
µr = PD( X ) − ∑
e∈E (T )
λ (e)θ (e, r ).
Proof. For each e ∈ E (T ) , and W selected uniformly at random from all subsets of X of size
r, consider the random variable XW (e) defined by setting
µr = E[ PD(W )] = ∑
e∈E (T )
λ (e)E[ XW (e)]. (7.2)
Now, E[ XW (e)] = 1 − P[ XW (e) = 0] , and the event XW (e) = 0 occurs precisely if all the r ele-
ments of W are selected from amongst the leaves that are not below e. The probability of this
n− n
occurring, when these r leaves are chosen randomly without replacement, is ( nr e ) , which
(r )
is θ (e, r ) . Thus, E[ XW (e)] = 1 − θ (e, r ), which, combined with (7.2), establishes the Proposition.
∑
1
µn−1 = PD( X ) − λ (e),
n e∈E
ext (T )
where Eext (T ) denotes the set of n (exterior) edges of T (leaves incident with a leaf).
For r ∈{1, … , n }, let ∆µr = µr − µr −1. Note that, since µ0 = 0 , we have ∆ µ1 = µ1. For an edge e
of T , and r ∈{1, … , n − 1} let
n (n − 1)
ψ (e, r ) := e e ⋅
( ).
n − ne
r −1
r (r + 1)
( ) n
r +1
9579_C007.fm Page 109 Monday, November 13, 2006 11:36 AM
We now describe the main consequence of Proposition 7.3.1. It shows that for any fully resolved
tree PD decays in a strictly concave fashion as taxa are randomly deleted, and the only trees for
which the decay of PD is linear are fully unresolved ‘star’ trees.
Corollary 7.3.2 Consider a rooted phylogenetic tree T with an assignment λ of positive branch lengths.
Then,
∆µr − ∆µr +1 = ∑
e∈E (T )
λ (e)ψ (e, r ).
∆µr − ∆µr +1 = 2 µr − µr −1 − µr +1 = − ∑
e∈E (T )
λ (e)[2θ (e, r ) − θ (e, r − 1) − θ (e, r + 1)]
and using a straightforward though tedious manipulation of (ratios of) binomial coefficients
leads to the formula in the corollary.
For part (ii), if T has a cherry, let e be an edge with two leaves below it. Then ψ (e, r ) > 0
for all r ∈{1, … , n − 1}. Conversely, if ∆µn−1 − ∆µn > 0 , then there exists an edge e for which
ψ (e, n − 1) > 0 , in which case ne = 2, and so T has a cherry.
For part (iii), note that T is a star tree, if and only if (ne − 1) = 0 for all edges e of T , and
this holds precisely if ψ (e, r ) = 0 for all edges e of T and all values of r.
ACKNOWLEDGEMENTS
We thank the New Zealand Marsden Fund and the Allan Wilson Centre for Molecular Ecology and
Evolution for supporting this research. I also thank Arne Mooers, Peter Lockhart, Klaas Hartmann,
an anonymous referee and the editors for some helpful comments on an earlier version of this chapter.
REFERENCES
1. Fitch, W.M. and Margoliash, E., Construction of phylogenetic trees, Science, 155, 279, 1967.
2. Hendy, M.D. and Penny, D., A framework for the quantitative study of evolutionary trees, Syst. Zool.,
38, 297, 1989.
3. Lockhart, P.J. et al., Recovering evolutionary trees under a more realistic model of sequence evolution,
Mol. Biol. Evol., 11, 605, 1994.
4. Susko, E., Inagaki, Y., and Rogers, A.J., On inconsistency of the neighbor-joining, least squares, and
minimum evolution estimation when substitution processes are incorrectly modeled, Mol. Biol. Evol. ,
21, 1629, 2004.
5. Mossel, E. and Steel, M., How much can evolved characters tell us about the tree that generated them?
in Mathematics of Evolution and Phylogeny , Gascuel, O., Ed., Oxford University Press, 2005, chap. 14.
6. Steel, M., Böcker, S., and Dress, A.W.M., Simple but fundamental limits for supertree and consensus
tree methods, Syst. Biol., 49, 363, 2000.
7. Semple, C. and Steel, M., Phylogenetics, Oxford University Press, 2003.
8. CIPRES, Building the Tree of Life: A national resource for phyloinformatics and computational
phylogenetics (http://www.phylo.org).
9. Maley, L.E. and Marshall, C.R., The coming of age of molecular systematics, Science, 279(5350),
505, 1998.
10. Bapteste, E. et al., Do orthologous gene phylogenies really support tree-thinking? BMC Evol. Biol.,
5, 33, 2005.
11. Doolittle, W.F., Phylogenetic classification and the universal tree, Science, 284, 2124, 1999.
12. Salamin, N., Hodkinson, T.R., and Savolainen, V., Towards building the tree of life: a simulation study
for all angiosperm genera, Syst. Biol., 54, 183, 2005.
13. Zwickl, D.J. and Hillis, D.M., Increased taxon sampling greatly reduces phylogenetic error, Syst.
Biol., 51, 588, 2002.
14. Erdös, P.L. et al., A few logs suffice to build (almost) all trees (Part 1), Rand. Struct. Algor., 14(2),
153, 1999.
15. Pollock, D.D. et al., Increased taxon sampling is advantageous for phylogenetic inference, Syst. Biol.,
51, 664, 2002.
16. Sober, E. and Steel, M., Testing the hypothesis of common ancestry, J. Theor. Biol., 218, 395, 2002.
17. Aldous, D., Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today,
Stat. Sci., 16, 23, 2001.
18. Chan, K.M.A. and Moore, B.R., Whole-tree methods for detecting differential diversification rates,
Syst. Biol., 51, 855, 2002.
19. Heard, S.B. and Mooers. A.O., The signatures of random and selective mass extinctions in phylogenetic
tree balance, Syst. Biol., 51, 889, 2002.
20. McKenzie A. and Steel, M., Distributions of cherries for two models of trees, Math. Biosci. 164, 81, 2000.
21. Lockhart, P.J. et al., How molecules evolve in Eubacteria, Mol. Biol. Evol., 17, 835, 2000.
22. Steel, M.A., Goldstein, L., and Waterman, M., A central limit theorem for parsimony length of trees,
Adv. Appl. Prob., 28, 1051, 1996.
23. Steel, M., and Penny, D., Maximum parsimony and the phylogenetic information in multi-state charac-
ters, in Parsimony, Phylogeny and Genomics , Albert, V., Ed., Oxford University Press, 2005, chap. 9.
24. Wheeler, Q. and Meier, R., Species Concepts and Phylogenetic Theory , Columbia University Press,
New York, 2000.
25. Mallet, J., Hybridization as an invasion of the genome, Trends. Ecol. Evol., 20, 229, 2005.
26. Maddison, W., Gene trees in species trees, Syst. Biol., 46, 523, 1997.
9579_C007.fm Page 111 Monday, November 13, 2006 11:36 AM
27. Bryant, D. and Berry, V., A structured family of clustering and tree construction methods, Adv. Appl.
Math., 27, 705, 2001.
28. Devauchelle, C., et al., Constructing hierarchical set systems, Ann. Combin., 8, 441, 2004.
29. Legendre, P. and Makarenkov, V., Reconstruction of biogeographic and evolutionary networks using
reticulograms, Syst. Biol., 51, 199, 2002.
30. Ané, C. and Sanderson, M.J., Missing the forest for the trees: phylogenetic compression and its
implications for inferring complex evolutionary histories, Syst. Biol., 54(1), 146, 2005.
31. Baroni, M., Semple, C., and Steel. M., A framework for representing reticulate evolution, Ann.
Combin., 8, 391, 2004.
32. Gusfield, D. and Bansal, V., A fundamental decomposition theory for phylogenetic networks and
incompatible characters, in Proc. RECOMB 2005 , Miyato, S. et al. Eds., LNBI 3500, Springer-Verlag,
Berlin Heidelberg, 2005, 217.
33. Huynh, T.N.D., Jansson, J., Nguyen, N.B. and Sung, W.-K., Constructing a smallest refining galled
phylogenetic network, in Proc. RECOMB 2005 , Miyato, S. et al. Eds., LNBI 3500, Springer-Verlag,
Berlin Heidelberg, 2005, 265.
34. Moret, B. M. E. et al., Phylogenetic networks: modeling, reconstructibility, and accuracy, IEEE/ACM
Trans. Comput. Biol. Bioinf., 1, 1, 2004.
35. Song, Y. and Hein, J., On the minimum number of recombination events in the evolutionary history
of DNA sequences, J. Math. Biol., 48, 160, 2003.
36. Baroni, M., Semple, C. and Steel, M., Hybrids in real time, Syst. Biol., 55, 46, 2006.
37. Huson, D.H. et al., Reconstruction of reticulate networks from gene trees, in Proc. RECOMB 2005 ,
LNBI 3500 Miyano S. et al. Eds., Springer-Verlag, Berlin Heidelberg, 2005, 233.
38. Bordewich, M. and Semple, C., Computing the minimum number of hybridisation events for a
consistent evolutionary history, Research Report (UCDMS2004/21), Department of Mathematics and
Statistics, University of Canterbury, Christchurch, New Zealand, 2005.
39. Baroni, M. et al., Bounding the number of hybridisation events for a consistent evolutionary history,
J. Math. Biol., 51, 171, 2005.
40. Faith, D. P., From species to supertrees: Popperian corroboration and some current controversies in
systematics, Austr. Syst. Bot., 17, 1, 2004.
41. Sanderson, M.J. et al., TreeBASE: A prototype database of phylogenetic analyses and an interactive
tool for browsing the phylogeny of life, Am. J. Bot. , 81, 183, 1994, (http://www.treebase.org/treebase).
42. Bryant, D., A classification of consensus methods for phylogenies, in BioConsensus, Janowitz, M.,
Lapointe, F.-J., McMorris, F.R., Mirkin, B., and Roberts, F.S. Eds., American Mathematical Society,
2003, 163.
43. Aho, A. V., Sagiv, Y., Szymanski, T. G., and Ullman, J. D., Inferring a tree from lowest common
ancestors with an application to the optimization of relational expressions. SIAM Journal on Computing ,
10, 405, 1981.
44. Bininda-Emonds, O.R.P., Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life ,
Kluwer Academic Publishers, Dordrecht, 2004.
45. Dress, A.W.M. and Huson, D.H., Constructing splits graphs, IEEE/ACM Trans. Comput. Biol. and
Bioinf., 1, 109, 2004.
46. Bryant, D. and Moulton, V., NeighborNet: an agglomerative algorithm for the construction of phylo-
genetic networks, Mol. Biol. Evol., 21, 255, 2004.
47. Bandelt, H.-J., Forster, P., and Röhl, A., Median-joining networks for inferring intraspecific phylog-
enies, Mol. Biol. Evol. , 16, 37, 1999.
48. Holland, B. et al., Using consensus networks to visualize contradictory evidence for species phylogeny,
Mol. Biol. Evol., 21, 1459, 2004.
49. Huson, D. H. et al., Phylogenetic super-networks from partial trees, IEEE/ACM Trans. Comput. Biol.
Bioinf., 1, 151, 2004.
50. Baroni, M. and Steel, M., Accumulation phylogenies, Ann. Combin. , 10, 19, 2006.
51. Nakhleh, L., Warnow, T., and Linder, C.R., Reconstructing reticulate evolution in species—theory
and practice. In Proc. RECOMB 2004 , ACM, 2004, 337.
52. Huber, K.T. and Moulton, V., Phylogenetic networks from multi-labelled trees, J. Math. Biol. 52, 613,
2006.
9579_C007.fm Page 112 Monday, November 13, 2006 11:36 AM
53. Moret, B.M.E., Tang, J., and Warnow, T., Reconstructing phylogenies from gene-content and gene
order data, in Mathematics of Evolution and Phylogeny Gascuel, O. Ed., Oxford University Press,
chap. 12.
54. Delsuc, F., Brinkmann, H., and Philippe, H., Phylogenomics and the reconstruction of the tree of life,
Nature Rev. Genet. , 6, 361, 2005.
55. Burstein, D. et al. Information theoretic approaches to whole genome phylogenies, in Proc. RECOMB
2005, Miyato, S. et al., Eds., LNBI 3500, Springer-Verlag, Berlin Heidelberg, 2005, 283.
56. Otu, H.H. and Sayood, K., A new sequence distance measure for phylogenetic tree construction,
Bioinf., 19, 2122, 2003.
57. Faith, D.P., Conservation evaluation and phylogenetic diversity, Biol. Conserv., 61, 1, 1992.
58. Barker, G. M., Phylogenetic diversity: a quantitative framework for measurement of priority and
achievement in biodiversity conservation, Biol. J. Linn. Soc., 76, 165, 2002.
59. Crozier, R.H., Dunnet, L.J., and Agapow, P.-M., Phylogenetic biodiversity assessment based on
systematic nomenclature, Evol. Bioinf. Online , 1, 11, 2005.
60. Mooers, A.O., Heard, S.B., and Chrostowski, E., Evolutionary heritage as a metric for conservation,
in Phylogeny and Conservation , Purvis, A., Brooks, T.L. and Gittleman, J.L. Eds., Cambridge University
Press, Cambridge, 2005, 120.
61. Pavoine, S., Ollier, S., and Dufour, A.-B., Is the originality of species measurable? Ecol. Lett., 8, 579,
2005.
62. Nee, S. and May, R.M., Extinction and the loss of evolutionary history, Science, 278, 692, 1997.
63. Steel, M., Phylogenetic diversity and the greedy algorithm, Syst. Biol., 54, 527, 2005.
64. Pachter, L. and Speyer, D., Reconstructing trees from subtree weights, Appl. Math. Lett., 17, 615, 2004.
65. Levy, D., Yoshida, R., and Pachter, L., Beyond pairwise distances: neighbor joining with phylogenetic
diversity estimates, Mol. Biol. Evol. 23, 491, 2006.
66. Soutullo, A. et al., Distribution and correlates of Carnivore phylogenetic diversity across the Americas,
Animal Conserv. , 8, 249, 2005.
67. Wigner, E.P., The unreasonable effectiveness of mathematics in the natural sciences, Comm. Pure
Appl. Math., 13, 1, 1960.
68. Kac, M., Rota, G.C., and Schwartz, J., Discrete Thoughts, Birkhauser, 1993.
69. Willson, S.J., Constructing rooted supertrees using distances, Bull. Math. Biol., 66, 1755, 2004.
70. Willson, S.J., Unique solvability of certain hybrid networks from their distances, Ann. Combin.,
10, 165, 2005.
71. Ronquist, F., Huelsenbeck, J.P., and Britton, T., Bayesian supertrees, in Phylogenetic Supertrees:
Combining Information to Reveal the Tree of Life , Bininda-Emonds, O.R.P. Ed., Kluwer Academic
Publishers, Dordrecht, 2004, 193.
72. Mossel, E. and Vigoda, E., Phylogenetic MCMC algorithms are misleading on mixtures of trees,
Science, 309, 2207, 2005.
9579_C008.fm Page 113 Wednesday, November 15, 2006 12:12 PM
CONTENTS
113
9579_C008.fm Page 114 Wednesday, November 15, 2006 12:12 PM
ABSTRACT
The problems of nucleotide homology determination and tree search are intertwined and complex
issues for phylogenetic reconstruction. Both present NP-hard optimisations. One step and two step
heuristic procedures are reviewed and compared through the analysis of example data sets using
multiple sequence alignment plus tree search and direct optimisation techniques. The examples
here show that extraordinary effort on the tree search side cannot overcome the shortcomings of
poor sequence homology heuristics. Direct optimisation using the most simple heuristics can offer
solutions with 30% better optimality scores in larger data sets.
8.1.2 NP-COMPLETENESS
The complexities of the problem of converting observations into cladograms come from the
difficulty in optimising its two components. The joint problem is composed of two NP-complete
(nondeterministic polynomial complete) problems. Both cladogram search and the assignment of
9579_C008.fm Page 115 Wednesday, November 15, 2006 12:12 PM
optimal ancestral sequences, such that any overall tree is optimal, are NP-hard optimisations10.
Hence, heuristic solutions are required to find usable solutions to both of these problems. Their
joint nature makes the challenge even greater. Two step multiple alignment analysis seeks to simplify
the problem by separating homology and cladogram search into separate, tractable operations. One
step optimisation methods attack the issue as a nested problem, dealing directly with the complexity
of both operations.
A B C
a.
A D B C
b.
FIGURE 8.1 Basic Wagner build procedure of Farris12 showing the addition of each taxon in turn to each
possible edge (branch) on the tree. (a) The fourth taxon D is added to each of three places on the rooted tree.
(b) The fifth taxon E is added at each of five positions.
globally satisfying solution may require traveling through suboptimal, intermediate states (Figure 8.3).
This is the situation presented by the annealing of metals and applied to computational problems
by Metropolis et al.16.
Ratcheting. Nixon17 brought simulated annealing to phylogenetic analysis. In Nixon’s
approach, called the ‘ratchet’, characters are randomly reweighted and searches performed on the
newly weighted data. The weights are then set back to their initial values, and a search is performed
with the reweighted tree as a starting point. The method has been extremely effective in finding
lower-cost solutions in data sets thought to be refractory to further analysis, such as a large
angiosperm matrix18.
Drifting. A phylogenetic method much closer to the original description of simulated annealing
was proposed by Goloboff11 and termed ‘drifting’. Unlike the ratchet, where suboptimal solutions
were arrived at via weighting, drifting explicitly creates a probability of topology acceptance based
on the extent of its suboptimality. This is implemented in TNT3.
Monte Carlo Markov Chain. A probabilistic form of local search uses the relative proba-
bility of successive tree rearrangements as a criterion for the acceptance of a rearrangement. As
with drifting and other simulated annealing techniques, suboptimal solutions can be accepted as
intermediate solutions on the way to more globally optimal scenarios. As a search strategy, Monte
Carlo Markov Chains have had their greatest impact on Bayesian estimates of clade
probabilities19.
9579_C008.fm Page 117 Wednesday, November 15, 2006 12:12 PM
A B C D E F
a.
A B C D E F
b.
A B C E D F D E A B C F
c.
FIGURE 8.2 Simple tree rearrangement showing SPR branch swapping. Clade of tree (a) is pruned off leaving
two subtrees (b). The subtree is then added back to each possible place on the subtree (c), avoiding its original
position and yielding new trees closely related topologically to the first.
A B C D E F
(ABC → E)
a.
cost = 10
A B C E D F
b.
cost = 11
C E A B D F
(AB → D)
c.
cost = 9
FIGURE 8.3 A simulated annealing trajectory. Globally optimal topology (c) can only be reached from locally
optimal (a) by passing through the suboptimal topology (b).
A B C D E F A C B E D F
a b
A C B D E F A B C E D F
a' b'
FIGURE 8.4 Tree fusing (Goloboff11) component of genetical algorithm (Moilanen20). Cladograms a and b
exchange the (ABC) groups yielding two new arrangements a′ and b′.
sectors of 35–50 taxa that are then treated as a single terminal (Figure 8.5). Branch swapping is then
performed on this reduced data set. Sectors and searches are dynamically defined and alternated as
the topology evolves until a stable solution is found. This approach has been further explored as ‘disk
covering methods’22,23 (see also Wilkinson and Cotton, Chapter 5; Bininda-Emonds and Stamatakis,
Chapter 6), yielding improvements in many areas of phylogenetic tree searching.
e
a b
f
d g
FIGURE 8.5 Definition of cladogram sectors for use in a sectorial search (Goloboff11); (a–g) usually 35–50 taxa.
Multiple alignment methods seek, in general, a single set of homologies, upon which all cladograms
are evaluated, whereas optimisation methods create potentially unique schemes for each cladogram.
The fundamental methods are briefly reviewed below.
A B C D E F
C4
C3
C2
C1
C0
FIGURE 8.6 Guide tree-based multiple alignment. Sequences are accreted in turn as the procedure moves
from the tips of the tree (A–E) to the root. Intermediate vertices ( Ci ) may be consensus sequences as in
CLUSTALW (Thompson et al.26) or partial alignments as in MALIGN32.
O (n ⋅ (n + k ) 2 )
9579_C008.fm Page 122 Wednesday, November 15, 2006 12:12 PM
A B C D E F
C4
C3
C2
C1
C0
FIGURE 8.7 O(n 2 ) and O(n 3 ) sequence optimisation medians. O(n 2 ) (closed arrows) and O(n 3 ) (closed and
open arrows). The median sequences ( Ci ) are calculated based on either their two descendants, or their
descendants and immediate ancestor. C5 would not be calculated in the O(n 3 ) case. Multiple passes may be
performed on the cladograms to update the vertex sequences and improve median quality.
whereas a simple median approach would require O(n ⋅ m 2 ) . As long as n + k < m, search-based
methods should win out as far as time is concerned. The set size k, however, will determine the
quality (cost) of the result, with k varying from 0 in fixed state optimisation40 to the set of all
sequences resulting in an exact solution through explicit enumeration36. Little work has been
done to examine how large k should be, or how best to choose the sequences to be included in
the heuristic set.
S0 S1 S2 S3 S4 S5
C4 {s0,s1,...s5+k}
C3 {s0,s1,...s5+k}
C2 {s0,s1,...s5+k}
{s0,s1,...s5+k}
C1
{s0,s1,...s5+k}
C0
FIGURE 8.8 Search-based optimisation of a set of observed sequences ( Si ) to determine vertex sequences
( C j ) using a set of candidate sequences ( S0 ,…, S5+ k ).
9579_C008.fm Page 123 Wednesday, November 15, 2006 12:12 PM
TABLE 8.1
Summary of CLUSTALW Multiple Alignment and POY Optimisation Analyses
Aligned Variable TNT TNT
Data Set Parameters Taxa Length Positions POY Cost Simple Aggressive
Note: Results from CLUSTALW are shown above the line (upper part of table) and those of POY shown below
the line (lower part of table). ‘CLUS Default’ denotes CLUSTALW default parameters, ‘Equal’ all events = 1.
The POY costs for CLUS Default runs are high due to the parameter setting (‘-gap 50 -extensiongap 1 -change 5)
in those runs. ‘TNT Simple’ denotes the TNT command ‘mult’, whilst ‘TNT Aggressive’ signifies
‘xmult=replications 10 ratchet 50 drift 20 fuse 5’. The rightmost three column values are cladogram costs.
Source: Data from CLUSTALW Thompson et al.26 and POY Wheeler et al.43
9579_C008.fm Page 124 Wednesday, November 15, 2006 12:12 PM
140000
120000
100000
Cladogram cost
80000
60000
40000
20000
0
0 200 400 600 800 1000 1200
Data set size (taxa)
FIGURE 8.9 Cladogram optimality as a function of data set size based on the 1:1:1 values of Table 8.1.
Dashed line represents POY, POY-TNT ‘Simple’ and POY-TNT ‘Aggressive’. Solid line represents CLUST-
ALW-TNT ‘Simple’ and CLUSTALW-TNT ‘Aggressive’ (full values can be seen in Table 8.1).
without further refinement. Implied alignments9 were created to allow for direct comparison with
CLUSTALW results. These were subjected to the same TNT analysis conditions as the CLUSTALW
alignments to find the tree lengths.
8.4.4 RESULTS
The results of the CLUSTALW multiple alignment and POY optimisation analyses are shown in
Table 8.1 and Figure 8.9.
8.5 COMPARISONS
Several patterns are immediately apparent. The CLUSTALW alignments do not differ much in
optimality value (tree length) from the default to equal weighting scenarios. Given that the relative
indel costs differ by a factor of 50, this is striking. POY analyses contrast sharply with this.
The Mantodea data show a difference in equally weighted TNT cost (tree length) of 1.4%, whereas
the POY runs are 32% different; Metazoa data show 6.5% (CLUSTALW) and 81% (POY), and for
Archaea data the difference was 1.1% (CLUSTALW) and 44% (POY). The mitochondrial data set
has a 22% difference for CLUSTALW, the highest of the alignment-based runs, but still lower than
all the optimisation comparisons. This difference is so great that the CLUSTALW alignments were
superior to the POY optimisations in every case where the homology and cladogram cost parameters
differed (CLUSTALW default settings). Whilst the POY optimisation analyses are very responsive
to cost parameters, the CLUSTALW runs are not. Whilst each case where equal weighted optimi-
sation was used yielded superior (that is, lower cost) optimality values for POY optimisation, only
half of the alignment cases showed this pattern.
The two methods also differed in their response to increased severity of cladogram search
heuristics. Neither CLUSTALW nor POY implied alignments displayed any better solutions under
9579_C008.fm Page 125 Wednesday, November 15, 2006 12:12 PM
more exhaustive cladogram searching for the smaller Mantodea or Metazoa data sets. The 585-taxon
archaeal and 1,040-taxon mitochondrial data sets did, with the CLUSTALW showing an average
improvement factor of 4.93 × 10–4 and the POY a factor of 1.09 × 10–4. The results based on these
simple POY homology searches were from 4.7% (Metazoa) to 45% (mitochondrial) less costly
than those based on the CLUSTALW alignments. This is especially pointed in concert with the
search severity improvement being 20% as great for the POY runs. Cladogram search effort is
much more productive for the CLUSTALW alignments. Even with the very aggressive phylogenetic
search options of TNT, in no case where the homology and search parameters were the same did
the CLUSTALW alignment match the cost of the rudimentary Wagner build procedure used in the
POY optimisation.
ACKNOWLEDGEMENTS
The US National Science Foundation and NASA for research support. Louise Crowley, Gonzalo
Giribet, Megan Harrison, Camilo Mattoni, Kurt Pickett and Andrés Varón for data sets, discussion
and commentary on this manuscript. The temporal forbearance of Trevor Hodkinson and John
Parnell while reviewing this paper.
REFERENCES
1. DePinna, M.C.C., Concepts and tests of homology in the cladistic paradigm, Cladistics, 7, 367, 1991.
2. Swofford, D.L., PAUP*: Phylogenetic Analysis Using Parsimony (* and Other Methods), version 4.0b
10, Sinauer Associates, Sunderland, MA, 2002.
3. Goloboff, P.A., Farris, J.S., and Nixon, K., TNT (Tree analysis using New Technology) version 1.0
ver. beta test v. 0.2., 2003, Tucumán, Argentina (http://www.zmuc.dk/public/phylogeny/tnt).
4. Giribet, G., Generating implied alignments under direct optimization using POY, Cladistics, 21, 396, 2005.
5. Wheeler, W.C. et al., Dynamic Homology and Phylogenetic Systematics: A Unified Approach Using
POY, American Museum of Natural History, 2005.
6. Wheeler, W.C., Optimization alignment: the end of multiple sequence alignment in phylogenetics?
Cladistics, 12, 1, 1996.
7. Hein, J. et al., Statistical alignment: computational properties, homology testing, and goodness-of-fit,
J. Mol. Biol., 302, 265, 2000.
9579_C008.fm Page 126 Wednesday, November 15, 2006 12:12 PM
8. Hein, J., Jensen, C.J.L., and Pedersen C.N.S., Recursions for statistical multiple alignment, Proc.
Natl. Acad. Sci. USA, 100, 14960, 2003.
9. Wheeler, W.C., Implied alignment, Cladistics, 19, 261, 2003.
10. Wang, L. and Jiang, T., On the complexity of multiple sequence alignment, J. Comput. Biol., 1, 337,
1994.
11. Goloboff, P.A., Analyzing large data sets in reasonable times: solutions for composite optima, Cladistics,
15, 415, 1999.
12. Farris, J.S., A method for computing Wagner trees. Syst. Zool., 19, 83, 1970.
13. Goloboff, P.A., Techniques for analysing large data sets, in Techniques in Molecular Systematics and
Evolution, DeSalle, R., Giribet, G. and Wheeler, W., Eds., Birkhäuser Verlag, Basel, 2002, 7.
14. Felsenstein, J., PHYLIP, 1980 (http://evolution.genetics.washington.edu/phylip.html)
15. Mickevich, M.F. and Farris, J.S., PHYSYS: Phylogenetic Analysis System, 1980.
16. Metropolis, N.A. et al., Equation of state calculations by fast computing machine, J. Chem. Phys,
21, 1087, 1953.
17. Nixon, K.C., The parsimony ratchet, a new method for rapid parsimony analysis, Cladistics, 15, 407,
1999.
18. Rice, K.A., Donoghue, M.J., and Olmstead. R.G., Analyzing large data sets: rbcl 500 revisited, Syst.
Biol., 46, 554, 1997.
19. Huelsenbeck, J.P. and Ronquist, F., MrBayes: Bayesian inference of phylogeny, 3.0 edition, 2003.
(http://mrbayes.csit.fsu.edu).
20. Moilanen, A., Searching for most parsimonious trees with simulated evolutionary optimization,
Cladistics, 15, 39, 1999.
21. Chase, M.W. et al., Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid
gene rbcl, Ann. Mol. Bot. Gard., 80, 528, 1993.
22. Huson, D., Nettles, S., and Warnow, T., Disk-covering, a fast converging method for phylogenetic
tree reconstruction, J. Comput. Biol., 6, 368, 1999.
23. Roshan, U., et al., Rec-i-dcm3: A fast algorithmic technique for reconstructing large phylogenetic
tree, in Proc. IEEE Computer Society Bioinformatics Conference CSB 2004, Stanford University,
2004.
24. Simmons, M.P., Independence of alignment and tree search, Mol. Phylogenet. Evol., 31, 874, 2004.
25. Sankoff, D.M. and Cedergren, R.J., Simultaneous comparison of three or more sequences related
by a tree, in Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence
Comparison, Sankoff, D.M. and Kruskall, J.B., Eds., Addison Wesley, Reading, MA, 1983,
chap. 9.
26. Thompson, J.D., Higgins, D.G., and Gibson, T.J., CLUSTAL W: improving the sensitivity of progres-
sive multiple sequence alignment through sequence weighting, position specific gap penalties and
weight matrix choice, Nucleic Acids Res., 22, 4673, 1994.
27. Phillips, A., Janies, D., and Wheeler. W., Multiple sequence alignment in phylogenetic analysis, Mol.
Phylogenet. Evol., 16, 317, 2000.
28. Saitou, N. and Nei. M., The neighbor-joining method: a new method for reconstructing phylogenetic
trees, Mol. Biol. Evol., 4, 406, 1987.
29. Hein, J., A new method that simultaneously aligns and reconstruct ancestral sequences for any number
of homologous sequences, when the phylogeny is given, Mol. Biol. Evol., 6, 649, 1989.
30. Hein, J., A tree reconstruction method that is economical in the number of pairwise comparisons used,
Mol. Biol. Evol., 6, 669, 1989.
31. Wheeler, W.C., Sources of ambiguity in nucleic acid sequence alignment, in Molecular Ecology and
Evolution: Approaches and Applications, Schierwater, G.W.B., Streit, B., and DeSalle, R., Eds.,
Birkhäuser Verlag, Basel Switzerland, 1994, 323.
32. Wheeler, W.C. and Gladstein, D.S., (documentation by Janies, D. and Wheeler, W.C.), MALIGN,
New York, NY, 1991–1998 (http://research.amnh.org/scicomp/projects/malign.php).
33. Sankoff, D.M., Minimal mutation trees of sequences, SIAM J. Appl. Math., 28, 35, 1975.
34. Wheeler, W.C., Fixed character states and the optimization of molecular sequence data, Cladistics,
15, 379, 1999.
35. Wheeler, W.C., Iterative pass optimization, Cladistics, 19, 254, 2003.
36. Wheeler, W.C., Search-based character optimization, Cladistics, 19, 348, 2003.
37. Wheeler, W.C., Dynamic homology and the likelihood criterion, Cladistics, 2005.
9579_C008.fm Page 127 Wednesday, November 15, 2006 12:12 PM
38. Sankoff, D.M. and Blanchette, M., The median problem for breakpoints in comparative genomics,
Computing and Combinatorics 3rd Annual Int. Conf. COCOON 97, 1276, 251, 1997.
39. Gladstein, D.S., Efficient incremental character optimization, Cladistics, 13, 21, 1997.
40. Wheeler, W.C., Measuring topological congruence by extending character techniques, Cladistics, 15,
131, 1999.
41. Sankoff, D.M. and Rousseau, P., Locating the vertices of a Steiner tree in arbitrary space, Math.
Program., 9, 240, 1975.
42. Svenson, G.J., and Whiting, M.F., Phylogeny of Mantodea based on molecular data: evolution of a
charismatic predator, Syst. Ent., 29, 359, 2004.
43. Wheeler, W.C., Gladstein, D.S., and De Laet, J.D., (documentation by Janies, D. and Wheeler, W.C.—
commandline documentation by De Laet, J.D. and Wheeler W.C.), POY version 3.0.11, American
Museum of Natural History, New York, 1996–2005 (http://research.amnh.org/scicomp/projects/poy.php).
9579_C008.fm Page 128 Wednesday, November 15, 2006 12:12 PM
9579_C009.fm Page 129 Saturday, November 11, 2006 11:38 AM
9 Species-Level Phylogenetics
of Large Genera: Prospects
of Studying Coevolution
and Polyploidy
N. Rønsted, E. Yektaei-Karin, K. Turk,
J. J. Clarkson and M. W. Chase
Jodrell Laboratory, Royal Botanic Gardens, Kew,
Richmond, UK
CONTENTS
ABSTRACT
The problems facing workers trying to produce phylogenetic hypotheses of large genera are usually
surmountable if the group in question is well studied previously. The major problems faced in plants
are caused by hybridisation between species and low levels of variability in the standard phylogenetic
markers. Many researchers use plastid DNA and the internal transcribed spacers of nuclear ribosomal
DNA (nrDNA), neither of which alone is suitable for the detection of hybrids, the former because
it is inherited through the maternal lineage and the latter because it is subject to concerted evolution
via gene conversion. Sequencing low copy, protein coding, regions is a good alternative, but these
are often neither easily amplified nor suitable for other reasons. Low levels of variability in the
standard markers can alternatively be dealt with by using markers such as amplified fragment length
polymorphisms (AFLPs). Some prospects, problems and solutions will be discussed and exemplified
with work on figs (Ficus, Moraceae) and tobacco (Nicotiana, Solanaceae).
129
9579_C009.fm Page 130 Saturday, November 11, 2006 11:38 AM
Ficus Pollinator
FIGURE 9.1 Coevolution in the fig-wasp system. Figs can only be pollinated by female agonid wasps. The
wasps can only lay their eggs inside fig inflorescences, where their larvae feed on some of the developing
seeds. (Reproduced with permission from Weiblen and Bush81.)
9579_C009.fm Page 132 Saturday, November 11, 2006 11:38 AM
species and is a widespread phenomenon20. In addition to the fig–wasp model system, a range of
other examples of coevolution between plants and their pollinators have been reported, such as the
well studied interaction between Yucca (Asparagaceae) and yucca moths (Lepidoptera)21. A diversity
of insect pollination mutualisms have been described for palms22, for instance an obligate weevil
pollination mutualism of the dwarf palm, Chamaerops humilis23. Another recent case is provided
by Epicephala moths (Gracillariidae) pollinating trees of the genus Glochidion (Phyllanthaceae)24.
Coevolution of plants and their pollinators is widespread, and the need for improving our under-
standing of the principles of coevolution continues to grow. Persistence of mutualisms over long
periods of time is noteworthy, considering that they are essentially unstable systems due to the
underlying conflict between partners25.
However, our knowledge of the nature and extent of coevolution in systems such as the fig–wasp
mutualism, with over 750 pairs of interacting species, has been limited because an accurate
evaluation of patterns and processes of species diversification in a coevolution system can only be
performed if phylogenetic trees of both partners are analysed and can therefore be compared26.
Classifications of both Ficus and Agaonidae have previously been based on morphology and
reproductive traits closely linked to their interaction, but these are potentially misleading due to
convergence/parallelism. We therefore risk circularity if we base our study of coevolution on such
classifications, and evolutionary relationships are perhaps more appropriately revealed by DNA
sequence analyses27.
Previous DNA sequence–based phylogenetic studies of Ficus have shown that taxonomic
categories are not natural and revealed several parallel transitions in growth habit and breeding
system14,27,28. Correlation between inflorescence characters with head shapes of pollinating wasps
and pollination behaviour indicates that reciprocal adaptations of morphological characters in
these mutualistic partners have occurred28,29. However, previous studies have only included
limited sampling (less than 50 species, or about 6%) of this large genus and have detected
insufficient genetic variation to allow a detailed estimation of relationships of fig species,
especially at species level.
Ongoing work by Rønsted and collaborators (for example, Rφnsted et al.18) aims first to use
molecular techniques to produce comprehensive phylogenetic trees for Ficus and second to combine
fig and wasp phylogenetic data to explore the causes for the extraordinary diversification at this
plant–animal interface. The rest of this chapter will present some problems and prospects of
phylogenetic work with Ficus and other similarly large genera.
5
Ficus adenospema
10 8
1 Ficus elastica
19 14
2 Ficus rumphii
9
Ficus microcarpa
3 5
Ficus rubiginosa
25 30
Ficus hispida
15
8 Ficus goldmanii
8
Ficus sur
1 Ficus
1 Ficus henryi
5
Ficus tonduzii
7 8
Ficus punctata
1
9
8 Ficus religiosa
5 4
Ficus villosa
3
Ficus variegata
6
4 Ficus natalensis
38 16
7 Ficus variifolia
5
9 100 Ficus virgata
12
Ficus insipida
34
34 Artocarpus sp.
23 39
97 Dorstenia psilurus
28 67 36
Maclura pomifera Outgroup
99 30
Brosimum alicastrum
8
17 Castilla elastica
2
100 Poulsenia armata
FIGURE 9.2 One of over 2,000 most parsimonious trees obtained from the combined analysis of sequences
from four plastid regions (rpl16 intron, trnL intron, trnL-F spacer, and psbB-psbF spacer) of Ficus. Tree length
511 steps, consistency index (CI) = 0.88, and retention index (RI) = 0.72. Branch lengths and bootstrap
percentages (>50%) are shown above and below the branches, respectively.
although palms are resolved on a long branch relative to other major clades of monocots33–35.
Such patterns (high bootstrap support for a genus and little variation within) are common among
plants36,37.
In addition to using plastid regions, the nuclear ribosomal (nrDNA) internal transcribed spacer
regions (ITS)38 have proven useful in many species-level phylogenetic studies of a wide range
of taxa39–41. ITS occurs in high copy number, which makes it easy to amplify, and the region has
been widely employed for systematic studies. Furthermore, the whole ribosomal complex under-
goes rapid concerted evolution, meaning that sequence similarity between individual copies is
extremely high in most taxa. Divergent copies are detected in some cases37,42–46. Paralogs often
require cloning of individual copies to separate them from orthologs (see section on hybrids and
polyploids later).
9579_C009.fm Page 134 Saturday, November 11, 2006 11:38 AM
71 F. concinna
70 F. prolixa
53 F. virens
F. superba
77 F. religiosa Sect.
54 F. ingens
100 F. lacor Urostigma
F. cordata
F. salicifolia
F.callosa
63 62 F. hombroniana Sect.
F. racemigera
F. edelfeltii
F. pachystemon
Oreosycea
F.mauritiana
79 84 F. sur
64 F. mucoso
92 F. sycomorus Sect.
F. vallischoudae
F. auriculata Sycomorus
94 F. nodosa
63 F. robusta
F. variegata
F. bernaysii
F. condensa 95
63 F. congesta
F. hispidioides
Sect.
67
F. lepicarpa
F. fistulosa
Sycocarpus
F. scortechinii
F. septica
F. uncinata
F. deltoidea
93 F. adenosperma Sects.
51 F. ochrochlora
F. dammaropsis Ficus &
65 F. erecta F. pumila Adenosperma
F. oleaefolia
F. ischnopoda
94 F. bauerlenii
F. odoardi
F. villosa
Sects.
62
100 79 F. punctata
F. jimiensis Kissosycea &
F. ruginervia Rhizocladus
F. diversiformis
76 72 F. aurata
89 52
F. grossularioides Sect.
F. padana
F. hirta Eriosycea
F. chartacea
F.asperifolia
77 F. conocephalofolia
F. phaeosyce
F. wassa
Sect.
59 F.copiosa
F. coronata
Sycidium
88 F.gul
71 89 F. pygmae
F. lateriflora
100 96 F. heteropleura
F. sinuata Sect.
75 F. parietalis
95 F. tinctoria Paleomorphe
F. virgata
100 F.johannis
F. palmata
Sect. Ficus
93 F. rumphii
F. menabeensis
F. altissima
68 100 F. benghalensis
97
56 F. elastica
F. binnendykii Sect.
74
100 F. drupacea
F. microcarpa Conosycea
F. benjamina
86 F. consociata
69 69 F. xylophylla
84 F. subgelderi
71 F. sundaica
F. crassipes
74 F. leuchotricha
F. triradiata
70 F. platypoda
52 74
F. watkinsiana Sect.
F. rubiginosa
F. glandifera Malvanthera
79 F. hesperidiiformis
85 F. macrophylla
99 F. pleurocarpa
F. macrophyllaLH
F. albertsmithii
F. broadwayii
70 F. americana
77 F. pertusa
77 F. schumacheri
100 F. perforata
F. andicola
F. quichuana
99 76 98 F. caballina
F.schippii
F. paraensis Sect.
F. cestrifolia
91 F. eximia Americana
F. goldmanii
F. gomelliera
96 F. luschnatiana
75 F. monckii
F. obtusifolia
60 F. citrifolia
F. nymphaeifolia
F.calimana
F. geniaefoliaeu
99 F. palmeri
F. petiolaris
77 F. lutea
F. saussureana
67 F. platyphylla
F. elasticoides
74 F. cyathistipuloides
F. lyrata
59 70 97 F. preussi
F. sagittifolia
100 F. wildemaniana
F.scassellatii
62
100 F. ottonifolia
F. tremula
Sect.
F. sansibarica
68 F. abutifolia
Galoglychia
79 F. glumosa
F. populifolia
62 F. stuhlmanii
97 F. burkei
F. petersii
69 F. craterostoma
95 F. natalensis
F. thonningii
F. buxifolia
F. lingua
F.kiloneura
F. glabrata
F. maxima
F. insipida
Sect.
F. macrosyc
F. yoponensis
Pharmacosycea
64 Sparratosyce dioca
Antiaropsis decipiens
71 Castilla elastica Outgroup
Paulsenia armata
– 5 cha n ges
FIGURE 9.3 One of 74 most parsimonious trees obtained from combined analysis of ITS and ETS rDNA
sequences of Ficus. Tree length 2,010 steps, CI = 0.52, and RI = 0.83. Bootstrap percentages (>50%) are
shown above the branches.
9579_C009.fm Page 135 Saturday, November 11, 2006 11:38 AM
The ITS region has been used in a phylogenetic analysis of 46 dioecious Ficus species14.
Amplification gave single bands, and cloning of four species showed no problems with
heterogeneity among ITS copies. However, interspecific variability among ITS sequences
within Ficus was limited, and the matrix was combined with a morphological character set.
Later on, Jousselin and coworkers28 combined ITS sequences with sequences of the external
transcribed spacer (ETS)47 in a study including 41 fig species representing most of the sections.
ETS sequences evolve more rapidly than ITS sequences and can be a useful complement to
ITS47–50. However, in comparison with plastid regions and ITS, the ETS region is notoriously
difficult to amplify and necessitates high template quality. The combined analyses of ITS and
ETS sequences of Ficus produced six trees, which were better resolved and supported down
to sectional level than trees obtained with either of the separate ITS and ETS datasets28. Four
genera of Moraceae (Artocarpus, Brossemum, Broussonetia and Morus) were included in the
study as outgroups, but Ficus ITS and ETS sequences were too divergent to be aligned with
these other genera, and the trees were rooted internally based on concepts of morphological
change in the genus.
Sequencing of ITS and ETS was continued by Rønsted and coworkers18 on a much larger
dataset including 146 fig taxa. Based on other studies27,51 including a recent phylogenetic analysis
of Moraceae17, sequences of four putatively closely related Moraceae genera (Antiaropsis,
Castilla, Paulsenia and Sparratosyce) were successfully aligned and included as an outgroup.
With this large dataset, sectional relationships of monoecious figs were clarified, but still only
limited support was obtained for the dioecious groups and other groups within sections in general
(Figure 9.3).
However, such less specific PCR conditions may allow for nonspecific annealing of primers
and increase the number of PCR products obtained. Designing specific primers can often
improve amplification and reduce the number of bands obtained during amplification as well
as improve specificity in general, but in most cases cloning will still be required for a portion
of the samples, often due to heterozygosity in some individuals.
Gene duplication is also common and can result in erroneous phylogenetic trees if undetected
paralogs are included in the analysis. For example, gene duplication of rpb2 has been reported in
two major groups of asterid plants67 and Hibiscus and related Malvaceae68. Duplication of adh
genes has occurred in grasses, palms and other monocot clades and may reflect multiple duplication
events rather than a single ancestral duplication63. In one tribe of legumes, LEAFY has been
duplicated69.
The problems discussed do not occur consistently for all regions and taxa, and often the best
approach for finding a new useful region for a specific phylogenetic study is to try out an array
of regions on a subset of the taxa (five to eight species spanning the range of expected variation,
that is, two closely related species, plus one to two more distant species plus one to two
outgroups).
Looking for alternative regions for phylogenetic analyses of figs, Rønsted and coworkers32
screened various nuclear regions. Nuclear plastid-expressed glutamine synthetase gene (ncpGS)62
yielded strong amplification of one and occasionally two bands in a small set of taxa and was
chosen as an additional region; ncpGS is a nuclear gene responsible for assimilation of ammonia
from photorespiration. It is a member of a multigene family but diverged long ago from the cytosolic
expressed members; it contains several introns and is expected to diverge at a higher rate than ITS.
The primers designed by Emshwiller and Doyle62 for a range of dicotyledoneous plants amplify a
region with four introns, and the size of the amplified product varies between 500 and 1,600 base
pairs (bp). The region varies between 1,050 and 1,350 bp in figs with one to two indels missing
in some taxa32. To obtain specific and strong amplification, a new set of primers was designed
based on fig sequences.
In a provisional dataset containing 60 aligned ncpGS sequences of Ficus and 1,695 characters,
211 characters were potentially parsimony informative (12%) and 492 (29%) were variable32. This
is somewhat less variation than in Sinningieae (Gesneriaceae), where ncpGS provided 24% poten-
tially parsimony informative characters3. A preliminary analysis of a combined dataset of ITS, ETS
and ncpGS sequences of 33 figs and two outgroup taxa was performed using 500 replicates of
random stepwise addition with tree bisection-reconnection (TBR), equal weights, and the maximum
parsimony criterion as implemented in PAUP* 4.0b1031. This analysis produced seven trees32. One
of the trees is shown in Figure 9.4. A total of 500 bootstrap replicates with simple stepwise addition
and TBR swapping was performed.
Compared with the previous combined analysis of ITS and ETS sequences (Figure 9.3)18
both resolution and support is improved in the three-region set. For instance, in the ITS/ETS
analysis the relationship of the African F. section Galoglychia and the New World F. section
Americana was uncertain, with some trees showing Galoglychia and Americana as sisters and
other trees showing Galoglychia paraphyletic to Americana. In the three-region analysis,
section Galoglychia (63 bootstrap percentage, BP) is sister (100 BP) to section Americana
(100 BP).
This dataset is limited and includes more monoecious than dioecious figs, but when many more
taxa are sequenced for ncpGS and included, the three regions combined are expected to provide a
“well supported” phylogenetic hypothesis for sectional relationships within figs. However, to obtain
a well resolved and supported species-level tree, additional nuclear regions will be needed. However,
direct sequencing of currently known nuclear regions may not provide sufficient resolution, and an
alternative could be to use other molecular techniques, such as amplified fragment length polymor-
phism (AFLP), which is discussed in the next section.
9579_C009.fm Page 137 Saturday, November 11, 2006 11:38 AM
99 Ficus pleurocarpa
Sect. Malvanthera
Ficus rubiginosa
Ficus albert_smithii
Ficus citrifolia
100 100 95 Ficus americana Sect. Americana
Ficus schumacheri
Ficus calimana
Ficus cestrifolia
100 Ficus socotrana
77 Ficus socotrana
91 Ficus petersii
95 Ficus craterostoma
Ficus kiloneura Sect. Galoglychia
63
100 Ficus ilicina
Ficus elasticoides
65
100 Ficus preussii
100 Ficus sagittifolia
Ficus scassellatii
Ficus tonduzii Sect. Pharmacosycea
Castilla elatica
Outgroup
Poulsenia armata
— 10 changes
FIGURE 9.4 One of seven most parsimonious trees obtained from combined analysis of ITS, ETS and ncpGS
sequences of Ficus. Tree length 3,061 steps, CI = 0.75, and RI = 0.80. Bootstrap percentages (>50%) are
shown above the branches. Arrowheads indicate nodes that collapse in the strict consensus tree.
(Solanum) and rice (Oryza), but more experience is needed if the technique is to become more
widely used.
Many researchers also feel that AFLP data are not suitable for parsimony and other types of
phylogenetic analyses. According to Koopman and coworkers72, critics raise two main points of
concern. First, because AFLP fragments are identified by their length and not by their base composition,
nonidentical fragments of equal length will mistakenly be scored as homologous. Second, AFLPs are
usually scored as dominant characters, that is, with only the character states present (1), and absent
(0), whereas in reality, at least some of the bands may represent codominant markers. Both sources
of error introduce homoplasies and redundancies into the data and could lead to erroneous tree
topologies in phylogenetic analyses. However, Koopman and coworkers72 concluded that the impact
of these homoplasies on conclusions regarding species relationships will be minor.
Several other empirical studies have likewise demonstrated that AFLP data are suitable for
analyses with parsimony. In a study by Hodkinson et al.73 AFLPs were used to investigate phylo-
genetic relationships of Phyllostachys, a large economically important genus of woody bamboos.
DNA bands ranging from 50 to 500 bp in size were scored as presence/absence characters, and weak
bands were removed from the matrix, which was then subjected to parsimony analyses. Hodkinson
and coworkers also used AFLP on Miscanthus, a close relative of sugarcane (Saccharum), and
found that major groupings were consistent with those determined from DNA sequences74. Other
examples include cultivated lettuce (Lactuca)72, the orchid genus Dactylorhiza75, Phylica76 and wild
potato (Solanum)77.
Summarising, AFLP is an effective technique for systematic studies in groups for which DNA
sequence analyses have provided insufficient variation. The biggest problem with AFLP markers is
that nonhomologous but similarly sized fragments may be scored as homologous, but this problem
is overcome by not working with distantly related taxa75. Another problem with these markers is that
they are not amenable to use with models of molecular evolution, which are needed in many studies,
such as in molecular clock approaches. A preliminary study of the utility of AFLPs for figs indicated
that the levels of variation between a subset of species produced results comparable to the DNA
sequence results, but with greater levels of variation, and band homology could be easily assigned78,79.
110
100
90
30 6
20
8
10
0
0 10 20 30 40 50 60 70 80 90 100 110
FIGURE 9.5 Temporal congruence of fig lineages and their associated pollinator wasp lineages based on
independent, fossil-calibrated molecular phylogenetic trees. Horizontal and vertical bars indicate standard
errors of ages inferred from fig and wasp phylogenetic trees, respectively. (Reproduced with permission from
Rønsted et al.18)
significantly different from r = 1. To evaluate whether the relationship could be due to chance alone,
the sum of squares of perpendicular offsets from a perfect linear regression (slope = 1) were compared
to 10,000 randomised sets of 10 pairs of ages drawn from both phylogenetic trees. This analysis
showed that the pattern observed in Figure 9.5 was highly significant, and the correlation between
interacting fig and wasp lineages could therefore not be due to chance alone.
The strength of relationship between independently inferred ages of closely associated fig and
wasp lineages provided the most compelling published evidence for long-term codivergence in this
mutualism during at least the last 60 million years.
This is the situation with recently formed allotetraploids in Nicotiana, such as N. tabacum86,89,
which is less than 200,000 years old90. However, if the polyploid is older, then the match with
parental loci will be less exact (due to subsequent divergence in both parent and hybrid progeny),
and it is possible that parental species may have become extinct, both of which may obscure the
origins of allopolyploids. Parentage of older allotetraploid groups in Nicotiana, such as N. sect.
Suaveolentes (25 species found in Australia and southwestern Africa, about 8 million years old91),
was much more difficult for Goodspeed92 to sort out on the basis of chromosome studies and has
been more difficult to determine on the basis of plastid DNA, nuclear ITS DNA and low-copy
nuclear genes86,89,91. DNA sequences in such taxa are so divergent from all extant species that
assessments of phylogenetic relationships are problematic.
ITS sequences of nuclear ribosomal DNA in allopolyploids are subject to concerted evolution
and/or gene conversion and, given enough time, are usually converted to one of the parental types,
most often that of the maternal parent86,89,93,94. However these loci can be unpredictable within some
genera, and conversion can occur in both directions in closely related species89. For example, in
N. rustica conversion favoured the maternal parent, so both plastid loci and ITS data indicated the
same phylogenetic placement; thus if it had not been known that N. rustica was an allotetraploid,
sequencing a plastid and nuclear locus like ITS would not have revealed its hybrid origin. In
N. tabacum, ITS conversion to the paternal copy occurred, so a comparison of the plastid and ITS
trees clearly revealed discordant relationships for this species89. Therefore, ITS and plastid data are
difficult to correctly interpret without other sources of independent information.
ITS and ETS loci occupy the same 35S rDNA array and have been combined in phylogenetic
analyses for diploid species (for example,18,28). However, few comparisons between these two loci
have been made in allopolyploids, and it cannot be assumed that they will trace the same evolu-
tionary history in all cases. Conversion of both loci to the same parental copies cannot be assumed,
although we know of no documented cases of such in the angiosperms. The 5S ribosomal gene is
typically located on a different chromosome from that of the 35S array, and thus it is more likely
that different conversion patterns will occur, which would generate incongruent results in studies
that use both loci. A striking example of this is the allopolyploid N. section Repandae (Figure 9.6),
in which the 5S copy is inherited solely from the paternal parent, N. obtusifolia, whereas the ITS
type is inherited from only the maternal parent, N. sylvestris90. Single-copy nuclear genes do not
generally undergo concerted evolution, and copies of each progenitor type can usually be sequenced
from allopolyploids. In Nicotiana single-copy nuclear genes (for example, plastid expressed
glutamine synthetase62) have provided important information about parentage for some allopolyp-
loid species91. These data have proven to be particularly useful in allopolyploids in which the ITS
region is converted to the maternal copy, “making plastid and ITS based trees agree”.
Allotetraploids (polyploids with two distinct diploid genomes) obviously contain twice as many
copies of the genes a plant needs to function. There are four main ways in which homeologous
nuclear genes can interact, and most produce a specific signature in phylogenetic studies:
• Both homeologous copies are expressed and therefore have intact reading frames.
Here both types have usually diverged from their respective progenitor types at an
approximately equal rate. This appears to be the most common fate of duplicated genes,
and the literature is filled with examples, including studies of floral gene sequences in
Hawaiian silverswords (Asteraceae)95 and 16 loci surveyed in cotton (Gossypium)96. In
fact, polyploid cotton is probably a special case, as both copies have been shown to be
expressed but not always in the same tissue97. This form of specialisation could result
in each copy being under different selection pressures; hence each type may evolve at a
different rate.
• Only one of the duplicated genes is expressed, and therefore the other becomes
redundant. A relaxation in the constraints on the redundant gene results in a build up
of deleterious mutations, eventually making it a pseudogene (reviewed in Wendel 98).
9579_C009.fm Page 141 Saturday, November 11, 2006 11:38 AM
N. nesophila
N. stocktonii
N. repanda
N. nudicaulis
N. sylvestris N. sylvestris
N. noctiflora N. noctiflora
N. petunioides N. petunioides
N. linearis N. linearis
N. attenuata N. attenuata
N. miersii N. miersii
N. alafa N. alafa
N. bonariensis N. bonariensis
N. wigandioides N. wigandioides
N. cordifolia N. cordifolia
N. nesophila
N. stocktonii
N. repanda
N. nudicaulis
N. palmerii N. palmerii
N. obtusifolia N. obtusifolia
N. tomentosiformis N. tomentosiformis
N. otophora N. otophora
FIGURE 9.6 The single most parsimonious NTS tree (left) versus the strict consensus of all most parsimonious
ITS trees (right) for 18 species of Nicotiana. Both resulted from analysis using identical parameters90. The
striped clade indicates the allopolyploid section Repandae, and all other species are diploid. (Reproduced with
permission from Clarkson et al.91)
• Both homeologous copies of the gene are expressed, but there is relaxed selection on
one, which, given the right selection pressure, can evolve another function. This results
in accelerated nonsynonymous rates of substitution. This situation appears to be rare, although
it has been shown to be a possible fate of duplicated genes in Petunia and Ipomoea99,100.
• There is ‘cross talk’ between the two copies, which results in chimeric ‘hybrid’
sequences. This situation appears to be rare and, as far as we are aware, has only been
reported once in homeologous gene copies, in Nicotiana tabacum101. Here members of
the glucan endo-1,3 β-glucosidase gene family were reported to be hybrid sequences
between the two progenitor types. Crucially, however, the two progenitors of this allotet-
raploid were not sequenced for the study.
9579_C009.fm Page 142 Saturday, November 11, 2006 11:38 AM
9.8 CONCLUSIONS
Large genera have often been neglected for taxonomic studies, and many suffer from superficial
or outdated classifications as well as uncertain species numbers and delimitations. However, large
genera offer unique opportunities for studies of comparative biology. The precondition is a com-
prehensive and robust phylogenetic hypothesis, but in most cases phylogenetic work on large genera
is still in its early days, with limited sampling and an array of potential problems. Phylogenetic
hypotheses are currently typically based on molecular data, which surpasses morphological and
other types of data in various ways, particularly the ability to obtain sufficient amounts of infor-
mation for species-level comparisons and the need for phylogenetic hypotheses that are independent
of biological traits that one may wish to evaluate.
The first decades of molecular phylogenetic work have focused on plastid regions, which are easily
amplified and have provided sufficient resolution in many studies. However, in an increasing number
of cases, a limited number of relatively small plastid regions provide too little variation to be of use for
infrageneric studies, and this problem is especially pronounced when dealing with large genera. Several
low-copy nuclear genes have proven useful for examining divergence among closely related species in
which nuclear ribosomal spacers and plastid spacers do not provide sufficient variation for phylogenetic
reconstruction. However, low-copy nuclear regions are often difficult to amplify, and generation of
phylogenetic trees may be confounded by dynamic evolutionary processes of the gene families, such as
gene duplication/deletion, gene conversion and recombination. Although these regions often provide
more informative variation than commonly used standard regions, there will typically still be a need for
combining several regions to obtain a substantial increase in resolution. In addition, mere size of the
genera poses problems with phylogenetic analysis time, which can be reduced by various strategies.
In cases for which direct sequencing of nuclear regions may not provide sufficient resolution,
an alternative could be using other molecular marker techniques, such as AFLP. The biggest
problem with AFLP markers is that nonhomologous but similarly sized fragments may be scored
9579_C009.fm Page 143 Saturday, November 11, 2006 11:38 AM
as homologous, but this problem is overcome by not working with distantly related taxa. Another
problem with these markers is that they are not amenable to use with models of molecular evolution,
which are needed in many studies (for example, molecular clock approaches).
In general, there is no easy way to detect hybrids using DNA sequences in phylogenetic studies
unless one has at hand multiple trees from independent loci. Sequencing of just plastid and nuclear
ribosomal DNA will in many cases lead to the conclusion that hybrids are not present due to
conversion to the maternal copy type in the ribosomal DNA. Inclusion of several single or low-
copy regions is the most likely route to discovery of hybrids, particularly allopolyploids, but
detection of a homoploid hybrid requires a much greater number of loci and some luck with choice
of regions such that different linkage groups have been selected.
ACKNOWLEDGEMENTS
This work was supported by the Danish Carlsberg Foundation (Nina Rønsted), a Marie Curie
Outgoing International Fellowship within the Sixth European Community Framework Program
(Nina Rønsted), the Natural Environment Research Council (NERC) (James Clarkson) and the Jeff
Metcalf Fellows Programme of the University of Chicago (Kathrine Turk).
REFERENCES
1. Frodin, D.G., History and concepts of big plant genera, Taxon, 53, 753, 2004.
2. Bronstein, J.L. and McKey, D., The fig-pollinator mutualism: a model system for comparative biology,
Experientia, 45, 601, 1989.
3. Perret, M. et al., Systematics and evolution of tribe Sinningieae (Gesneriaceae): evidence from
phylogenetic analyses of six plastid DNA regions and nuclear ncpGS, Amer. J. Bot., 90, 445, 2003.
4. Albach, D.C., Martinez-Ortega, M.M., and Chase, M.W., Veronica: parallel morphological evolution
and phylogeography in the Mediterranean, Pl. Syst. Evol., 246, 177, 2004.
5. Pennington, R.T., Cronk, Q.C.B. and Richardson, J.A., Introduction and synthesis: plant phylogeny
and the origin of major biomes, Phil. Trans. R. Soc. Lond. B, 359, 1455, 2004.
6. Rutschmann, F. et al., Did Crypteroniaceae really disperse out of India? Molecular dating evidence
from rbcL, ndhF, and rpl16 intron sequences, Int. J. Plant Sci., 165, S69, 2004.
7. Davies, T.J. et al., Darwin´s abominable mystery: insights from a supertree of the angiosperms, Proc.
Natl. Acad. Sci. USA, 101, 1904, 2004.
8. Davies, T.J. et al., Environmental causes for plant biodiversity gradients, Phil. Trans. R. Soc. Lond.
B, 359, 1645, 2004.
9. Davies, T.J. et al., Environmental energy and evolutionary rates in flowering plants, Phil. Trans. R.
Soc. Lond. B., 271, 2195, 2004.
10. Linder, H.P., Hardy, C.R., and Rutschmann, F., Taxon sampling effects in molecular clock dating: an
example from the African Restionaceae, Mol. Phyl. Evol., 35, 569, 2005.
11. Bremer, K., Ancestral areas: a cladistic reinterpretation of the center of origin concept, Syst. Biol.,
41, 436, 1992.
12. Chase, M.W. et al., When in doubt, put it in Flacourtiaceae: a molecular phylogenetic analysis based
on plastid rbcL DNA sequences, Kew Bull., 57, 141, 2002.
13. Rønsted et al., Molecular phylogenetic evidence for the monophyly of Fritillaria and Lilium (Liliaceae;
Liliales) and the infrageneric classification of Fritillaria, Mol. Phyl. Evol., 35, 509, 2005.
14. Weiblen, G.D., Phylogenetic relationships of functionally dioecious Ficus (Moraceae) based on
ribosomal DNA sequences and morphology, Amer. J. Bot., 87, 1342, 2000.
15. Cook, J.M. and Rasplus, J.-Y., Mutualists with attitude: coevolving fig wasps and figs, Trends Ecol.
Evol., 18, 241, 2003.
16. Machado, C.A. et al., Phylogenetic relationships, historical biogeography and character evolution of
fig-pollinating wasps, Proc. R. Soc. Lond. B, 268, 685, 2001.
17. Datwyler, S.L. and Weiblen, G.D., On the origin of the fig: phylogenetic relationships of Moraceae
from ndhF sequences, Amer. J. Bot., 91, 767, 2004.
9579_C009.fm Page 144 Saturday, November 11, 2006 11:38 AM
18. Rønsted, N. et al., 60 million years of co-divergence in the fig-wasp symbiosis, Proc. R. Soc. Lond.
B, 272, 2593, 2005.
19. Zerega, N.J.C. et al., Biogeography and divergence times in the mulberry family (Moraceae), Mol.
Phyl. Evol., 37, 402, 2005.
20. Page, R.D.M., Introduction, in Tangled Trees: Phylogeny, Cospeciation and Coevolution, Page,
R.D.M., Ed., The University of Chicago Press, Chicago, 2003, 1.
21. Pellmyr, O., Yuccas, yucca moths, and coevolution: a review, Ann. Missouri Bot. Gard., 90, 35, 2003.
22. Henderson, A., A review of pollination studies in the Palmae, Bot. Rev., 52, 221, 1986.
23. Meekijjaroenroj, A. and Anstett, M.C., A weevil pollinating the Canary Islands date palm: between
parasitism and mutualism, Naturwissenschaften, 90, 452, 2003.
24. Kato, M., Takimura, A., and Kawakita, A., An obligate pollination mutualism and reciprocal
diversification in the tree genus Glochidion (Euphorbiaceae), Proc. Natl. Acad. Sci. USA, 100, 5264,
2003.
25. Bronstein, J.L., The costs of mutualism, Amer. Zool., 41, 825, 2001.
26. Page, R.D.M., Clayton, D.H., and Patterson, A.M., Lice and cospeciation: a response to Barker, Int.
J. Parasit., 26, 213, 1996.
27. Herre, E.A. et al., Molecular phylogenies of figs and their pollinator wasps, J. Biogeogr., 23, 521, 1996.
28. Jousselin, E., Rasplus, J.-Y., and Kjellberg, F., Convergence and coevolution in a mutualism: evidence
from a molecular phylogeny of Ficus, Evolution, 57, 1255, 2003.
29. Weiblen, G.D., Correlated evolution in fig pollination, Syst. Biol., 53, 1, 2004.
30. Shaw, J. et al., The tortoise and the hare II: relative utility of 21 noncoding chloroplast DNA sequences
for phylogenetic analyses, Amer. J. Bot., 92, 142, 2005.
31. Swofford, D.L., PAUP*: Phylogenetic Methods Using Parsimony (*and Other Methods), version
4, Sinauer, Sunderland, MA, 2002.
32. Rønsted, N. et al., unpublished data, 2005.
33. Chase, M.W., Fay, M.F., and Savolainen, V., Higher-level classification in the angiosperms: new
insights from the perspective of DNA sequence data, Taxon, 49, 685, 2000.
34. Asmussen, C.B. and Chase, M.W., Coding and noncoding plastid DNA in palm systematics, Amer.
J. Bot., 88, 1103, 2001.
35. Chase, M.W. et al., Multi-gene analyses of monocot relationships: a summary, in Monocots: Com-
parative Biology and Evolution, Columbus, J.T. et al., Eds., Rancho Santa Ana Botanic Garden,
Claremont, CA, USA, in press.
36. Reeves, G. et al., Molecular systematics of Iridaceae: evidence from four plastid DNA regions, Amer.
J. Bot., 88, 2074, 2001.
37. Goldblatt, P. et al., Radiation in the Cape flora and the phylogeny of peacock irises Moraea (Iridaceae)
based on four plastid DNA regions, Mol. Phyl. Evol., 25, 341, 2002.
38. Baldwin, B.G. et al., The ITS region of nuclear ribosomal DNA: a valuable source of evidence on
angiosperm phylogeny, Ann. Missouri Bot. Gard., 82, 247, 1995.
39. Rønsted, N. et al., Phylogenetic relationships within Plantago (Plantaginaceae): evidence from nuclear
ribosomal ITS and plastid trnL-F sequence data, Bot. J. Linn. Soc., 139, 323, 2002.
40. Park, S.J. and Kim, K.J., Molecular phylogeny of the genus Hypericum (Hypericaceae) from Korea
and Japan: evidence from nuclear rDNA ITS sequence data, J. Plant Biol., 47, 366, 2004.
41. Huang, J.L., Giannasi, D.E., and Huang, J., Phylogenetic relationships in Ephedra (Ephedraceae)
inferred from chloroplast and nuclear DNA sequences, Mol. Phyl. Evol., 35, 48, 2005.
42. Buckler, E.S., Ippolito, A., and Holtsford, T.P., The evolution of ribosomal DNA: divergent paralogues
and phylogenetic implications, Genetics, 145, 821, 1997.
43. Campbell, C.S. et al., Persistent nuclear ribosomal DNA sequence polymorphism in the Amelanchier
agamic complex (Rosaceae), Mol. Biol. Evol., 14, 81, 1997.
44. Denduangboripant, J. and Cronk, Q.C.B., High intraindividual variation in internal transcibed spacer
sequences in Aeschynanthus (Gesneriaceae): implications for phylogenetics, Proc. R. Soc. Lond. B,
267, 1407, 2000.
45. Kita, Y. and Ito, M., Nuclear ribosomal ITS sequences and phylogeny in East Asian Aconitum subgenus
Aconitum (Ranunculaceae), with special reference to extensive polymorphism in individual plants,
Pl. Syst. Evol., 225, 1, 2000.
46. Rapini, A., Chase, M.W., and Konno, T.U.P., Phylogenetics of the South American Asclepia-
doideae (Apocynaceae), Taxon, 55, 119, 2006.
9579_C009.fm Page 145 Saturday, November 11, 2006 11:38 AM
47. Baldwin, B.G. and Markos, S., Phylogenetic utility of the external transcribed spacer (ETS) of 18S-
26S rDNA: congruence of ETS and ITS trees of Calycadenia (Compositae), Mol. Phyl. Evol., 10,
449, 1998.
48. Béna, G. et al., Ribosomal external and internal transcribed spacers: combined use in the phylogenetic
analyses of Medicago (Leguminosae), J. Mol. Evol., 46, 299, 1998.
49. Roalson, E.H. and Friar, E.A., Phylogenetic analysis of the nuclear alcohol dehydrogenase (Adh) gene
family in Carex section Acrocystis (Cyperaceae) and combined analyses of Adh and nuclear ribosomal
ITS and ETS sequences for inferring species relationships, Mol. Phyl. Evol., 33, 671, 2004.
50. Oh, S.-H. and Potter, D., Molecular phylogenetic systematics and biogeography of tribe Neillieae
(Rosaceae) using DNA sequences of cpDNA, rDNA and LEAFY, Amer. J. Bot., 92, 179, 2005.
51. Sytsma, K.J., et al., Urticalean rosids: circumscription, rosid ancestry, and phylogenetics based on
rbcL, trnL-F and ndhF sequences, Amer. J. Bot., 89, 1531, 2002.
52. Soltis, D.E. and Soltis, P.S., Choosing an approach and an appropriate gene for phylogenetic analysis,
in Molecular Systematics of Plants II: DNA Sequencing, Soltis, D.E., Soltis, P.S., and Doyle, J.J.,
Eds., Kluwer Academic, Dordrecht, 1998, 1.
53. Clegg, M.T., Cummings, M.P., and Durbin, M.L., The evolution of plant nuclear genes, Proc. Natl.
Acad. Sci. USA, 94, 7791, 1997.
54. Mason-Gamer, R.J. and Kellogg, E.A., Potential utility of the nuclear gene waxy for plant phylogenetic
analyses, Amer. J. Bot. (suppl.), 83, 178, 1996.
55. Mason-Gamer, R.J., Weil, C.F., and Kellogg, E.A., Granule-bound starch synthase: structure, function,
and phylogenetic utility, Mol. Biol. Evol., 15, 1658, 1998.
56. Bailey, C.D. and Doyle, J.J., Potential phylogenetic utility of the low-copy nuclear gene pistillata in
dicotyledonous plants: comparison to nrDNA ITS and trnL intron in Sphaerocardamum and other
Brassicaceae, Mol. Phyl. Evol., 13, 20, 1999.
57. Wang, C.N., Moller, M., and Cronk, Q.C.B., Phylogenetic position of Titanotrichum oldhamii
(Gesneriaceae) inferred from four different gene regions, Syst. Bot., 29, 407, 2004.
58. Grob, G.B.J., Gravendeel, B., and Eurlings, M.C.M., Potential phylogenetic utility of the nuclear
FLORICAULA/LEAFY second intron: comparison with three chloroplast DNA regions in
Amorphophallus (Araceae), Mol. Phyl. Evol., 30, 13, 2004.
59. Mathews, S., Lavin, M., and Sharrock, R.A., Evolution of the phytochrome gene family and its utility
for phylogenetic analyses of angiosperms, Ann. Missouri Bot. Gard., 82, 296, 1995.
60. Davis, C.C. et al., Laurasian migration explains Gondwanan disjunctions: evidence from Malpighiaceae,
Proc. Natl. Acad. Sci. USA, 99, 6833, 2002.
61. Samuel, R. et al., Molecular phylogenetics of Phyllanthaceae: evidence from plastid matK and nuclear
PHYC sequences, Amer. J. Bot., 92, 132, 2005.
62. Emshwiller, E. and Doyle, J.J., Chloroplast-expressed glutamine synthetase (ncpGS): potential utility
for phylogenetic studies with an example from Oxalis (Oxalidaceae), Mol. Phyl. Evol., 12, 310, 1999.
63. Morton, B.R., Gaut, B., and Clegg, M.T., Evolution of alcohol dehydrogenase gene in the palm and
grass families, Proc. Natl. Acad. Sci. USA, 93, 11735, 1996.
64. Small, R.L. et al., The tortoise and the hare: choosing between noncoding plastome and nuclear ADH
sequences for phylogeny reconstruction in a recently diverged plant group, Amer. J. Bot., 85, 1301,
1998.
65. Lewis, C.E. and Doyle, J.J., Phylogenetic utility of the nuclear gene malate synthase in the palm
family (Arecaceae), Mol. Phyl. Evol., 19, 409, 2001.
66. Denton, A.L., Hall, B.D., and McConaughy, B.L., RPB2, a nuclear gene for tracing angiosperm
phylogeny, Amer. J. Bot., 83 (suppl.), 150, 1996.
67. Oxelman, B. et al., RPB2 gene phylogeny in flowering plants, with particular emphasis on asterids,
Mol. Phyl. Evol., 32, 462, 2004.
68. Pfeil, B.E. et al., Paralogy and orthology in the Malvaceae RPB2 gene family: investigation of gene
duplication in Hibiscus, Mol. Biol. Evol., 21, 1428, 2004.
69. Archambault, A. and Bruneau, A., Phylogenetic utility of the LEAFY/FLORICAULA gene in the
Caesalpinioideae (Leguminosae): gene duplication and a novel insertion, Syst. Bot., 29, 609, 2004.
70. Vos, P. et al., AFLP: a new technique for DNA fingerprinting, Nucleic Acids Res., 23, 4407, 1995.
71. Li, G. and Quiros, C.F., Sequence-related amplified polymorphism (SRAP), a new marker system
based on a simple PCR reaction: its application to mapping and gene tagging in Brassica, Theor.
Appl. Genet., 103, 455, 2001.
9579_C009.fm Page 146 Saturday, November 11, 2006 11:38 AM
72. Koopman, W.J.M., Zevenbergen, M.J., and van den Berg, R.G., Species relationships in Lactuca s.l.
(Lactuceae, Asteraceae) inferred from AFLP fingerprints, Amer. J. Bot., 88, 1881, 2001.
73. Hodkinson, T.R. et al., A comparison of ITS nuclear rDNA sequence data and AFLP markers for
phylogenetic studies in Phyllostachys (Bambusoideae, Poaceae), J. Pl. Res., 113, 259, 2000.
74. Hodkinson, T.R. et al., Phylogenetics of Miscanthus, Saccharum and related genera (Saccharinae,
Andropogoneae, Poaceae) based on DNA sequences from ITS nuclear ribosomal DNA and plastid
trnL intron and trnL-F intergenic spacers, J. Pl. Res., 115, 381, 2002.
75. Hedrén, M., Fay, M.F., and Chase, M.W., Amplified fragment length polymorphisms (AFLP) reveal
details of polyploid evolution in Dactylorhiza (Orchidaceae), Amer. J. Bot., 88, 1868, 2001.
76. Richardson, J.E. et al., Species delimitation and the origin of populations in island representatives of
Phylica (Rhamnaceae), Evolution, 57, 816, 2003.
77. Lara-Cabrera, S.I. and Spooner, D.M., Taxonomy of North and Central American diploid wild potato
(Solanum sect. Petota) species: AFLP data, Pl. Syst. Evol., 248, 129, 2004.
78. Parrish, T., Krakatau: genetic consequences of island colonization, Ph.D. thesis, University of Utrecht,
Utrecht, 2002.
79. Parrish, T., personal communication, 2002.
80. Percy, D.M., Page, R.D.M., and Cronk, Q.C.B., Plant-insect interactions: double-dating associated
insect and plant lineages reveals asynchronous radiations, Syst. Biol., 53, 120, 2004.
81. Weiblen, G.D. and Bush, G.L., Speciation in fig pollinators and parasites, Mol. Ecol., 11, 1573, 2002.
82. Hulsenbeck, J.P. and Ronquist, F., Mr Bayes: Bayesian inference of phylogeny, Bioinformatics, 17,
754, 2001.
83. Collinson, M.E., The fossil history of the Moraceae, Urticaceae (including Cecropiaceae), and
Cannabaceae, in Evolution, Systematics, and Fossil History of the Hamamelidae Vol. 2: Higher
Hamamelidae, Crane, P. and Blackmore, S., Eds., Clarendon Press, Oxford, The Systematics
Association, 1989, 319.
84. Sanderson, M.J., A nonparametric approach to estimating divergence times in the absence of rate
constancy, Mol. Biol. Evol., 14, 1218, 1997.
85. Sanderson, M.J., Estimating absolute rates of molecular evolution and divergence times: a penalized
likelihood approach, Mol. Biol. Evol., 19, 101, 2002.
86. Clarkson, J. et al., Phylogenetic relationships in Nicotiana based on multiple plastid loci, Mol. Phyl.
Evol., 33, 75, 2004.
87. Doyle, J.J., Doyle, J.L., and Brown, A.H.D., Incongruence in the diploid B-genome species complex
of Glycine (Leguminosae) revisited: histome H3-D alleles vs. chloroplast haplotypes, Mol. Biol. Evol.,
16, 354, 1999.
88. Wendel, J.F., New world tetraploid cottons contain old world cytoplasms, Proc. Natl. Acad. Sci. USA,
86, 4132, 1989.
89. Chase, M.W. et al., Molecular systematics, GISH and the origin of hybrid taxa in Nicotiana
(Solanaceae), Ann. Bot., 92, 107, 2003.
90. Clarkson, J. et al., Long-term genome diploidization in allopolyploid Nicotiana section Repandae
(Solanaceae), New Phytol, 168, 241, 2005.
91. Clarkson, J. et al., unpublished data, 2005.
92. Goodspeed, T.H., The genus Nicotiana, Chronica Botanica Company, MA, USA, 1954.
93. Aoki, S, and Ito, M., Molecular phylogeny of Nicotiana (Solanaceae) based on the nucleotide sequence
of the matK gene, Pl. Biol., 2, 316, 2000.
94. Pillon, Y. et al., Insights into the evolution and biogeography of western European species complexes
in Dactylorhiza (Orchidaceae), Taxon, in press.
95. Barrier, M. et al., Interspecific hybrid ancestry of a plant adaptive radiation: allopolyploidy of the
Hawaiian silversword alliance (Asteraceae) inferred from floral homeotic gene duplications, Mol.
Biol. Evol., 16, 1105, 1999.
96. Cronn, R.C., Small, R.L., and Wendel, J.F., Duplicated genes evolve independently after polyploid
formation in cotton, Proc. Natl. Acad. Sci. USA, 96, 14406, 1999.
97. Adams, K.L. et al., Genes duplicated by polyploidy show unequal contributions to the transcriptome
and organ-specific reciprocal silencing, Proc. Natl. Acad. Sci. USA, 100, 4649, 2003.
98. Wendel, J.F., Genome evolution in polyploids, Pl. Mol. Biol., 42, 225, 2000.
99. Durbin, M.L. et al., Evolution of the chalcone synthetase gene family in the genus Ipomoea, Proc.
Natl. Acad. Sci. USA, 92, 3338, 1995.
9579_C009.fm Page 147 Saturday, November 11, 2006 11:38 AM
100. Huttley, G.A. et al., Nucleotide polmorphism in the chalcone synthase: a locus and evolution of the
chalcone synthase multigene family of common morning glory, Ipomoea purpurea, Mol. Ecol., 6,
549, 1997.
101. Sperisen, C., Ryals, J., and Meins, F., Comparison of cloned genes provides evidence for intergenomic
exchange of DNA in the evolution of a tobacco glucan endo-1,3-β-glucosidase gene family, Proc.
Natl. Acad. Sci. USA, 88, 1820, 1991.
102. Linder, C.R. and Rieseberg, L.H., Reconstructing patterns of reticulate evolution in plants, Amer. J.
Bot. 91, 1700, 2004.
103. Müntzing, A., Outlines to a genetic monograph of the genus Galeopsis, Hereditas, 13, 185, 1930.
104. Timberlake, J., unpublished data, 2005.
105. Barkman, T.J. and Simpson, B.B., Hybrid origin and parentage of Dendrochilum acuiferum
(Orchidaceae) inferred in a phylogenetic context using nuclear and plastid DNA sequence data, Syst.
Bot., 27, 209, 2002.
9579_C009.fm Page 148 Saturday, November 11, 2006 11:38 AM
9579_C010.fm Page 149 Saturday, November 11, 2006 11:54 AM
10 The Diversification of
Flowering Plants through Time
and Space: Key Innovations,
Climate and Chance
T. J. Davies
Department of Biology, University of Virginia,
Charlottesville, USA
T. G. Barraclough
Division of Biology and NERC Centre for Population Biology,
Imperial College London, Silwood Park Campus, Ascot, Berkshire
CONTENTS
ABSTRACT
The flowering plants represent one of the largest terrestrial evolutionary radiations within recent
geological times. Current estimates indicate there may be as many as half a million extant species,
yet within the angiosperms species richness can vary over several orders of magnitude between
closely related clades and between geographical regions. Understanding why some regions and
some lineages contain more species than others has been a major challenge in biology. To date,
approaches for studying these two patterns have been mostly separate. Traditional explanations for
149
9579_C010.fm Page 150 Saturday, November 11, 2006 11:54 AM
taxonomic imbalance have focused upon key biological traits, whilst regional variation in species
richness has been ascribed largely to environmental factors. Using a tree of life for flowering plants,
we demonstrate that environment can explain much of the taxonomic imbalance evident within
phylogenetic trees not explained by key traits, and unequal rates of diversification, a product of the
interaction between traits and environment, may contribute to regional patterns in species richness.
10.1 INTRODUCTION
One of the principal goals of ecology and evolutionary biology is to understand the diversity and
distribution of life on Earth. The expansion of molecular approaches to phylogenetics has provided a
wealth of data for reconstructing the evolutionary events behind why some groups have flourished
whilst others have floundered. Flowering plants (angiosperms) have been one focus for such studies.
Flowering plants represent a highly species rich group with an estimated 500,000 extant species1–3
and were the subject of early coordinated efforts to reconstruct a complete family level phylogenetic
tree of a higher taxonomic group4,5. Flowering plant species richness varies greatly among taxo-
nomic groups and geographic regions; traditionally such patterns have been treated as largely
separate phenomena. Here we outline recent efforts to explore patterns and processes of diversifi-
cation in flowering plants using large-scale phylogenetic trees.
The phylogenetic distribution of species richness can vary over several orders of magnitude, even
between closely related families, indicating considerable variation in net diversification rates between
clades. Fossil evidence suggests a first appearance of angiosperms in the Cretaceous6; however, early
diverging lineages, such as Amborellaceae and Nymphaceae, tend to be relatively species poor.
Furthermore, a number of more recently derived families are unexpectedly species rich, notably within
Euasterids I7 sensu APG II8. There are two broad explanations for low species richness in older clades,
either extinction rates have been higher, or speciation rates lower. Although the fossil record is
insufficient to provide accurate estimates of extinction rates, there is little evidence that species poor
clades were previously more diverse. It is therefore most likely that it was not until after the initial
branching events of the clade that significant shifts in speciation rate arose9. Hence, high species
richness is a result of elevated diversification rates within a subset of angiosperm lineages and therefore
is of uneven distribution within flowering plants. However, the frequency, magnitude and location of
shifts in diversification rates across the group have been poorly documented.
The geographical distribution of flowering plant species richness varies at a magnitude similar
to that observed between lineages. Pollen records indicate ecological dominance was first attained
at low latitudes between 20°N and 20°S10. Subsequent latitudinal expansion of the clade coincided
with significant changes in the diversity of other plant groups, for example a decline in bryophytes
and pteridophytes10. By the late Cretaceous flowering plants were the dominant flora of low latitudes,
but comprised only 30–50% of diversity at higher latitudes. Over the past 65 million years before
present (mybp) flowering plants have become the predominant vegetation type across all latitudes,
but perhaps the most striking, and certainly the most frequently cited, spatial pattern in species
richness remains the latitudinal gradient in diversity11. Tropical regions, for example Brazil’s
Atlantic forest, the Eastern Arc and coastal forests of Tanzania/Kenya and Sundaland, are recognised
hotspots of flowering plant species richness12. Species richness tends to decrease at higher latitudes,
although there are a number of exceptions, notably within Mediterranean climates such as the Cape
of South Africa, the Mediterranean basin and the Californian chaparral.
Numerous studies have sought explanation for why some lineages are more diverse than others,
concentrating on the role of key biological traits, such as pollination syndrome, but in flowering
plants (as in other groups) such traits apparently explain relatively little of the variation in species
numbers. At the same time, ecological studies have explored the effects of environment on floristic
richness within regions, but have not traditionally addressed evolutionary explanations as to why
some lineages or regions have more species. Phylogenetics provides a means to combine these
9579_C010.fm Page 151 Saturday, November 11, 2006 11:54 AM
approaches. Here we review how information on diversification rates inferred from phylogenetic
trees can offer insights into the processes shaping both taxonomic and geographic patterns of species
richness. Specifically, we consider whether differences in environment experienced by lineages can
explain the extreme imbalance in species richness among clades.
TABLE 10.1
Taxonomic Distribution of Imbalanced Nodes Using the Imbalance
Measure of Slowinski and Guyer20 on the Phylogenetic Tree of
Flowering Plant Families from Davies et al.29
Higher Clade Order Number of Nodes % Imbalanced Nodes
N/A Austrobaileyales 2 50
Magnolids Canellales 2 0
Magnolids Laurales 6 33
Magnolids Magnoliales 5 40
Magnolids Piperales 2 50
Monocots Alismatales 13 23
Monocots Asparagales 20 30
Monocots Dioscoreales 2 50
Monocots Liliales 6 17
Monocots Pandanales 4 0
Commelinids Commelinales 4 25
Commelinids Poales 14 36
Commelinids Zingiberales 7 14
Eudicots Proteales 1 100
Eudicots Ranunculales 6 33
Core Eudicots Caryophyllales 19 21
Core Eudicots Saxifragales 11 36
Asterids Cornales 4 25
Asterids Ericales 24 21
Euasterids I Gentianales 5 20
Euasterids I Lamiales 20 35
Euasterids I Solanales 6 33
Euasterids II Apiales 9 22
Euasterids II Aquifoliales 4 25
Euasterids II Asterales 11 27
Euasterids II Dipsacales 1 0
Rosids Crossosomatales 2 0
Rosids Geraniales 2 0
Rosids Myrtales 10 10
Eurosids I Celastrales 2 50
Eurosids I Cucurbitales 6 17
Eurosids I Fabales 3 100
Eurosids I Fagales 6 17
Eurosids I Malpighiales 27 22
Eurosids I Oxalidales 4 0
Eurosids I Rosales 7 29
Eurosids II Brassicales 15 20
Eurosids II Malvales 6 50
Eurosids II Sapindales 9 22
under the assumption that the diversification rate during time t has been approximately exponential40,41,
where N is the number of species in the clade, and t is the time since the clade diverged from its
sister clade on the dated tree. Shifts in net diversification rates were therefore calculated as:
where des is the descendent clade and anc is the ancestral clade. A positive shift in net diversi-
fication rate indicates an increase in rates from the ancestral to the descendent clade. Mapping
the magnitude of rate shifts on the topology of the tree confirms the impression of frequent large
shifts in diversification rate, indicating that the propensity to diversify is a highly labile trait.
However, the direction and magnitude of shifts in net rates appeared to vary nonrandomly across
the phylogenetic tree.
The 10 greatest shifts in net diversification rates were negative, from high ancestral rates to
low descendent rates, for example, the nodes subtending Ecdeiocoleaceae (tussocky cord rush; one
species) sister to Poaceae (grasses; c. 12,000 species), Stegnospermaceae (Cuban tangle; three
species), sister to a number of families within Caryophyllales (for example, cacti, carpetweeds and
fig-marigolds; c. 4,300 species), and Calyceraceae (calycera family; 40 species) sister to Asteraceae
[daisies; c. 13,000 species). The two clades with the greatest positive shift in rates were identified
as the sister family pair Moraceae (figs and mulberrys; 1,675 species) and Urticaceae (nettles;
825 species). The mean age for the top 10 greatest positive shifts was significantly younger than
that for the negative shifts (mean age 38.5 mybp versus 53.4 mybp for the positive and negative
shifts respectively; P < 0.05, Mann-Whitney test).
In general, older nodes tended to exhibit greater taxonomic imbalance, associated with a
negative shift in net diversification rates, and more recent nodes tended to be more balanced than
expected, with several sister family pairs displaying correlated positive shifts in rates. One possible
explanation for this would be a general increase in diversification rates within recent time periods,
and the imbalance of older nodes might reflect the accumulated effect of past shifts in diversification
rate. However, an alternative explanation is that this pattern reflects a bias due to the use of families
as terminal taxa; shifts occurring within families can only be reconstructed as occurring in the
entire family in the analyses. Furthermore, extinction will have had less time to operate within
more recently derived clades, thereby inflating diversification rate estimates7. The overriding impres-
sion is of a history littered with tales of evolutionary successes and failures. We explore what might
explain this chequered past in the following sections.
variables, for example additional traits also affecting speciation rates, are minimised54. However,
Salamin and Davies43 found no significant association between the traits studied and species richness
among higher clades.
There are several possible explanations as to why no support was found for the key innovation
hypothesis:
• Poor phylogenetic data; phylogenetic error will tend to reduce signal and thereby
increase the probability of type II errors when independent contrasts are employed24.
• Poor trait data; if clades were miscoded in terms of trait value, we would also predict
an associated increase in type II error rates.
• The wrong traits were examined; Gorelick55 lists 20 hypotheses that have variously
been proposed to explain the evolutionary success of the flowering plants, including
many key traits. Doubtlessly a comprehensive survey of the literature would reveal many
more putative key innovations, and a number of significant results have been reported
within a subset of clades43,56–60. It is possible that, if sufficient data were to become
available to test these hypotheses in the future, significant associations may be found
across the flowering plants.
• Contingency upon other traits and the environment; whether a certain trait influences
diversification rates is likely to depend on a number of factors, including the abiotic
environment, other biological traits, and other taxa61.
• Contrasts at higher taxonomic levels may be too insensitive; the majority of flowering
plant diversity is encompassed within, rather than between, families, hence a stronger asso-
ciation between traits and species richness might be observed for more fine-scale analyses.
Although there were a possible 378 contrasts (nodes in the supertree), the sample size of
unambiguous state changes was small; life form and mode of pollination were both limited to two
comparisons, the maximum being 15 comparisons (mode of dispersal). The limited number of
contrasts was partly a product of lack of variation in the traits under examination; for example,
abiotic pollination characterises both clades subtending the most imbalanced node identified above,
the grasses and their sister group. However, the predominant limiting factor was within family
variation, resulting in many clades being classified as polymorphic for the majority of traits,
indicating that, for the traits examined, the taxonomic scale of this analysis was inappropriate.
Where strong associations between species richness and biological traits have been found, they
are often environment or clade specific, for example, annual life form in grasses43, floral nectar
spurs in columbines57, climbing habit in predominantly tropical taxa58 and fleshy fruit in the tropical
understorey59. As we look further back in time, at nodes deeper in the phylogenetic tree, we would
expect a proportional increase in the impact of other factors, such as mass extinctions, biogeography
and other traits on diversification rates61,62. The difference between the findings of Smith59 and
those of Salamin and Davies43 on the importance of biotic dispersal, for which fleshy fruit is an
indicator, is likely a result of the former study restricting comparisons to taxa found only within a
narrow environmental niche, the tropical understorey. The significant association between annual
life form and species richness in grasses and the absence of significance in contrasts between
families of flowering plants is also a likely product of scale, but taxonomic rather than environ-
mental. It is therefore unsurprising that key traits do not always generalise across disparate taxa.
For example, biogeography appears to have left a greater imprint on patterns of current species
richness than presence or absence of nectar spurs in the genus Halenia63, yet nectar spurs may
remain important when only young, geographically restricted, clades are considered.
In summary, there is a growing appreciation that explanations based upon one or a few traits
are too simplistic to explain patterns of flowering plant species richness7,8,64. Where significant
correlations between biological traits and species richness have been found, they tend to be in
comparisons between recently radiated taxa sharing similar ecological conditions. Whether a bio-
logical trait influences net diversification rates is therefore likely to depend on a number of other
9579_C010.fm Page 155 Saturday, November 11, 2006 11:54 AM
factors, including abiotic environment. If the efficacy of a trait in influencing speciation rates were
environment dependent we might also predict that different traits would have been advantageous
at different geological times, with those taxa that happened to be pre-adapted to changes in
environmental conditions radiating rapidly. Such a scenario has been suggested as explaining the
rapid radiation of the grasses (previously restricted to marginal habitats) coinciding with the late
Tertiary change towards a drier climate, which enabled the exploitation of new niches and a dramatic
increase in their ecological dominance65,66, and might explain the apparent lag between the origin
of particular traits and the increase in the proportion of taxa possessing them in the fossil record67.
Environment clearly has the potential to greatly enhance our understanding of the evolutionary
history of flowering plant diversity; in the following section we explore the effects of one aspect
of environment, namely latitudinal gradients.
a. Generation Times
Mutation Rates
Mutation Rates
FIGURE 10.1 Alternate pathways depicting the potential relationships among species richness, mutation rates
and environmental energy. (a) Higher temperature of tropical regions may increase evolutionary rates, via
shortening generation times and elevating mutation rates, thereby driving net diversification rates; the faster
evolution hypothesis. (b) Alternatively, environmental energy may influence both evolutionary rates and
diversification rates independently.
Cardillo90 failed to find any association between molecular rates and latitude; the second step in
the theory. Therefore, despite widespread interest, support for the faster evolution hypothesis has
been equivocal, and the direction of causality unclear (compare Figure 10.1a and Figure 10.1b).
Sister family comparisons of Davies et al.91 supported the broad predictions of the spe-
cies–energy theory, revealing a strong correlation between environmental energy and species rich-
ness. Temperature was the best predictor of the alternate energy measures, explaining 19% of the
variation in species numbers between families, once area had been accounted for. Energy was also
found to be a good predictor of molecular evolutionary rates, with faster rates in high-energy
environments, confirming the second step in the faster evolution theory. Although a relationship
between molecular evolutionary rates and environmental energy had not previously been reported90,
there was no evidence that energy increased diversification rates by this pathway. Instead the effects
of energy on both molecular rates and species richness was direct (Figure 10.1b), leading Davies
et al.91 to reject the faster evolution hypothesis.
Diversification rates co-vary with environment; lineages occupying higher energy regions tend
to have higher net speciation rates. The direct link between energy and species richness is compatible
with the biomass hypothesis. More productive environments might reduce extinction through
supporting higher population densities, thereby elevating net diversification rates, although the
precise relationship between productivity and density remains controversial70,98–100. Bonn et al.101
recently proposed an alternative explanation, in which more productive environments were likely
to contain a greater sample of taxon-specific critical resources. Therefore, high energy environments
could sustain a greater number of viable populations, not by increasing population density, but
rather increasing the probability of the occurrence of a limiting resource, which may vary between
taxa. This hypothesis remains to be evaluated in plants.
Both environment and biological traits may explain a proportion of the variation in net diver-
sification rates among lineages within flowering plants. Key traits are difficult to evaluate for older
nodes; similarly, we might expect causal relationships between environment and species richness
to be more difficult to detect for more ancient splits due to post speciation range movement and
historical climate change. In the final section of this chapter, we explore how combining information
on biology and environment can be mutually informative, using the iris family (Iridaceae) as a test
case. By using younger and more narrowly distributed taxa, it may be possible to more accurately
discriminate environment and the effect of species-specific traits on geographical and taxonomic
patterns of species richness.
High : 1155
Low : 1
FIGURE 10.2 Geographical distribution of iris family species richness. (Generated by summing species
counts from overlaid generic distribution maps from Davies et al.110,111)
within dense vegetation typical of tropical regions, species numbers tend to be highest in seasonally
dry environments (Figure 10.2). Irises conform to typical patterns of Cape diversity: the family
contains 677 species in the region, of which 80% are endemic105, and many genera including
Gladiolus (260 species) and the peacock irises (Moraea) (196 species) have radiated extensively
within the Cape108. Although a few genera such as Neomarica (eight species) and Eleutherine (two
species) are found in the Neotropics, they are relatively species poor. The family is characterised
by an isobilateral leaf held vertically, perhaps the single most important morphological innovation
of the family, and underground storage organs, such as a corms (for example, Crocus and Gladiolus),
rhizomes (for example, Isophysis and Sisyrinchium) or bulbs (for example, Cypella and Tigridia).
Floral morphology is highly variable and matches several different pollination syndromes109. These
traits have been thought important in the group’s evolution within the Cape105.
Phylogenetically independent contrasts22 of species richness from a generic level phylogenetic
tree of irises revealed several significantly imbalanced nodes110,111. Although the node subtending
Isophysis (one species) and the remainder of Iridaceae is the most imbalanced, the next most basal
nodes are also highly imbalanced, suggesting that that the ancestral state may have been a low net
diversification rate. The early diverging lineages, Diplarrhena (two species), Patersonia (21 species)
and Geosiris (one species), tend to be species poor, rhizomatous, of limited geographical distribution
and suggest an Australasian origin for the family. Repeating the family-level analysis of environment
and species richness at the generic level showed that abiotic environment plus area could explain
up to 85% of the variation in species richness between sister clades, including the highly imbalanced
nodes identified above.
Environmental factors associated with warm, dry and topologically diverse habitats were the
best predictors of species richness, reflecting the specific preferences of the family and confirming
its departure from the more general trend towards higher species richness in tropical environments.
However, environment alone was insufficient to explain the high diversity of Cape clades. By using
contrasts between sister clades, lineages within the Cape were revealed to have speciated at faster
rates than those found elsewhere, even than in regions with similar Mediterranean type climates.
Molecular evidence suggests that the Cape may have undergone a period of rapid diversification
coinciding with a change in oceanic currents leading to the aridification of the region 8–7 mybp104,112.
While the majority of branching events in the generic phylogenetic tree predate this shift in climate,
9579_C010.fm Page 159 Saturday, November 11, 2006 11:54 AM
there is evidence in at least one genus, Moraea, that this geological background provided the setting
for the diversification leading to high species richness of irises in the Cape108.
It is possible that the high net diversification rates observed in Cape clades are a product of
abiotic factors not included in the model parameters or that the spatial scale of the analysis was
too insensitive to accurately characterise the great physical diversity of the Cape Region. For
example, the Cape may have been more climatically stable during the Pleistocene radiations,
allowing greater time for gradual speciation, elevating net diversification rates95. However, of the
genera with the greatest deviation from the model, all those that contain more species than predicted
by environment, Geissorhiza, Hesperantha, Ixia and Therianthus, fall within a single clade,
Crocoideae. Although no biological traits changed state frequently enough on the tree to allow tests
for a general correlation with diversity, a number of traits are characteristic of this clade and may
have been instrumental in its diversification in the Cape. The evolution of the perianth tube and
zygomorphic flowers was likely important in allowing floral plasticity in Crocoideae, and subse-
quent pollinator specialisation105,113,114. A cormous rootstock, again typical of Crocoideae, may have
enabled rapid regrowth in fire-dominated landscapes and also promoted establishment by vegetative
reproduction following rare long-distance dispersal events.
Even if sufficient data were available to evaluate these putative key traits more rigorously, it
seems unlikely that they would be identified as key innovations from simple analyses of taxonomic
imbalance, as several species poor genera share many of these traits with their species rich
counterparts. Ancestral state reconstructions for both flower symmetry and rootstock type reveal
that shifts in character states are correlated with neither significantly imbalanced nodes nor large
deviance from the expected values derived from the climate variables110,111. Biological traits asso-
ciated with high species richness in genera of irises may therefore have only had those effects in
particular places, most notably the highly heterogeneous environment of the Cape. This is consistent
with our second prediction: clades outside the Cape have not diversified, irrespective of their
intrinsic biological attributes. The limited number of comparisons meant that it was not possible
to examine our first prediction: whether the absence of particular traits results in lower diversification
rates in the region.
10.6 CONCLUSIONS
Phylogenetic trees of the flowering plants are too imbalanced to be a product of an equal rates
Markov process, in which all lineages have an equal probability of diversifying, but shifts in
diversification rate appear to be too frequent for it to be explained by the inheritance of a few key
traits. The interaction between traits and the environment may offer a resolution to this apparent
paradox. If the influence of heritable biological traits upon the likelihood of diversifying was
dependent on environmental conditions and environmental change was frequent, repeated shifts in
diversification rate would be expected. Within any set of conditions some lineages would be favoured
over others. However, the identity of these lineages would fluctuate with a changing environment,
as conditions favourable to speciation within one lineage may not be so in another. A strong
relationship between environment and species richness would, however, be evident at any single
point in the evolutionary history of flowering plants.
Extant species richness may be best explained with reference to the contemporary environment.
However, lineages characterised by different suites of traits might be expected to display different
functional responses to their physical surroundings. Irises represent a family that departs from
global trends in species richness, yet the environment can explain a large proportion of the variation
in species richness among lineages. Even within this clade, it is likely that particular biological
traits have favoured rapid diversification within the Cape of South Africa. A wider sample of Cape
clades may provide sufficient data to evaluate the interaction between traits and environment in
this unique and species rich region.
9579_C010.fm Page 160 Saturday, November 11, 2006 11:54 AM
ACKNOWLEDGEMENTS
We thank our collaborators on the published work described here, Vincent Savolainen, Nicolas
Salomin, Mark Chase, Justin Moat, Peter Goldblatt, Pam Soltis and Doug Soltis. We also thank
Camille Barr for helpful comments on an earlier draft of this manuscript. The work was supported
by a NERC Ph.D. studentship and by the Royal Society.
REFERENCES
1. Govaerts, R., How many species of seed plants are there? Taxon, 50, 1085, 2001.
2. Bramwell, D., How many plants species are there? Plant Talk, 28, 32, 2002.
3. Willis, K.J. and McElwain, J.G., The Evolution of Plants, Oxford University Press, Oxford, 2002.
4. APG, An ordinal classification for the families of flowering plants, Ann. Mo. Bot. Gard., 85, 531, 1998.
5. Chase, M.W. and Albert, V.A., A perspective on the contribution of plastid rbcL DNA sequences to
angiosperm phylogenetics, in Molecular Systematics of Plants II: DNA Sequences, Soltis, D.E., Soltis,
P.S. and Doyle, J.A., Eds., Kluwer Academic Publishers, Boston, 1998, 488.
6. Niklas, K.J., Tiffney, B.H., and Knoll, A.H., Patterns in vascular land plant diversification, Nature,
303, 614, 1983.
7. Magallón, S. and Sanderson, M.J., Absolute diversification rates in angiosperm clades, Evolution, 55,
1762, 2001.
8. APG II, An update of the Angiosperm Phylogeny Group classification for the orders and families of
flowering plants: APG II, Bot. J. Linn. Soc., 141, 399, 2003.
9. Sanderson, M.J. and Donoghue, M.J., Shifts in diversification rate with the origin of angiosperms,
Science, 264, 1590 1994.
10. Crane, P.R. and Lidgard, S., Angiosperm diversification and paleolatitudinal gradients in cretaceous
floristic diversity, Science, 246, 675, 1989.
11. Gaston, K.J. and Williams, P.H., Spatial patterns in taxonomic diversity, in Biodiversity: a Biology of
Numbers and Differences, Gaston, K.J., Ed., Blackwell Science, Oxford, 1996, 202.
12. Myers, N. et al., Biodiversity hotspots for conservation priorities, Nature, 403, 853, 2000.
13. MacArthur, R., On the relative abundance of bird species, Proc. Natl. Acad. Sci. USA, 43, 293, 1957.
14. Dial, K.P. and Marzluff, J.M., Nonrandom diversification within taxonomic assemblages, Syst. Zool.,
38, 26, 1989.
15. Scotland, R.A. and Sanderson, M.J., The significance of few versus many in the tree of life, Science,
303, 643, 2004.
16. Slowinski, J.B. and Guyer, C., Testing the stochasticity of patterns of organismal diversity: an improved
null model, Am. Nat., 134, 907, 1989.
17. Purvis, A., Using Interspecies phylogenies to test macroevolutionary hypotheses, in New Uses for
New Phylogenies, Harvey, P.H. et al., Eds., Oxford University Press, Oxford, 1996, 153.
18. Mooers, A.Ø. and Heard, S.B., Inferring evolutionary process from phylogenetic tree shape, Q. Rev.
Biol., 72, 31, 1997.
19. Mooers, A.Ø. and Heard, S.B., Using tree shape, Syst. Biol., 51, 833, 2002.
20. Slowinski, J.B. and Guyer, C., Testing whether certain traits have caused amplified diversification: an
improved method based on a model of random speciation and extinction, Am. Nat., 142, 1019, 1993.
21. Nee, S., Mooers, A.Ø., and Harvey, P.H., Tempo and mode of evolution revealed from molecular
phylogenies, Proc. Natl. Acad. Sci. USA, 89, 8322, 1992.
22. Harvey, P.H. and Pagel, M.D., The Comparative Method in Evolutionary Biology, Oxford University
Press, Oxford, 1991.
23. Symonds, M.R.E., Life histories of the Insectivora: the role of phylogeny, metabolism and sex
differences, J. Zool. Soc. Lond., 249, 315, 1999.
24. Symonds, M.R.E., The effect of topological inaccuracy in evolutionary trees on the phylogenetic
comparative method of independent contrasts, Syst. Biol., 51, 541, 2002.
25. Doyle, J.A. and Donoghue, M.J., Phylogenies and angiosperm diversification, Paleobiology, 19, 141, 1993.
26. Friedman, W.E. and Floyd, S.K., Perspective: the origin of flowering plants and their reproductive
biology — a tale of two phylogenies, Evolution, 55, 217, 2001.
9579_C010.fm Page 161 Saturday, November 11, 2006 11:54 AM
27. Soltis, D.E. et al., Inferring complex phylogenies using parsimony: an empirical approach using three
large DNA data sets for angiosperms, Syst. Biol., 47, 32, 1998.
28. Savolainen, V. and Chase, M.W., A decade of progress in plant molecular phylogenetics, Trends Genet.,
19, 717, 2003.
29. Davies, T.J. et al., Darwin’s abominable mystery: insights from a supertree of the angiosperms, Proc.
Natl. Acad. Sci. USA, 101, 1904, 2004.
30. Pisani, D. and Wilkinson, M., Matrix representation with parsimony, taxonomic congruence, and total
evidence, Syst. Biol., 51, 151, 2002.
31. Gatesy, J. et al., Resolution of a supertree/supermatrix paradox, Syst. Biol., 51, 652, 2002.
32. Bininda-Emonds, O.R.P. et al., Supertrees are a necessary not-so-evil: a response to Gatesy et al.,
Syst. Biol., 52, 724, 2003.
33. Bininda-Emonds, O.R.P. and Bryant, H.N., Properties of matrix representation with parsimony analyses,
Syst. Biol., 47, 497, 1998.
34. Bininda-Emonds, O.R.P. and Sanderson, M.J., Assessment of the accuracy of matrix representation
with parsimony analysis supertree construction, Syst. Biol., 50, 565, 2001.
35. Salamin, S., Hodkinson, T.R., and Savolainen, V., Building supertrees: an empirical assessment using
the grass family (Poaceae), Syst. Biol., 51, 136, 2002.
36. Fusco, G. and Cronk, Q.C.B., A new method for evaluating the shape of large phylogenies, J. Theor.
Biol., 175, 235, 1995.
37. Purvis, A., Katzourakis, A., and Agapow, P.M., Evaluating phylogenetic tree shape: two modifications
to Fusco & Cronk’s method, J. Theor. Biol., 214, 99, 2001.
38. Wilkinson, M. et al., The shape of supertrees to come: tree shape related properties of fourteen
supertree methods, Syst. Biol., 54, 419, 2005.
39. Guyer, C. and Slowinski, J.B., Comparisons of observed phylogenetic topologies with null expectations
among three monophyletic lineages, Evolution, 45, 340, 1991.
40. Eriksson, O. and Bremer, B., Pollination systems dispersal modes, life forms, and diversification rates
in angiosperm families, Evolution, 46, 258, 1992.
41. Stanley, S.M., Macroevolution: Pattern and Process, Freeman, San Francisco, 1979.
42. van Valen, L., Adaptive zones and the orders of mammals, Evolution, 25, 420, 1971.
43. Salamin, N. and Davies, T.J., Using supertrees to investigate species richness in grasses and flowering
plants, in Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, Computational
Biology Vol. 4, Bininda-Emonds, O.R.P., Ed., Kluwer Academic, Dordrecht, 2004, 461.
44. Eriksson, O. and Bremer, B., Pollination systems, dispersal modes, life forms, and diversification rates
in angiosperm families, Evolution, 46, 258, 1992.
45. Gaut, B.S. et al., Substitution rate comparisons between grasses and palms: synonymous rate
differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL, Proc. Natl. Acad.
Sci. USA, 93, 10274, 1996.
46. Gaut, B.S. et al., Relative rates of nucleotide substitution at the rbcL locus of monocotyledonous
plants, J. Mol. Evol., 35, 292, 1992.
47. Dodd, M.E., Silvertown, J., and Chase, M.W., Phylogenetic analysis of trait evolution and species
diversity variation among angiosperm families, Evolution, 53, 732, 1999.
48. Ricklefs, R.E. and Renner, S.S., Species richness within families of flowering plants, Evolution, 48,
1619, 1994.
49. Bawa, K.S. and Opler, P.A., Dioecism in tropical forest trees, Evolution, 29, 167, 1975.
50. Bawa, K.S., Pollinators of tropical dioecious angiosperms: a reassessment? No, not yet, Am. J. Bot.,
81, 456, 1994.
51. Rieseberg, L.H., Hybrid origins of plant species, Annu. Rev. Ecol. Syst., 28, 359, 1997.
52. Baker, H.G., Self-compatability and establishment after ‘long-distance’ dispersal, Evolution, 9, 347,
1955.
53. Felsenstein, J., Phylogenies and the comparative method, Am. Nat., 125, 1, 1985.
54. Barraclough, T.G., Nee, S., and Harvey, P.H., Sister-group analysis in identifying correlates of
diversification: comment, Evol. Ecol., 12, 751, 1998.
55. Gorelick, R., Did insect pollination cause increased seed plant diversity? Biol. J. Linn. Soc., 74, 407, 2001.
56. Farrell, B.D., Dussourd, D.E., and Mitter, C., Escalation of plant defence — do latex and resin canals
spur plant diversification? Am. Nat., 138, 881, 1991.
9579_C010.fm Page 162 Saturday, November 11, 2006 11:54 AM
57. Hodges, S.A. and Arnold, M.L., Spurring plant diversification: are floral nectar spurs a key innovation?
Proc. R. Soc. Lond. B, 262, 343, 1995.
58. Gianoli, E., Evolution of a climbing habit promotes diversification in flowering plants, Proc. R. Soc.
Lond. B, 271, 2011, 2004.
59. Smith, J.F., High species diversity in fleshy-fruited tropical understory plants, Am. Nat., 157, 646, 2001.
60. Sargent, R.D., Floral symmetry affects speciation rates in angiosperms, Proc. R. Soc. Lond. B, 271,
603, 2003.
61. de Queiroz, A., Contingent predictability in evolution: key traits and diversification, Syst. Biol., 51,
917, 2002.
62. Ree, R.H., Detecting the historical signature of key innovations using stochastic models of character
evolution and cladogenesis, Evolution, 59, 257, 2005.
63. Von Hagen, K.B. and Kadereit, J.W., The diversification of Halenia (Gentianaceae): ecological
opportunity versus key innovation, Evolution, 57, 2507, 2003.
64. Sims, H.J. and McConway, K.J., Nonstochastic variation of species-level diversification rates within
angiosperms, Evolution, 57, 460, 2003.
65. Axelrod, D.I., A theory of angiosperm evolution, Evolution, 6, 29, 1952.
66. Chapman, G.P., The Biology of Grasses, CAB International, Oxon, 1996.
67. Crane, P.R., Friis, E.M., and Pedersen, K.J., The origin and early diversification of angiosperms,
Nature, 374, 27, 1995.
68. Hillebrand, H., On the generality of the latitudinal diversity gradient, Am. Nat., 163, 192, 2004.
69. Willig, M.R., Kaufman, D.M., and Stevens, R.D., Latitudinal gradients of biodiversity: pattern,
process, scale, and synthesis, Annu. Rev. Ecol. Evol. Syst., 34, 273, 2003.
70. Currie, D.J. et al., Predictions and tests of climate-based hypotheses of broad-scale variation in
taxonomic richness, Ecol. Lett., 7, 1121, 2004.
71. Wright, D.H., Species-energy theory: an extension of species-area theory, Oikos, 41, 496, 1983.
72. Rohde, K., Latitudinal gradients in species-diversity: the search for the primary cause, Oikos, 65, 514, 1992.
73. Allen, A.P., Brown, J.H., and Gillooly, J.F., Global biodiversity, biochemical kinetics, and the energetic-
equivalence rule, Science, 297, 1545, 2002.
74. Turner, J.R.G., Lennon, J.J., and Lawrenson, J.A., British bird species distributions and the energy
theory, Nature, 335, 539, 1988.
75. Currie, D.J., Energy and large-scale patterns of animal-species and plant-species richness, Am. Nat.,
137, 27, 1991.
76. Wright, D.H., Currie, D.J., and Maurer, B.A., Energy supply and patterns of species richness on local
and regional scales, in Species Diversity in Ecological Communities, Ricklefs, R.E. and Schluter, D.S.,
Eds., Chicago Press, Chicago, 1993, 66.
77. Roy, K. et al., Marine latitudinal diversity gradients: tests of causal hypotheses, Proc. Natl. Acad. Sci.
USA, 95, 3699, 1998.
78. Francis, A.P. and Currie, D.J., A globally consistent richness-climate relationship for angiosperms,
Am. Nat., 161, 523, 2003.
79. Wylie, J.L. and Currie, D.J., Species-energy theory and patterns of species richness: I. Patterns of
bird, angiosperm, and mammal species richness on islands, Biol. Cons., 63, 137, 1993.
80. Hutchinson, G.E., Homage to Santa Rosalina or why are there so many kinds of animals? Am. Nat.,
93, 145, 1959.
81. Pianka, E.R., Latitudinal gradients in species diversity: a review of the concepts, Am. Nat., 100, 33,
1966.
82. Gaston, K.J., Global patterns in biodiversity, Nature, 405, 220, 2000.
83. Willis, K.J. and Whittaker, R.J., Species diversity: scale matters, Science, 295, 1245, 2002.
84. Stehli, F.G., Douglas, R.D., and Newell, N.D., Generation and maintenance of gradients in taxonomic
diversity, Science, 164, 947, 1969.
85. Jablonski, D., The tropics as a source of evolutionary novelty through geological time, Nature, 364,
142, 1993.
86. Cardillo, M., Latitude and rates of diversification in birds and butterflies, Proc. R. Soc. Lond. B, 266,
1221, 1999.
87. Buzas, M.A., Collins, L.S., and Culver, S.J., Latitudinal difference in biodiversity caused by higher
tropical rate of increase, Proc. Natl. Acad. Sci. USA, 99, 7841, 2002.
9579_C010.fm Page 163 Saturday, November 11, 2006 11:54 AM
88. Barraclough, T.G. and Savolainen, V., Evolutionary rates and species diversity in flowering plants,
Evolution, 55, 677, 2001.
89. Webster, A.J., Payne, R.J.H., and Pagel, M., Molecular phylogenies link rates of evolution and
speciation, Science, 301, 478, 2003.
90. Bromham, L. and Cardillo, M., Testing the link between the latitudinal gradient in species richness
and rates of molecular evolution, J. Evol. Biol., 16, 200, 2003.
91. Davies, T.J. et al., Environmental energy and evolutionary rates in flowering plants, Proc. R. Soc.
Lond. B, 271, 2195, 2004.
92. Rothschild, L.J., The influence of UV radiation on protistan evolution, J. Eukaryotic Microbiol., 46,
548, 1999.
93. Currie, D.J. and Paquin, V., Large-scale biogeographical patterns of species richness of trees, Nature,
329, 326, 1987.
94. O’Brien, E.M., Climatic gradients in woody plant species richness: towards an explanation based on
an analysis of southern Africa’s woody flora, J. Biogeog., 20, 181, 1993.
95. Dynesius, M. and Jansson, R., Evolutionary consequences of changes in species’ geographical
distributions driven by Milankovitch climate oscillations, Proc. Natl. Acad. Sci. USA, 97, 9115, 2000.
96. Qian, H. and Ricklefs, R.E., Geographical distribution and ecological conservatism of disjunct genera
of vascular plants in eastern Asia and eastern North America, J. Ecol., 92, 253, 2004.
97. Huntley, B. and Webb, T.I., Migration: species’ response to climate variations caused by changes in
the Earth’s orbit, J. Biogeog., 16, 5, 1989.
98. Currie, D.J. and Fritz, J.T., Global patterns of animal abundance and species energy use, Oikos, 67,
56, 1993.
99. Kaspari, M., O’Donnell, S., and Kercher, J.R., Energy, density, and constraints to species richness:
ant assemblages along a productivity gradient, Am. Nat., 155, 280, 2000.
100. Enquist, B.J. and Niklas, K.J., Invariant scaling relations across tree-dominated communities, Nature,
410, 655, 2001.
101. Bonn, A., Storch, D., and Gaston, K., Structure of the species-energy relationship, Proc. R. Soc. Lond.
B, 271, 1685, 2004.
102. Cowling, R.M., Holmes, P.M., and Rebelo, A.G., Plant diversity and endemism, in The Ecology of
Fynbos, Cowling, R.M., Ed., Oxford University Press, Cape Town, 1992, 62.
103. Simmons, M.T. and Cowling, R.M., Why is the Cape Peninsular so rich in plant species? An analysis
of the independent diversity components, Biodiv. Cons., 5, 551, 1996.
104. Richardson, J.E. et al., Rapid and recent origin of species richness in the Cape flora of South Africa,
Nature, 412, 181, 2001.
105. Goldblatt, P. and Manning, J.C., Plant diversity of the Cape region of southern Africa, Ann. Mo. Bot.
Gard., 89, 281, 2002.
106. Linder, H.P., Radiation of the Cape flora, southern Africa, Biol. Rev., 78, 597, 2003.
107. Goldblatt, P., Phylogeny and classification of the Iridaceae and the relationship of Iris, Annali di
Botanica, Nouva Serie, 1, 13, 2001.
108. Goldblatt, P. et al., Radiation in the Cape flora and the phylogeny of peacock irises Moraea (Iridaceae)
based on four plastid DNA regions, Mol. Phylog. Evol., 25, 341, 2002.
109. Goldblatt, P., Phylogeny and classification of Iridaceae, Ann. Mo. Bot. Gard., 77, 607, 1990.
110. Davies, T.J. et al., Environmental causes for plant biodiversity gradients, Phil. Trans R. Soc. Lond.
B, 359, 1645, 2004.
111. Davies, T.J. et al., Environment, area and diversification in the species-rich flowering plant family
Iridaceae, Am. Nat., 166, 1537, 2005.
112. Klak, C., Reeves, G., and Hedderson, T., Unmatched tempo of evolution in Southern African
semi-desert ice plants, Nature, 427, 63, 2004.
113. Goldblatt, P., An overview of the systematics, phylogeny and biology of the African Iridaceae,
Contributions from the Bolus Herbarium, 13, 1, 1991.
114. Bernhardt, P. and Goldblatt, P., The diversity of pollination mechanisms in the Iridaceae of southern
Africa, in Monocots: Systematics and Evolution, Wilson, K.L. and Morrison, D.A., Eds., CSIRO,
Melbourne, 2000, 301.
9579_C010.fm Page 164 Saturday, November 11, 2006 11:54 AM
9579_C011.fm Page 165 Saturday, November 11, 2006 3:40 PM
CONTENTS
ABSTRACT
The grass family (Poaceae) comprises about 10,000 species distributed in some 785 genera, seven large
subfamilies and a few small ones. The distribution of species in genera appears skewed toward mono-
typic genera and those with few species. This pattern follows the hollow curve distribution documented
by Willis1. Explanations of the pattern have been attributed to statistical, biological and taxonomic
factors. This study explores potential biological and statistical explanations for species distribution in
Poaceae. Patterns of species distribution in the family and its major subfamilies were investigated, and
the influence of age, habit and habitat on these patterns was assessed. Results showed that species
distribution is not only skewed for the number of small genera but also for the total number of species
in larger genera. Phylogenetic position does not appear to explain species distribution in the family and
in fact refutes the age and area theory proposed by Willis. Genus size appears to be correlated with
habit where larger genera are predominantly perennial. Genera with mixed annual and perennial species
do not reflect the hollow curve pattern. These patterns of species distribution may be explained by
polyploidy and hybridisation, two prominent features in the evolution of the family.
11.1 INTRODUCTION
A striking pattern for distribution of taxa in their respective higher categories points to a skewed
distribution towards monotypic and small groups. This phenomenon was first documented by Willis1
and Willis and Yule2 in a study of the flora of Ceylon (Sri Lanka). They dubbed this pattern the
165
9579_C011.fm Page 166 Saturday, November 11, 2006 3:40 PM
hollow curve distribution (HCD) and indicated that such a pattern exists at all taxonomic levels.
Willis and Yule2 asserted that the longer the group has existed, the more area it will occupy. They
further stated that monotypic genera are in general ‘beginners’ and are descendents of larger ones.
The HCD was later demonstrated in other organisms, such as arthropods, birds and mammals3–7.
Although this skewed pattern is evident across a broad range of biological diversity and at all
taxonomic levels, explanations of its causes vary, and different hypotheses and models have been
proposed (see Hodkinson and Parnell, Chapter 1; Davies and Barraclough, Chapter 10; Parnell
et al., Chapter 16).
Willis1 cited biological, historical, mathematical, psychological and statistical elements as
potential causes of the HCD. Willis and Yule2 stressed age and area as the principal factors behind
the biological patterns of diversification and the emergence of the HCD. Dial and Marzluff5 indicated
that early authors favoured deterministic explanations, whereas more recent authors incline toward
stochastic models. In a study aimed at determining patterns and causes of species diversification,
Dial and Marzluff5 compared species distributions in 85 taxonomic units from six groups of animals
and one group of plants to those predicted by five null models. Their sample comprised 53 taxonomic
assemblages based on traditional classifications and 32 based on phylogenetic schemes. They found
that real assemblages were dominated to a significantly higher extent by one unit than predicted
by all five models. They also noted that the pattern is evident in both traditional and phylogenetic
schemes and concluded that such skewed distributions reflect real differences in the evolutionary
successes of the groups. Dial and Marzluff concluded that overdominance of an assemblage by
one unit is a common and nonrandom feature of taxonomic diversity distribution and proposed that
such a pattern might be the consequence of differences in life history traits such as fecundity, age
of first reproduction, longevity and mobility. Cardillo et al.6 studied the pattern of diversification
in 76 genera (210 species) of Australian mammals and contrasted the observed distribution with
the Poisson and geometric models. They observed that species distribution based on real data is
significantly different from those predicted by Poisson and geometric null distribution, with the
observed distribution having more species poor and species rich genera than predicted by the models.
Scotland and Sanderson7 tested the HCD in birds and the three angiosperm families Fabaceae
(legumes), Orchidaceae (orchids) and Asteraceae (asters or daisies). They compared their new model,
simultaneous broken tree (SBT), with distributions based on real data, the simultaneous broken stick
(SBS) and the geometric distribution. Their study showed that the SBT model overestimated the
monotypes and dominance (large genera), whereas the SBS underestimated them. Consequently, they
suggested that lack of fit between real data and the SBT model is taxonomic and not evolutionary,
contending that taxonomists are averse to studying genera that are too large or too small.
In this study, the grass family (Poaceae) is chosen to assess the potential influence of some
biological traits on species distribution in genera. Biological traits examined here are habit, eco-
geographic preferences, habitat and polyploidy. Genus size is also considered in a phylogenetic
context using a consensus tree for the grass phylogeny. The grass family is chosen because of its
large size (approximately 10,000 species and 785 genera), wide distribution over diverse habitats,
variation in habit that provides a sizeable sample and wealth of data on chromosome number and
polyploidy. This study is based on real (observed) data and does not include null model assessments.
A 250
Number of genera
200
150
100
50
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
33
35
36
37
40
45
50
55
60
65
70
80
90
100
110
120
150
160
220
230
250
270
300
330
350
450
470
500
B Number of species/genus
Total number of species
600
500
400
300
200
100
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
33
35
36
37
40
45
50
55
60
65
70
80
90
100
110
120
150
160
220
230
250
270
300
330
350
450
470
500
Number of species/genus
FIGURE 11.1 The distribution of genus size in the grass family. (A) The pattern follows the HCD with
predominance of monotypic and small genera. (B) Species concentration (dominance) occurs at the opposite
end of the curve. (Data from Clayton and Renvoize11 and Watson and Dallwitz19.)
the distribution curve is strongly skewed toward small genera, the overwhelming majority of species
(dominance) is found at the other end of the curve in large genera (Figure 11.1B). Excluding the
two ends of the spectrum in terms of genus size and species distribution leaves genera with 11–99
species that make up 19% of the genera (123) with 35% (3,418) of the species, and an average of
TABLE 11.1
Distribution of Species in Grass Genera
Number of Percentage Number of Percentage
Genera of Genera Species of Species
Note: Statistics calculated for the whole family, annuals, perennials and genera with mixed
annual and perennial species (noted as mixed genera). Also noted is the distribution of
species in genera as grouped into three arbitrary categories: 1–10 species, 11–99 species
and 100 and more species. Data obtained from Hilu et al.18 and Watson and Dallwitz20.
9579_C011.fm Page 169 Saturday, November 11, 2006 3:40 PM
28 species per genus. The species distribution in this group is in sharp contrast with the two
extremes, where an average of 2.8 species per genus for genera containing 10 or less species and
219 species per genus for genera with 100 or more species is found (Table 11.1).
Arundinoideae, Danthonioideae,
Oryzoideae Aristidoideae Centothecoideae
Pooideae Bambusoideae Chloridoideae Panicoideae
Chasmanthium
Brachyelytrum
Brachypodium
Pappophorum
Phyllostachys
Streptochaeta
JOINVILLEA
Lophatherum
Anomochloa
Loudetiopsis
Echinochloa
Centropodia
Andropogon
Microchloa
Phragmites
Danthonia
Tristachya
Bouteloua
Chusquea
Hordeum
Triraphis
Digitaria
Ehrharta
Monodia
Sorghum
Triticum
Zeugites
RESTIO
Pariana
Aristida
Bromus
Phleum
Arundo
Chloris
Nardus
Pharus
Kengia
Melica
Zoysia
Vulpia
Avena
Oryza
Olyra
Briza
Stipa
Zea
35
7
1
1 4
Buergersiochloa
5
2
200
Puelioideae Streptogyneae
1
5 6
40
2 Eriachne
PACCAD
POACEAE
FIGURE 11.2 A consensus tree for the grass family on which genus size is mapped for basal lineages of the
grass family and its major subfamilies. (The consensus tree is based on trees obtained from GPWG17 and Hilu
et al.18; information on genus size is from Clayton and Renvoize11 and Watson and Dallwitz19.)
9579_C011.fm Page 170 Saturday, November 11, 2006 3:40 PM
the genus comprises eight species confined to Australia (not shown). Considering the base of
individual subfamilies for instance (Figure 11.2), in Pooideae, both Brachyelytrum (shady places
of woodlands in North America, Japan and Korea) and Nardus (Europe and Western Asia) are
monotypic; in Chloridoideae, Triraphis includes seven species found in Africa and Arabia and one
species in Australia; and in Ehrhartoideae, Streptogyna contains only two species distributed in the
forest shade from West Africa to Southern India and Sri Lanka, and Mexico to Brazil. Therefore,
genus size at the base of individual subfamilies is small, but geographic distribution varies. In contrast,
a wide range of genus size exists at the terminal branches that include monotypics and large genera.
Thus, the size of early diverging genera in the family and the subfamilies is small rather than large.
The small size and mostly restricted distribution may either represent the prehistorical geographic
pattern or could be the outcome of species extinction and endemism. Cardillo et al.6 indicated that
diversification rate is an outcome of a differential rate of speciation and extinction. In conclusion,
age alone cannot be used to explain genus size or the predominance of small-sized genera.
A
25
Genera number
20
15
10
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
20
23
24
25
30
35
40
45
50
60
100
120
B Species number
70
60
Genera number
50
40
30
20
10
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
24
28
30
35
40
55
160
350
Species number
FIGURE 11.3 Genus size distribution in Bambusoideae (A) and Chloridoideae (B). Genus size distribution
follows the HCD. The two subfamilies differ in their ecophysiological preferences and photosynthetic systems.
9579_C011.fm Page 171 Saturday, November 11, 2006 3:40 PM
A B
400 7500
Number of species
Number of genera
300
5000
200
2500
100
0 0
annual mixed perennial annual mixed perennial
FIGURE 11.4 The distribution of genera and species in annuals, perennials and mixed genera (genera with
both types of life histories). (A) The distribution of genera. (B) The distribution of species. (Information on
genus size is from Clayton and Renvoize11 and Watson and Dallwitz19.)
here as, due to infrequent flowering, taxonomic decisions are in some cases based on vegetative
characters, a situation that may favour a trend toward splitting but not lumping of species in a
genus. Consequently, one would expect species distribution to be skewed away from monotypic or
very small genera and to favour relatively larger ones; this assumption is based on potentially higher
variation amongst populations in vegetative characters relative to reproductive traits. In this case,
taxonomic factors would become more pronounced in assessments of species distribution. This,
obviously, does not seem to be the case (Figure 11.3A).
Thus, these contrasting subfamilies all show the HCD despite having differing geographic
distribution, photosynthetic pathways and other adaptive traits correlating to ecological factors.
A 150
125
75
50
25
0
1
2
3
4
5
6
7
8
9
10
11
12
13
15
16
17
18
19
20
21
23
24
25
26
27
29
30
35
37
40
45
50
60
65
70
80
90
100
120
150
270
450
Number of species per genus
B
80
70
Number of genera
60
50
40
30
20
10
0
1 2 3 4 5 6 7 8 10 15 21 22 25 30
Number of species per genus
FIGURE 11.5 The distribution of genus size in perennial (A) and annual (B) genera. Genus size distribution
follows the HCD. (Information on genus size is from Clayton and Renvoize11 and Watson and Dallwitz19.)
15
Number of genera
10
0
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
22
24
26
28
30
33
35
36
37
40
50
55
60
65
80
100
110
150
160
220
230
250
300
330
350
470
500
0
2
10
12
14
15
16
18
20
37
40
55
60
65
100
FIGURE 11.6 Deviations from the HCD. Genera containing both annual and perennial species deviate from
the HCD at the whole grass family level (A) and in individual tribes such as Andropogoneae (B). A similar
pattern is found in other grass tribes such as the Paniceae and Poeae/Aveneae. (Information on genus size is
from Clayton and Renvoize11 and Watson and Dallwitz19.)
9579_C011.fm Page 173 Saturday, November 11, 2006 3:40 PM
of distribution was also apparent in mixed genera of the tribes Andropogoneae (Figure 11.6B),
Paniceae and Poeae/Aveneae (not shown).
The small number of annual genera and their small average size and the substantially larger
number of perennial genera and their larger size in grasses are unexpected findings at first glance,
as an annual but not a perennial habit is generally considered to favour higher diversification (see
Hodkinson et al., Chapter 17). Annuals reach flowering stage in the same season, and variation in
length of vegetative (juvenile) period is a matter of days. In contrast, perennials go through juvenile
vegetative stages which may extend from months to years21. Furthermore, Harper21 noted that in
perennial herbs, such as grasses, years of flowering and seed production are often interrupted by
years of purely vegetative growth. Thus regeneration of offspring in perennials, or at least herba-
ceous ones, is at a lower rate than with annuals. This gives annuals a definite advantage over
perennials in terms of rapid generation turnover, increased variability due to more frequent sexual
recombination, and potential for higher rate of fixation of adaptive mutations. Considering this
scenario, annuals are expected to have better options for enhanced speciation compared with
perennials. It has been shown that fast life history is conducive to higher diversification and may
increase probability of speciation6,22,23. Evidently this does not seem to be the case in grasses, as
annual genera are by far less species rich than perennial ones. The causes of this pattern of
diversification require explanation.
Considering these observed patterns of species distribution in relation to habit, three questions
can be posed:
• Why do annual genera tend to be smaller in size, whereas perennial genera are more
species rich?
• Why does the HCD theory break down in mixed annual/perennial genera?
• What causes genera with mixed species to be larger in size on the average than either
annual or perennial genera?
To assess the potential impact of polyploidy in grass biodiversity, samples of annual, perennial
and mixed genera of various sizes were examined for the presence and degree of polyploidy. Large
perennial genera examined are highly polyploid, displaying a series of aneuploid and/or euploid
gametic chromosome numbers. Standing out among these genera are Festuca (450 species),
Calamagrostis (270 species), Bambusa (120 species) and Rytidosperma (90 species). In Festuca,
somatic chromosome numbers reported include 2n = 14, 28, 35, 42, 56 and 70; in Calamagrostis
2n = 28, 42 and 56, or 56-91 (apomicts); Bambusa 2n = 24, 46, 48, 70 and 72; and Rytidosperma
2n = 24, 48, 72, 96 and 120. With the exception of Festuca, diploid types are not found in these
genera. In contrast, perennial genera of small size tend to occupy the other end of the spectrum in
terms of polyploidy. For instance, the monotypic perennial Calderonella is a diploid with x = 12,
Thysanolaena is a diploid based on x = 11, and Asthenochloa and Cleistachne are tetraploids based
on x = 9. The perennial small genus Sartidia (4) is also a diploid based on x = 11. This genus is
of special interest, as other members of its tribe, Aristideae, are much larger, with Stipagrostis
containing 50 species and Aristida 250. However, Stipagrostis contains primarily perennial species
with a few annuals, and species are diploid or tetraploid based on x = 11. Aristida on the other
hand includes a mixture of annuals and perennials but displays a more extensive polyploid series
based on x = 11 or 12 (2n = 22, 24, 36, 44, 48 and 66). These three Aristideae genera display a
grade of genus size that parallels their polyploidy levels and habit, suggesting a correlation between
species number and polyploidy levels in perennials. It is to be noted that Bambusa belongs to the
woody bamboo subfamily (Bambusoideae), a group containing long-lived perennials. Polyploidy
is the norm in these, and diploid genotypes/species have for the most part gone extinct26.
Looking at genera with different habits, it appears from examining representative genera that
there is a tendency towards lower frequency of polyploidy in annuals than perennials. Asthenochloa
and Cleistachne are monotypic annuals, both are tetraploids based on x = 9, most likely rediploidised
tetraploids that lack other chromosomal variation. The annual genera Gastridium, Mibora and
Elytrophorus each contain two species; they are all diploids. Gaudinia contains four annual species
that sometimes behave like biennials; the species are diploids based on x = 7, but occasionally
possess a somatic number of x = 15. Larger-size annual genera seem to have been able to accom-
modate some degree of polyploidy. Examples of those are Sacciolepis (30) with x = 9 and 2n =
16, 18, 36 and 45, and Avena (25) with x = 7 and 2n = 14, 28, 42, 48 and 63. The presence of 5x
ploidy level and aneuploid numbers in species of these genera may suggest the presence of some
degree of apomictic reproduction, again a system that can accommodate and promote polyploidy
by producing seeds without sexual reproduction. In this case, the apparent large genus size may
be an artefact of taxonomic splitting caused by apomictic perpetuation of morphologically distinct
biotypes.
The second question to be addressed is why mixed genera display a pattern that does not fit
the HCD. The answer to this question may relate to the proportion of each life history type in any
given genus. If polyploid perennials have an accelerated rate of speciation and the converse is true
for the mostly diploid or low polyploid annuals, then the size of a genus with mixed annuals and
perennials would depend on the proportion of the two types of species. A mixed genus with
proportionally more perennial species would tend to be larger in size than that with higher proportion
of annuals. This scenario fits well the large mixed genera Sporobolus (160), Muhlenbergia (160),
Isachne (100) and Pennisetum (80), where both annual and perennial habits are present but the
annual species are rare. Polyploidy is extensive in all four genera19. In Sporobolus, chromosome
numbers are based on x = 9 and 10, and somatic numbers are 2n = 18, 24, 36, 38, 54, 72, 80, 88,
90, 198 and 124. Similarly, Pennisetum species display basic chromosome numbers of x = 9, and
somatic numbers are 2n = 14, 18, 22, 34, 35, 36, 45, 52 and 54.
The large genus Paspalum (330) is described as ‘usually’ perennial, indicating more annuals
than in the above discussed genera. However, extensive series of polyploidy based on x = 10 and
12 augmented by aneuploidy have been established in this genus. Digitaria contains 230 annual
and perennial species. Although it is not described as primarily perennial, it has evolved a wide
9579_C011.fm Page 175 Saturday, November 11, 2006 3:40 PM
range of polyploidy numbers based on x = 9, 15 and 17; such basic chromosome series appear to
have originated via aneuploidy, most likely coupled with hybridisation. Panicum (470) comprises
both annuals and perennials with no predominance of either habit, but diploidy (2n = 18) is
seemingly rare, while polyploidy is extensive and based on x = 7, 9 and 10, and 2n = 36, 37, 54
and 72. In these cases, it appears that species diversification is augmented by extensive patterns of
polyploidisation.
At the other end of the spectrum of mixed genera, namely small-sized ones, one would expect
either annuals to be found in higher proportion or polyploidy to be rare, or both. Amphicarpum (2)
is described as an annual and perennial genus; only diploid chromosome numbers have been
reported (x = 9, 2n = 18). Tricholaena contains four species, mostly perennial, that are all tetraploids
based on x = 9. Echinolaena is slightly larger with eight annual and perennial species, but only
tetraploidy is found (2n = 60). Only diploid genotypes of 2n = 20 have been reported for Chionachne
(7), although it is composed primarily of perennial species. Lack of polyploidy may account for
the small size of these genera. Loudetia (26) is rarely annual, but the larger size is associated with
a more elaborate polyploid series based on x = 6, 12, and 2n = 20, 24, 40 and 60. Moving up in
genus size, Echinochloa contains 35 annual and perennial species. Polyploidy has progressed
considerably in this genus where, based on x = 9, odd and even euploidy as well as aneuploidy
have been established (2n = 27, 36, 42, 48, 54, 72 and 108). Therefore, it appears that small genus
size is correlated with lack of polyploidy, and an increase in genus size tends to be associated with
progressive emergence of polyploid genotypes.
The third question is why, on average, genera with mixed annual and perennial species tend
to be more species rich. Part of the answer may be found in the combined accelerated rates of
speciation of the two life forms through different biological routes. Although perennials take
advantage of polyploidy, annuals are also well suited for speciation at the diploid level. In these
cases, a combination of rapid generation turnover inherent to annuals and extensive polyploidy
in perennials, possibly with apomictic mode of reproduction superimposed, could play a role in
determining genus size. This question, however, remains intriguing and in need of further
investigation.
11.5 CONCLUSION
It is evident from this survey that small genera either have more annual species or less polyploidy,
or a combination of both. With increase in genus size, the proportion of perennial species and the
incidence of polyploidy increases. This phenomenon could have contributed to the punctuated genus
size pattern observed for the mixed genera. It also may explain the correlation between annuals
and small genus size and perennials and explosive speciation. Although perenniality combined with
polyploidy sets the stage for increased diversity at the species level in Poaceae, other factors such
as breeding systems (outbreeding, inbreeding, cleistogamy and apomixis) impose yet additional
parameters that could influence speciation. All these factors together should be considered to act
in promoting adaptive radiation to exploit a diverse variety of habitats. As such, ecological param-
eters represent major factors that work in accordance with the genetic factors when degree and
patterns of speciation are to be explained. Thus, the nonrandom distribution of species appears to
have contributed to the skewed distribution favouring monotypic and small genera and towards that
skewed distribution of dominance in terms of species richness. Biological factors, such as habit
and polyploidy, appear to have had some impact on these patterns in the grass family. These
biological factors may be group specific and should not be overly generalised.
ACKNOWLEDGEMENTS
I thank Scott Parker for discussion and assistance with statistical tests and Kieran Hilu for valuable
comments on a draft of the manuscript.
9579_C011.fm Page 176 Saturday, November 11, 2006 3:40 PM
REFERENCES
1. Willis, J.C., Age and Area, Cambridge University Press, Cambridge, England, 1922.
2. Willis, J.C. and Yule, G.U., Some statistics of evolution and geographic distribution in plants and
animals and their significance, Nature, 109, 177, 1922.
3. Williams, C.B., Patterns in the Balance of Nature and Related Problems in Quantitative Ecology
Academic Press, New York, 1964.
4. Anderson, S., Patterns of faunal evolution, Rev. Biol. 49, 1, 1974.
5. Dial, K.P. and Marzluff, J.M., Nonrandom diversification within taxonomic assemblages, Syst. Zool.,
38, 26, 1989.
6. Cardillo, M., Huxtable, J.S., and Bromham, L., Geographic range size, life history and diversification
of Australian mammals, J. Evolution. Biol., 16, 282, 2003.
7. Scotland, R.W. and Sanderson, M.J., The significance of few versus many in the tree of life, Science,
303, 643, 2004.
8. Clayton, W.D., Evolution and distribution of grasses, Ann. Missouri Bot. Gard., 68, 5, 1981.
9. Hilu, K.H. and Soderstrom, T.R., Biological basis of adaptation in grasses: an introduction, Ann.
Missouri Bot. Gard., 72, 823, 1985.
10. Redman, R.E., Adaptation of grasses to water stress-leaf rolling and stomata distribution, Ann. Missouri
Bot. Gard., 72, 833, 1985.
11. Clayton, W.D. and Renvoize, S.A., Genera Graminum, HMSO Publications, London, 1986.
12. Connor, H.E., Evolution of reproductive systems in Gramineae, Ann. Missouri Bot. Gard., 72, 48, 1981.
13. de Wet, J.M.J., Hybridization and Polyploidy in the Poaceae, Smithsonian Institution Press,
Washington, DC, 1987.
14. Hunziker, J.H. and Stebbins, G.L., Chromosomal Evolution in the Gramineae, Smithsonian Institution
Press, Washington, DC, 1987.
15. Hilu, K.H., Phylogenetics and chromosomal evolution in the Poaceae (grasses), Aust. J. Bot., 52, 13,
2005.
16. Bennett, M.D. and Leitch, I.J., Angiosperm DNA C-value Database, http://www.rbgkew.org.uk/
cval/homepage/html, 2001.
17. GPWG, Phylogeny and subfamilial classification of the grasses (Poaceae), Ann. Missouri Bot. Gard.,
88, 373, 2001.
18. Hilu, K.W., Alice, L.A., and Liang, H., Phylogeny of Poaceae inferred from matK sequences, Ann.
Missouri Bot. Gard., 86, 835, 1999.
19. Watson, L. and Dallwitz, M.J., The Grass Genera of the World, CAB International, Wallingford,
Australia, 1992.
20. JMP Version 3, SAS Institute Inc., Cary, NC, 1996.
21. Harper, J.L., Population Biology of Plants, Academic Press, London, 1977.
22. Marzluff, J.M. and Dial, K.P., Life-history correlates of taxonomic diversity, Ecology, 72, 428, 1991.
23. Rosenheim, J.A. and Tabashnik, B.E., Generation time and evolution, Nature, 365, 791, 1993.
24. Stebbins, G.L., Polyploidy, hybridization and the invasion of new habitats, Ann. Missouri Bot. Gard.,
72, 824, 1985.
25. Hilu, K.W., Identification of the ‘A’ genome of finger millet using chloroplast DNA, Genetics, 118,
163, 1988.
26. Pohl, R.W. and Clark, L.G., New chromosome counts for Chusquea and Aulonemia (Poaceae:
Bambusoideae), Am. J. Bot., 79, 478, 1992.
9579_C012.fm Page 177 Wednesday, November 15, 2006 12:14 PM
12 Reconstructing Animal
Phylogeny in the Light of
Evolutionary Developmental
Biology
A. Minelli
Department of Biology, University of Padova, Italy
E. Negrisolo
Department of Public Health, Comparative Pathology and Veterinary
Hygiene, University of Padova, Italy
G. Fusco
Department of Biology, University of Padova, Italy
CONTENTS
ABSTRACT
The relevance of evolutionary developmental biology (evo-devo) to our effort to reconstruct the tree
of life has until recently been very poorly explored. However, the contribution of an evo-devo approach
to the main steps of phylogenetic analysis, such as evaluation of homology, selection of characters
and assessment of character polarity can be critically important, especially in species rich groups.
177
9579_C012.fm Page 178 Wednesday, November 15, 2006 12:14 PM
As independence of traits is a prerequisite for the use of coded information in the reconstruction of
phylogeny, the identification of developmentally independent units is one of the areas where evo-devo
may offer an especially important contribution. The way in which characters originate and change in
evolution has fundamental consequences on the patterns of evolutionary change we can reconstruct
from character distribution. The remoulding of pre-existing features, genetic networks or develop-
mental trajectories, can operate at any level of biological organisation. Comparative developmental
biology supports a view that homology cannot be a relationship of the all-or-nothing kind.
of control cascades and spatiotemporal patterns of expression. In this way, as we will show through
some examples, we are helped in developing comparisons among very divergent and thus hardly
comparable body plans, without the need to totally ignore morphology and rely on molecular evidence
only. This is critically important at high taxonomic levels, where the heuristic power of comparative
morphology has been exploited since Cuvier’s pioneering articulation of the animal kingdom into
four ‘embranchements’ (the vertebrates, the articulates, the molluscs and the radiates)6, in setting up
a classification where species are ultimately grouped into phyla, that is, higher rank taxa separated
by such deep differences in overall organization, as to cause major problems of comparison at
morphological level7. On the other hand, molecular phylogenetics has provided tools for assessing
phylogenetic relationships among remotely related taxa, thus overcoming the limitation of morphol-
ogy, and often suggesting unconventional affinities, such as between arthropods and nematodes (in a
group named Ecdysozoa8) or between hermit crabs and king crabs9, affinities that are less than obvious
at the level of morphology. However, this distance between morphological and molecular evidence
may be shortened if we get comparative information about the identity, sequence and expression of
genes which are more or less specifically involved in the generation of animal (or plant) form.
Indeed, this approach has been the source of unexpected discoveries, such as the fundamental
identity of the genetic control of dorsoventral (DV) patterning in animals that are as different from
each other as arthropods and vertebrates. Zoologists are familiar with a fundamental difference
between them, as representatives of gastroneuralians (animals with ventral nervous cord) and
notoneuralians (animals with dorsal nervous cord) respectively. In his idiosyncratic belief in a unity
of plan of all animals, Geoffroy Saint-Hilaire proposed that arthropods might be equated to
vertebrates, provided that we regard the dorsal aspect of the former as equivalent to the ventral
aspect of the latter, and vice versa. At the time no factual argument could be advanced to support
such an unconventional thesis, and the comparison was universally disregarded as a fanciful
speculative exercise. However, comparative developmental genetics was eventually to rescue his
idea from oblivion, by demonstrating that in Drosophila DV patterning is controlled by two genes
(short gastrulation and decapentaplegic) that are homologous to two genes (chordin and bone
morphogenetic protein-4) which perform the same job in vertebrates, but at opposite DV sides10,11.
In the meantime, a group of genes involved in the AP patterning of the main body axis of all
bilaterian metazoans had been discovered12, thus nourishing hopes that the main traits of the hypothetical
ancestor of all bilaterians, the so called Urbilateria, could eventually be inferred from comparative
developmental genetic evidence. The list of traits that have been progressively added to this idealised
ancestor include AP polarity and DV patterning13, heart13, cephalisation14, brain and brain areas15, a
primitive photoreceptor16, a skeleton17, a ‘humble appendage or antenna-like outgrowth’13, hemocoel18
and even segmentation19. Unfortunately, no element in this reconstruction has been evaluated using the
tools of phylogenetic systematics. Indeed, doubts have often been raised as to the uniquely derived
nature of most of these traits20,21. But the main message we want to bring forward here is not about
the cavalier methods by which these phylogenetic inferences have been produced, but about the
increasing availability of comparative data that can be applied to the evaluation of phylogenetic
relationships among higher groups. These data are not limited to pure sequence information about
genes and their products, but open vistas into an understanding of the origin and evolution of form22–24.
The crudest level at which we can exploit comparative developmental genetics in reconstructing phy-
logeny is by focusing molecular phylogenetics on sequence data of selected classes of genes which play
a key role in establishing the main features of animal body plans during early embryonic development.
Let us consider, for example, those classes of molecules, such as cell adhesion molecules with
intracellular signal transduction pathways and gradient-forming morphogens or growth factors25,
that probably played a critical role in the very origin of metazoans from their unicellular ancestors.
9579_C012.fm Page 180 Wednesday, November 15, 2006 12:14 PM
These molecules are today inextricably involved in building and patterning the supracellular archi-
tecture of the animal body. But homologues of the corresponding genes were probably also present
along the stem lineage of living metazoans; their occurrence is likely in the living representatives
of the metazoans’ sister group, even if the function of these genes is different from the function
that evolved in the metazoan branch of the phylogenetic tree. This is indeed a pathway of inquiry
where a detailed knowledge of developmental genetics suggests a class of genes on which to focus,
as a source of information for assessing a charismatic node of the tree of life, namely, the splitting
of metazoans from their sister group. Indeed, King and colleagues26 have found that choanoflagel-
lates, the most likely candidates to be the closest relatives of metazoans among the living unicells,
express representatives of many cell signalling and adhesion protein families. These include cadherins,
C-type lectins, tyrosine kinases and components of the tyrosine kinase signalling pathway.
Other phylogenetic studies focusing on developmentally important genes have used the Hox
family. The reconstructed duplication event by which a protoHox gene cluster would have given
rise to the Hox genes sensu stricto and to the paralogous set of the ParaHox genes has been
suggested as relevant to the Cambrian explosion of animal body plans27. Another major event in
animal evolution, the origin of the vertebrate lineage, is often thought to have been accompanied
by a quadruplication of the Hox gene cluster28. Other phylogenetic analyses of Hox genes give
support to the hypothesis that insects and crustaceans form a clade to the exclusion of myriapods29
and demonstrate the lophotrochozoan affinities of bryozoans30.
The role of these genes in controlling basic features of the animal body has probably been
overemphasised. Indeed, criticisms of the mainstream gene-centred view of development are
increasingly frequent20,31–34, but in addition to these theoretical perspectives there is also experi-
mental evidence pointing in the same direction. For example, the deletion of an entire vertebrate
Hox cluster may have little effect on the development of the animal’s main body axis35–37, much
less indeed than the deletion of any individual Hox gene in the same model animal. This asks for
a serious rethink of our understanding of the role of these genes in patterning the animal body.
At any rate, comparing sequences of developmentally important genes does not represent a
major improvement in respect to current practice in molecular phylogenetics. We argue instead that
much more substantial progress is obtained if the basic steps in phylogenetic reconstruction are
approached from an explicit evo-devo perspective.
independence. The latter is often easier to ascertain, in so far as we are able to single out individual
features or structural complexes performing largely distinct functions, thus behaving as minimally
overlapping units from the point of view of the selective pressure acting upon the organism. Much
easier, but not less important, is the identification of developmentally independent units, due
especially to the pervasive pleiotropic effects of the underlying genetic networks. This is certainly
one of the areas where evo-devo may offer an especially important contribution.
Evo-devo explicitly addresses the generative mechanisms underlying the evolution of organis-
mal form. An extreme reductionist view of the evolutionary process would argue that at the basis
of evolutionary change there is nothing more than a change in the underlying regulatory networks
of developmental genes45. Alternative views of the evolution of organismal form would maintain
that generative mechanisms are not restricted to the genetic circuitry involved in individual devel-
opment. These mechanisms also arise from the physical properties of biological materials, the self
organisational capacities of cells and tissues, and the dynamics of epigenetic interactions among
developmental modules20,46,47. However, although evo-devo does not coincide with developmental
genetics, it must be said that our current understanding of development and the mechanisms of its
evolutionary change is much more advanced at the level of the genes.
Due to their pervasive occurrence, the ‘nongenetic’ components of the developmental processes
(physicochemical properties of living matter and epigenetic interactions) are likely to be relevant
in the evaluation of character independence. However, even adopting a narrow view of evo-devo,
limiting its scope to developmental genetics leaves a lot to say about the assessment of homology,
character independence and character polarity.
of gene expression, both at the transcriptional and post-transcriptional levels. These ‘alternative
regulative levels’, as they call them, have many of the features, like flexibility, modularity and a
combinatorial nature, which make enhancers critical contributors of genetic source materials to the
evolution of development. They are sites of ‘initial’ genetic change, that may be either strengthened
or replaced by subsequent ‘secondary’ genetic changes at other regulatory levels, including the
level of the enhancers.
Evolutionary changes at the level of the genome’s regulatory elements make it possible that
different parts of the same animal are built by exploiting the same gene network or the same
developmental module. ‘Tinkering’, ‘multi-functionality’, ‘redundancy’ and ‘modularity’ are com-
mon at the roots of phenotypic variation, but their impact on phenotypic evolution is far from being
generally acknowledged. Character independence cannot be inferred from comparative anatomy or
descriptive embryology alone and is just one working hypothesis among others in phylogenetic
reconstruction. By expanding on the scope of current methods for estimating robustness of phylo-
genetic trees, it would probably be profitable to develop methods able to cope with an unknown
level of character covariation.
cascade of gene activity, leading from the early gap genes to the later expressed pair-rule and
segment-polarity genes. These genes encode proteins that are eventually localised in the embryo
according to segmental periodicity. In other arthropods, segments originate sequentially in an AP
progression from a subterminal region. Simultaneous and sequential segmentation can both occur
within the same animal. In many insects with embryo intermediate between short and long germ-
band type, the most anterior segments originate synchronously, whereas the remaining segments
are sequentially specified from a posterior sub-terminal zone59. At least for a significant posterior
portion of the main body axis, sequential segmentation is generally considered the primitive
condition in arthropods, and mechanisms for the evolutionary change from sequential to simul-
taneous segmentation have been proposed. These are based on a gradual cellular-to-syncytial
transition in the blastoderm where the same segment-forming gene network operates60, or on a
progressive increase (from the anterior) of the number of segmental units falling under the control
of gap genes61.
In this case history of segmentation processes, the downstream developmental processes are
conserved, whereas the earliest phase of the segmentation process has changed. This is just one of
the many examples that comparative developmental biology can offer in support of a view that
homology cannot be a relationship of the all-or-nothing kind. Because evolutionary change is a
continuous process, based on the remoulding of pre-existing features, along with the underlying
genetic networks that control their development, homology can only be partial20,62–66. The view of
a character remaining the same (homologue) throughout a number of possible states, defining as
many steps in an evolutionary sequence that can be linearly polarised and coded to fill in a
phylogenetic data matrix, probably rests on a misrepresentation of how organisms evolve.
the present article’s authors has remarked elsewhere20, this still means reducing organ homology
to gene homology, something conceptually and methodologically equivalent to reducing species
phylogeny to gene phylogeny. As the latter reduction is conceptually unwarranted and must be
rejected70,71, so process homology is not to be reduced to the shared involvement of homologous
genes in two developmental sequences, but must be firmly rooted in the shared origin of the
developmental pattern itself. This opens several interesting questions, two of which will be dealt
with here.
The first question is whether we can formulate hypotheses of homology between developmental
stages, rather than between specific developmental events. We think we can, but very cautiously.
Nobody will contend that to enter in a data matrix a character ‘larva’ with states such as ‘tro-
chophora’, ‘caterpillar’ and ‘tadpole’ would be other than plain nonsense, but what about suggesting
the grasshopper prelarva as homologous to the larva of beetles or flies? Evidence in favour is
admittedly tenuous72, but the hypothesis cannot be discounted hastily73,74, and only a thorough
exploration of the developmental sequences both upstream of these putatively homologous stages
and during the same will hopefully clarify the issue.
Problems with the comparison of developmental stages are sometimes even more subtle. When
comparing stages of two closely related insects, with a similar postembryonic developmental
schedule but with different number of instars, we may consider whether there is necessarily
homology between equally numbered stages of the two species. That is, we may ask whether the
fifth and last larval stage of butterfly species A is homologous to the fifth but penultimate larval
stage of butterfly species B. In our understanding, there is no universally valid answer to this
question, but as a basic rule, we believe that individual instars in a developmentally ‘smooth’
sequence (that is, one along which moults are only punctuations of the animal’s basically continuous
growth) cannot be individually treated as homologues. The only meaningful comparison, in the
example, would be one between the character state ‘larval development through four instars’ and
‘larval development through five instars’, but excluding a direct stage-to-stage comparison between
the two species.
The second question is whether we can still rely on the traditional all-or-nothing notion of
homology. Our answer is firmly ‘no’. All developmental sequences investigated in some detail, and
especially those for which a detailed analysis has been performed in terms of genetic control of
developmental events, have shown that characters, morphological and developmental alike, are not
produced by unique and perfectly well integrated complexes, or networks, of genes. Locally acting
dynamics allow the recognition of more or less individualised developmental modules75–78, but
overlaps and cross links are such as to oppose a simply hierarchical dissection of development and,
hence, a strictly hierarchical view of homology. A combinatorial approach to homology has been
suggested as a viable alternative63.
12.4.1 SEGMENTATION
One of the grand traits of animal organisation on which evolutionary developmental biology offers
a renewed perspective is segmentation, a key trait in the taxonomy of some species rich groups,
such as the mecistocephalid centipedes39. Body segmentation has long been regarded as a character
useful in recognising affinities at very high taxonomic levels, for example, as an argument through
which zoologists have supported, until recently, Cuvier’s6 pre-evolutionary concept of a taxon
Articulata that should include the two major groups of segmented invertebrates, annelids and
arthropods. Modern insights into segmentation mechanisms have cast increasing doubt as to the
equivalence of segmentation mechanisms in the two groups79 and the origin of segmentation is now
generally regarded as either very deep in animal phylogeny (via a segmented Urbilateria13,19,80,81)
or, as we prefer to believe, as having evolved in annelids and arthropods convergently82. Adopting
a broader evo-devo perspective helps with the interpretation of segmented features of animal body
architecture in a much more articulated way83,84. In this way, we realise that phylogenetically closely
related species may differ in their segmentation mechanism to a considerable extent, whereas
9579_C012.fm Page 185 Wednesday, November 15, 2006 12:14 PM
unexpected similarities in these mechanisms may occur between much more distant lineages.
Comparative evidence suggests that the concept of segmentation applies to organs rather than to
whole organisms, overall segmentation resulting when independently segmented structures even-
tually develop to share period and phase of their repetitive patterns. One may even argue that
segmentation is a ‘generic’ property of bilaterians85, that is, that it depends on basic physicochemical
properties of living matter more than on the specific expression patterns of a restricted set of genes.
In this perspective, it seems that the traditional distinction between the ‘true’ segmentation of
annelids, arthropods and vertebrates, and the ‘pseudosegmentation’ of animals such as tapeworms
and kinorhynchs should be abandoned. This is obviously important, given the use we may want to
make of segmentation in phylogenetic reconstruction.
in one step39, such as from 41 to 45 to 49. These evo-devo perspectives on the topology of the
ontogenetically accessible morphospace are clearly useful in defining character states.
dependence due to the fact that scoring of event pairs is not independent. This double nonindependence
may lead to highly inconsistent results, or even absurd results, from a logical perspective.
To circumvent this intrinsic flaw in the event pair approach, Schulmeister and Wheeler92
developed a new method in which every developmental sequence is considered as a single multistate
character. A search based optimisation is used to investigate changes within developmental
sequences, and step matrices are used to account for changes within each sequence. The new method
does not suffer from the nonindependence that characterises the event pair approach and thus
appears to be a promising strategy for the use of developmental data in phylogenetic reconstruction
when data are affected by heterochrony.
12.5 CONCLUSION
In the context of the current dialogue between evolutionary and developmental biology, the value
of reliable phylogenetic reconstructions in comparative evaluation of developmental processes has
been adequately demonstrated. In contrast, the relevance of evolutionary developmental biology in
our effort to reconstruct the tree of life has been very poorly explored until recently. However, as
shown in this article, the contribution of an evo-devo approach to the main steps of phylogenetic
analysis, such as evaluation of homology, selection of characters and assessment of character
polarity, can be critically important.
ACKNOWLEDGEMENTS
We thank Ronald Jenner for insightful comments on a previous version of this work. The idiosyn-
cratic views we present here are definitely ours. Alessandro Minelli was supported by a grant of
the Italian Ministry of Education, University and Research.
REFERENCES
1. Geoffroy Saint-Hilaire, E., Philosophie anatomique, J.B. Ballière, Paris, 2 vols, 1818–1822.
2. von Baer, K.E., Über Entwicklungsgeschichte der Thiere: Beobachtung und Reflexion, 1, Bornträger,
Königsberg, 1828.
3. de Beer, G.R., Embryos and Ancestors, 3rd ed., Clarendon Press, Oxford, 1958.
4. Hall, B.K., Evolutionary Developmental Biology, 2nd ed., Chapman and Hall, London, 1998.
5. Arthur, W., The emerging conceptual framework of evolutionary developmental biology, Nature, 415,
757, 2002.
6. Cuvier, G., Sur un nouveau rapprochement à établir entre les classes qui composent le règne animal,
Ann. Mus. Natn. Hist. Nat. Paris, 19, 73, 1812.
7. Minelli, A., Biological Systematics: The State of the Art, Chapman and Hall, London, 1993.
8. Aguinaldo, A.M.A. et al., Evidence for a clade of nematodes, arthropods and other moulting animals,
Nature, 387, 489, 1997.
9. Cunningham, C.W., Blackstone, N.W., and Buss, L.W., Evolution of king crabs from hermit crab
ancestors, Nature, 355, 539, 1992.
10. Arendt, D. and Nübler-Jung, K., Inversion of dorsoventral axis? Nature, 371, 26, 1994.
11. De Robertis, E.M. and Sasai, Y., A common plan for dorsoventral patterning in Bilateria, Nature, 380,
37, 1996.
12. Slack, J.M.W., Holland, P.W.H., and Graham, C.F., The zootype and the phylotypic stage, Nature,
361, 490, 1993.
13. De Robertis, E.M., The ancestry of segmentation, Nature, 387, 25, 1997.
14. Finkelstein, R. and Boncinelli, E., From fly head to mammalian forebrain: the story of otd and Otx,
Trends Genet., 10, 310, 1994.
15. Arendt, D. and Nübler-Jung, K., Common ground plans in early brain development in mice and flies,
BioEssays, 18, 255, 1996.
9579_C012.fm Page 188 Wednesday, November 15, 2006 12:14 PM
16. Bolker, J. and Raff, R.A., Developmental genetics and traditional homology, BioEssays, 18, 489, 1996.
17. Jacobs, D.K. et al., Molluscan engrailed expression, serial organization, and shell evolution, Evol.
Dev., 2, 340, 2000.
18. Valentine, J.W., Erwin, D.H., and Jablonski, D., Developmental evolution of metazoan bodyplans: the
fossil evidence, Dev. Biol., 173, 373, 1996.
19. Kimmel, C.B., Was Urbilateria segmented? Trends Genet., 12, 329, 1996.
20. Minelli, A., The Development of Animal Form: Ontogeny, Morphology, and Evolution, Cambridge
University Press, Cambridge, 2003.
21. Jenner, R., Towards a phylogeny of the Metazoa: evaluating alternative phylogenetic positions of
Platyhelminthes, Nemertea, and Gnathostomulida, with a critical reappraisal of cladistic characters,
Contrib. Zool., 73, 3, 2004.
22. Raff, R.A., The Shape of Life: Genes, Development, and the Evolution of Animal Form, The University
of Chicago Press, Chicago-London, 1996.
23. Arthur, W., The Origin of Animal Body Plans. A Study in Evolutionary Developmental Biology,
Cambridge University Press, Cambridge, 1997.
24. Valentine, J.W., On the Origin of Phyla, The University of Chicago Press, Chicago-London, 2004.
25. Müller, W.E.G., How was metazoan threshold crossed? The hypothetical Urmetazoa, Comp. Biochem.
Physiol. A, 129, 433, 2001.
26. King, N., Hittinger, C.T., and Carroll, S.B., Evolution of key cell signaling and adhesion protein
families predates animal origins, Science, 301, 361, 2003.
27. Brooke, N.M., García-Fernàndez, J., and Holland, P.W.H., The ParaHox gene cluster is an evolutionary
sister of the Hox gene cluster, Nature, 392, 920, 1998.
28. Bailey, W.J. et al., Phylogenetic reconstruction of vertebrate Hox cluster duplication, Mol. Biol. Evol.,
14, 843, 1997.
29. Cook, C.E. et al., Hox genes and the phylogeny of the arthropods, Curr. Biol., 11, 759, 2001.
30. Passamaneck, Y.J. and Halanych, K.M., Evidence from Hox genes that bryozoans are lophotrochozoans,
Evol. Dev., 6, 275, 2004.
31. Nijhout, H.F., Metaphors and the role of genes in development, BioEssays, 12, 441, 1990.
32. Keller, E.F., The Century of the Gene, Harvard University Press, Cambridge, Mass., 2000.
33. Oyama, S., The Ontogeny of Information, 2nd ed., Duke University Press, Durham, N.C., 2000.
34. Müller, G.B. and Newman, S.A., Origination of Organismal Form: Beyond the Gene in Developmental
and Evolutionary Biology, MIT Press, Cambridge, MA, and London, 2003.
35. Suemori, H. and Noguchi, S., Hox C cluster genes are dispensable for overall body plan of mouse
embryonic development, Dev. Biol., 220, 333, 2000.
36. Medina-Martinez, O., Bradley, A., and Ramirez-Solis, R., A large targeted deletion of Hoxb1-Hoxb9
produces a series of single-segment anterior homeotic transformation, Dev. Biol., 222, 71, 2000.
37. Spitz, F. et al., Large scale transgenic and cluster deletion analysis of the HoxD complex separate an
ancestral regulatory module from evolutionary innovations, Genes Dev., 15, 2209, 2001.
38. Hall, B.K., Evo-devo or devo-evo: does it matter? Evol. Dev., 2, 177, 2000.
39. Bonato, L., Foddai, D., and Minelli, A., A cladistic analysis of mecistocephalid centipedes reveals an
evolutionary trend in segment number (Chilopoda: Geophilomorpha: Mecistocephalidae), Syst.
Entomol., 28, 539, 2003.
40. Smith, M.M., Vertebrate dentitions at the origin of jaws: when and how pattern evolved, Evol. Dev.,
5, 394, 2003.
41. Lindberg, D.R. and Guralnik, R.P., Phyletic patterns of early development in gastropod molluscs,
Evol. Dev., 5, 494, 2003.
42. Maslakova, S.A., Martindale, M.Q., and Norenburg, J.L., Vestigial prototroch in a basal nemertean,
Carinoma tremaphoros (Nemertea; Palaeonemertea), Evol. Dev., 6, 219, 2004.
43. Cheverud, J.M., The genetic architecture of pleiotropic relations and differential epistasis, in The
Character Concept in Evolutionary Biology, Wagner, G.P., Ed., Academic Press, San Diego, 2001, 411.
44. Klingenberg, C.P. et al., Inferring developmental modularity from morphological integration: analysis
of individual variation and asymmetry in bumblebee wings, Am. Nat., 157, 11, 2001.
45. Davidson, E.H., Genomic Regulatory Systems: Development and Evolution, Academic Press, San
Diego, 2001.
46. Newman, S.A., Generic physical mechanisms of tissue morphogenesis: a common basis for develop-
ment and evolution, J. Evol. Biol., 7, 467, 1994.
9579_C012.fm Page 189 Wednesday, November 15, 2006 12:14 PM
47. Müller, G.B., Six memos for EvoDevo, in From Embryology to EvoDevo: A History of Embryology
in the 20th Century, Laubichler, M.D., and Maienschein, J., Eds., MIT Press, Cambridge, MA, and
London, in press.
48. Pennisi, E., Searching for the genome second code, Science, 306, 632, 2004.
49. Wang, W.C.H. et al., Comparative cis-regulatory analyses identify elements of the mouse Hoxc8 early
enhancer, J. Exp. Zool. (Mol. Dev. Evol.), 302B, 436, 2004.
50. Richardson, M.K. et al., Somite number and vertebrate evolution, Development, 125, 151, 1998.
51. Alonso, C.R. and Wilkins, A.S., Developmental evolution: are enhancers the primary source of
novelty? Nat. Rev. Genet., 6, 709, 2005.
52. Minelli, A., Limbs and tail as evolutionarily diverging duplicates of the main body axis, Evol. Dev.,
2, 157, 2000.
53. Minelli, A. and Fusco, G., Conserved vs. innovative features in animal body organization, J. Exp.
Zool. (Mol. Dev. Evol.), 304B, 520, 2005.
54. Dong, P.D.S., Chu, J., and Panganiban, G., Proximodistal domain specification and interactions in
developing Drosophila appendages, Development, 128, 2365, 2001.
55. Casares, F. and Mann, R.S., Control of antennal versus leg development in Drosophila, Nature, 392,
723, 1998.
56. Casares, F. and Mann, R.S., The ground state of the ventral appendage in Drosophila, Science, 293,
1477, 2001.
57. Minelli, A., The origin and evolution of the appendages, Int. J. Dev. Biol., 47, 573, 2003.
58. Schlosser, G. and Wagner, G.P., Introduction: the modularity concept in developmental and
evolutionary biology, in Modularity in Development and Evolution, Schlosser, G. and Wagner, G.P.,
Eds., University of Chicago Press, Chicago-London, 2004, 1.
59. Davis, G.K. and Patel, N.H., Short, long and beyond: molecular and embryological approaches to
insect segmentation, Annu. Rev. Entomol., 47, 669, 2002.
60. Salazar-Ciudad, I., Solé, R.V., and Newman, S.A., Phenotypic and dynamical transitions in model
genetic networks II. Application to the evolution of segmentation mechanisms, Evol. Dev., 3, 95, 2001.
61. Peel, A., The evolution of arthropod segmentation mechanisms, BioEssays, 26, 1108, 2004.
62. Shubin, N. and Wake, D., Phylogeny, variation, and morphological integration, Am. Zool., 36, 51, 1996.
63. Minelli A., Molecules, developmental modules and phenotypes: a combinatorial approach to homology,
Mol. Phylogenet. Evol., 9, 340, 1998.
64. Abouheif, E., Establishing homology criteria for regulatory gene networks: prospects and challenges,
in Homology, Bock, G.R. and Cardew, G., Eds., Wiley, Chichester, 1999, 207.
65. Wake, D.B., Homoplasy, homology and the problem of ‘sameness’ in biology, in Homology, Bock,
G.R. and Cardew, G., Eds., Wiley, Chichester, 1999, 24.
66. Pigliucci, M., Characters and environments, in The Character Concept in Evolutionary Biology,
Wagner, G.P., Ed., Academic Press, San Diego, 2001, 363.
67. Nelson, G., Homology and systematics, in Homology: The Hierarchical Basis of Comparative
Biology, Hall, B.K., Ed., Academic Press, San Diego, 1994, 17.
68. Roth, V.L., Character replication, in The Character Concept in Evolutionary Biology, Wagner, G.P.,
Ed., Academic Press, San Diego, 2001, 81.
69. Gilbert, S.F. and Bolker, J.A., Homologies of process and modular elements of embryonic construction,
J. Exp. Zool. (Mol. Dev. Evol.), 291B, 1, 2001.
70. Maddison, W.P., Gene trees in species trees, Syst. Biol., 46, 523, 1997.
71. Nichols, R., Gene trees and species trees are not the same, Trends Ecol. Evol., 16, 358, 2001.
72. Heming, B.S., Insect Development and Evolution, Comstock Publishing Associates, Ithaca, New York,
2003.
73. Berlese, A., Intorno alle metamorfosi degli insetti, Redia, 9, 121, 1913.
74. Truman, J.W., and Riddiford, L.M., The origins of insect metamorphosis, Nature, 401, 447, 1999.
75. Wagner, G.P., The biological homology concept. Annu. Rev. Ecol. Syst., 20, 51, 1989.
76. Schlosser, G., The role of modules in development and evolution, in Modularity in Development and
Evolution, Schlosser, G. and Wagner, G.P., Eds., The University of Chicago Press, Chicago-London,
2004, 519.
77. Nelson, C., Selector genes and the genetic control of developmental genes, in Modularity in Development
and Evolution, Schlosser, G. and Wagner, G.P., Eds., The University of Chicago Press, Chicago-
London, 2004, 17.
9579_C012.fm Page 190 Wednesday, November 15, 2006 12:14 PM
78. Cheverud, J.M., Modular pleiotropic effects of quantitative trait loci on morphological traits, in
Modularity in Development and Evolution, Schlosser, G. and Wagner, G.P., Eds., The University of
Chicago Press, Chicago-London, 2004, 132.
79. Minelli, A. and Bortoletto, S., Myriapod metamerism and arthropod segmentation, Biol. J. Linn. Soc.,
33, 323, 1988.
80. Holland, L.Z. et al., Sequence and embryonic expression of the amphioxus engrailed gene (AmphiEn):
the metameric pattern of transcription resembles that of its segment-polarity homolog in Drosophila,
Development, 124, 1723, 1997.
81. Christ. B. et al., Segmentation of the vertebrate body, Anat. Embryol., 197, 1, 1998.
82. Arthur, W., Jowett, T., and Panchen, A., Segments, limbs, homology, and co-option, Evol. Dev., 1, 74,
1999.
83. Budd, G.E., Why are arthropods segmented? Evol. Dev., 3, 332, 2001.
84. Minelli, A. and Fusco, G., Evo-devo perspectives on segmentation: model organisms, and beyond,
Trends Ecol. Evol., 19, 423, 2004.
85. Newman, S.A., Is segmentation generic? BioEssays, 15, 277, 1993.
86. Panganiban, G., Nagy, L., and Carroll, S.B., The role of the Distall-less gene in the development and
evolution of insect limbs, Curr. Biol., 4, 671, 1994.
87. Schileyko, A.A., Redescription of Scolopendropsis bahiensis (Brandt, 1841), the relations between
Scolopendropsis and Rhoda, and notes on some characters used in scolopendromorph taxonomy
(Chilopoda: Scolopendromorpha). Arthropoda Selecta, in press.
88. Arthur, W. and Farrow, M., The pattern of variation in centipede segment number as an example of
developmental constraint in evolution, J. Theor. Biol., 200, 183, 1999.
89. Gould, S.J., Ontogeny and Phylogeny, The Belknap Press of Harvard University Press, Cambridge,
MA, 1977.
90. McNamara, K.J., A guide to the nomenclature of heterochrony, J. Paleontol., 60, 4, 1986.
91. Wiens, J.J., Bonnet, R.M., and Chippindale P.T., Ontogeny discombobulates phylogeny: paedomor-
phosis and higher-level salamander relationships, Syst. Biol., 54, 91, 2005.
92. Schulmeister, S. and Wheeler W., Comparative and phylogenetic analysis of developmental sequences,
Evol. Dev., 6, 50, 2004.
93. Bininda-Emonds, O.R.P. et al., From Haeckel to event-pairing: the evolution of developmental
sequences, Theor. Biosci., 121, 297, 2002.
94. Smith, K.K., Comparative patterns of craniofacial development in eutherian and metatherian mammals,
Evolution, 51, 1663, 1997.
95. Velhagen, W.A., Analyzing developmental sequences using sequence units, Syst. Biol., 46, 204, 1997.
96. Jeffery, J.E. et al., Analyzing developmental sequences within a phylogenetic framework, Syst. Biol.,
51, 478, 2002.
9579_S003.fm Page 191 Monday, October 23, 2006 4:20 PM
Section C
Taxonomy and Systematics of Species
Rich Groups (Case Studies)
9579_S003.fm Page 192 Monday, October 23, 2006 4:20 PM
9579_C013.fm Page 193 Wednesday, November 15, 2006 12:15 PM
M. A. Wall
Department of Entomology, San Diego Natural History Museum,
San Diego, California, USA
R. T. Schuh
American Museum of Natural History, Division of Invertebrate Zoology,
New York City, USA
CONTENTS
13.1 Introduction..........................................................................................................................194
13.2 Estimates and Drivers of Insect Diversity ..........................................................................194
13.2.1 Insect Diversity and Classification........................................................................194
13.2.2 Drivers of Diversity ...............................................................................................197
13.2.3 Estimates of Insect Species Richness ...................................................................199
13.3 Dealing with Diversity: From the Cottage to the Factory..................................................200
13.4 Plant Bug Diversity, Biology and Classification ................................................................201
13.5 Plant Bugs as a Cottage Industry........................................................................................202
13.6 Taxonomic, Collections and Classification Impediments...................................................203
13.7 Plant Bugs in the Twenty-First Century: Industrial Cyber-Taxonomy ..............................205
13.7.1 Plant Bug Planetary Biodiversity Inventory .........................................................205
13.7.2 Human Resources ..................................................................................................206
13.7.3 Specimen Resources and Field Work....................................................................206
13.7.4 Producing Descriptions..........................................................................................206
13.7.5 Technical Resources ..............................................................................................206
13.8 Conclusions..........................................................................................................................208
Acknowledgements ........................................................................................................................209
References ......................................................................................................................................209
193
9579_C013.fm Page 194 Wednesday, November 15, 2006 12:15 PM
ABSTRACT
Insects are the most diverse higher taxon of organisms, comprising more than half of all described
species. The rate and scale of species extinction and ecosystem degradation, the so called biodi-
versity crisis, demands an urgent response by the taxonomic community to comprehensively
document global organismal diversity. For ‘megadiverse families’ within insects, the establishment
of predictive classifications that are global in scope and the description of ‘all species’ are hampered
by the scale of the task. To answer this challenge, we support previous calls for industrialising the
taxonomic process, involving astronomy-like international collaboration, infrastructural investment,
capacity building and taking full advantage of information technology developments. We strongly
argue that this unitary approach can be implemented without compromising the hypothesis-driven
nature of taxonomic science. The plant bug family Miridae is presented as a case study of this
approach.
13.1 INTRODUCTION
If we could visualise a tree of life, insects would form the canopy, overshadowing the rest of life
(Figure 13.1; insects are the major component of Hexapoda). Nearly a million species of insects
make up the 1.7 million species of organisms so far described. Despite their omnipresence, insects
are not a recent explosive radiation, nor mere variations on a theme. Insects have a minimum history
of 400 million years, and most modern insect orders have been in existence for around 250 million
years1. Insects are the most dominant and diverse group of terrestrial metazoans by almost all
possible measures. Aside from submerged marine habitats, there are few ecological niches that
insects have not exploited. In terms of abundance and biomass, insects dominate most terrestrial
ecosystems. For instance, arthropods (primarily insects) reach extraordinary biomasses (23.6 kilograms
per hectare) and abundances (23.9 million individuals per hectare) in Borneo2. Termites alone can
reach abundances of up to 10,000 individuals per metre squared3 and biomasses of 100 g per metre
squared4. Insects are crucial to terrestrial ecosystem processes such as nutrient cycling5, seed
dispersal6 and pollination7.
By virtue of scale alone, no other group epitomises the challenges that species rich taxa
present to taxonomy and systematics quite like the insects do. How many of the 350,000
described beetles does one include when reconstructing the phylogeny of Holometabola? How
do we reduce duplication of effort so that we do not repeat historic levels of up to 80%8
synonymy within the Insecta? How do we describe the four to nine million undescribed insects
(Table 13.2) within a time frame that meets the demands of scientifically informing the
biodiversity crisis?
It is with the biodiversity crisis in mind that taxonomists and the taxonomic method are
increasingly faced with questions about relevance. In this chapter we outline the issues that face
entomologists in documenting this remarkable diversity of insects. We present the case study of
plant bugs (Insecta: Heteroptera: Miridae) as a model group for preserving the taxonomic method
but incorporating advances in technology and global cooperation as a means to expediting the
documentation process.
Hexa
poda
Crustacea
Myriapoda
Cheliceriformes
Tardigrades
Nemata
Annelida Kinorhyncha
Nemertea
Mollusca Ectoprocta
Onychophora
Phoronida
Entoprocta Chaetognatha
Brachiopoda
Platyhelminthes
Echinodermata
Rotifera Hemichordata
Gastrotricha
Chordata
Cnidaria
Ctenophora
Porifera
Choanoflagellata
Ascomycota
s
m
Zygomycota
er
Cycadophyta
sp
Basidiomycota
gio
Lycopodiophyta
An
Parabasilida
Granuloreticulosa
Diplomonadida Euglenida
Archaea Kinetoplastida
Eubacteria
100,000 species
FIGURE 13.1 Tree of life with terminal branches expanded to represent number of described species in each
taxon. Hexapods include insects, springtails, diplurans and proturans. (Tree structure is based on Pennisi64
and taxon species richness estimates are from Brusca and Brusca65, Margulis and Schwartz66 and DSMZ67.)
contentious, but has mostly come down to arguments about where the root of the phylogenetic
tree lies. Most contemporary analyses that include DNA sequence data suggest that insects have
arisen from within a paraphyletic Crustacea9–11, although some authors have suggested that
Hexapoda and Crustacea are mutually paraphyletic12,13. Within Insecta, interordinal relationships
(Figure 13.2) are in some ways poorly resolved, although a great deal of progress has been made
in the last quarter of a century. While some clades, such as Holometabola (Figure 13.2), have
been well supported since before the time of Hennig, other problematic taxa such as Plecoptera
have caused considerable instability in the deep level branching of insect phylogenetic reconstructions.
9579_C013.fm Page 196 Wednesday, November 15, 2006 12:15 PM
Plecoptera
Mantophasmatodea
Grylloblattodea
Orthoptera
Neoptera
Phasmatodea
Embiidina
Hemiptera
Thysanoptera
Psocoptera
Phthiraptera
Hymenoptera
Lepidoptera
Tricoptera
Mecoptera*
Siponaptera
Strepsiptera
Holometabola
Diptera
Coleoptera
Neuroptera
Megaloptera
Raphidioptera
FIGURE 13.2 Interordinal phylogenetic tree of Hexapoda. Geological time scale is indicated on the left side.
Wide branches indicate robust support in contemporary analyses.
The insect phylogenetic tree presented here (Figure 13.2) is a summary tree of recent work14–16
in the field.
Approximately 925,000 described species1 are represented in just 32 orders of insects, a
tractable higher classification when compared to the approximately 100 orders of vertebrates.
Yet, species diversity in the insects is dramatically uneven in distribution. The majority of
species are found in just five orders (Table 13.1). Four of these orders (Coleoptera, Diptera,
Hymenoptera and Lepidoptera ) belong to Holometabola, a clade characterised by development
via complete metamorphosis (Figure 13.2). Together with Hemiptera, these orders represent
close to 90% (c. 825,000 species) of the described insect diversity. This uneven distribution
of described species diversity extends to every taxonomic level. In fact, just 20 families
(Table 13.1) of insects contain a little over 45% of all described insect diversity. Most of these
hyperdiverse families are primarily herbivorous, such as Chyrsomelidae, Miridae and Noctuidae,
with some notable predaceous (for example, Staphylinidae) and parasitic (for example, Ichneu-
monidae and Tachinidae) exceptions.
9579_C013.fm Page 197 Wednesday, November 15, 2006 12:15 PM
TABLE 13.1
Described Species Diversity within the Hexapods
Order Family Species Order Species
Note: Described species diversity within the hexapods. All orders and the 20 largest
families are shown.
Source: Species richness estimates from Grimaldi and Engel1 unless otherwise indicated.
Lamiales
Gentiales
Ericales
Asterids
Asterales
Apiales
Sapindales No. of Cases
Rosales
Myrtales 1
Malvales
Malpighiales 2-11
Geraniales
12-21
Rosids
Fagales
Fabales
Curcubitales 22-32
Brassicales
Saxifragales 33-42
Santalales
Proteales 43-52
Other
Core Eudicots
Caryophyllales
Zingiberales >52
Poales
9579_C013.fm Page 198 Wednesday, November 15, 2006 12:15 PM
Dioscoreales
Arecales
Asparagales
Monocots Eudicots
Alismatales
Magnoliales
Lower
Laurales
Filicopsida
Angiosperms
Coniferopsida
Ninidae
Alydidae
Cymidae
Blissidae
Coreidae
Berytidae
Gymnosperms
Lygaeidae
Rhopalidae
Dinidoridae
Plataspidae
Artheneidae
Piesmatidae
Scutelleridae
Pentatomidae
Oxycarenidae
Pyrrhocoridae
Tessaratomidae
Tessaratomidae
Heterogastridae
Colobathristidae
Pachygronthidae
Cryptorhamphidae
Acanthosomatidae
Rhyparochromidae
FIGURE 13.3 Host plant affinities of families of land bugs belonging to the infraorder Pentatomomorpha (Heteroptera) with ordinal
taxa of land plants. Dashed line separates asterids and rosids from remainder of land plants.
Reconstructing the Tree of Life
9579_C013.fm Page 199 Wednesday, November 15, 2006 12:15 PM
TABLE 13.2
Estimates of Insect Species Diversity in the
Scientific Literature and the Year in Which They
Were Published
Estimated Number
of Species (in millions) Reference Year
30 Erwin26 1982
7–80 Stork28 1988
5 Gaston75 1991
1.8–2.6 Hodkinson and Casson33 1991
12.5 Hammond76 1992
5 Ødegaard et al.27 2000
2.0–3.4 Dolphin and Quicke77 2001
4–6 Novotny et al.29 2002
10 Ødegaard et al.32 2005
9579_C013.fm Page 200 Wednesday, November 15, 2006 12:15 PM
Chapter 3). For example, Agapow et al.38 found a 48% increase in species names after phylogenetic
revision and application of the phylogenetic species concept. In considering these adjustments, there is
the potential for 7.4–14.8 million phylogenetically defined species of insects in the world.
Leaving all of these hypothetical assumptions, abstractions and extrapolations behind, there is
a clear message in all of these estimates: insect taxonomists have a considerable job to meet the
challenge of developing an encyclopedia of life39. In a survey of the Zoological Record from 2000
through 2004, we found an average of 8,500 new insect species described per year. At this rate, it
will take 480 to 1,070 years to describe the world insect fauna (based on estimates of 5–10 million
insect species). Clearly this rate of species description is not adequate in meeting the contemporary
needs of society, including ameliorating the alarming decline in biodiversity. This rate must be
multiplied by a factor of 10–24 in order to document scientifically the world’s undescribed insect
fauna in the next 100 years, and 100 times that if the fauna had to be described in the next 25 years,
as some people have suggested.
or even orders. This has resulted in a concomitant change from general to specialised collecting.
These new survey efforts have led to the discovery of large numbers of species that were not
represented in existing collections. For example, in a recent revision49 of Australian barkbugs
(Aradidae: Mezirinae), 45 of the 93 species represented in Australia were described as new, based
on material collected primarily by the revision’s author.
In summary, if the intent is to describe all insects in nature, and not just those in existing
collections, then the taxonomic impediment is not merely a shortfall in species description. Code-
pendent classification and collection impediments require parallel attention. In the following sections
we provide a case study that documents the methodological transition from the single investigator to
an industrial model of taxonomy that strives to overcome taxonomic classification and collection
impediments within the plant bug family Miridae.
A B
FIGURE 13.4 Two undescribed species of Miridae from Australia: Mymecorides sp. an ant mimic (A) and
Peritropis sp. (B). Scale bars = 1 mm.
9579_C013.fm Page 202 Wednesday, November 15, 2006 12:15 PM
Ph orin inae
ae
ae
C lop e
Is pina e
in
al ina
c r
a
om e
yo co
op
y l in
ae
e
Ps tyl
Br aeo
et
na
in
a
yl
rth
ir i
er
M
O
1,000 species
FIGURE 13.5 Phylogeny of Miridae with terminal branches expanded to represent number of described
species in each taxon.
myrmecomorphy being found in hundreds of species and many genera. The majority of Orthot-
ylinae and Phylinae is highly host specific, occurring primarily on meristematic growth of devel-
oping flowers and/or shoots.
5000 Global
North America
Australia
4000
No. of species
3000
2000
1000
0
1830 1855 1880 1905 1930 1955 1980
Year
FIGURE 13.6 Global and regional chart of species description accumulation for the Orthyotylinae and
Phylinae from 1830 to present.
9579_C013.fm Page 203 Wednesday, November 15, 2006 12:15 PM
published the multipart Catalogue of the Miridae of the World. An updated version of the catalogue
was published in 1995 by Schuh58, who now maintains an up-to-date online version50. Since
Carvalho’s catalogue, species have been described at a rate of approximately 145 species per year.
This represents a doubling of the rate of the previous 50 years (73 species per year) to the publication
of the catalogue.
Plant bug taxonomy has historically been a cottage industry in which single investigators have
worked on regional collections, producing a modest list of species names over a lifetime. In total,
some 340 authors (excluding junior authors) have published 13,048 species group names in Miridae.
Although synthetic ‘global’ taxonomists have emerged throughout history, the vast majority of
plant bug taxonomists (73%) have described fewer than 15 species. Until recently, most plant bug
taxonomists worked alone. Only 13% of plant bug names are described in multi-author papers,
suggesting that the image of taxonomists as lone investigators working in isolation is an apt
description of past behaviour. However, when these data are partitioned by decade of description,
a different image emerges. Since the 1970s, the proportion of plant bug species described in multi-
authored papers has steadily increased by an average of 5% a decade. In fact, in the last ten years,
40% of plant bug species names are the product of a collaborative effort.
What are the benefits of collaborative efforts? Collaboration almost invariably increases taxo-
nomic and/or geographic breadth. In particular the collaboration of global authorities with regional
experts reduces redescription of geographically widespread species. For example, Carvalho, the
most prolific global plant bug worker in history, occasionally collaborated with regional experts to
produce works on geographically restricted faunas. Broadening taxonomic and geographic breadth
through collaboration has the potential to increase the stability, universality, and predictive value
of classifications. On the other hand, reclusive approaches may produce deleterious results, as for
example the near simultaneous but independent work of Knight60 and Kelton61 on the genus
Reuteroscopus. Within Miridae, levels of synonymy are approximately 23%, suggesting that there
is room for improvement, with collaboration offering an obvious possibility.
Although many taxonomists have contributed to the plant bug taxonomic literature, just 22
taxonomists have described 75% of plant bug species. Do these ‘über-taxonomists’ represent the
ideal for which we should strive? Obviously, the introduction of species names should not be the
only measure by which we judge the output of taxonomists. For instance, the value of Stichel’s
316 species names of Miridae is markedly decreased by the subsequent treatment of 285 (90%) of
those names as junior synonyms. In contrast, the American entomologist Lattin is not the primary
author for any plant bug name; however, just five of his students have produced approximately 750
species group names, with very low rates of synonymy. Nonetheless if we are to attain the rates
of taxonomic output necessary to chronicle the diversity of life on Earth, then creating the infra-
structure and resources for efficient networks of collaborating taxonomists has the greatest potential
for advancing the cause.
Described Species
1-210
211-1013
1014-2300
no data
FIGURE 13.7 Map of plant bug species richness. The patterns suggest a high degree of sampling bias and correlation with distribution of plant bug
taxonomists.
Reconstructing the Tree of Life
9579_C013.fm Page 205 Wednesday, November 15, 2006 12:15 PM
been roughly sorted into 2,000 species, which equates to an order of magnitude increase on
published knowledge.
Based on these figures alone, the Australian plant bug fauna would be categorised as one of
the most species rich in the world. However, the sampling of the Australian flora is far from
adequate. In recent surveys of the Australian Miridae, we have sampled just over 1,200 species of
flowering plants and found that 75% of Australian plant bugs are known from only one or two
hosts. Although we do not keep records for host plant species sampled without plant bugs, our
sampling efforts are to this stage only a fraction of the 18,000 known species of plants comprising
the Australian flora. In addition, most localities have only been visited once, and temporal turnover
patterns for plant bugs at these localities is largely unknown. In the few cases where there has been
repeat sampling, a highly significant temporal turnover in plant bug species has been found62.
Other factors also contribute greatly to the low rate of species description of insect faunas in
the Southern hemisphere, and for plant bugs the lack of adequate generic classifications is a
fundamental issue. A historical overview of the description of the Australian Miridae indicates that
Northern hemisphere generic concepts were often applied to what we are finding to be a highly
endemic Australian plant bug fauna. For instance, Melanotrichus australianus Carvalho (Orthotylinae)
is the only representative of this genus in the Southern hemisphere. Cursory examination of the
species indicates that it is clearly misplaced, and in fact belongs to an undescribed genus in Phylinae.
These determinations can often only be made in hindsight; however, it emphasises the importance
of quickly building classificatorial frameworks for poorly described faunas.
Internet interface
Public homepage
Interface
Interactive digital
identification keys Species pages
FIGURE 13.8 Information technology infrastructure for the plant bug PBI.
In Figure 13.8 we outline the information technology framework for the Plant Bug PBI project, which
is divided into the cyber-based taxonomic tools, and the overarching Internet interface that is designed
to provide universal and immediate access to the generated taxonomic outputs. The key cyber-
taxonomic tools that are implemented or in development are described below:
Web-based systematic catalogue. Within the confines of available funding and technological
understanding, the Plant Bug PBI team has chosen to place as many research tools and as much
research information as possible on the Internet. At the core of this approach is a systematic
catalogue of Miridae. This source, in the form of a relational database, provides an up-to-date
bibliographic history for all taxa in the group under study. It provides a powerful tool for organising
and retrieving information on nomenclature, classification, host associations and geographical
distributions. The relational database allows for potentially continuous updating and the rapid
delivery of identical results to users anywhere in the world, thus maintaining a contemporary species
list for Miridae. Beyond its capacity to serve catalogue data, the systematic catalogue serves as a
platform for the retrieval of pages from the digital library and other key information from the
specimen and image databases.
Digital library. Taking advantage of the relational structure of the systematic catalogue, a
digital library of relevant literature, comprising some 30,000 pages has been uploaded to the web
in searchable PDF format (http://research.amnh.org/pbi/catalog). These pages relate to the taxonomy,
morphology and natural history of Orthotylinae and Phylinae. The most obvious limitation of this
approach is that permission for copyrighted material published during the last 70 years could not
always be secured, and in such cases this literature is not incorporated into the digital library.
Publications that are already available on the web, especially those published very recently, can be
included through the use of linking Uniform Resource Locators (URLs). The rewards produced by
this digital archive go well beyond its relatively modest production costs. It provides access to a
near comprehensive body of primary literature, including access to the older literature, which often
has restricted availability, particularly to scientists in developing countries.
Web-based specimen database. Although the structural attributes of specimen data have been
widely agreed upon for some time (for example, Darwin Core schema), the approaches to acquiring
and retrieving those data are less well settled. In an effort to accommodate the international partners
on the Plant Bug PBI team, the project implemented a web-based approach to the acquisition of
specimen data. This approach takes advantage of high speed Internet connections and has the desirable
property of allowing for centralised geo-referencing, real-time data entry, and the security of using a
centralised enterprise level computer server with regular backups and institutional support.
Matrix code unique specimen identification. Whilst unique specimen identification has long
been used in vertebrate collections, the ‘barcoding’ (not to be confused with DNA barcoding) of
insect specimens has become a relatively common practice only in the last few years. Unique specimen
identification allows for the tracking of information otherwise not possible, and particularly for the
9579_C013.fm Page 208 Wednesday, November 15, 2006 12:15 PM
rapid retrieval of database records. Yet the codes may require handling of the specimens in order
to be read or might inordinately increase the amount of space required to house collections. The
Plant Bug PBI team has adopted the use of ‘matrix code’ labels, which provide the benefits of
unique specimen identification. These labels are relatively small and only increase the total amount
of space occupied by the collection by one third. Their small size does not, however, increase
specimen handling, as the specimens are machine readable without removing them from the
collection.
Real-time mapping/host data from labels. The integration of the specimen database with the
systematic catalogue allows for the real-time mapping of species distributions and the assessment
of host specificity from actual specimen data. Geo-referenced specimen data can also be easily
exported to GIS software for generation of distribution maps for taxonomic manuscripts. Also,
voucher material of plants is collected in the field, determined by botanists, digitally scanned and
deposited in various herbaria. The host plant data are then linked to specimens through herbaria
accession numbers and the unique identifiers associated with plant bug specimens.
High-resolution digital imaging. The description and documentation of taxa can be greatly
enhanced through the use of effective imaging and illustration techniques. The Plant Bug PBI team
has adopted the use of digital imaging systems that allow for the rapid capture of high-resolution
images. These images are supplemented with scanning electron micrographs of specialised mor-
phology. All of these images are databased and linked with specimens, resulting in an image
morphological databank for Miridae.
Species pages and integration of information on the web. Using the digitised information
sources described above, taxon information is combined into ‘species pages’ on the web. These
displays incorporate real-time nomenclatural, descriptive, host plant, distributional and biblio-
graphic information, as well as morphological imagery. This allows for a comprehensive perspective
on the attributes of individual plant bug species. End users may arrive at these pages through
Internet search engines, the plant bug online catalogue, or multi-entry web-based identification
keys that are being developed by the Plant Bug PBI team.
13.8 CONCLUSIONS
To complete the tree of life we must assemble all of the pieces. The possibility of an endpoint in
this task may appear remote, particularly within the lifespan of existing taxonomists. To achieve
this goal, the comprehensive documentation of species is a necessary objective, worthy of strategic
planning and investment. The insects are a dauntingly diverse taxon, whose complete description
and cataloguing in a short time span will take a Herculean effort. In reaching for this outcome, it
is important to overcome misconceptions that taxonomy is nothing more than a cataloguing process.
Wheeler41 has made the necessary defence of taxonomy, that it is a hypothesis-driven science. The
outputs of taxonomy (such as character homology, taxa and classifications) are the foundation upon
which most of biological science rests, and cannot be tossed out for expediency. The speed with
which individual investigators can recognise and diagnose new taxa without doubt will occupy
some minimum period of time. This is simply part of the analytic process and the fact that species
as we understand them are concepts, not self-identifying entities in nature. So the question remains,
how do we maintain the cornerstones of traditional taxonomy and ramp up the effort? In this chapter
we have argued that the tools of industrial taxonomy must derive from the proper mix of human
power, collaboration and technology.
Almost everyone agrees that the World Wide Web and digital technology have the potential to
accelerate taxonomy in the twenty-first century. However, the acceleration of taxonomy is not
simply increasing the rate of species descriptions, but also greatly enhancing the availability of
data to end users. In line with the original PBI objectives, the Plant Bug team has developed and
implemented technology that simultaneously assists plant bug taxonomists and enhances broader
accessibility of taxonomic data (Figure 13.8). Through the use of relational databases and custom
9579_C013.fm Page 209 Wednesday, November 15, 2006 12:15 PM
web applications, users can generate ‘material examined’ lists for taxonomic manuscripts, track
collection data for rare taxa, query morphological and molecular data for phylogenetic analyses,
examine image libraries for studies in comparative morphology, or generate the data necessary to
build a species richness map for a biological preserve. Furthermore, the integration of these tools
with a team-based approach allows for the division of taxonomic effort, while at the same time
being able to focus on the larger problem of producing an up-to-date classification for monophyletic
groups on a worldwide basis.
Combining traditional taxonomic approaches with global collaboration, centralised web-accessible
plant bug data and targeted biological survey work, the Plant Bug PBI has a model for high
taxonomic output in Miridae. The success of this project will be judged by those within the
taxonomic community, but also by other stakeholders, stretching from the biologically curious to
environmental decision makers. Whilst no single investigator can possibly master any megadiverse
group, applying suitable web-based technology can effectively couple the skills of multiple inves-
tigators working towards a common goal.
ACKNOWLEDGEMENTS
Sincere appreciation is extended to Hannah Finlay for illustrations in Figure 13.4. Additional thanks are
given to Lance Wilkie, Michael Elliot and Gareth Carter for assistance with Figures 13.3, 13.6 and 13.7.
This paper was supported in part by the NSF Planetary Biodiversity Inventory grant DEB-0316495.
REFERENCES
1. Grimaldi, D. and Engel, M.S., Evolution of the Insects, Cambridge University Press, New York, 2005.
2. Dial, R.J. et al., Arthropod abundance, canopy structure, and microclimate in a Bornean lowland
tropical rainforest, Biotropica, 38, 643, 2006.
3. Watt, A.D. et al., Impact of forest loss and regeneration on insect abundance and diversity, in Forests
and Insects, Watt, A.D., Stork, N.E., and Hunter, M.D., Eds., Chapman and Hall, London, 1997, 273.
4. Eggleton, P. et al., The diversity, abundance, and biomass of termites under differing levels of
disturbance in the Mbalmayo Forest Reserve, southern Cameroon, Philos. Trans. R. Soc. Lond. B,
351, 51, 1996.
5. Bignell, D.E. et al., Termites as mediators of carbon fluxes in tropical forest: budgets for carbon
dioxide and methane emissions, in Forests and Insects, Watt, A.D., Stork, N.E., and Hunter, M.D.,
Eds., Chapman and Hall, London, 1997, 109.
6. Berg, R.Y., Myrmecochorous plants in Australia and their dispersal by ants, Aust. J. Bot., 23, 475, 1975.
7. Buchmann, S.L. and Nabhan, G.P., The Forgotten Pollinators, Island Press, Washington, DC, 1996.
8. Gaston, K.J. and Mound, L.A., Taxonomy, hypothesis testing and the biodiversity crisis, Proc. R. Soc.
Biol. Sci. B, 251, 139, 1993.
9. Regier, J.C., Shultz, J.W., and Kambic, R.E., Pancrustacean phylogeny: hexapods are terrestrial
crustaceans and maxillopods are not monophyletic, Proc. R. Soc. Biol. Sci. Ser., 272, 395, 2005.
10. Babbitt, C.C. and Patel, N.H., Relationships within the Pancrustacea: examining the influence of
additional Malacostracan 18S and 28S rDNA, in Crustacea and Arthropod Relationships, Koenemann,
S. and Jenner, R.A., Eds., CRC Press, Boca Raton, 2005, 275.
11. Giribet, G. et al., The position of crustaceans within Arthropoda: evidence from nine molecular loci
and morphology, in Crustacea and Arthropod Relationships, Koenemann, S. and Jenner, R.A., Eds.,
CRC Press, Boca Raton, 2005, 307.
12. Cook, C.E., Yue, Q.Y. and Akam, M., Mitochondrial genomes suggest that hexapods and crustaceans
are mutually paraphyletic, Proc. R. Soc. Biol. Sci. B, 272, 1295, 2005.
13. Carapelli, A. et al., Relationships between hexapods and crustaceans based on four mitochondrial
genes., in Crustacea and Arthropod Relationships, Koenemann, S. and Jenner, R.A., Eds., CRC Press,
Boca Raton, 2005, 295.
14. Willmann, R., Phylogenetic relationships and evolution of insects, in Assembling the Tree of Life,
Cracraft, J. and Donoghue, M.J., Eds., Oxford University Press, New York, 2004, 330.
9579_C013.fm Page 210 Wednesday, November 15, 2006 12:15 PM
15. Whiting, M.F., Phylogeny of the holometabolous insects: the most successful group of terrestrial
organisms, in Assembling the Tree of Life, Cracraft, J. and Donoghue, M.J., Eds., Oxford University
Press, New York, 2004, 345.
16. Terry, M.D. and Whiting, M.F., Mantophasmatodea and phylogeny of the lower neopterous insects,
Cladistics, 21, 240, 2005.
17. Mayhew, P.J., Shifts in hexapod diversification and what Haldane could have said, Proc. R. Soc. Lond.
B, 269, 969, 2002.
18. Pellmyr, O. and Krenn, H.W., Origin of a complex key innovation in an obligate insect-plant
mutualism, Proc. Natl. Acad. Sci. USA, 99, 5498, 2002.
19. Anderson, R.S., Weevils and plants: phylogenetic versus ecological mediation of evolution of host plant
association in Curculioninae (Coleoptera: Curculionidae), Mem. Entomol. Soc. Can., 165, 197, 1993.
20. Farrell, B.D., ‘Inordinate fondness’ explained: why are there so many beetles? Science, 281, 555, 1998.
21. Marvaldi, A.E. et al., Molecular and morphological phylogenetics of weevils (Coleoptera, Curculionoidea):
do niche shifts accompany diversification? Syst Biol, 51, 761, 2002.
22. Cassis, G. and Gross, G.F., Hemiptera: Heteroptera (Pentatomomorpha), CSIRO Publishing,
Melbourne, 2002.
23. Schcherbakov, D.E. and Popov, Y.A., Superorder Cimicidae Laicharting, 1781 Order Hemiptera Linné,
1758. The bugs, cicadas, plantlice, scale insects, etc., in History of Insects, Rasnitsyn, A.P. and Quicke,
D.L.J., Eds., Kluwer Academic Publishers, Dordrecht, 2002, 143.
24. Niklas, K.J., Tiffney, B.H., and Knoll, A.H., Patterns in vascular plant diversification, Nature, 303,
614, 1983.
25. Davies, T.J. et al., Darwin’s abominable mystery: insights from a supertree of the angiosperms, Proc.
Natl. Acad. Sci. USA, 101, 1904, 2004.
26. Erwin, T.L., Tropical forests: their richness in Coleoptera and other arthropod species, Coleopterists’
Bulletin, 36, 74, 1982.
27. Ødegaard, F., How many species of arthropods? Erwin’s estimate revised, Biol. J. Linn. Soc., 71, 583, 2000.
28. Stork, N.E., Insect diversity: facts, fiction and speculation, Biol. J. Linn. Soc., 35, 321, 1988.
29. Novotny, V. et al., Low host specificity of herbivorous insects in a tropical forest, Nature, 416, 841,
2002.
30. Novotny, V. and Basset, Y., Review: host specificity of insect herbivores in tropical forests, Proc. R.
Soc. Biol. Sci. B, 272, 1083, 2005.
31. Ødegaard, F. et al., The magnitude of local host specificity for phytophagous insects and its implica-
tions for estimates of global species richness, Conserv. Biol., 14, 1182, 2000.
32. Ødegaard, F., Diserud, O.H., and Ostbye, K., The importance of plant relatedness for host utilization
among phytophagous insects, Ecol. Lett., 8, 612, 2005.
33. Hodkinson, I.D. and Casson, D., A lesser predilection for bugs — Hemiptera (Insecta) diversity in
tropical rainforests, Biol. J. Linn. Soc., 43, 101, 1991.
34. Alroy, J., How many named species are valid? Proc. Natl. Acad. Sci. USA, 99, 3706, 2002.
35. Solow, A.R., Mound, L.A., and Gaston, K.J., Estimating the rate of synonymy, Syst. Biol., 44, 93, 1995.
36. Tautz, D. et al., A plea for DNA taxonomy, Trends Ecol. Evol., 18, 70, 2003.
37. Nixon, K.C. and Wheeler, Q.D., An amplification of the phylogenetic species concept, Cladistics, 6,
211, 1990.
38. Agapow, P.M. et al., The impact of species concept on biodiversity studies, Q. Rev. Biol., 79, 161, 2004.
39. Wilson, E.O., The encyclopedia of life, Trends Ecol. Evol., 18, 77, 2003.
40. Thomas, J.A. et al., Comparative losses of British butterflies, birds, and plants and the global extinction
crisis, Science, 303, 1879, 2004.
41. May, R.M., Lawton, J.H., and Stork, N.E., Assessing extinction rates, in Extinction Rates, Lawton,
J.H. and May, R.M., Eds., Oxford University Press, New York, 1995, 1.
42. Godfray, H.C.J., Challenges for taxonomy — the discipline will have to reinvent itself if it is to survive
and flourish, Nature, 417, 17, 2002.
43. Wheeler, Q.D., Taxonomic triage and the poverty of phylogeny, Philos. Trans. R. Soc. Lond. B., 359,
571, 2004.
44. Reid, C.A.M., Spilopyrinae Chapuis: a new subfamily in the Chrysomelidae and its systematic
placement, Invertebr. Taxon, 14, 837, 2000.
9579_C013.fm Page 211 Wednesday, November 15, 2006 12:15 PM
45. Gressit, J.L. and Kimoto, S., Chyrsomelidae (Coleopt.) of China and Korea, Part 1, Pacific Insects
Monographs, 1A, 1, 1961.
46. Schuh, R.T., Pretarsal structure in the Miridae (Hemiptera) with a cladistic analysis of relationships
within the family, Am. Mus. Novit., 2585, 1, 1976.
47. Carvalho, J.C.M., On the major classification of the Miridae (Hemiptera): with keys to subfamilies
and tribes and a catalogue of the world genera, An. Acad. Bras. Cienc., 24, 31, 1952.
48. Suarez, A.V. and Tsutsui, N.D., The value of museum collections for research and society, Bioscience,
54, 66, 2004.
49. Monteith, G.B., Revision of the Australian flat bugs of the subfamily Mezirinae (Insecta: Hemiptera:
Aradidae), Mem. Queensl. Mus., 41, 1, 1997.
50. Schuh, R.T., Plant Bug Inventory Systematic Catalog, http://research.amnh.org/pbi/catalog/, 2005.
51. Henry, T.J. and Wheeler, A.G., Family Miridae Hahn, 1833 (=Capsidae Burmeister, 1835), in Catalog
of the Heteroptera, or the True Bugs, of Canada and the Continental United States, Henry, T.J. and
Froeschner, R.C., Eds., Brill, New York, 1988, pp. 251.
52. Cassis, G., Schwartz, M.D., and Moulds, T., Systematics and new taxa of the Vannius complex
(Hemiptera: Miridae: Cylapinae) from the Australian region, Mem. Queensl. Mus., 49, 123, 2003.
53. Cassis, G., A reclassification and phylogeny of the Termatophylini (Heteroptera: Miridae: Deraeo-
corinae), with a taxonomic revision of the Australian species, and a review of the tribal classification
of the Deraeocorinae, Proc. Entomol. Soc. Wash., 97, 258, 1995.
54. Sanchez, J.A., Gillespie, D.R., and McGregor, R.R., Plant preference in relation to life history traits
in the zoophytophagous predator Dicyphus hesperus, Entomol. Exp. Appl., 112, 7, 2004.
55. Wheeler, Q.D. and Wheeler, A.G., Mycophagous Miridae? Associations of Cylapinae (Heteroptera)
with pyrenomycete fungi (Euascomycetes: Xylariaceae), J. N. Y. Entomol. Soc., 102, 114, 1994.
56. Wheeler, A.G., Biology of the Plant Bugs (Hemiptera: Miridae), Cornell University Press, Ithaca, NY,
2001.
57. Schuh, R.T. and Slater, J.A., True Bugs of the World (Hemiptera: Heteroptera): Classification and
Natural History, Cornell University Press, Ithaca, NY, 1995.
58. Schuh, R.T., Plant Bugs of the World (Insecta: Heteroptera: Miridae), New York Entomological
Society, New York, 1995.
59. Linnaeus, C., Systema Naturae per Regna tria Naturae, secundum Classes, Ordines, Genera, Species, cum
Characteribus, Differentis, Synonymis, Locis. Editio decima, reformata Laurentii Salvii, Holmiae, 1758.
60. Knight, H.H., A new key to species of Reuteroscopus Kirk. with descriptions of new species
(Hemiptera, Miridae), Iowa State J. Sci., 40, 101, 1965.
61. Kelton, L.A., Revision of the genus Reuteroscopus Kirkaldy 1905 with descriptions of eleven new
species (Hemiptera: Miridae), Can. Entomol., 96, 1421, 1964.
62. Major, R.E. et al., The effect of habitat configuration on arboreal insects in fragmented woodlands of
south-eastern Australia, Biol. Conserv., 113, 35, 2003.
63. Myers, N. et al., Biodiversity for conservation priorities, Nature, 403, 2000.
64. Pennisi, E., Modernizing the tree of life, Science, 300, 1692, 2003.
65. Brusca, R.C. and Brusca, G.J., Invertebrates, 2nd ed., Sinauer Associates, Sunderland, MA, 2002.
66. Margulis, L. and Schwartz, K.V., Five Kingdoms: An Illustrated Guide to the Phyla of Life on Earth,
3rd ed., W.H. Freemand and Company, New York, 1998.
67. DSMZ, Bacterial Nomenclature Up-to-Date, Deutsche Sammlung von Mikroorganismen und Zellkulturen
GmbH, Braunschweig, Germany, 2005.
68. Minelli, A., Biological Systematics: The State of the Art, Chapman and Hall, London, 1993.
69. Thayer, M.K. and Newton, A.F., What is a Staphylinid? http://www.fieldmuseum.org/peet_staph/
whatisstaph.html, 2003.
70. Wagner, D.L., Moths, in Encyclopedia of Biodiversity, Levin, S., Ed., Academic Press, San Diego,
2001, 249.
71. Thompson, F.C., Biosystematic Database of World Diptera, http://www.sel.barc.usda.gov.diptera/
biosystem.htm 2000.
72. Goulet, H. and Huber, J.T., Hymenoptera: An Identification Guide to Families, Research Branch
Agriculture Canada, Ottawa, 1993.
73. Agosti, D. and Johnson, N.F., Antbase, http://www.antbase.org, version, 2005.
9579_C013.fm Page 212 Wednesday, November 15, 2006 12:15 PM
74. Dietrich, C.H., Phylogeny of the leafhopper subfamily Evacanthinae with a review of Neotropical
species and notes on related groups (Hemiptera: Membracoidea: Cicadellidae), Syst. Entomol., 29,
455, 2004.
75. Gaston, K.J., The magnitude of global insect species richness, Conserv. Biol., 5, 283, 1991.
76. Hammond, P.M., Species inventory, in Global Biodiversity. Status of the Earth’s Living Resources,
Groombridge, B., Ed., Chapman and Hall, London, 1992, 17.
77. Dolphin, K. and Quicke, D.L.J., Estimating the global species richness of an incompletely described
taxon: an example using parasitoid wasps (Hymenoptera: Braconidae), Biol. J. Linn. Soc., 73, 279,
2001.
9579_C014.fm Page 213 Monday, November 13, 2006 2:49 PM
M. Geerts
Swalmen, The Netherlands
A. F. Konings
Cichlid Press, El Paso, Texas, USA
K. R. McKaye
Appalachian Laboratory, University of Maryland System,
Frostburg, Maryland, USA
CONTENTS
ABSTRACT
Cichlids are one of the most species rich families of vertebrates, with conservative estimates citing
more than 2,000 extant species. Although native to tropical areas of the world, with the exception
of Australia, some 70–80% of cichlids are found in Africa, with the greatest diversity found in the
Great Lakes (lakes Victoria, Tanzania and Malawi). Their highly integrated pharyngeal jaw appa-
ratus permits cichlids to transport and process food, thus enabling the oral jaws to develop special-
isations for acquiring a variety of food items. This distinct feature has allowed cichlids to achieve
great trophic diversity, which in turn has lead to great species diversity. The high species diversity of
this vertebrate family is not accompanied by an appropriately high genetic diversity. The combination
213
9579_C014.fm Page 214 Monday, November 13, 2006 2:49 PM
of rapid radiation of the group and relatively low genetic diversity has confounded attempts to
diagnose species and discern phylogenetic relationships. Behavioural traits appear to be important
characters for diagnosing many cichlid species.
14.1 INTRODUCTION
Cichlids (Cichlidae) are a species rich group of fish from the lowland tropics1 and are indigenous
primarily to the fresh waters of Africa, South America and Central America, with one species
extending its range to the Rio Grande River in southern North America. In addition, cichlids are
found in Madagascar, the Levant and India and have also been introduced into nearly all tropical
and subtropical regions of the world, either through escapes from aquaculture or ornamental fish
operations or intentionally to provide sport fishing opportunities or to control exotic plants2. They
have established breeding populations in warm waters of industrial effluents in temperate areas3
and have been introduced into some marine environments4. Numerous investigators5–14 have focused
on cichlid fishes for their ecological, evolutionary and behavioural research.
Without doubt, the cichlids’ explosive speciation, unique feeding specialisations, diverse mating
systems and great importance as a protein source in tropical countries have been factors stimulating
research interest in this group15–18. In fact, Greenwood19 referred to the cichlid species flocks as
“evolutionary microcosms repeating on a small and appreciable scale the patterns and mechanisms
of vertebrate evolution”. Many of these research efforts, however, have been slowed and results
often confused because of the uncertain systematic status of some of the cichlids being studied20,21.
The conservative bauplan of cichlids1 and relatively low genetic divergence is coupled with a great
morphological diversity that makes it difficult to diagnose species using morphological criteria
alone (Figure 14.1). Systematic confusion exists within Cichlidae and also within and between its
higher taxonomic ranks such as suborders. Such relationships are currently being debated. The
reasons why cichlids have managed to speciate so successfully, often within a restricted geographic
range such as the Great Lakes of Africa, have also been under investigation14,15,22.
FIGURE 14.1 (A colour version of this figure follows page 240) Cichlid fishes. Cichlids have a conservative
bauplan, and specialised attributes, such as hypertrophied lips are the result of parallel evolution, thus
making species and higher level diagnoses difficult. (a) Amphilophus sp. ‘fatlip’ in Lake Xiloa, Nicaragua;
(b) Placidochromis milomo at Nkhomo Reef, Lake Malawi, Malawi; (c) Lobochilotes labiatus at Nkondwe
Island, Lake Tanganyika, Tanzania. (Photos reproduced with permission from A.F. Konings.)
presumably not limited to the other ‘labroid’ lineages, sparids, anabantids-nandids, haemulids,
percids, moronids, and kyphosids”. Westneat and Alfaro27 reported maximum RAG2 DNA sequence
divergence between wrasses and outgroups ranging as high as 23% between parrotfishes and some
of the cichlids they examined. Nevertheless, they supported the inclusion of the parrotfish as a
subgroup of Labridae; thus, it is doubtful if these families belong to the same suborder.
9579_C014.fm Page 216 Monday, November 13, 2006 2:49 PM
Boulenger30 first speculated that the cichlids form a natural group within the perciform Acantho-
pterygians. Stiassny1 recognised Cichlidae as a monophyletic group based on five apomorphic characters:
• Loss of a structural association between parts A2 and A10 of the adductor mandibulae
muscle and the attachment of a large ventral section of A2 onto the posterior border of
the ascending process of the anguloarticular
• An extensive cartilaginous cap on the front margin of the second epibranchial bones
• An expanded head of the fourth epibranchial bones
• Presence of characteristically shaped and distributed microbranchiospines on the gill
arches
• The subdivision of the traversus dorsalis anterior muscle into three distinct parts as
described by Liem and Greenwood26
The monophyly of Cichlidae is further supported by the morphology of otoliths and configu-
ration of the digestive tracts. Gaemers31, based on the structural configuration of the sagitta, also
hypothesised that the cichlids are monophyletic. The sagitta is usually the largest otolith in most
teleosts, including cichlids. The sagitta of cichlids is strong, thick, with a more or less oval, short,
elliptical to rounded pentagonal shape31. If the pseudocolliculum in the sagitta of cichlids is a
synapomorphic character, it supports other evidence of monophyly of the family26,32. Finally, three
structural attributes of a cichlid’s digestive tract support cichlid monophyly: the stomach’s extend-
ible blind pouch; the left hand exit to the anterior intestine; and the position of the first intestinal
loop on the left side33.
• Short anterior arm of epibranchial, which is a reduction that occurs in several lineages,
including the Etroplinae and Astatotilapia, and which is reversed in groups such as
Cichlasomatinae
• Interdigitating suture connecting the vomerine shaft and the parasphenoid bar (requires
independent development in Ptychochromis and reversal in Biotoecus, Dicrossus and
Nannacara)
• Presence of an anterior palatoethmoid ligament, which occurs in all Neotropical cichlids
and Heterochromis, but in no other Old World cichlids
9579_C014.fm Page 217 Monday, November 13, 2006 2:49 PM
Retroculus is regarded as the earliest diverging lineage of Neotropical cichlids and the sister group
of Cichla-Crenicichla, which is placed in a new subfamily, Cichlinae34. Kullander34 recognises the
following Neotropical subfamilies: Astronotinae (Astronotus, Chaetobranchus), Geophaginae (for
example, Geophagus, Apistogramma) and Cichlasomatinae (Cichlasomines, Heroines Acaronia).
Mitochondrial DNA supports the monophyly of Cichlidae, but differs in its interpretation of
intrafamilial relationships39. In particular, Heterochromis is sister to the remaining African cichlids,
and Retroculus is the most basal taxon of the Neotropical cichlids. Farias et al.40 believed that
Astronotus, Cichla and Retroculus formed three independent basal lineages, even though one of
their trees favoured Astronotus as the sister group to Cichla. Therefore, Astronotinae, sensu
Kullander, is not accepted by molecular biologists, whereas Chaetobranchus and Chaetobranchopsis
are grouped together as Chaetobranchines. Farias et al.40 remove Crenicichla and Teleocichla from
Cichlinae (sensu Kullander) and transfer that group to Geophaginae. In the molecular phylogeny,
Acaronia is no longer regarded as the sister group of the Heroini/Cichlasomatini group40.
By far the greatest radiation of cichlids is found in the Great Lakes of Africa, with Lake Malawi
alone having as many as 850 species49. The phylogenetic diversity ranges from the single invasion
of Lake Malawi, which resulted in the endemism of all but a few species, to multiple invasions in
Lake Tanganyika, which resulted in the presence of 12 different tribes50. The rich fauna of these
lakes is primarily attributable to the explosive adaptive radiation and speciation51–53 of the haplo-
chromines sensu lato (see Schliewen and Stiassny35). The driving mechanism for these speciation
events is unknown. The two most widely proposed methods are allopatric speciation5,15,54–57 and
intrinsic isolating mechanisms14,22,58–64. Furthermore, biologists generally agree that female mate
choice can act as a strong driving force in runaway speciation where the average female preference for
a specific male trait differs between two allopatric populations65–69. Thus, behavioural traits are
important tools for the diagnosis of these African cichlids, primarily because behavioural traits
played a very important role in and facilitated the rapid radiation of these fishes, which may not
always be accompanied by discernable morphological changes70.
Certainly, there are fewer species of cichlids in South and Central America than in the Old
World. Greenwood19 noted that the focus on African cichlids has distracted attention from the South
American Cichlidae, a fact which he regarded as ‘sad but understandable’. Nevertheless, the
neotropical cichlid fauna is varied and diverse1, comprising some 50 genera and 450 species34, with
new species still being discovered71. In the last decade, on average more than 20 species of cichlids
have been formally described each year. The existence of literally hundreds of species awaiting
description is likely to continue this trend.
food items. It is this distinct feature of cichlids that has permitted them to dominate colonisation
of many habitats and adopt feeding opportunities available in lacustrine environments72,76.
Phenotypic plasticity is defined as the possible environmental modification of the phenotype88.
The degree of phenotypic plasticity that cichlids exhibit is congruent with the ability of cichlids
to take advantage of many habitats and feeding opportunities. Despite the morphological plasticity
observed in other fishes89–91, the morphology of African cichlids initially was thought to be rigid92;
however, much morphological variability has been observed93–98. In particular, the observed
phenotypic plasticity in some instances has involved the pharyngeal jaw apparatus. For example,
the phenotypic diversity of the pharyngeal jaws of the New World Herichthys minckleyi99,100 has
been documented extensively. Furthermore, differences in bone structure of the lower pharyngeal
jaw of Astatoreochromis alluaudi resulted from different diets101–103. In addition to these obser-
vations, Meyer94 and Wimberger97,98 have experimentally demonstrated the effects of diet on
plasticity of head morphology in New World cichlids, but Meyer94 hypothesised that the plasticity
of mouth brooding Old World cichlids may not be as pronounced due to constraints on jaw
morphology for mouth brooding. Stauffer and Van Snik Gray104 effected significant differences
in head morphology of Lake Malawi rock-dwelling cichlids by experimentally manipulating diets.
The magnitude of plasticity in these mouth brooding Lake Malawi cichlids, however, was not
as pronounced as that observed for the New World substrate brooder Herichthys cyanoguttatum.
They did postulate, however, that phenotypic plasticity might have contributed to the extensive
trophic radiation and subsequent explosive speciation observed in Old World haplochromine
cichlids. Lewontin105 postulated that colonising species, possessing a high degree of phenotypic
plasticity, may have a selective advantage because of their ability to exploit additional resources
in differing environments.
eggs in her mouth. Genetic studies of paternity for several Lake Malawi species show that females
mate with as many as six males per brood126.
The process of female choice is complex. Female Otopharynx c.f. argyrosoma selectively chose
males that occupy bowers in the centre of the arena121, while female Mehenga conophoros and
Lethrinops c.f. parvidens chose males that build the largest bowers122,125,127. If the rapid radiation
of the Lake Malawi cichlid flock was accelerated by sexual selection, the observed differences
in behaviour might be the best way to distinguish between sibling species that differ little in
morphology71.
REFERENCES
1. Stiassny, M.L.J., Phylogenetic intrarelationships of the family Cichlidae: an overview, in Cichlid
Fishes Behaviour, Ecology and Evolution. Keenleyside, M.H.A., Ed., Chapman and Hall, New York,
1991, chap. 1.
2. Courtenay Jr., W.R. and Stauffer Jr., J.R., Biology, Distribution and Management of Exotic Fishes,
Johns Hopkins University Press, Baltimore, 1, 1984.
3. Stauffer Jr., J.R., Boltz, S.E., and Boltz, J.M., Thermal tolerance of the blue tilapia, Oreochromis
aureus, in the Susquehanna River, N. Am. J. Fisheries Management, 8(3), 329, 1988.
4. Lobel, P.S., Invasion by the Mozambique tilapia (Sarotherodon mossambicus; Pisces: Cichlidae) of
a Pacific atoll marine ecosystem, Micronesia, 16, 349, 1980.
5. Fryer, G., The trophic interrelationships and ecology of some littoral communities of Lake Nyasa
with special reference to the fishes, and a discussion of the evolution of a group of rock-frequenting
Cichlidae, Proc. Zool. Soc. Lond., 132, 153, 1959.
9579_C014.fm Page 221 Monday, November 13, 2006 2:49 PM
6. Jackson, P.B.N. et al., Report on the Survey of Northern Lake Nyasa 1954–1955, Government Printing
Office, Zomba, 1, 1963.
7. Holzberg, S., A field and laboratory study of the behaviour and ecology of Pseudotropheus zebra
(Boulenger) an endemic cichlid of Lake Malawi (Pisces: Cichlidae), Z. Zool. Syst. Evol., 16, 171, 1978.
8. Marsh, A.C., Ribbink, A.J., and Marsh, B.A., Sibling species complexes in sympatric populations of
Petrotilapia Trewavas (Cichlidae, Lake Malawi), Zool. J. Linn. Soc., 71, 253, 1981.
9. Kocher, T.D. et al., Similar morphologies of cichlid fish in lakes Tanganyika and Malawi are due to
convergence, Mol. Phylogenet. Evol., 2, 158, 1993.
10. Sato, T. and Gashagaza, M.M., Shell-brooding cichlid fishes of Lake Tanganyika: their habitats and
mating system, in Fish Communities in Lake Tanganyika, Kawanabe, H., Hori, M., and Nagoshi, M.,
Eds., Kyoto University Press, Kyoto, 1997, 219.
11. Kawanabe, H.M., Hori, M., and Nagoshi, M., Fish Communities in Lake Tanganyika, Kyoto University
Press, Kyoto, 1997.
12. Land, R., Seehausen, O., and van Alphen, J.J.M., Mechanisms of rapid sympatric speciation by sex
reversal and sexual selection in cichlid fish, Genetica, 112, 435, 2001.
13. Genner, M.J. et al., How does the taxonomic status of allopatric populations influence species richness
within African cichlid fish assemblages? J. Biogeography, 31, 93, 2004.
14. Genner, M.J. and Turner, G.F., The mbuna cichlids of Lake Malawi: a model for rapid speciation and
adaptive radiation, Fish and Fisheries, 6, 522, 2005.
15. Fryer, G. and Iles, T.D., The cichlid fishes of the Great Lakes of Africa, Oliver and Boyd, London, 1972.
16. Keenleyside, M.H.A., Ed., Cichlid Fishes: Behaviour, Ecology and Evolution, Chapman and Hall,
London, 1991.
17. Keenleyside, M.H.A., Parental care, in Cichlid Fishes: Behaviour, Ecology and Evolution, Keenleyside,
M.H.A., Ed., Chapman and Hall, New York, 1991, 191.
18. Kornfield, I., African cichlid fishes: model systems for evolutionary biology, Ann. Rev. Ecol. Syst.,
31, 163, 2000.
19. Greenwood, P.H., The cichlid fishes of Lake Victoria, East Africa: the biology and evolution of a
species flock, Bull. Br. Mus. Nat. Hist. (Zool.), 6, 1, 1974.
20. Stauffer Jr., J.R. and McKaye, K.R., The naming of cichlids, J. Aquaricult. Aquatic Sci., 9, 1, 2001.
21. Turner, G.F. et al., Identification and biology of Diplotaxodon, Rhamphochromis, and Pallidochromis,
in The Cichlid Diversity of Lake Malawi/Nyasa/Niassa: Identification, Distribution, and Taxonomy,
Snoeks, J., Ed., Cichlid Press, El Paso, 2004, 198.
22. McKaye, K.R. and Stauffer Jr., J.R., Description of a gold cichlid (Teleostei: Cichlidae) from Lake
Malawi, Africa, Copeia, 1986, 870, 1986.
23. Kaufman, L. and Liem, K.F., Fishes of the suborder Labroidei (Pisces: Perciformes): phylogeny,
ecology and evolutionary significance, Breviora, 472, 1, 1982.
24. Stiassny, M.L.J. and Jensen, J.S., Labroid intrarelationships revisited: morphological complexity, key
innovations, and the study of comparative diversity, Bull. Mus. Comp. Zool., 151, 269, 1987.
25. Müller, J., Beiträge zur Kenntniss der natürlichen Familien der Fische, Arch. Naturgesch., 9, 381,
1844. (published 1846)
26. Liem, K.F. and Greenwood, P.H., A functional approach to the phylogeny of the pharyngognath
teleosts, Amer. Zool., 21, 83, 1981.
27. Westneat, M.W. and Alfaro, M.E., Phylogenetic relationships and evolutionary history of the reef fish
family Labridae, Mol. Phylogen. Evol., 36, 370, 2005.
28. Streelman, T.J. and Karl, S.A., Reconstructing labroid evolution with single-copy nuclear DNA, Proc.
R. Soc. Lond. B, 264, 1011, 1997.
29. Sparks, J. and Smith, W., Phylogeny and biogeography of cichlid fishes (Teleostei: Perciformes:
Cichlidae), Cladistics, 20, 501, 2004.
30. Boulenger, G.A., A revision of the African and Syrian fishes of the family Cichlidae, Part 1, Proc.
Zool. Soc. Lond., 132, 1898.
31. Gaemers, P.A.M., Taxonomic position of the Cichlidae (Pisces, Perciformes) as demonstrated by the
morphology of their otoliths, Neth. J. Zool., 34, 566, 1984.
32. Stiassny, M.L.J., The phyletic status of the family Cichlidae (Pisces, Perciformes): a comparative
anatomical investigation, Neth. J. Zool., 31, 275, 1981.
33. Zilhler, F., Gross morphology and configuration of digestive tracts of Cichlidae (Teleostei, Perciformes):
phylogenetic and functional significance, Neth. J. Zool., 32, 544, 1982.
9579_C014.fm Page 222 Monday, November 13, 2006 2:49 PM
34. Kullander, S.O., A phylogeny and classification of the South American Cichlidae (Teleostei: Perciformes),
in Phylogeny and Classification of Neotropical Fishes, Malabarba, L.R. et al., Eds., EDIPUCRS, Porto
Alegre, 1998, 461.
35. Schliewen, U. and Stiassny, M.L.H., Etia nguti, a new genus and species of cichlid fish from the River
Mamfue, Upper Cross River Basin in Cameroon, West Central Africa, Ichthyol. Explor., Freshwaters,
14, 61, 2003.
36. Oliver, M.K., Systematics of African cichlid fishes: determination of the most primitive taxon, and
studies of the Haplochromines of Lake Malawi (Teleostei:Cichlidae), Ph.D. thesis, Yale University,
New Haven, CT, 1984.
37. Stiassny, M.L.J., Cichlid familial intrarelationships and the placement of the neotropical genus Cichla
(Perciformes, Labroidei), J. Nat. Hist., 21, 1311, 1987.
38. Cichocki, F.P., Cladistic history of the cichlid fishes and reproductive strategies of the American genera
Acarichthys, Biotodoma, and Geophagus, Ph.D. thesis, University of Michigan, Ann Arbor, 1976.
39. Farias, I.P. et al., Mitochondrial DNA phylogeny of the family Cichlidae: monophyly and fast
molecular evolution of the neotropical assemblage, J. Mol. Evol., 48, 703, 1999.
40. Farias, I.P., Orti, G., and Meyer, A., Total evidence molecules, morphology, and the phylogenetics of
cichlid fishes, J. Exp. Zool., 288, 76, 2000.
41. Chakrabarty, P., Cichlid biogeography: comment and review, Fish and Fisheries, 5, 97, 2004.
42. Sparks, J. and Smith, W., Freshwater fishes, dispersal ability, and non-evidence: ‘Gondwana life rafts’
to the rescue, Syst. Biol., 54, 158, 2005.
43. Lundberg, J., African-South American freshwater clades and continental drift; problems with a paradigm,
in Biological Relationships between Africa and South America, Goldblatt, P., Ed., Yale University
Press, New Haven, 1993, 156.
44. Van Couvering, J.A.H., Fossil cichlid fish of Africa, Spec. Pap. Palaeont. Lond., 29, 1982.
45. Vences, M. et al., Reconciling fossils and molecules: Cenozoic divergence of cichlid fishes and the
biogeography of Madagascar, J. Biog., 28, 1091, 2001.
46. Murray, A., The oldest fossil cichlids (Teleostei: Perciformes): indication of a 45 million year old
species flock, Proc. R. Soc. Lond. B, 268, 679, 2001.
47. Arratia, G.A. et al., Late Cretaceous-Paleocene percomorphs (Teleostei) from India — early radiation
of Perciformes, in Recent Advances in the Origin and Early Radiation of Vertebrates, Arratia, G.,
Wilson, M.V.H., and Cloutier, R., Eds., Pfeil Verlag, Munich, 2004, 635.
48. Greenwood, P.H., Speciation, in Cichlid Fishes: Behaviour, Ecology and Evolution, Keenleyside,
M.H.A., Ed., Chapman and Hall, New York, 1991, 86.
49. Konings, A., Malawi Cichlids in Their Natural Habitat, 3rd edition, Cichlid Press, El Paso, TX, 2001.
50. Poll, M., Deuxième serie de Cichlidae nouveaux recueillis par la mission hydrobiologique belge au
lac Tanganyika (1946–1947), Bull. Inst. Sci. Nat. Belg., 25, 1, 1949.
51. Regan, C.T., The cichlid fishes of Lake Nyassa, Zool. Soc. Lond., 21, 675, 1922.
52. Trewavas, E., A synopsis of the cichlid fishes of Lake Nyassa, Ann. Mag. Nat. Hist., 16, 65, 1935.
53. Greenwood, P.H., Towards a phyletic classification of the ‘genus’ Haplochromis (Pisces: Cichlidae)
and related taxa, Part I, Bull. Br. Mus. Nat. Hist. (Zool.), 35, 265, 1979.
54. Greenwood, P.H., African cichlids and evolutionary theories, in Evolution of Fish Species Flocks,
Echelle, A.A. and Kornfield, I., Eds., University of Maine at Orono Press, Orono, 1984, 141.
55. Greenwood, P.H., The haplochromine fishes of the east African lakes, Kraus International Publications,
Munchen, and British Museum Natural History, London, 1981.
56. Marlier, G., Observations sur la biologie littorale du lac Tanganyika, Rev. Zool. Bot. Afr., 59, 16, 1959.
57. Matthes, H., Note sur la reproduction des poissons au lac Tanganyika, C.S.A. 3rd Symp. Hydrobiol.
Major Lakes, 107, 1962.
58. Bush, G.L., Modes of animal speciation, Ann. Rev. Ecol. Syst., 6, 339, 1975.
59. Kosswig, C., Selective mating as a factor for speciation in cichlid fish of east African lakes, Nature,
159, 604, 1947.
60. Kosswig, C., Ways of speciation in fishes, Copeia, 1963, 238, 1963.
61. McKaye, K.R., Explosive speciation: the cichlids of Lake Malawi, Discovery, 13, 24, 1978.
62. McKaye, K.R., Seasonality in habitat selection by the gold color morph of Cichlasoma citrinellum
and its relevance to sympatric speciation in the family Cichlidae, Environ. Biol. Fish., 5, 75, 1980.
9579_C014.fm Page 223 Monday, November 13, 2006 2:49 PM
63. Takahashi, K. et al., A novel family of short interspersed repetitive elements (SINEs) from cichlids:
the patterns of insertion of SINEs at orthologous loci support the proposed monophyly of four major
groups of cichlid fishes in Lake Tanganyika, Mol. Biol. Evol., 15, 391, 1998.
64. Takahashi, K. et al., Phylogenetic relationships and ancient incomplete lineage sorting among cichlid
fishes in Lake Tanganyika as revealed by analysis of the insertion of retroposons, Mol. Biol. Evol.,
18, 2057, 2001.
65. Barlow, G.W., Mating systems among cichlid fishes, in Cichlid Fishes: Behaviour, Ecology, and
Evolution, Keenleyside, M.H.A., Ed., Chapman and Hall, New York, 1991, 173.
66. Barlow, G.W., The Cichlid Fishes: Nature’s Grand Experiment in Evolution, Perseus Publishing,
Cambridge, 2000.
67. Clutton-Brock, T.H., The Evolution of Parental Care, Princeton University Press, Princeton, 1991.
68. Johnsgard, P.A., Arena Birds: Sexual Selection and Behavior, Smithsonian Institution Press, Wash-
ington, 1994.
69. Hogland, J. and Alatalo, R.V., Leks, Princeton University Press, Princeton, 1995.
70. Stauffer Jr., J.R., McKaye, K.R., and Konings, A.F., Behaviour: an important diagnostic tool for Lake
Malawi cichlids, Fish and Fisheries, 3, 213, 2002.
71. Stauffer Jr., J.R. and McKaye, K.R., Descriptions of three new species of cichlid fishes (Teleostei:
Cichlidae) from Lake Xiloá, Nicaragua, Cuadernos Invest., UCA, 12, 1, 2002.
72. Liem, K.F., Evolutionary strategies and morphological innovations: cichlid pharyngeal jaws, Syst.
Zool., 22, 425, 1974.
73. Echelle, A.A. and Kornfield, I., Eds., Evolution of Fish Species Flocks, University of Maine Press,
Orono, 1984.
74. Meyer, A. et al., Monophyletic origin of Lake Victoria cichlid fishes suggested by mitochrondrial
DNA sequences, Nature, 347, 550, 1990.
75. Zardoya, R. et al., Evolutionary conservation of microsatellite flanking regions and their use in
resolving the phylogeny of cichlid fishes (Pisces: Perciformes), Proc. R. Soc. Lond. B, 263, 1589,
1996.
76. Liem, K.F. and Osse, J.W.M., Biological versatility, evolution, and food resource exploitation in
African cichlid fishes, Am. Zool., 15, 427, 1975.
77. Liem, K.F., Adaptive significance of intra- and interspecific differences in the feeding repertoires of
cichlid fishes, Am. Zool., 20, 295, 1980.
78. Axelrod, H.R. and Burgess, W., African Cichlids of Lakes Malawi and Tanganyika, 8th ed., T.F.H.
Publications, Neptune City, 1979.
79. Stauffer Jr., J.R. et al., Evolutionarily significant units among cichlid fishes: the role of behavioral
studies, Am. Fisheries Soc. Symp., 17, 227, 1995.
80. Kocher, T.D., Adaptive evolution and explosive speciation: the cichlid fish model, Nature Rev. Genetics,
5, 288, 2004.
81. Fryer, G., Biological notes on some cichlid fishes of Lake Nyasa, Rev. Zool. Bot. Afr., 54, 1, 1956.
82. McKaye, K.R. and Kocher, T.D., Head ramming behaviour by three paedophagous cichlids in Lake
Malawi, Africa, Anim. Behav., 31, 206, 1983.
83. Stauffer Jr., J.R. and McKaye, K.R., Description of a paedophagous deep-water cichlid (Teleostei:
Cichlidae) from Lake Malawi, Africa, Proc. Biol. Soc. Wash., 99, 29, 1986.
84. Ribbink, A.J., The feeding behavior of a cleaner, scale and skin and fin eater of Lake Malawi
(Docimodus evelynae, Pisces: Cichlidae), Neth. J. Zool., 34, 182, 1984.
85. Stauffer Jr., J.R., Description of a facultative cleanerfish (Teleostei: Cichlidae) from Lake Malawi,
Africa, Copeia, 1991, 141, 1991.
86. Stauffer Jr., J.R., Posner, I., and Seltzer, R., Hunting strategies of a Lake Malawi cichlid with reverse
countershading, Copeia, 1999, 1108, 1999.
87. Liem, K.F., Evolutionary strategies and morphological innovations: cichlid pharyngeal jaws, Syst.
Zool., 22, 425, 1973.
88. Bradshaw, A.D., Evolutionary significance of phenotypic plasticity in plants, Adv. Genet., 13, 115, 1965.
89. Barlow, G.W., Causes and significance of morphological variation in fishes, Syst. Zool., 10, 105, 1961.
90. Behnke, R.J., Systematics of salmonid fishes of recently glaciated lakes, J. Fisheries Res. Board
Canada, 29, 639, 1972.
9579_C014.fm Page 224 Monday, November 13, 2006 2:49 PM
91. Chernoff, B., Character variation among populations and the analysis of biogeography, Am. Zool., 22,
425, 1982.
92. van Oijen, M.J.P., Ecological differentiation among the piscivorous haplochromine cichlids of Lake
Victoria (East Africa), Neth. J. Zool., 32, 336, 1982.
93. Hoogerhoud, R.J.C., A taxonomic reconsideration of the Haplochromine genera Gaurochromis
Greenwood, 1980 and Labrochromis Regan, 1920 (Pisces, Cichlidae), Neth. J. Zool., 34, 539, 1984.
94. Meyer, A., Phenotypic plasticity and heterochromy in Cichlasoma managuense (Pisces: Cichlidae)
and their implications for speciation in cichlid fishes, Evolution, 41, 1357, 1987.
95. Meyer, A., Ecological and evolutionary consequences of the trophic polymorphism in Cichlasoma
citrinellum (Pisces: Cichlidae), Biol. J. Linn. Soc., 39, 279, 1990.
96. Witte, F., Barel, C.D.N., and Hoogerhoud, R.J.C., Phenotypic plasticity of anatomical structures and
its ecomorphological significance, Neth. J. Zool., 40, 278, 1990.
97. Wimberger, P.H., Plasticity of jaw and skull morphology in the neotropical cichlids Geophagus
brasiliensis and G. steindachneri, Evolution, 45, 1545, 1991.
98. Wimberger, P.H., Plasticity of body shape: the effects of diet, development, family and age in two
species of Geophagus (Pisces: Cichlidae), Biol. J. Linn. Soc., 45, 197, 1992.
99. Kornfield, I. and J.N. Taylor, J.N., A new species of polymorphic fish, Cichlasoma minckleyi from
Cuatro Cienegas, Mexico, (Teleostei: Cichlidae), Proc. Biol. Soc. Wash., 96, 253, 1983.
100. Trapani, J., Morphological variability in the Cuarto Cienegas cichlid, Cichlasoma minckleyi, J. Fish
Biol., 62, 276, 2003.
101. Huysseune, A., Sire, J.-Y., and Meunier, F.J., Comparative study of lower pharyngeal jaw structure
in two phenotypes of Astatereochromis alluaudi (Teleostei: Cichlidae), J. Morph., 221, 25, 1994.
102. Smits, J.D., Witte, F., and Povel, D., Differences between inter- and intraspecific architectonic
adaptations to pharyngeal mollusk crushing in cichlid fishes, Biol. J. Linn. Soc., 59, 367, 1996.
103. Smits, J.D., Witte, F., and van Veen, F.G., Functional changes in the anatomy of the pharyngeal jaw
apparatus of Astatoreochromis alluaudi (Pisces, Cichlidae), and their effects on adjacent structures,
Biol. J. Linn. Soc., 59, 389, 1996.
104. Stauffer Jr., J.R. and van Snik Gray, E., Phenotypic plasticity: its role in trophic radiation and explosive
speciation in cichlids (Teleostei: Cichlidae), Anim. Biol., 54, 137, 2004.
105. Lewontin, R.C., Selection for colonizing ability, in The Genetics of Colonizing Species, Baker, H.G.
and Stebbins, G.L., Eds., Academic Press, New York, 1965, 77.
106. Baerends, G.P. and Baerends-van Roon, J.M., An introduction to the study of the ethology of cichlid
fishes, Behaviour, 1, 1, 1950.
107. Kuwamura, T., Parental care and mating systems of cichlid fishes in Lake Tanganyika: a preliminary
field survey, J. Ethol., 4, 146, 1986.
108. Lowe-McConnell, R.H., The breeding behaviour of Tilapia species (Pisces; Cichlidae) in natural
waters: observations on T. karomo Poll, and T. variabilis Boulenger, Behaviour, 9, 140, 1956.
109. Trewavas, E., Tilapiine Fishes of the genera Saratherodon, Oreochromis, and Danakilia, British
Museum (Natural History), London, 1983.
110. van den Berghe, E. and McKaye, K.R., Reproductive success of maternal and biparental care in a
Nicaraguan cichlid fish, Parachromis dovii, J. Aquaricult. Aquatic Sci., 9, 49, 2001.
111. Barlow, G.W., A test of appeasement and arousal hypotheses of courtship behaviour in a cichlid fish,
Etroplus maculates, Z. Tierpsychol., 27, 779, 1970.
112. Ward, J.A. and Wyman, R.L., Ethology and ecology of cichlid fishes of the genus Etroplus in Sri
Lanka: preliminary findings, Environ. Biol. Fishes, 2, 137, 1977.
113. Konings, A., Tanganyika cichlids, Verduijn Cichlids, Zevenhuizen, 1988.
114. Apfelback, R., Vergleichende quantitative Untersuchungen des Fortpflanzungs- und Brutpflegeverhalten
von mono- und dimorpher Tilapien (Pisces, Cichlidae), Z. Tierpsychol., 26, 692, 1969.
115. Loiselle, P.V., The Cichlid Aquarium, Tetra-Press, Melle, 1985.
116. Timms, A.M. and Keenleyside, M.H.A., The reproductive behaviour of Aequidens paraguayensis,
Z. Tierpsych., 39, 8, 1975.
117. Linke, H. and Staeck, W., African Cichlids I: Cichlids of West Africa, Tetra Press, Melle, 1981.
118. Myrberg Jr., A.A., A descriptive analysis of the behaviour of the African cichlid fish, Pelmatochromis
guentheri (Sauvage), Anim. Behav., 13, 312, 1965.
119. McKaye, K.R., Ecology and breeding behavior of a cichlid fish, Cyrtocara eucinostomus, on a large
lek in Lake Malawi, Africa, Environ. Biol. Fish., 8, 81, 1983.
9579_C014.fm Page 225 Monday, November 13, 2006 2:49 PM
120. McKaye, K.R., Behavioural aspects of cichlid reproductive strategies: patterns of territoriality and
brood defense in Central American substratum spawners versus African mouth brooders, in Fish
Reproduction: Strategies and Tactics, Wooton, R.J. and Potts, G.W., Eds., Academic Press, New York,
1984, 245.
121. McKaye, K.R., Sexual selection and the evolution of the cichlid fishes of Lake Malawi, Africa, in
Cichlid Fishes: Behavior, Ecology and Evolution, Keenleyside, M.H.A., Ed., Chapman and Hall,
London, 1991, 241.
122. McKaye, K.R., Louda, S.M., and Stauffer Jr., J.R., Bower size and male reproductive success in a
cichlid fish lek, Am. Naturalist, 135, 597, 1990.
123. Stauffer Jr., J.R., LoVullo, T.J., and McKaye, K.R., Three new sand-dwelling cichlids from Lake
Malawi, Africa, with a discussion of the status of the genus Copadichromis (Teleostei: Cichlidae),
Copeia, 1993, 1017, 1993.
124. McKaye, K.R. and Stauffer Jr., J.R., Seasonality, depth, and habitat distribution of breeding males,
Oreochromis spp., ‘Chambo’, in Lake Malawi National Park, J. Fish Biol., 33, 825, 1988.
125. Kellogg, K.A., Stauffer Jr., J.R., and McKaye, K.R., Characteristics that influence male reproductive
success on a cichlid lek, Behav. Ecol. Sociobiol., 47, 164, 2000.
126. Kellogg, K.A. et al., Microsatellite variation demonstrates multiple paternity in lekking cichlid fishes
from Lake Malawi, Africa, Proc. R. Soc. Lond. B, 260, 79, 1995.
127. Stauffer Jr., J.R., Kellogg, K.A., and McKaye, K.R., Experimental evidence of female choice in Lake
Malawi cichlids, Copeia, 2005, 656, 2005.
128. Witte, F. et al., The destruction of an endemic species flock: quantitative data on the decline of the
haplochromine cichlids of Lake Victoria, Environ. Biol. Fishes, 34, 1, 1992.
129. Goldschmidt, T. and Witte, F., Explosive speciation and adaptive radiation of haplochromine cichlids
from Lake Victoria: an illustration of the scientific value of a lost species flock, Mitt. Internat. Verein.
Limnol., 23, 101, 1992.
130. McKaye, K.R. et al., African tilapia in Lake Nicaragua: ecosystem in transition, BioScience, 45, 406,
1995.
131. Kornfield, I., McKaye, K.R., and Kocher, T.D., Evidence for the immigration hypothesis in the endemic
cichlid fauna of Lake Tanganyika, Isozyme Bull., 15, 76, 1985.
132. Kocher, T.D. et al., A genetic linkage map of a cichlid fish, the tilapia (Oreochromis niloticus),
Genetics, 148, 1225, 1998.
133. Arnegard, M.E. et al., Population structure and colour variation of the cichlid fish Labeotroheus
fuelleborni Ahl along a recently formed archipelago of rocky habitat patches in southern Lake Malawi,
Proc. R. Soc. Lond. B, 266, 1, 1999.
134. Markert, J.A. et al., Biogeography and population genetics of the Lake Malawi cichlid Melanochromis
auratus: habitat transience, philopatry, and speciation, Mol. Ecol., 8, 1013, 1999.
135. Albertson, R.C. et al. Phylogeny of a rapidly evolving clade: the cichlid fishes of Lake Malawi,
East Africa, Proc. Natl. Acad. Sci., 96, 5107, 1999.
9579_C014.fm Page 226 Monday, November 13, 2006 2:49 PM
9579_C015.fm Page 227 Saturday, November 11, 2006 12:25 PM
15 Fungal Diversity
A. M. C. Tang, B. D. Shenoy and K. D. Hyde
Centre for Research in Fungal Diversity, Department of Ecology and
Biodiversity, The University of Hong Kong, P. R. China
CONTENTS
ABSTRACT
Fungi are ubiquitous, beneficial, harmful and mutualistic. They perform some of the most important
basic roles in life and have some of the greatest potential for biotechnology, yet as few as 7% of
the total estimated fungal species on Earth are described. There are thought to be 1.5 million fungal
species, but there are huge problems in obtaining estimates of fungal diversity. These include:
species recognition, as there are usually few useful characters to distinguish species; separate
taxonomic binomials for asexual and sexual states of the same species; lack of specialist mycologists;
227
9579_C015.fm Page 228 Saturday, November 11, 2006 12:25 PM
and the unfortunate downward trend for mycological biodiversity funding. Estimates of fungal
diversity are discussed for selected plant groups, insects and species rich genera with more than
1,000 species. We conclude that it is important to identify habitats and substrates where a greater
fungal diversity may occur in order to offer maximum protection to fungal resources. The large
variation in estimates of fungal diversity means that considerable data are required before we can
produce a reliable estimate of the number of species of fungi.
by Phytophthora infestans, which was responsible for the epidemics that contributed to the Irish
famine in 184517. Contrary to fungal diseases in plants, fungal diseases in animals are more specific,
and probably every species of animal has some specific fungal parasites; Beauveria and Metarhizium
are examples of well studied insect parasites. Fungi have been tested and formulated for application
in insect pest management systems as important biocontrol agents18. For example, Cordyceps is a
pathogenic fungus which produces fruiting bodies from caterpillars after killing the host. It is well
known for its ability to produce numerous bioactive metabolites, including cyclosporins and efrapep-
tins that have been used in medicine for the immunosuppressive capabilities19,20.
Fungi form one of the six kingdoms of life (Animalia, Bacteria, Chromista, Fungi, Plantae and
Protozoa; but see Hodkinson and Parnell, Chapter 1 for a more recent phylogenetic interpretation
of the major groups of life)21. Surprisingly, fungi are a group more closely related to animals than
plants according to ribosomal DNA and protein coding gene sequences, but this theory is still
controversial22–24. Fungi are subdivided into four phyla, namely Ascomycota, Basidiomycota,
Chytridiomycota and Zygomycota25. Molecular data suggest that some phyla that were once
considered as fungi, such as the plasmodial and cellular slime moulds (Myxomycota and Dictyos-
teliomycota) and the water moulds (Oomycota), should now be excluded from the kingdom23. The
following is a summary of the characteristic features of the four phyla within the Fungi.
15.1.2 ASCOMYCOTA
The phylum Ascomycota, or sac fungi (Greek (hereafter Gr.) ascus, sac; mycetos, fungi) is a group
in which the sexual process involves the production of eight (or multiples of eight) haploid
ascospores through the meiosis of a diploid nucleus in an ascus (Figure 15.1a–d)26. It is the largest
phylum of fungi, with approximately 45,000 described species, and it represents 65% of the known
species of fungi27. It includes many notable members such as Claviceps purpurea, the natural
hallucinogen producer which grows on the grains of grasses, Penicillium notatum and P. chryosogenum
used in the production of antibiotic penicillin, Saccharomyces cerevisiae responsible for fermen-
tation in the production of alcohol, and Neurospora, the model organism for genetic studies, as
well as morels (Morchella esculenta) and truffles, such as Tuber melanosporum, used in Western
cuisines. As well as reproducing sexually, Ascomycetes also sporulate asexually, with the formation
of conidia (spores) on conidiophores (Hyphomycetes) or inside a conidiomata (Coelomycetes). The
sexual stage of an ascomycete is termed the teleomorph, and the asexual stage is the anamorph.
Three subphyla are designated in Ascomycota according to our recent classification28. They are the
subphylum Pezizomycotina (Euascomycetes), Saccharomycotina (Hemiascomycetes) and
Taphrinomycotina (Archiascomycetes).
Pezizomycotina (Euascomycetes) are a group comprising more than 90% of Ascomycota, and
98% are lichenised. Members of Pezizomycotina are designated into two groups: ascohymenial and
ascolocular. Ascohymenial relates to an ascocarp that forms after nuclear pairing. The ascohymenial
type ascomata may be closed (cleistothecium) (Figure 15.1a), provided with an opening (perithecium)
(Figure 15.1b), or open as a cup (apothecium) (Figure 15.1c). Ascolocular relates to a mode of ascocarp
growth in which a perithecium (flask-shaped fruiting body) develops within a cushioning hollow of
cells (stroma) in a depression of the hymenium (locule). Notable examples of ascohymenial
ascomycetes include Aspergillus and Penicillium (class Eurotiomycetes), Ascobolus and Morchella
(Pezizomycetes), and Claviceps, Cordyceps and Neurospora (Sordariomycetes). Examples of
ascolocular ascomycetes include Pleospora, Pyrenophora and Venturia (Dothideomycetes).
Saccharomycotina (Hemiascomycetes) are a small subphylum but of tremendous importance. They
are characterised by the absence of ascoma so that the asci are naked. They include the ‘true yeast’
Saccharomyces cerevisiae (Figure 15.1d), which is important in the processing of bread and alcoholic
beverages. Saccharomyces was also the first eukaryote to have its genome completely sequenced29,30.
Taphrinomycotina (Archiascomycetes) are a diverse group including saprobic and parasitic
forms that have been grouped primarily on the basis of rDNA sequence analysis31,32. In some
9579_C015.fm Page 230 Saturday, November 11, 2006 12:25 PM
a b c
d e
f g
h i
FIGURE 15.1 Fungi representing different fungal classes. (a) Cleistothecium fruiting body (ascomycete);
(b) Perithecium fruiting body (ascomycete); (c) Apothecium fruiting body (ascomycete); (d) Saccharomyces
cerevisiae (ascomycete); (e) Amanita species (basidiomycete); (f) Zoosporangium stage of Chytriomyces
species (chytridiomycete); (g) Zoospore stage of Chytriomyces species (chytridiomycete); (h) Sporangium
(asexual) stage of Rhizopus stolonifer (zygomycete); (i) Zygosporangium (sexual) stage of Rhizopus stolonifer
(zygomycete). (Drawings by Alvin M.C. Tang.)
species, such as Schizosaccharomyces pombe, the fission yeast was surprisingly separated from
Saccharomyces cerevisiae (budding yeast, subphylum Saccharomycotina) based on molecular data33.
Pneumocystis carinii, an extracellular biotroph of alveoli in infected lungs of mammals, was once
thought to be a protozoan, but is now classified to this subphylum based on DNA sequences34.
9579_C015.fm Page 231 Saturday, November 11, 2006 12:25 PM
15.1.3 BASIDIOMYCOTA
The phylum Basidiomycota (Gr. basidion, small base or pestal; mykes, fungi) is a group in which
the sexual process involves the production of haploid basidiospores borne on a basidium in which
a diploid nucleus undergoes meiosis (Figure 15.1e; Figure 15.2)26. It contains approximately 30,000
described species, which accounts for about 35% of the known species of fungi27. Molecular
analyses have defined three lineages from Basidiomycota, two without fruiting bodies, Uredinio-
mycetes and Ustilaginomycetes, and one with fruiting bodies, Hymenomycetes35.
Urediniomycetes contain approximately 7,400 (34%) of the described species of Basidiomycota27,36.
It includes the plant pathogenic fungi, the rusts (Uredinales) and the yeasts (Sporidiales), which
are saprotrophs and pathogens of plants, animals and fungi. Ustilaginomycetes contain approximately
FIGURE 15.2 (A colour version of this figure follows page 240) Basidiomycete fungi. (a) Dacryopinax
spathularia, (b) Pseudocoprinus disseminatus. (Photos reproduced with permission from Edward Grand,
Chiang Mai, Thailand.)
9579_C015.fm Page 232 Saturday, November 11, 2006 12:25 PM
1,300 (6%) of the described species of Basidiomycota27,37. It includes the smut fungi (Ustilaginales),
which form black and dusty masses of teliospores in diseased plants. Smuts are notorious as
they cause millions of dollars of damage to important food crops and ornamentals. Hymenom-
ycetes consists of about 13,500 (60%) of the described species of Basidiomycota27,35. There are
two basal evolutionary branches (sister groups to the rest of Hymenomycetes), one leading to Tremel-
lales (jelly fungi) and the other to Dacrymycetales, Auriculariales (tree ear fungi), Agaricales
(mushrooms) and Aphyllophorales (shelf fungi). Agaricales contains many names that have been
known since humans started to collect mushrooms. Amanita (Figure 15.1e) is a genus commonly
associated with mushroom poisoning, whilst Agaricus bitorquis (button mushroom), Flammulina
velutipes (Enokitake), Lentinula edodes (shiitake), and Pleurotus ostreatus (oyster mushroom) are
widely known as food.
15.1.4 CHYTRIDIOMYCOTA
The Chytridiomycota is the only fungal phylum that produces motile zoospores and requires water
for dispersal (Figure 15.1f–g). They have been classified in the Protista and Protoctista38,39, but based
on SSU rDNA sequence they were recently included in the kingdom Fungi40. Chytridiomycota are
probably a very ancient group, with extant forms possibly having changed little since the early periods
of eukaryotic evolution41. They are commonly found in lakes, streams, ponds, roadside ditches and
coastal marine environments, as well as in soil. As members of terrestrial and aquatic microbial
communities, chytrids play an important ecological role in decomposition of chitin, cellulose, keratin
and hemicellulose42. Notable plant pathogenic species include Synchytrium endobioticum (potato black
wart), Physoderma maydis (corn brown spot) and Urophlytis alfalfae (alfalfa crown wart). As a
representative of lower fungi, Allomyces macrogynus has become fashionable in molecular biology
for the comparative study of the primitive genetic features with other higher fungi43–45.
15.1.5 ZYGOMYCOTA
The phylum Zygomycota (Gr. zygos, yoke of marriage; mykes, fungi) is principally characterised
by the presence of nonseptate (coenocytic) mycelium and the production of dark, thick-walled,
ornamented sexual spores, called zygospores (Figure 15.1h–i). Members of the phylum are generally
morphologically and ecologically diverse, with some species not possessing zygospores46. Zygomycota
has been subdivided into two classes: Trichomycetes and Zygomycetes47,48. Trichomycetes are
symbionts in the gut of arthropods, while Zygomycetes are saprobic, haustorial or nonhaustorial
parasites of animals, plants or fungi48. The position of Trichomycetes within the kingdom Fungi
remains controversial, since members of Amoebidiales in this class have recently grouped with
protists by molecular data, and phylogenetic relationships of Eccrinales and Asellariales are still
unresolved49,50. Members of Mucorales in the class Zygomycetes, on the other hand, are the most
well known group. Rhizopus and Mucor are the most notable fungi, because they cause fruit rots
and bread moulds. Clinically, members of this order, such as Cokeromyces, Cunninghamella,
Rhizomucor, Rhizopus and Saksenaea, are potential human or animal pathogens, especially in
immunosuppressed patients during organ transplants and patients with immunological disorders48,51.
Members of Glomales in Zygomycetes are the most important order ecologically, as they form
mycorrhizae with the majority of plants worldwide.
cohesion and evolutionary principles. In practice, fungal species concepts are always the combinations
of morphological, biological and phylogenetic species concepts55–58. There are often very few useful
characters to separate fungal species, and therefore visual observation may not always be definitive.
Cultures are usually required to ascertain species status using morphological and biological species
concepts, and yet only 16% of the approximately 100,000 known fungal species are in culture
collections worldwide59. This makes progress of species determination difficult.
Further complications are caused by the presence of asexual forms (anamorphs) of Ascomycota
and Basidiomycota. A single fungal genome always has separate taxonomic binomials for the
teleomorph (sexual stage), multiple anamorphs (asexual stage), the chlamydospores, the sclerotia
and even the vegetative mycelium60. Some Ascomycota may have two or more anamorphs, whilst
others seem to be strictly asexual, as sexual reproduction has not clearly been observed60. Article
59 of the International Code of Botanical Nomenclature was specifically written for fungi to allow
dual or multiple names for a single fungal genome. Whether a single name or multiple names
should be used has been controversial; however, it remains an unavoidable complication for many
fungal groups, due to the rarity with which multiple morphs are encountered and technical diffi-
culties in linking stages of life cycles60,61.
TABLE 15.1
Published Fungal Diversity Estimates Since 1990
Author Year Estimated Species (Millions)
With such large variations in estimates of fungal diversity, it is important that work is carried
out in selected research topics to provide data for poorly understood diversity questions. To achieve
that, we need more detailed information on particular sites, fungus to plant and fungus to insect
ratios and sustained increased attention on the fungi associated with particular plants or groups of
insects, especially in the tropics79. Several researchers have been carrying out such research, and
their data provide more insights into fungal diversity, and some of these data are discussed below.
Selected groups and hosts have also been proposed for rapid biodiversity assessments, such as
macromycetes, Xylariaceae, lichen-forming fungi, endophytes, palms, bamboos, Pandanus species,
freshwater fungi and pathogens99.
We select grasses, as they are the world’s most important agricultural plants and because fungal
diversity on grasses has rarely been reviewed101. Poaceae (Gramineae) includes cereals, sugar cane,
forage grasses for farm animals, ornamental grasses and bamboos and comprises about 10,000 species
in 650 genera101,102 (see Hilu, Chapter 11; Hodkinson et al., Chapter 17). Grasses (especially cereal
grasses) provide favourable substrates for fungal colonisation, as evident from fungal records on
various grasses available from the fungal databases of the Systematic Botany and Mycology
Laboratory (SBML), Agricultural Research Service, United States Department of Agriculture
(http://nt.ars-grin.gov/fungaldatabases/index.cfm)103. As many as 14 well studied grass genera sup-
port more than 800 fungal taxa, and these are listed in Table 15.2.
There are at least 30,000 records of fungi in the SBML database. These records cover terrestrial
habitats104–109, freshwater habitats107, estuarine regions110–112 and marine regions113,114. However, they
are not exhaustive. Previous studies were biased towards economically important plants, and
estimates used small sample sizes and a limited number of sampling sites. Increased sampling,
9579_C015.fm Page 236 Saturday, November 11, 2006 12:25 PM
TABLE 15.2
Fungal Records on Selected Grass Genera
Genus No. of Fungal Records Genus No. of Fungal Records
longer study periods, new habitats and unexplored sites tend to yield new data. For example, the
number of different saprobic fungi on one well studied cosmopolitan reed, Phragmites australis,
based on seven studies, can be more than 300107,109,112,115,116 . Intensive survey of smut fungi
(microscopic Basidiomycota) also yielded surprising results. More than 350 species of smut fungi were
isolated from nine grasses (including Bothriochloa, Capillipedium, Chrysopogon, Cynodon,
Dichanthium, Hyparrhenia, Muhlenbergia, Saccharum and Sorghum) in New Zealand117–135. If we
accept 10% of the fungi associated with the P. australis above to be host specific, a high ratio of
30:1 results. This fungi to host ratio will increase when endophytes, mycorrhizal fungi, pathogens,
rusts and smut fungi are included in the estimation, as these groups are more host specific136.
Our knowledge of bamboo fungi is still at the cataloguing stage, and new species are often
described after field sampling137–143. Eriksson and Yue144 provided an annotated checklist of bam-
busicolous fungi, and Hyde et al.143 provided a review of bambusicolous fungi recorded worldwide.
There have been some taxonomic or ecological studies on bamboo fungi, but these are limited to
France145, Hong Kong137–139,143,146 and Japan147–155. In June 2005, there were in total 3,222 records
of fungi associated with 11 of the most common bamboo genera (Arundinaria, Bambusa, Chusquea,
Dendrocalamus, Gigantochloa, Guadua, Phyllostachys, Pleioblastus, Pseudosasa, Schizostachyum
and Sinobambusa) in the SBML database103. After correction (allowing about 30–40% for duplicated
names and multiple records of single species), there are at least 1,933 fungal species known for
bamboo. This figure is much higher than the 1,100 species reviewed by Hyde and coworkers in
2002143. It is obvious that this figure will continue to increase as more field studies are conducted.
Fungal diversity on sedges has not been well studied in comparison with fungi reported on
grass hosts. The obvious reason may be that sedges have less economic value156,157. There are 9,585
records of fungi associated with Cyperaceae in the SBML database103. Most of the records were
contributed from studies with the genera Carex and Cyperus (Table 15.3), while the records on
TABLE 15.3
Fungal Records on Selected Sedge Genera
Genus No. of Fungal Records Genus No. of Fungal Records
other genera are below 800. This disparity between Poaceae and Cyperaceae may also be attributed
to the more diverse morphology and anatomy of the former family than the latter156.
Juncaceae (rushes) are the sister group of Cyperaceae and are a family of eight genera and
about 400 species. They are distributed mainly in temperate climates or the montane regions of the
tropics. Juncus is one of the dominant genera in estuarine marshes of the American East coast.
This genus has received fairly intensive studies in relation to fungi. One endemic species, Juncus
roemerianus (needle rush)158,159, was reported to harbour 117 fungal species160. If we adopt 10%
of host specific fungi, the fungi to host ratio will be 11:1, which is much higher than an estimated
average of 5.7 to 8.573. The 117 species (66 Ascomycota, one Basidiomycota and 50 anamorphic
taxa) include 48 novel species, 14 novel genera and one novel family160.
Few studies have addressed fungal numbers on invertebrates. In fact, since the important paper of
Weir and Hammond175 there have been relatively little data on biodiversity of invertebrate fungi.
If invertebrate fungi were host specific and occurred in most insects this would have extreme
implications for fungal numbers. Weir and Hammond suggest that between 5 and 7% of beetle
species may act as hosts for Laboulbeniales (ascomycetous obligate ectoparasites of Arthropoda)
and speculated that at least 20,000 and possibly 50,000 species of Laboulbeniales await
description175. Trichomycetes (symbiotic gut fungi) numbers were also shown to be dependent on
host diversity, and host specificity was shown to be a crucial factor in trichomycete diversity176.
Much work is still needed to address fungal numbers in this area.
15.5.1 COLLETOTRICHUM
The species rich genus Colletotrichum causes various plant diseases often known as anthracnose
and is worldwide in distribution177. Colletotrichum species cause major damage to crops in
tropical, subtropical and temperate regions. Cereal, vegetables, legumes, ornamentals and fruit
trees may be seriously affected by this pathogen178. Colletotrichum species are also commonly
isolated as endophytes, and latent and quiescent infections by these species on several hosts have
been reported16. Their ability to cause latent infection, that is, infection without visible symptoms,
makes them one of the most successful pathogens causing postharvest disease in a wide range
of crop species177.
Colletotrichum is the anamorphic stage of several species of Glomerella and has a taxonomic
history of about 200 years179. There are 17 acknowledged generic synonyms for Colletotrichum,
and two further names are tentatively included, and there are about 900 species names assigned
to this genus177,180. The identification and characterisation of Colletotrichum species are mainly
based on morphological and cultural criteria or a combination of both. It has become apparent
that the classification system presently used has limited scope, since some species names assigned
to collections and isolates lack the precision required by users. The numbers of morphological
characters derived from growth in culture are limited, and growth conditions have rarely been
standardised. Moreover, the inherent phenotypic plasticity of individual isolates creates confusion
in identification. There are group species or species complexes such as C. dematium, C. gloeospo-
rioides and C. lindemuthianum, which are known to be represented by at least nine distinct
subtaxa177.
At least nine different Colletotrichum species (C. capsici, C. coccodes, C. crassipes, C. dematium,
C. destructivum, C. gloeosporioides, C. lindemuthianum, C. trifolii and C. truncatum) have been
reported on economically important legumes in tropical and temperate regions181. All of these
species are reported to infect at least two hosts, and C. capsici, C. gloeosporioides and C.
lindemuthianum are reported to have the widest host ranges amongst these nine. C. gloeosporioides
is a particularly large complex comprising taxa that cause diseases of a wide range of crops. The
taxa have been isolated as pathogens, endophytes and saprobes, and it is not clear whether these
different lifestyles are associated with specific lineages or have evolved many times. It is therefore
particularly important that we gain an understanding of the diversity of organisms within this
complex.
Under these circumstances the species name has limited practical significance to the plant
pathologist involved in disease management and quarantine and the breeder involved in resistance
breeding. The development of different systems for identification of species over time has largely
been the result of subtle changes in species concept involving different aspects of morphology
combined with ideas about host range and host–pathogen relationships for particular taxa. Despite
these amendments the current species concept used in Colletotrichum systematics is still very broad,
unreliable and unpredictable, being based on the combination of classical criteria such as conidial
shape and size, presence, absence and morphology of setae, presence of sclerotia and appressoria
and symptom expression on host. Moreover, the current classification system for Colletotrichum
in general is unsatisfactory because the constituent species are inadequately defined61. With further
research we may expect to uncover significant levels of synonymy but also discover new species
in complexes such as C. gloeosporioides.
15.5.2 PESTALOTIOPSIS
Pestalotiopsis species commonly cause diseases on a variety of plants and are commonly isolated
as endophytes or occur as saprobes182. The genus contains about 205 named species with many
named after their hosts in much the same way as Colletotrichum. The understanding of species
relationships within this weakly parasitic genus is complicated by the lack of morphological
9579_C015.fm Page 239 Saturday, November 11, 2006 12:25 PM
characters to differentiate species, and in many cases host association has provided a convenient
means to separate species. Jeewon et al.182 used DNA data from a number of Pestalotiopsis
isolates to test whether isolates from the same host are phylogenetically related. They also
investigated the validity of naming species based on host association. Their results indicated that
there was a close phylogenetic relationship between isolates possessing similar morphological
characteristics, but isolates from the same host were not necessarily closely related. They advised
that, when describing new Pestalotiopsis species, morphological characteristics should be taken
into account rather than host association. They considered that the high numbers of Pestalotiopsis
species named in the literature was an overestimate given that naming species based on host is
not valid.
15.5.3 MYCOSPHAERELLA
Species of Mycosphaerella (and their anamorphs) are commonly associated with leaf spots or stem
cankers183. As in the previous two genera, many species have been described based on host
association. However, unlike those genera, as well as being host specific, most of these taxa are
also highly tissue specific, to the degree that some cercosporoids will sporulate on either the upper
or lower leaf surface. Since 1993, Crous and coworkers have described nearly 40 new species of
Mycosphaerella and associated anamorphs from Eucalyptus183 which appear to be highly specific
to this host. An exception to the rule is the Mycosphaerella tassiana complex, as well as other
species with Cladosporium anamorphs.
TABLE 15.4
Synopsis of Potential Applications of Molecular Biology
Areas of Study Application of Molecular Techniques
on rDNA sequence analysis. In a study of Colletotrichum from almond, avocado and strawberry,
Freeman et al.211 found that, although morphological criteria indicated that the Israeli isolates of
almond are unique, the population was grouped within the C. acutatum species according to molecular
analyses. It is obvious that further studies of other species rich genera at the molecular level are
necessary before we can obtain a conclusion of the effects of these genera on fungal numbers.
For example, palms have been shown to be hyperdiverse substrates for fungi84,98,214,227–229, and recent
studies in palm swamps in Thailand have yielded numerous new taxa230,231. Palms and other unstudied
substrates in other areas should be investigated to establish if new species discovery will continue
unabated. To advance our knowledge, we must prioritise funding for inventory and monographic
studies simultaneously with funding for molecular biology. This will provide an invaluable legacy of
data for conservation evaluation and biotechnological and pharmaceutical utilisation.
REFERENCES
1. Reid et al., Ed., Biodiversity Prospecting: Using Genetic Resources for Sustainable Development,
World Resources Institute, Washington, DC, 1993.
2. Rossman, A.Y., A strategy for an all-taxa inventory of fungal diversity, in Biodiversity and Terrestrial
Ecosystems, Monograph Series No. 14, Peng, C.I. and Chen, C.H., Eds., Institute of Botany, Academia
Sinica, Taipei, 1994, 169.
3. Hyde, K.D., Increasing the likelihood of novel compound discovery from filamentous fungi, in Bio-
Exploitation of Filamentous Fungi, Fungal Diversity Research Series, Pointing, S.B. and Hyde, K.D.,
Eds., Fungal Diversity Press, Hong Kong, 6, 77, 2001.
4. Chapela, I.H., Bioprospecting: myths, realities and potential impact on sustainable development, in
Mycology in Sustainable Development: Expanding Concepts, Vanishing Borders, Palm, M.E. and
Chapela, I.H., Eds., Parkway Publisher, Boone, NC, 1997, 238.
5. Concepcion, G.P., Lazaro, J.E., and Hyde, K.D., Screening for bioactive novel compounds, in
Bio-Exploitation of Filamentous Fungi, Fungal Diversity Research Series, Pointing, S.B. and Hyde,
K.D., Eds., 6, 93, 2001.
6. Strobel, G.A., Endophytic fungi: new sources for old and new pharmaceuticals, Pharm. News, 3, 7, 1996.
7. Madigan, M.T., Martinko, J.M., and Parker, J., Brock Biology of Microorganisms, 10th ed., Prentice
Hall and Pearson Education, Upper Saddle River, NJ, 2003.
8. Deacon, J.W., Fungal Biology, Blackwell, UK, 2005.
9. Swift, M.J., Heal, O.W., and Anderson, J.M., Decomposition in Terrestrial Ecosystems, Blackwell,
Oxford, UK, 1979.
10. Cotrufo, M.F., Miller, M., and Zeller, B., Litter decomposition, in Carbon and Nitrogen Cycling,
Schulze, E.D., Ed., Springer-Verlag, Heidelberg, 2000.
11. Risna, R.A. and Suhirman, Lignolytic enzyme production by Polyporaceae from Lombok, Indonesia,
Fung. Divers., 9, 123, 2002.
12. Urairuj, C., Khanongnuch, C., and Lumyong, S., Ligninolytic enzymes from tropical endophytic
Xylariaceae, Fung. Divers., 13, 209, 2003.
13. Lutzoni, F., Pagel, M., and Reeb, V., Major fungal lineages are derived from lichen symbiotic ancestors,
Nature, 411, 937, 2001.
14. Heckman, D.S. et al., Molecular evidence for the early colonization of land by fungi and plants,
Science, 293, 1129, 2001.
15. Rodrigues, K.F., Fungal endophytes of palms, in Endophytic Fungi of Grasses and Woody Plants,
Redlin, S.C. and Carris, L.M., Eds., APS Press, St. Paul, MN, 1996.
16. Bills G.F., Isolation and analysis of endophytic fungal communities from woody plants, in Endophytic
Fungi in Grasses and Woody Plants, Redlin, S.C. and Carris, L.M., Eds., APS Press, St. Paul, MN, 1996, 31.
17. Agrios, G.N., Plant Pathology, 4th ed., Academic Press, San Diego, CA, 1997.
18. Faria, M. and Wright, S.P., Biological control of Bemisia tabaci with fungi, Crop Protection, 20, 767, 2001.
19. Hodge, H.T., Krasnoff, S.B., and Humber, R.A., Tolypocladium inflatum is the anamorph of Cordyceps
subsessilis, Mycologia, 88, 715, 1996.
20. Bandani, A.R. et al., Production of efrapeptins by Tolypocladium species and evaluation of their
insecticidal and antimicrobial properties, Mycol. Res., 104, 537, 2000.
21. Cavalier-Smith, T., A revised six-kingdom system of life, Biol. Rev., 73, 203, 1998.
22. Baldauf, S.L. and Palmer, J.D., Animals and fungi are each other’s closest relatives: congruent evidence
from multiple proteins, Proc. Natl. Acad. Sci. USA, 90, 11558, 1993.
23. Wainright, P.O. et al., Monophyletic origins of the Metazoa: ‘an evolutionary link with fungi’, Science,
260, 340, 1993.
9579_C015.fm Page 243 Saturday, November 11, 2006 12:25 PM
24. Wang, D.Y.C., Kumar, S., and Hedges, S.B., Divergence time estimates for the early history of animal
phyla and the origin of plants, animals and fungi, Proc. R. Soc. Lond. B, 266, 163, 1999.
25. Bruns, T.D. et al., Evolutionary relationships within the fungi: analyses of nuclear small subunit RNA
sequences, Mol. Phylogenet. Evol., 1, 231, 1992.
26. Carlile, M.J. and Watkinson, S.C., The Fungi, Academic Press, London, 1994.
27. Kirk, P.M. et al., Ainsworth and Bisby’s Dictionary of the Fungi, 9th ed., CAB International, Oxon,
UK, 2001.
28. Eriksson, O.E. et al., Eds., Outline of Ascomycota 2003, Myconet, http://www.umu.se/myconet/M9.
html, 2003.
29. Kim, J.M. et al., Transposable elements and genome organization: a comprehensive survey of ret-
rotransposons revealed by the complete Saccharomyces cerevisiae genome sequence, Genome Res.,
8, 464, 1998.
30. Sánchez, R. and Sali, A., Large-scale protein structure modeling of the Saccharomyces cerevisiae
genome, Proc. Natl. Acad. Sci. USA, 95, 13597, 1998.
31. Nishida, H. and Sugiyama, J., Phylogenetic relationships among Taphrina, Saitoella, and other higher
fungi, Mol. Biol. Evol., 10, 431, 1993.
32. Nishida, H. and Sugiyama, J., Archiascomycetes: detection of a major new linage within the
Ascomycota, Mycoscience, 35, 361, 1994.
33. Taylor, J.W. et al., Fungal model organisms: phylogenetics of Saccharomyces, Aspergillus, and
Neurospora, Syst. Biol., 42, 440, 1993.
34. Taylor, J.W., Swann, E., and Berbee, M.L., Molecular evolution of ascomycete fungi: phylogeny and
conflict, in First International Workshop on Ascomycete Systematics, Hawksworth, D.L., Ed., NATO
Advanced Science Institutes Series, Plenum Press, New York, 1994, 201.
35. Swann, E.C. and Taylor, J.W., Higher taxa of basidiomycetes: an 18S rRNA gene perspective,
Mycologia, 85, 923, 1993.
36. Swann, E.C., Frieder, E.M., and McLaughlin, D.J., Urediniomycetes, in The Mycota VII: Systematics
and Evolution Part B, McLaughlin, D.J., McLaughlin, E.G., and Lemke, P.A., Eds., Springer-Verlag,
Berlin, 2001, 37.
37. Bauer, R. et al., Ustilaginomycetes, in The Mycota VII Systematics and Evolution Part B, McLaughlin,
D.J., McLaughlin, E.G., and Lemke, P.A., Eds., Springer-Verlag, Berlin, 2001, 57.
38. Whittaker, R.H., New concepts of kingdoms of organisms, Science, 163, 150, 1969.
39. Margulis, L. et al., Handbook of Protoctista, Jones and Bartlett, Boston, 1990.
40. Bowman, B.H. et al., Molecular evolution of the fungi: relationship of the Basidiomycetes,
Ascomycetes, and Chytridiomycetes, Mol. Biol. Evol., 9, 285, 1992.
41. Barr, D.J.S., Chytridiomycota, in The Mycota VII Systematics and Evolution Part B, McLaughlin,
D.J., McLaughlin, E.G., and Lemke, P.A., Eds., Springer-Verlag, Berlin, 2001, 93.
42. Barr, D.J.S., Phylum Chytridiomycota, in Handbook of Protoctista, Margulis L. et al., Ed., Jones and
Bartlett, Sudbury, MA, 1990, 454.
43. Paquin, B. and Lang, F., The mitochondrial DNA of Allomyces macrogynus: the complete genomic
sequence from an ancestral fungus, J. Mol. Biol., 255, 688, 1996.
44. Ribichich, K.F. et al., Gene discovery and expression profile analysis through sequencing of expressed
sequence tags from different developmental stages of the chytridiomycete Blastocladiella emersonii,
Eukaryot. Cell, 4, 455, 2005.
45. Rocha, C.R.C. and Gomes, S.L., Characterization and submitochondrial localization of the alpha
subunit of the mitochondrial processing peptidase from the aquatic fungus Blastocladiella emersonii,
J. Bacteriol., 181, 4257, 1999.
46. Alexopoulos, C., Mims, C., and Blackwell, M., Introductory Mycology, Wiley and Sons, New York, 1996.
47. Benny, G.L., Zygomycota: Trichomycetes, in The Mycota VII Systematics and Evolution Part B,
McLaughlin, D.J., McLaughlin, E.G., and Lemke, P.A., Eds., Springer-Verlag, Berlin, 2001, 147.
48. Benny, G.L., Humber, R.A., and Morton, J.B., Zygomycota: Zygomycetes, in The Mycota VII
Systematics and Evolution Part B, McLaughlin, D.J., McLaughlin, E.G., and Lemke, P.A., Eds.,
Springer-Verlag, Berlin, 2001, 113.
49. Benny, G.L. and O’Donnell, K.O., Amoebidium parasiticum is a protozoan, not a Trichomycete,
Mycologia, 92, 1133, 2000.
50. Ustinova, I., Krienitz, L., and Huss, V.A.R., Hyaloraphidium curvatum is not a green alga, but a lower
fungus: Amoebidium parasiticum is not a fungus, but a member of the DRIPs, Protist, 151, 253, 2000.
9579_C015.fm Page 244 Saturday, November 11, 2006 12:25 PM
51. Guarro, J., Gene, J., and Stchigel, A.M., Developments in fungal taxonomy, Clin. Microbiol. Rev.,
12, 454, 1999.
52. Harrington, T.C. and Rizzo, D.M., Defining species in the fungi, in Structure and Dynamics of Fungal
Populations, Worrall, J.J., Ed., Kluwer Press, Dordrecht, Netherlands, 1999, 43.
53. Mayden, R.L., A hierarchy of species concepts: the dénouement in the saga of the species problem,
in Species: The Units of Biodiversity, Claridge, M.F., Dawah, H.A., and Wilson, M.R., Eds., Chapman
and Hall Ltd., London, UK, 1997, 381.
54. Taylor, J.W. et al., Phylogenetic species recognition and species concepts in fungi, Fun. Genet. Biol.,
31, 21, 2000.
55. Blackwell, M., Phylogenetic systematics of ascomycetes, in The Fungal Holomorph, Reynolds, D.R.
and Taylor, J.W., Eds., CAB International, Wallingford, UK, 1993, 93.
56. Hibbett, D.S., et al., Phylogenetic diversity in shiitake inferred from nuclear ribosomal DNA
sequences, Mycologia, 87, 618, 1995.
57. Vilgalys, R., Speciation and species concepts in the Collybia dryophila complex, Mycologia, 83, 758, 1991.
58. Vilgalys, R. and Sun, B.L., Ancient and recent patterns of geographic speciation in the oyster
mushroom Pleurotus ostreatus revealed by phylogenetic analysis of ribosomal DNA, Proc. Nat. Acad.
Sci. USA, 91, 4599, 1994.
59. Hawksworth, D.L., Fungal diversity and its implications for genetic resource collections, Stud. Mycol.,
50, 9, 2004.
60. Seifert, K.A. and Samuels, G.J., How should we look at anamorphs? Stud. Mycol., 45, 5, 2000.
61. Cannon, P.F. and Kirk, P.M., The philosophy and practicalities of amalgamating anamorph and
teleomorph concepts, Stud. Mycol., 45, 19, 2000.
62. Magurran, A.E., Measuring Biological Diversity, Blackwell Science, Oxford, 2004.
63. Clarke, K.R. and Warwick, R.M., A further biodiversity index applicable to species lists: variation in
taxonomic distinctness, Mar. Ecol. Prog. Ser., 216, 265, 2001.
64. Clifford, H.T. and Stephenson, W., An Introduction to Numerical Classification, Academic Press,
London, 1975.
65. Gaston, K.J., Species richness: measure and measurement, in Biodiversity: A Biology of Numbers and
Difference, Gaston, K.J., Ed., Oxford University Press, Oxford, UK, 1996, 77.
66. Ito, A. and Imai, S., Ciliates from the cecum of capybara (Hydrocheorus hydrochaeris) in Bolivia 2:
the family Cycloposthiidae., Eur. J. Protist., 2000, 36, 169.
67. Pielou, E.C., An Introduction to Mathematical Ecology, Wiley, New York, 1969.
68. Pielou, E.C., Ecological Diversity, Wiley InterScience, New York, 1975.
69. Southwood, R. and Henderson, P.A., Ecological Methods, Blackwell Science, Oxford, UK, 2000.
70. Cannon, P.F., Diversity of Phyllachoraceae with special reference to the tropics, in Biodiversity of
Tropical Microfungi, Hyde, K.D., Ed., Hong Kong University Press, Hong Kong, 1997, 255.
71. Cannon, P.F., Strategies for rapid assessment of fungal diversity, Biodiver. Conserv., 6, 669, 1997.
72. Colwell, R.R. et al., The microbial species concept and biodiversity, in Microbial Diversity and
Ecosystem Function, Allsopp, D., Colwell, R.R., and Hawksworth, D.L., Eds., Cambridge University
Press, Cambridge, UK, 1995, 3.
73. Hawksworth, D.L., The fungal dimension of biodiversity: magnitude, significance, and conservation,
Mycol. Res., 95, 641, 1991.
74. May, R.M., A fondness for fungi, Nature, 352, 475, 1991.
75. Hyde, K.D., Where are the missing fungi? Does Hong Kong have any answers? Mycol. Res., 105,
1514, 2001.
76. Korf, R.P., Reinventing taxonomy: a curmudgeon’s view of 250 years of fungal taxonomy, the crisis
in biodiversity, and the pitfalls of the phylogenetic age, Mycotaxon, 93, 407 2005.
77. Wheeler, Q.D., Taxonomic triage and the poverty of phylogeny, Phil. Trans. R. Soc. Lond. B, 359,
571, 2004.
78. Ainsworth, G.C., The number of fungi, in The Fungi: An Advanced Treatise, Vol. 3, Ainsworth, G.C.
and Sussman, A.S., Eds., Academic Press, New York, 1968, 505.
79. Hawksworth, D.L., The magnitude of fungal diversity: the 1.5 million species revisited, Mycol. Res.,
105, 1422, 2001.
80. Kirk, P.M., World catalogue of 340K fungal names on-line, Mycol. Res., 104, 516, 2000.
81. Hawksworth, D.L., The need for a more effective biological nomenclature for the 21st century, Bot.
J. Linn. Soc., 109, 543, 1992.
9579_C015.fm Page 245 Saturday, November 11, 2006 12:25 PM
82. Hawksworth, D.L. et al., Ainsworth and Bisby’s Dictionary of the Fungi, 8th ed., CAB International,
Wallingford, 1995.
83. Hammond, P.M., Described and estimated species numbers: an objective assessment of current
knowledge, in Microbial Diversity and Ecosystem Function, Allsopp. D., Colwell, R.R., and Hawksworth,
D.L., Eds., Cambridge University Press, Cambridge, UK, 1995, 29.
84. Fröhlich, J. and Hyde, K.D., Biodiversity of palm fungi in the tropics: are global fungal diversity
estimates realistic? Biodiver. Conserv., 8, 977, 1999.
85. Sipman, H.J.M. and Aptroot, A., Where are the missing lichens? Mycol. Res., 105, 1433, 2001.
86. Hammond, P.M., The current magnitude of biodiversity, in Global Biodiversity Assessment, Heywood,
V.H., Ed., Cambridge University Press, Cambridge, UK, 1995, 113.
87. Hammond, P.M., Species inventory, in Global Biodiversity: Status of the Earth’s Living Resources,
Groombridge, B., Ed., Cambridge University Press, Cambridge, UK, 1992, 113.
88. May, R.M., The dimensions of life on earth, in Nature and Human Society: The Quest for a Sustainable
World, Raven, P.H. and Williams, T., Eds., National Academy Press, Washington, DC, 2000, 30.
89. Pascoe, I.G., History of systematic mycology in Australia, in History of Systematic Mycology in
Australia, Short, P.S., Ed., Australian Systematic Botany Society, South Yarra, 1990, 259.
90. Smith, D. and Waller, J.M., Culture collections of microorganisms: their importance in tropical plant
pathology, Fitopatol. Brasil., 17, 1, 1992.
91. Hywel-Jones, N.L., A systematic survey of insect fungi from natural, tropical forest in Thailand, in
Aspects of Tropical Mycology, Issac, S. et al., Eds., Cambridge University Press, Cambridge, UK,
1993, 300.
92. Dreyfuss, M.M. and Chapela, I.H., Potential of fungi in the discovery of novel, low-molecular weight
pharmaceuticals, in The Discovery of Natural Products with Therapeutic Potential, Gullo, V.P., Ed.,
Butterworth-Heinemann, London, UK, 1994, 49.
93. Cifuentes Blanco, J. et al., Diversity of macromycetes in pine-oak forests in the neovolcanic axis,
Mexico, in Mycology in Sustainable Development: expanding concepts, vanishing borders, Palm,
M.E. and Chapela, I.H., Eds., Parkway Publishers, Boone, NC, 1997, 111.
94. Shivas, R.G. and Hyde, K.D., Biodiversity of plant pathogenic fungi in the tropics, in Biodiversity of
Tropical Microfungi, Hyde, K.D., Ed., Hong Kong University Press, Hong Kong, 47, 1997.
95. Arnold, A.E. et al., Are tropical fungal endophytes hyperdiverse? Ecol. Lett., 3, 267, 2000.
96. Hawksworth, D.L., Rossman, A.Y., Where are all the undescribed fungi? Phytopathol., 87, 888, 1997.
97. Lodge, D.J., Ed., A survey of patterns of diversity in non-lichenized fungi, Mitteilungen der Eidgenös-
sischen Forschungsanstlalt für Wald, Schnee und Landschaft, 70, 157, 1995.
98. Hyde, K.D., Fröhlich, J., and Taylor, J., Diversity of ascomycetes on palms in the tropics, in Biodiversity
of Tropical Microfungi, Hyde, K.D., Ed., Hong Kong University Press, Hong Kong, 1997, 141.
99. Hyde, K.D. et al., Estimating the extent of fungal diversity in the tropics, in Nature and Human
Society: The Quest for a Sustainable World, Raven, P.H. and Williams, T., Eds., National Academy
Press, Washington, DC, 2000, 156.
100. Aptroot, A., Species diversity in tropical rainforest ascomycetes: lichenized versus non-lichenized;
foliicolus verus corticolous, Abstracta Botanica, 21, 37, 1997.
101. Chapman, G.P. and Peat, W.E., An Introduction to Grasses (Including Bamboos and Cereals),
Redwood Press Ltd., UK, 1992.
102. Younger, V.B. and McKell, C.M., The Biology and Utilization of Grasses, Academic Press, New York
and London, 1972, 426.
103. Farr, D.F. et al., Fungal Databases, Systematic Botany and Mycology Laboratory, ARS, USDA,
http://nt.ars-grin.gov/fungaldatabases/index.cfm, 2005.
104. Apinis, A.E., Chester, C.G.C., and Taligoola, H.K., Colonization of Phragmites communis leaves by
fungi, Nova Hedwigia, 23, 113, 1972.
105. Barr, M.E., Huhndorf, S.M., and Rogerson, C.T., The Pyrenomycetes described by J.B. Ellis, Memoirs
of the New York Botanical Garden, 79, 1, 1996.
106. Piepenbring, M., Ecology, and seasonal variation, and altitudinal distribution of Costa Rican smut
fungi, Basidiomycetes: Ustilaginales and Tilletiales, Rev. Biol. Trop., 44, 115, 1996.
107. Poon, M.O.K. and Hyde, K.D., Evidence for the vertical distribution of saprophytic fungi on senescent
Phragmites australis culms at Mai Po Marshes, Hong Kong, Bot. Mar., 41, 285, 1998.
108. Sivanesan, A., Graminicolous species of Bipolaris, Curvularia, Drechslera, Exserohilum and their
teleomorphs, Mycol. Pap., 158, 1, 1987.
9579_C015.fm Page 246 Saturday, November 11, 2006 12:25 PM
109. Wong, M.K.M. and Hyde, K.D. Diversity of fungi on six species of Gramineae and one species of
Cyperaceae in Hong Kong, Mycol. Res., 105, 1485, 2001.
110. Poon, M.O.K. and Hyde, K.D., Biodiversity of Intertidal estuarine fungi on Phragmites at Mai Po
Marshes, Hong Kong, Bot. Mar., 41, 141, 1998.
111. Sabada, R.B. et al., Observations on vertical distribution of fungi associated with standing senescent
Acanthus ilicifolius stems at Mai Po Mangrove, Hong Kong, Hydrobiol., 295, 119, 1995.
112. Shearer, C.A., The freshwater ascomycetes, Nova Hedwigia, 56, 1, 1993.
113. Lee, S.Y., Net aerial productivity, litter production and decomposition of the Phragmites australis in a
nature reserve in Hong Kong: management implications, Marine Ecology Progress Series, 66, 161, 1990.
114. Newell, S.Y., Decomposition of shoots of a salt-marsh grass, in Advances in Microbial Ecology, 13,
Jones, J.G., Ed., Plenum Press, New York, 1993, 1.
115. Farr, D.F. et al., Fungi on Plants and Plant Products in the United States, APS Press, St. Paul, MN, 1989, 1.
116. van Ryckegem, G. and Verbeken, A., Fungal diversity and community structure on common reed
(Phragmites australis) along a salinity gradient in the Scheldt-estuary, Nova Hedwigia, 80, 173, 2005.
117. Shivas, R.G. and Vánky, K., The smut fungi on Cynodon, including Sporosorium normanensis sp.
nov. from Australia, Fung. Divers., 8, 149, 2001.
118. Vánky, K., Ten new species of Ustilaginales, Mycotaxon, 18, 319, 1983.
119. Vánky, K., Taxonomic studies on Ustilaginales XII, Mycotaxon, 54, 215, 1995.
120. Vánky, K., Taxonomic studies on Ustilaginales XIII, Mycotaxon, 56, 197, 1995.
121. Vánky, K., Taxonomic studies on Ustilaginales XV, Mycotaxon, 62, 127, 1997.
122. Vánky, K., Taxonomic studies on Ustilaginales XVI, Mycotaxon, 65, 133, 1997.
123. Vánky, K., Taxonomic studies on Ustilaginales XX, Mycotaxon, 74, 161, 2000.
124. Vánky, K., The smut fungi on Sacchraum and related grasses, Aust. Pl. Pathol., 29, 155, 2000.
125. Vánky, K., Taxonomic studies on Ustilaginales XXI, Mycotaxon, 78, 265, 2001.
126. Vánky, K., Taxonomic studies on Ustilaginales XXII, Mycotaxon, 81, 367, 2002.
127. Vánky, K., The smut fungi (Ustilaginomycetes) of Hyparrhenia (Poaceae), Fung. Divers., 12, 179, 2003.
128. Vánky, K., Taxonomic studies on Ustilaginales XXIII, Mycotaxon, 85, 1, 2003.
129. Vánky, K., The smut fungi (Ustilaginomycetes) of Sporobolus (Poaceae), Fung. Divers., 14, 205, 2003.
130. Vánky, K., The smut fungi (Ustilaginomycetes) of Bothriochloa, Capillipedium and Dichanthium
(Poaceae), Fung. Divers., 15, 219, 2004.
131. Vánky, K., Taxonomic studies on Ustilaginales 24, Mycotaxon, 89, 55, 2004.
132. Vánky, K., The smut fungi (Ustilaginomycetes) of Boutelouinae (Poaceae), Fung. Divers., 16, 167, 2004.
133. Vánky, K., The smut fungi (Ustilaginomycetes) of Muhlenbergia (Poaceae), Fung. Divers., 16, 199,
2004.
134. Vánky, K., The smut fungi (Ustilaginomycetes) of Chrysopogon (Poaceae), Fung. Divers., 18, 177,
2005.
135. Vánky, K. and Shivas, R.G., Smut fungi (Ustilaginomycetes) of Sorghum (Gramineae) with special
regard to Australia, Mycotaxon, 80, 339, 2001.
136. Zhou, D.Q. and Hyde, K.D., Host-specificity, host-exclusivity and host-recurrence in saprobic fungi,
Mycol. Res., 105, 1449, 2001.
137. Hyde, K.D. et al., Saprobic fungi on bamboo culms, Fung. Divers., 7, 35, 2001.
138. Zhou, D.Q., Biodiversity of saprobic microfungi associated with bamboo in Hong Kong and Kunming,
China, Ph.D. thesis, The University of Hong Kong, 2000.
139. Zhou, D.Q. and Hyde, K.D., Fungal succession on bamboo in Hong Kong, Fung. Divers., 10, 213,
2002.
140. Zhou, D., Cai, L., and Hyde, K.D., Astrosphaeriella and Roussoella species on bamboo from Hong
Kong and Yunnan, China, including a new species of Roussoella, Cryptogam. Mycol., 24, 191, 2003.
141. Tanaka, K., and Harada, Y., Bambusicolous fungi in Japan (1): four Phaeosphaeria species, Myco-
systema, 45, 377, 2004.
142. Shenoy, B.D., Jeewon, R., and Hyde, K.D., Oxydothis bambusicola, a new ascomycete with a huge
subapical ascal ring found on bamboo in Hong Kong, Nova Hedwigia, 80, 511, 2005.
143. Hyde, K.D. et al., Vertical distribution of saprobic fungi on bamboo culms, Fung. Divers., 11, 109, 2002.
144. Eriksson, O.E. and Yue, J.Z., Bambusicolous pyrenomycetes, an annotated check-list, Myconet, 1, 25,
1998.
145. Petrini, O., Candoussau, F., and Petrini, L.E., Bambusicolous fungi collected in southern western
France 1982-1989, Mycologica Helvetica, 3, 263, 1989.
9579_C015.fm Page 247 Saturday, November 11, 2006 12:25 PM
146. Goh, T.K. and Hyde, K.D., Fungi on submerged wood and bamboo in the Plover Cove Reservoir,
Hong Kong, Fung. Divers., 3, 57, 1999.
147. Hino, I., Icones Fungorum Bambusicolorum Japonicolorum, The Fuji Bamboo Garden, Japan, 1961.
148. Rehm, H., Ascomycetes philippinenses collecti a clar. D.B. Baker, Philippine Journal of Science, 8,
181, 1913.
149. Rehm, H., Ascomycetes philippinenses IV, Leaflets of Philippine Botany, 6, 1935, 1913.
150. Rehm, H., Ascomycetes philippinenses V, Leaflets of Philippine Botany, 6, 2191, 1914.
151. Rehm, H., Ascomycetes philippinenses VI, Leaflets of Philippine Botany, 6, 2257, 1914.
152. Rehm, H., Ascomycetes philippinenses VIII, Leaflets of Philippine Botany, 8, 2935, 1916.
153. Sydow, H. and Sydow, P., Enumeration of Philippine fungi with notes and descriptions of new species,
I: Micromycetes, Philippine Journal of Science, Section C, Botany, 8, 265, 1914.
154. Sydow, H. and Sydow, P., Enumeration of Philippine fungi with notes and descriptions of new species,
II, Philippine Journal of Science, Section C, Botany, 8, 475, 1914.
155. Cai, L. et al., Freshwater fungi from bamboo and wood submerged in the Liput River in the Philippines,
Fung. Divers., 13, 1, 2003.
156. Cannon, P.F. and Hawksworth, D.L., The diversity of fungi associated with vascular plants, Adv.Plant
Pathol., 11, 277, 1995.
157. Mabberley, D.J., The Plant Book: A Portable Dictionary of the Higher Plants, Cambridge University
Press, Cambridge, 1987.
158. Eleuterius, L.N., The distribution of Juncus roemerianus in the salt marshes of North America,
Chesapeake Science, 17, 289, 1976.
159. Snogerup, S., A revision of Juncus subgen. Juncus (Juncaceae), Willdenowia, 23, 23, 1993.
160. Kohlmeyer, J. and Volkmann-Kohlmeyer, B., The biodiversity of fungi on Juncus roemerianus, Mycol.
Res., 105, 1411, 2001.
161. Gessner, M.O., Aquatische Hyphomyceten, in Methoden der Biologischen Wasseruntersuchung —
Biologische Gewässeruntersuchung, von Tümpling, W. and Friedrich, G., Eds., Gustav Fischer Verlag,
Jena, 1999, 185.
162. Gessner, M.O., Bärlocher, F., and Chauvet, E., Qualitative and quantitative analyses of aquatic
hyphomycetes in streams, in Freshwater Mycology, Fungal Diversity Research Series, Tsui, C.K.M.
and Hyde, K.D., Eds., 2003, 10, 127.
163. Ingold, C.T., Aquatic hyphomycetes of decaying alder leaves, Trans. Br. Mycol. Soc., 25, 339, 1942.
164. Webster, J. and Descals, E., Morphology, distribution and ecology of conidial fungi in freshwater
habitat, in Biology of Conidial Fungi, Vol. 1, Cole, G.T. and Kendrick, B., Eds., Academic Press,
New York, 1981, 295.
165. Goh, T.K., Tropical freshwater hyphomycetes, in Biodiversity of Tropical Microfungi, Hyde, K.D.,
Ed., Hong Kong University Press, 1997, 189.
166. Chan, S.Y., Goh, T.K., and Hyde, K.D., Ingoldian fungi in Hong Kong, Fung. Divers., 5, 89, 2000.
167. Bills, G.F. and Polishook, J.D., Abundance and diversity of microfungi in leaf litter of a lowland rain
forest in Costa Rica, Mycologia, 86, 187, 1994.
168. Polishook, J.D., Bills, G.F., and Lodge, D.J., Microfungi from decaying leaves of two rainforest trees
in Puerto Rico, J. Ind. Microbiol., 17, 284, 1996.
169. Parungao, M.M., Fryar, S.C., and Hyde, K.D., Diversity of fungi on rainforest litter in North Queensland,
Australia, Biodiver. Conserv., 11, 1185, 2002.
170. Paulus, B., Gadek, P., and Hyde, K.D., Estimation of microfungal diversity in tropical rain forest leaf
litter using particle filtration: the effects of leaf storage and surface treatment, Mycol. Res., 107, 748, 2003.
171. Promputtha, I. et al., Fungal succession on senescent leaves of Manglietia garrettii in Doi Suthep-Pui
National Park, northern Thailand, in Fungal Succession, Fungal Diversity, Hyde, K.D. and Jones,
E.B.G., Eds., Fungal Diversity Press, Hong Kong, 10, 89, 2002.
172. Promputtha, I. et al., Dokmaia monthadangii gen. et sp. nov. a synnematous anamorphic fungus on
Manglietia garettii., Sydowia, 55, 99, 2003.
173. Promputtha, I. et al., Fungi on Manglietia garettii: Cheiromyces manglietiae sp. nov. from dead
branches. Nova Hedwigia, 80, 527, 2005.
174. Photita, W. et al., Fungi on Musa acuminata in Hong Kong, Fung. Divers., 6, 99, 2001.
175. Weir, A. and Hammond, P.M., A preliminary assessment of species-richness patterns of tropical,
beetle-associated Labuolbeniales (Ascomycetes), in Biodiversity of Tropical Microfungi, Hyde, K.D.,
Ed., Hong Kong University Press, Hong Kong, 1997, 121.
9579_C015.fm Page 248 Saturday, November 11, 2006 12:25 PM
176. Cafaro, M.J., Species richness patterns in symbiotic gut fungi (Trichomycetes), Fung. Divers., 9, 47,
2002.
177. Sutton, B.C., The genus Glomerella and its anamorph Colletotrichum, in Colletotrichum: Biology,
Pathology and Control, Bailey, J.A. and Jeger, M.J., Eds., CAB International, Wallingford, 1992,
Chap. 1.
178. Freeman, S. et al., Molecular analyses of Colletotrichum species from almond and other fruits,
Phytopathol., 90, 608, 2000.
179. Corda, A.C.I., Die Pilze Deutschlands, in Deutschlands Flora in Abbildungen nach der Natur mit
Beschreibungen, Sturm, J., Ed., Nürnberg, Sturm, 1837, 3, 1.
180. Sutton, B.C., The Coelomycetes: Fungi Imperfecti with Pycnidia, Acervula and Stromata, Common-
wealth Mycological Institute, Kew, Surrey, England, 1980.
181. Lenne, J.M., Colletotrichum diseases of legumes, in Colletotrichum: Biology, Pathology and Control,
Bailey, J.A. and Jeger, M.J., Eds., CAB International, Wallingford, 1992, 134.
182. Jeewon, R., Liew, E.C.Y., and Hyde, K.D., Phylogenetic evaluation of species nomenclature of
Pestalotiopsis in relation to host association, Fung. Divers., 17, 39, 2004.
183. Crous, P.W., Mycosphaerella spp. and their anamorphs associated with leaf spot diseases of Eucalyptus,
Mycol. Mem., 21, 1, 1998.
184. Platnick, N.I., Philosophy and the transformation of cladistics, Syst. Zool., 28, 537, 1979.
185. Lutzoni, F. and Vilgalys, R., Integration of morphological and molecular data sets in estimating fungal
phylogenies, Can. J. Bot., 73, S649, 1995.
186. Baker, R.H. and Gatesy, J., Is morphology still relevant? in Molecular Systematics and Evolution:
Theory and Practice, DeSalle, R., Giribet, G. and Wheeler, W., Eds., Birkhauser Verlag, Basel, 2002.
187. Liu, Y., Whelen, S., and Hall, B.D., Phylogenetic relationships among ascomycetes: evidence from
an RNA polymerase II subunit, Mol. Biol. Evol., 16, 1799, 1999.
188. Barr, M.E., The ascomycetes connection, Mycologia, 75, 1, 1983.
189. Barr, M.E., Prodomus to Class Loculoascomycetes, Newell, Amherst, Mass, 1987.
190. Barr, M.E., Prodomus to nonlichenized, pyrenomycetous members of Class Hymenoascomycetes,
Mycotaxon, 39, 43, 1990.
191. Eriksson, O. and Hawksworth, D.L., Outline of the Ascomycetes-1993, Systema Ascomycetum, 12,
1, 1993.
192. van Elsas, J.D. et al., Analysis of the dynamics of fungal communities in soil via fungal-specific PCR
of soil DNA followed by denaturing gradient gel electrophoresis, J. Microbiol. Meth., 43, 133, 2000.
193. May, L.A., Smiley, B., and Schmidt, M.G., Comparative denaturing gradient gel electrophoresis analysis
of fungal communities associated with whole plant corn silage, Can. J. Microbiol., 47, 829, 2001.
194. Berbee, M.L. and Taylor, J.W., Fungal molecular evolution: gene, trees and geologic time, in The
Mycota VII: Systematics and Evolution Part B, McLaughlin, D.J., McLaughlin, E.G., and Lemke,
P.A., Eds., Springer-Verlag, New York, 2001, 229.
195. Lang, B.E. et al., The closest unicellular relative to animals, Curr. Biol., 12, 1773, 2002.
196. Bridge, P., The history and application of molecular biology, Mycologist, 16, 90, 2002.
197. Tautz, D. et al., A plea for DNA Taxonomy, Trends Ecol. Evol., 18, 70, 2003.
198. Doran, J.W., Sarrantonio, M., and Liebig, M.A., Soil health and sustainability, Adv. Agron., 56, 2, 1996.
199. Garbeva, P., van Veen, J.A., and van Elsas, J.D., Microbial diversity in soil: selection of microbial
populations by plant and soil type and implications for disease suppressiveness, Annu. Rev. Phytopathol.,
42, 243, 2004.
200. Bridge, P. and Spooner, B., Soil fungi: diversity and detection, Pl. Soil, 232, 147, 2001.
201. Crespo, A., Bridge, P.D., and Hawksworth, D.L., Amplification of fungal rDNA-ITS regions from
non-fertile specimens of the lichen-forming genus Parmelia, Lichenol., 29, 275, 1997.
202. Heuer, H. et al., Analysis of actinomycete communities by specific amplification of gene encoding
16S rDNA and gel-electrophoretic separation in denaturing gradient, Appl. Environ. Microbol., 63,
3233, 1997.
203. Muyzer, G., de Waal, E.C., and Uitterlinden, A.G., Profiling of complex microbial populations by
denaturing gradient gel electrophoresis analysis of polymerase chain reaction-amplified genes coding
for 16S rRNA, Appl. Environ. Microbiol., 59, 695, 1993.
204. Muyzer, G. and Smalla, K., Application of denaturing gradient gel electrophoresis (DGGE) and
temperature gradient gel electrophoresis (TGGE) in microbial ecology, Antonie Van Leeuwenhoek,
73, 127, 1998.
9579_C015.fm Page 249 Saturday, November 11, 2006 12:25 PM
205. Liu, W.T. et al., Characterization of microbial diversity by terminal restriction fragment length
polymorphisms of genes encoding 16S rRNA. Appl. Environ. Microbiol., 63, 4516, 1997.
206. Ranjard, L. et al., Characterization of bacterial and fungal soil communities by automated ribosomal
intergenic spacer analysis fingerprints: biological and methodological variability, Appl. Environ.
Microbiol., 67, 4479, 2001.
207. Bailey, J.A. et al., Molecular taxonomy of Colletotrichum species causing anthracnose of Malvaceae,
Phytopathol., 86, 1076, 1996.
208. O’Neill, N.R., Application of amplified restriction fragment polymorphism for genetic characterization
of Colletotrichum pathogens of alfalfa, Phytopathol., 87, 745, 1997.
209. Pain, N.A. et al., Monoclonal antibodies which show restricted binding of four Colletotrichum species:
C. lindermuthianum, C. malvarum, C. orbiculare, and C. trifolii, Physiol. Mol. Pl. Pathol., 40, 111,
1992.
210. Sheriff, C. et al., Ribosomal DNA sequence analysis reveals new species groupings in the genus
Colletotrichum, Exp. Mycol., 18, 121, 1994.
211. Freeman, S., et al., Molecular analyses of Colletotrichum species from almond and other fruits,
Phytopathol., 90, 608, 2000.
212. Brown, K.B., Hyde, K.D., and Guest, D.I., Preliminary studies on endophytic fungal communities of
Musa acuminata species complex in Hong Kong and Australia, Fung. Divers., 1, 27, 1998.
213. Fisher, P.J. et al., A study of fungal endophytes in leaves, stem and roots of Gynoxis oliefolia Muchler
(Compositae) from Ecuador, Nova Hedwigia, 60, 589, 1995.
214. Fröhlich, J. et al., Endophytic fungi associated with palms, Mycol. Res., 104, 1202, 2000.
215. Kumar, D.S.S., and Hyde, K.D., Biodiversity and tissue-recurrence of endophytic fungi in Tripterygium
wilfordii, Fung. Divers., 17, 69, 2004.
216. Lodge, D.J., Fisher, P.J., and Sutton, B.C., Endophytic fungi of Manilkara bidentata leaves in Puerto
Rico, Mycologia, 85, 733, 1996.
217. Rodrigues, K.F., The foliar endophytes of the Amazonian palm Euterpe oleracea, Mycologia, 86, 376,
1994.
218. Taylor, J.E., Hyde, K.D., and Jones, E.B.G., Endophytic fungi associated with the temperate palm
Trachycarpus fortunei both within and outside of its natural geographic range, New Phytol., 142, 335,
1999.
219. Toofanee, S.B. and Dulymamode, R., Fungal endophytes associated with Cordemoya integrifolia,
Fung. Divers., 11, 169, 2002.
220. Umali, T., Quimio, T., and Hyde, K.D., Endophytic fungi in leaves of Bambusa tultoides, Fung. Sci.,
14, 11, 1999.
221. Dreyfuss, M.M. and Petrini, O., Further investigations on the occurrence and distribution of endophytic
fungi in the tropic plants, Botanica Helvetica, 94, 33, 1984.
222. Fisher, P.J. et al., Fungal endophytes from the leaves and twigs of Quercus ilex L. from England,
Majorca and Switzerland, New Phytol., 127, 133, 1994.
223. Pereira, J.O., Azevedo, J.L., and Petrini, O., Endophytic fungi of Stylosanthes: a first report, Mycologia,
85, 362, 1993.
224. Lacap, D.C., Hyde, K.D., and Liew, E.C.Y., An evaluation of the fungal ‘morphotype’ concept based
on ribosomal DNA sequences, Fung. Divers., 12, 53, 2003.
225. Guo, L.D., Hyde, K.D., and Liew, E.C.Y., Identification of endophytic fungi from Livistona chinensis
based on morphology and rDNA sequences, New Phytol., 147, 617, 2000.
226. Nikolcheva, L. et al., Determining diversity of freshwater fungi on decaying leaves: comparison of
traditional and molecular approaches, Appl. Env. Microbiol., 69, 2548, 2003.
227. Hyde, K.D., Taylor, J.E., and Fröhlich, J., Genera of Palm Ascomycetes, Fungal Diversity Research
Series 3, Fungal Diversity Press, Hong Kong, 2000.
228. Fröhlich, J. and Hyde, K.D., Palm Microfungi, Fungal Diversity Research Series 3, Fungal Diversity
Press, Hong Kong, 2000.
229. Taylor, J.E. and Hyde, K.D., Microfungi of Tropical and Temperate Palms, Fungal Diversity Research
Series 12, Fungal Diversity Press, Hong Kong, 2003.
230. Pinnoi, A. et al., Submersisphaeria palmae sp. nov. with a key to species and notes on Helicoubisia,
Sydowia, 56, 72, 2004.
231. Pinruan, U. et al., Three new species of Craspedodidymum from palm in Thailand, Mycoscience, 45,
177, 2004.
9579_C015.fm Page 250 Saturday, November 11, 2006 12:25 PM
9579_C016.fm Page 251 Saturday, November 11, 2006 3:43 PM
L. A. Craven
Australian National Herbarium, Centre for Plant Biodiversity Research,
CSIRO Plant Industry, Canberra, Australia
E. Biffin
Division of Botany and Zoology, Australian National University,
Canberra, Australia
CONTENTS
ABSTRACT
Syzygium, with about 1,200 species, is one of the largest generic groupings of Myrtaceae. Conven-
tionally, it is considered to be taxonomically difficult due to its previous confusion with another
large genus of the family (Eugenia), the seeming lack of ‘good’ diagnostic characters, and the
uncertainty as to the delimitation of genera within the Syzygium complex per se. Current divergent
taxonomic approaches are discussed, and the taxonomic history of Syzygium is summarised.
251
9579_C016.fm Page 252 Saturday, November 11, 2006 3:43 PM
Present research includes floristic and reproductive biological studies, and active studies into
morphological, anatomical and molecular aspects are in progress. The structural, ecological and
biological diversity of the group, together with its economic and biodiversity significance, point to
Syzygium being a challenging but rewarding subject for future research.
16.1 INTRODUCTION
This chapter aims to bring together past and present taxonomic and systematic research on the very
large and taxonomically perplexing angiosperm genus Syzygium Gaertner (Myrtaceae) and to outline
and stimulate further work; therefore, it is both retrospective and prospective. We show that Syzygium
poses many problems, including its delimitation as a genus, documentation of its species and under-
standing of many aspects of its biology. Nevertheless, we suggest that ongoing floristic and phyloge-
netic studies have the potential to significantly improve our current understanding of the genus.
Matters of Scale: Dealing with One of the Largest Genera of Angiosperms 253
Myrtoideae2, now defined by Wilson et al.20 to encompass more or less the entire Myrtaceae as
previously recognised.
Syzygium and Eugenia are two of the most taxonomically confused genera in the Myrtaceae,
and there are many other genera that have, at one time or another, been cleaved off from them or
been reunited with them. Schmid13 pointed out that there were about 35 generic names which have
been or could be reduced to Syzygium s.l. and at least another 30 assignable to Eugenia s.l. Since
Schmid’s publication the number of segregates has increased with, for example, the description of
Waterhousea B. Hyland and Monimiastrum A.J. Scott. In addition, several species have been placed
within Syzygium on the basis that accurate subdivision or description of segregate genera is currently
not possible (such as Craven9). Together, these genera form a ‘vast array of more or less closely
allied species’ (Ashton21). This ‘array’, dominated by Eugenia and Syzygium, is very large. The
standard printed work, Index Kewensis, has over 3,000 species listed under Eugenia and over 1,000
under Syzygium. Undoubtedly, this does not reflect the true balance in numbers of species between
these genera when they are considered in the strict sense, as even now many authors prefer, because
of historical precedent and because of the enormous number of consequent nomenclatural changes,
to ignore the differences between them.
Schmid13 provides a review of the status of Syzygium s.l. and makes clear why Eugenia and
Syzygium were confused. Schmid’s work summarises many of the relevant references and arguments,
and is therefore not repeated here in detail. Essentially, Schmid showed that Eugenia and Syzygium
were not closely related, differing most evidently in respect of the substitution of the transeptal
vascular supply to the ovule of Eugenia with an axile one in Syzygium. Kochummen3, Kostermans22
and others have criticised this work on the basis that very few species were studied; however, as
contrary data have not been forthcoming, we accept Schmid’s conclusions.
40
30
Number of genera
20
10
0
1 7 14 23 32 75 113 1000
4 11 17 28 40 80 250
Number of species per genus
FIGURE 16.1 Histogram and hand fitted trend line showing a plot of the number of genera against the number
of species within each genus for the Myrtaceae.
balanced by the combination of the number of new species awaiting description and transfers into
Syzygium of valid species wrongly placed in other genera. Conservatively, therefore, we estimate
that the total number of species of Syzygium is likely to be more than 1,000 but less than 1,500.
On this basis, Syzygium may be positioned higher up Frodin’s27 table than currently, possibly even
within the top 10 largest plant genera in the world.
In summary of this section, Syzygium is an extremely large genus wherein many species await
description.
1000 Syzygium
800
Total number of species
600
400
200
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
FIGURE 16.2 Frequency diagram of the total number of genera containing a certain number of species for
the Myrtaceae.
9579_C016.fm Page 255 Saturday, November 11, 2006 3:43 PM
Matters of Scale: Dealing with One of the Largest Genera of Angiosperms 255
species, Syzygium is not regarded as a major timber resource. Lemmens4 indicates that only small
amounts of the locally important timber Kelat (a South East Asian trade name that covers timber
produced by a number of species of Syzygium) are exported. Eddowes41 classed water gum (the
Papua New Guinea trade name for Syzygium timber) as a major exportable hardwood, although it
does not comprise a large proportion of the timber exported4.
To summarise this section, Syzygium is a widespread Old World genus of considerable ecological
and economic importance and therefore, pragmatically, a predictive taxonomic classification will
be of wide utility.
Matters of Scale: Dealing with One of the Largest Genera of Angiosperms 257
1 cm
A 0.5 mm B
C
1 cm
FIGURE 16.3 Illustration of Syzygium pergamentaceum (King) P.Chantaranothai and J.Parn. Drawing shows
(A) opposite leaves each with two intramarginal veins and inflorescence, (B) hypanthial cup, petals and gland
dots on the petals and (C) stamens. (Reproduced from Chantaranothai and Parnell28 with permission.)
Australasia. For Myrtaceae, the work of De Candolle47 stands out as the most comprehensive
classification of the early nineteenth century. De Candolle47 placed the fleshy fruited, one to few
large seeded species in one of five genera: Acmena, Caryophyllus, Eugenia, Jambosa Adans. and
Syzygium. The New World species were accommodated by De Candolle47 in Eugenia and the Old
World species classified in one of the remaining four genera.
Wight48 proposed the merging of the five genera recognised by De Candolle47 into one on the
basis that the floral features demonstrated continuous variation, the internal structure of the flowers
9579_C016.fm Page 258 Saturday, November 11, 2006 3:43 PM
FIGURE 16.4 (A colour version of this figure follows page 240) Flowers and fruit of species of the Syzygium
group. (A) Flowers of Syzygium malaccense (L.) Merr. and L.M. Perry; (B–D) flowers, inflorescence and
fruit, respectively, of Acmena cf. divaricata Merr. and L.M. Perry; (E) fruit of Piliocalyx bullatus Brongn.
and Gris; (F) buds and flowers of Syzygium longifolium (Brongn. and Gris) J.W. Dawson; (G–H) fruit and
flowers, respectively, of Syzygium aqueum (Burm. f.) Alston; (I) flowers of Syzygium jambos (L.) Alston;
(J) buds (note calyptras) of Syzygium kuebiniense J.W. Dawson; (K) fruit of Syzygium rubrimolle B. Hyland;
(L–M) flowers and fruit, respectively, of Syzygium glenum Craven. (Reproduced with permission from
G. Sankowsky (A–D, G, H, K–M), L. Craven. (F, I) and E. Biffin (J).)
9579_C016.fm Page 259 Saturday, November 11, 2006 3:43 PM
Matters of Scale: Dealing with One of the Largest Genera of Angiosperms 259
FIGURE 16.5 (A colour version of this figure follows page 240) Flowers, fruit and foliage of species of the
Syzygium group. (A–B) buds and flowers, and fruit, respectively, of Acmenosperma pringlei B. Hyland;
(C) Syzygium wilsonii subsp. cryptophlebium (F. Muell.) B. Hyland; (D) fruit of Syzygium elegans (Brongn.
and Gris) J.W. Dawson; (E–G) habit, young leaves, and buds and flowers, respectively, of Syzygium acre
(Pancher ex Guillaumin) J.W. Dawson; (H) fruit of Syzygium cormiflorum (F. Muell.) B. Hyland; (I) flowers
of Syzygium boonjee B. Hyland; (J) flower of Syzygium sp.; (K) flowers of Syzygium balansae (Guillaumin)
J.W. Dawson; (L) fruit of Syzygium maraca Craven and Biffin; (M) young fruit of Syzygium sp. (Reproduced
with permission from A. Ford (A–B), G. Sankowsky (C, H, I, L), E. Biffin (D), L. Craven. (E–G, K, M) and
J. Dowe (J).)
9579_C016.fm Page 260 Saturday, November 11, 2006 3:43 PM
and structure of the fruit were very uniform, and the habit of the plants themselves was generally
uniform. Wight did acknowledge the practical difficulty of not having some taxonomic substructure
for such a species rich genus as Eugenia had now become. Therefore, he recognised each of the
five genera he had merged into Eugenia as subgenera using the same epithets. Wight’s solution for
classifying all the large seeded species also had the effect of stabilising nomenclature, and from
the comment made by Bentham49, this appeared to be one consideration for adoption of the same
taxonomy by Bentham and Hooker50 in their influential Genera Plantarum. Meanwhile researchers,
mainly Dutch botanists working in the Malesian region, were continuing to describe new species
in Jambosa and/or Syzygium (for example, Blume51 and Miquel52). As more novel morphological
variation was encountered, new genera were also described from Malesia and the South West Pacific
(for example, Acicalyptus A. Gray, Aphanomyrtus Miq., Clavimyrtus Blume, Cleistocalyx Blume,
Cupheanthus Seem., Pareugenia Turrill and Piliocalyx Brongn. and Gris.). Where the various
segregate genera were known to Bentham and Hooker50, they were all reduced to Eugenia in Genera
Plantarum and therein assigned to one of the three sections they recognised, sect. Jambosa, sect.
Syzygium or sect. Eugenia. Although Bentham and Hooker’s circumscription of Eugenia was
followed by many taxonomists for over 100 years, it was not universally accepted.
Late in the century, Niedenzu’s account of Myrtaceae was published in Die Natürlichen
Pflanzenfamilien15. This work is similar to that of De Candolle46 in that Eugenia is retained for the
New World species (with a very few Old World species that clearly were part of this grouping)
and the very great majority of the Old World species were assigned to four other genera. The Old
World genera he recognised were Acicalyptus, Jambosa (including Cleistocalyx among others),
Piliocalyx and Syzygium (including Acmena among others). The narrower generic concepts for the
Eugenia group were welcomed by those taxonomists of the following century who believed
Bentham and Hooker’s broad circumscription50 to be unsatisfactory.
In summary of this section, Bentham and Hooker’s ‘all in one’ concept of genus50 provided
nomenclatural stability but was, by the end of the nineteenth century, often deemed unsatisfactory.
Matters of Scale: Dealing with One of the Largest Genera of Angiosperms 261
Investigations of Syzygium and Eugenia by other researchers in the second half of the century
have contributed significantly to the debate. Ingle and Dadswell61 studied the wood anatomy of
Myrtaceae in the South West Pacific and concluded that the Eugenia s.l. species sampled fell into
two distinct groups. A few species agreed anatomically with the New World species of Eugenia s.s.
but the majority were distinct from these and comprised species of Acmena, Cleistocalyx and
Syzygium. Pike62 found that pollen morphology supported the conclusions of Ingle and Dadswell61.
Pike further noted that the pollen of the Eugenia s.s. species examined resembled that of the
subtribes Myrtinae and Myrciinae, a finding of significance in the light of the recent work of Wilson
et al.20, in which Eugenia and Syzygium are placed in different tribes, that is, Myrteae and Syzygieae,
respectively.
Floral anatomical investigations by Schmid13 provided strong evidence that Eugenia s.s. and
the Syzygium group were not as closely related as believed by many earlier workers. Schmid13
considered that neither Eugenia nor Syzygium were directly ancestral to the other and that their
divergence occurred long ago. This view was supported by the phylogenetic analysis of mor-
phological and anatomical data by Johnson and Briggs19. This study indicated that Eugenia
formed a clade with other Myrtoideae genera (for example, Austromyrtus (Nied.) Burret, Myrcia
DC. ex Guill., Myrtus and Psidium L.), whereas Syzygium was in a clade with Acmena and other
Old World species remote from the Myrtoideae s.s. clade. Leaf anatomy has been studied in
Malay Peninsula species of Eugenia sects. Acmena, Cleistocalyx, Fissicalyx and Syzygium by
Khatijah et al.63. The results supported the recognition of sect. Acmena but not of sects. Cleistocalyx
and Fissicalyx, which were found to be similar to sect. Syzygium. Haron and Moore64 in a study
of leaf micromorphology of Old and New World Eugenia s.l. species, that is, species referable
to Syzygium and Eugenia s.s., found that there were differences in foliar features between the
two groups.
In summary of this section, twentieth-century authors have either adopted Bentham and
Hooker’s ‘all in one’ concept of genus50 or accepted Eugenia for the New World centred species
and varying numbers of genera for the Old World syzygioid species.
Khatijah et al.63, on the basis of 25 Malesian species surveyed, showed that Henderson’s53 groups
2 and 3 within Syzygium have paracytic stomata, whilst those in his group 4 are anisocytic; further
investigation of this promising line of research has yet to be undertaken. The intrusive material present
in the seeds of Acmena, Acmenosperma and Piliocalyx is an intriguing phenomenon. Hartley and
Craven68, in studies on Acmena, reported that the intrusive tissue was of placental origin. Work in
progress by Biffin indicates that the tissues in Acmena may be derived from the chalaza and be
homologous with tissues that surround the seed in several species of Syzygium s.s. The embryology
of the Syzygium group is another area of research that may be of systematic significance. Both
unitegmic and bitegmic ovules have been observed69,70, and this promising work is being continued.
In summary of this section, new morphological and anatomical analysis has brought forward
promising characters that may be of considerable taxonomic importance. However, in the majority
of cases, further analysis is needed before their significance can be adequately assessed.
Matters of Scale: Dealing with One of the Largest Genera of Angiosperms 263
Cleistocalyx decussatus
S acre
S auriculatum
S amplifolium
S aromaticum
S brackenridgei
S purpureum
S sandwicense
S aqueum
S austrocaledonicum
S macilwraithianum
S malaccense
Group I
S ngyonense
S branderhorstii
S buettnerianum
S bungadinnia
S cumini
S guineense
S masukuense
S jambos
S erythrocalyx
S paniculatum
S nervosum
S pycnanthum
S sp SulawesiBC92
S sp SulawesiBC90
S samarangense
S sp SumatraBC140
Acmena acuminatissima
Acmena smithii
Acmena ingens
Acmena divaricata
Group II
Acmena graveolens
Piliocalyx concinnus
Piliocalyx bullatus
S glenum
Waterhousea floribunda
Waterhousea hedraiophylla
S monimioides
S gustavioides
S buxifolium
Group III
S tetrapterum
S zeylanicum
S luehmanii
S wilsonii
Acmenosperma claviflorum
Group IV
S canicortex
S arboreum
S kuebiniense
S multipetalum
S fullagarii
S maire
Anetholea anisata
S wesa
Metrosiders nervulosa
Outgroups
Tristania neriifolia
Backhousia myrtifolia
Choricarpia subargentea
Eugenia uniflora
Pimenta racemosa
FIGURE 16.6 Strict consensus tree of 10,000 trees derived from a combined ndhF, matK and rpl16 dataset
from a representative sampling of Syzygium s.l. Bold branches have BS ≥ 90%; length 877; CI = 0.831;
RI = 0.829. Names in bold indicate species referrable to Cleistocalyx sensu Merrill and Perry54.
9579_C016.fm Page 264 Saturday, November 11, 2006 3:43 PM
S aqueum
S aromaticum
S austrocaledonicum
S ngyonense
S cordatum
S pondoense
S guineense
S racemosum
S sp SulawesiBC92
S muelleri
Group I
S branderhorstii
S bungadinnia
S erythrocalyx
S jambos
S macilwraithianum
S malaccense
S nervosum
S paniculatum
S sexangulatumBC141
S sp SumatraBC140
S sp SulawesiBC90
S tierneyanum
Acmena divaricata
Acmena graveolens
Piliocalyx francii
Piliocalyx bullatus
Group II
S glenum
Acmena hemilampra
Acmena smithii
S gustavioides
Waterhousea floribunda
Waterhousea hedraiophylla
S buxifolium
Group III
S luehmannii
S wilsonii
S francisii
Acmenosperma claviflorum
S apodophyllum
Group IV
S corynanthum
S canicortex
S fullagarii
S multipetalum
S maire
Anetholea anisata
S wesa
groups
Out-
Thaleropia queenslandica
Tristania neriifolia
FIGURE 16.7 50% bootstrap consensus tree for the ITS data set from a representative sampling of Syzygium
s.l. Data analysed under parsimony with transversions receiving four times the weight of transition substitu-
tions. Bold branches have BS ≥ 90%. Names in bold indicate species referable to Cleistocalyx sensu Merrill
and Perry54.
9579_C016.fm Page 265 Saturday, November 11, 2006 3:43 PM
Matters of Scale: Dealing with One of the Largest Genera of Angiosperms 265
The ITS data set shows a strong bias towards transition substitutions (CT, AG), and the tree
(Figure 16.7) was derived with transversion substitutions receiving four times the weight of tran-
sitions. The ITS data provides moderate to strong support for clades consistent with Groups I, III
and IV in the chloroplast data, although Group II is not resolved as monophyletic, and the
relationships of Anetholea and S. wesa are also unresolved. It is important to note, however, that
areas of disagreement between the ITS and chloroplast data are only weakly, or are not statistically
supported in the ITS data, consistent with the hypothesis that these data are uninformative regarding
some relationships within Syzygium s.l., rather than suggesting an alternative, conflicting resolution.
For instance, we note that, at moderate to high levels of sequence divergence, transition substitutions
are saturated (that is, there is a high probability of unobserved substitutions occurring at some
nucleotide positions), and as such, the historical signal may be obscured by ‘noise’. On the other
hand, congruence between our data sets increases our confidence in the recognition of Groups I,
III and IV. Additional lines of evidence, including further sequences of nuclear DNA and from
morphology, will be required to confidently resolve relationships of the ‘acmenoid’ taxa (Group II),
S. wesa and Anetholea.
In summary of this section, molecular data have suggested that current, largely morphologically
derived, generic characterisations are flawed.
TABLE 16.1
Current and Recently Published Floristic Research in Syzygium s.l.
Region Project Status Author(s)
There are significant impediments to floristic and systematic research on Syzygium. The group
is badly undercollected in many parts of its geographic range. Parnell77 showed that the distribution
of even the most common species of Thai Syzygium showed significant false gaps, which could be
filled in by subsequent collecting, a process that has not yet been even closely approached in
Thailand (and therefore most of South East Asia). Furthermore, Parnell et al.78 showed that Thailand
was severely undercollected, with a low collecting density and low rate of collecting activity. Such
undercollecting is typical of most countries where Syzygium is native. In addition to this lack of
floristic survey, Syzygium species are infrequent flowerers, and nonflowering material is generally
abhorred by tropical collectors, as usually it cannot be named to species. Therefore, even when
areas are thoroughly sampled over a one- to two-year period, species are passed over. Another
limitation is due to the inadequate representation in herbaria of the reproductive stages necessary
for complete descriptions and key preparation, for rarely does a species carry both flowers and fruit
at the same time. Other limitations equally applicable to all plant groups but especially critical for
groups of exceptional size are that the world’s herbaria are understaffed and that the largest are
located, through historical accident, in Europe79–82. The major collections are therefore removed
from the centres of diversity of Syzygium, and this does not aid field study. Current projects that
will result in the easier exchange of data through imaging will mitigate this problem.
Despite the research activity described in Table 16.1, there are a number of countries or areas
where no adequate floristic account exists or is realistically projected, including the Philippines,
Sulawesi and Sumatra. Areas where much further detailed work is needed include Peninsular
Malaysia, Kalimantan and the Andaman and Nicobar Islands, and these therefore could form the
focus for involvement of new workers on the genus. The presence of novel taxonomic data
retrievable only from the above areas cannot be ruled out. Such data, if it exists, could have a
dramatic impact on the structure of any phylogenetic hypothesis.
In summary of this section, current floristic activity is high, but nevertheless there are significant
gaps, which may be of evolutionary significance.
Matters of Scale: Dealing with One of the Largest Genera of Angiosperms 267
Free92 indicated that S. aromaticum may be a nonobligate apomict. The data in Boulter et al.88
indicate that S. sayeri may be able to act as an agamosperm, but that agamospermy is very much
less successful in terms of successfully pollinated buds than outcrossing. A further complication
is the inconsistent exhibition of adventitous polyembryony in certain species, for example Syzygium
cumini (L.) Skeels, wherein it is combined with reportedly varying levels of polyploidy16.
There are no data on the frequency of occurrence of apomixis, nor varying ploidy levels, nor
the frequency of adventitious embryony in Syzygium. Nevertheless, it is clear that the description
of species based on few collections, when combined with a very low collecting rate and density
over much of the range of Syzygium, and the potential for apomixis and ploidy variation might
result in the false delimitation of many taxa as new species which are, at best, microspecies. As
far as we are aware, the suggestion that a significant number of microspecies may exist in Syzygium
is novel.
The anthers in many Syzygium species have conspicuous, although small, glands associated
with the connective which appear secretory. In addition, the petals of Syzygium are also often
glandular. Discussion of the function, if any, of these glands in Syzygium is almost nonexistent,
and details of the chemical composition of the glands’ secretions is unknown. However, the
secretions are clearly variable in quantity and — probably — composition and function; and gland
density has been used as a taxonomic characteristic (Chantaranothai and Parnell28). Their further
study may offer novel taxonomic data and insights on breeding biology as has been suggested for
another Myrtaceous genus, Verticordia DC. (Ladd et al.93,94).
In summary of this section, the breeding biology of Syzygium is underinvestigated. Various
systems ranging from inbreeding to outbreeding occur, and the lack of information on their
frequency of occurence and distribution is a considerable impediment to understanding the delim-
itation and evolution of species in the genus.
ambiguous and impossible to recognise without both flower and fruit. In addition, Craven implies
that novel data, especially molecular data, are likely to suggest splitting of Syzygium in unforeseen
ways which will then allow the erection of robust, phylogenetically defined genera, and that it is
unwise to set up new genera in the interim. Based upon a synthesis of the presently available
molecular and morphological evidence, however, Craven and Biffin consider that the species under
discussion constitute a single, natural group, and that all should be classified in Syzygium with an
infrageneric classification that reflects the evolutionary relationships of the constituent clades. By
contrast, Parnell believes that the inclusion of the majority of new species within an expanded
concept of Syzygium might make it more polyphyletic and overstretch the genus boundaries. He
argues that knowledge of phylogeny will always be imperfect and favours the erection of separate
genera to accommodate such new species based on a sufficiency of current evidence. He believes
that any degree of predictability which could be derived from the current classification will be
diluted by cramming all the currently split off genera (for example, Acmena, Acmenospermum,
Cleistocalyx Piliocalyx and Waterhousea) along with new, probably generically distinct taxa, into
an ever-expanding Syzygium. Neither Craven’s nor Parnell’s methodology eliminates the necessity
for future species transfer between genera — rather, both admit that it will be necessary — however,
they have not agreed which procedure will be minimally disruptive, producing the smallest number
of intergeneric transfers. It does not appear that Article 34 of the International Code of Botanical
Nomenclature95 can be stretched to resolve this problem, as neither Craven or Parnell suggest that
the species described are invalid, nor do they suggest that the new species might not belong to
Syzygium.
This debate raises the question as to whether strict monophyly should be the overwhelming
consideration for classification of such a species rich group, and if it is, how (that is, on what basis)
it is to be established. Clearly, it is unlikely that sufficient numbers of strict monophyletic lineages
can be established in the short to medium term in such a species rich, widespread and poorly known
genus as Syzygium, where molecular data are limited and phylogenetically promising morphological
data are still being discovered. However, if major clades can be identified, then the task of classifying
the genus will be facilitated because researchers will be able to narrow down the number of species
included in their studies. Despite attempts to utilise morphological data cladistically by Parnell12
and Craven96, the lack of resolution suggests that, as in many other genera, morphology by itself
will be inadequate for the task. It may be, as Olmsted and Scotland97 argue, that molecular data
offer ‘more and better data’ to reconstruct phylogeny. Results of analysis of the chloroplast ndhF,
matK and rpl16 data are summarised in Figure 16.6 and are generally congruent with the ITS data
(Figure 16.7). The reliability of ITS data for phylogenetic reconstruction has been questioned by
Álvarez and Wendel98, and Biffin99 is presently investigating the utility of the nuclear encoded large
subunit of RNA polymerase (rpb2) as a source of data for a second nuclear region. Whether general
congruence is a sufficient measure indicating accurate reconstruction of phylogeny requires further
debate.
Strict adherence to the concept of monophyly may also be operationally infeasible. Our work
suggests that the variation patterns of Syzygium s.s. species in South East Asia and Australasia are
different, that there are many species awaiting description and naming, and that a uniform species
concept may, in part due to different breeding systems, be inapplicable. We believe that such basic
descriptive is best undertaken in a phylogenetic framework. However it is clear that in Syzygium,
although the overall framework is being developed as a result of the studies by Harrington and
Gadek71 and Biffin et al.72,73, the detail will take longer and requires much work. It is important
that this work be undertaken for the extremely large Jambosa-Syzygium clade (Group I), not only
because of its extreme size (c. 1,000 species are involved) but also because this group contains the
economically important fruit and spice species. Although morphology may lack the strength of
molecular sequence data for phylogenetics in Syzygium, it still has much to offer the Syzygium
systematist. Apart from its obvious significance for identification, morphology will be important
in characterising the clades recovered from analysis of sequence data.
9579_C016.fm Page 269 Saturday, November 11, 2006 3:43 PM
Matters of Scale: Dealing with One of the Largest Genera of Angiosperms 269
If monophyly is not given prime place, then this raises the issue of what drives classification
and nomenclature. We do not believe that historical precedent and convenience should be the sole
pilots of classification. So, for example, we welcome the transfer of inappropriately placed species
from Eugenia to Syzygium, as no rational systematist now argues that they are closely related.
Unsatisfactory as it might seem, we believe that what will drive classification and nomenclature in
Syzygium, and the segregation of related genera etc., will be similar arguments, essentially based
on an ‘unassailable mass’ of evidence. We differ, however, in our consideration of what is sufficient
mass. Guidance may well be provided by molecular data. Where phylogenetic trees of large genera
constructed on the basis of a few exemplar species, or only a single gene, challenge currently
accepted patterns, we certainly suggest that those patterns and their underlying causes are re-examined
and the testing expanded. Taxonomic change should not be undertaken rashly. We have shown that
DNA sequence data do offer new insights into Syzygium, especially at the higher levels; however,
it is unclear at present how robust those insights are. Advances in phylogenetic reconstruction
methods may be needed for large datasets which might derive from large groups such as Syzygium.
One promising development is continuous jackknife function analysis100, and this appears unutilised
for large datasets which might derive from large groups such as Syzygium. Its application may be
an important tool allowing assessment of the stability of large group phylogenies and impartial
assessment of the achievement of ‘unassailable mass’.
Another difficulty is the question of whether locally distinctive species groups (for example,
the Fijian species assigned to Cleistocalyx, the Papuasian species of the Syzygium furfuraceum
Merr. and L.M. Perry group and the trimerous New Caledonian species) should be given recognition
at some level. If this is done, it is likely to result in a paraphyletic classification with a very large
number of comparably ranked taxa that had to be established merely to ‘balance’ the classification.
In part, we are here concerned with a conflict between the operability and utility of classifications
and their predictability and monophyly. In general, we accept the thrust of the letter coordinated
by Nordal and Stedje101 which advocates the acceptance of paraphyletic taxa (at least for Floras)
and on this basis, there is no reason not to allow for the recognition of locally distinctive species
groups.
Clearly, Syzygium is unusual in size and an obvious question is ‘why is it so big?’. For example,
we may want to know if there are any key innovations that can correlate with, or explain, diversification
patterns (see Davies and Barraclough, Chapter 10; Hodkinson et al., Chapter 17). In some other
large genera, there appears to be an uniting apomorphy of great importance in driving speciation;
in Solanum it may be buzz pollination, in Euphorbia it may be the cyathium, in Ficus it may be
the fig (i.e., the syconium), in the Compositae it may be either a specialised incompatibility
mechanism linked to specialised pollination mechanisms or the development of chemical poisons.
In Syzygium it may be invidious to single out only one key innovation; perhaps, it is better to
consider Syzygium’s combination of features as innovative.
Research areas that we believe will be personally rewarding to study, and which are important
to pursue from the biodiversity perspective include the following:
• Resolving interrelationships of the 80–90% of the genus that comprises Group I (as
defined on the basis of molecular analysis), that is the Jambosa-Syzygium s.s. clade. This
will be a major task, given the sampling issues posed by the geographical distribution
of the group, let alone the identification of suitable DNA sequence regions for analysis.
• Completing floristic surveys of the major regions not yet investigated adequately, especially
Myanmar, Peninsula Malaysia, Kalimantan, Sulawesi and the Philippines.
• Developing an understanding of the biogeography of the major clades, especially of their
prehistorical biogeography.
• Investigating the breeding systems to establish to what extent, if any, there are implications
for taxonomy from factors such as apomixis, hybridisation and introgression.
• Studying evolutionary phenomena, such as r and K adaptive strategies.
9579_C016.fm Page 270 Saturday, November 11, 2006 3:43 PM
In conclusion, we regard the size of Syzygium as a positive, even though we acknowledge there
are caveats on logistical grounds, as it offers opportunities for the initiation of major and stimulating
research projects well into the twenty-first century and beyond. The enormous structural diversity
embodied in the plants themselves, their habit, foliage, their often highly attractive flowers and
fruit, their manifestly diverse ecology and wide geography, their biotic and abiotic interactions with
other animals including man, all ensure that exciting and meaningful research is limited only by
money and imagination.
ACKNOWLEDGEMENTS
We wish to thank various agencies and individuals whose data and support have contributed to this
chapter. John Parnell thanks the EU for support under the Marie Curie Scheme for various post-
doctoral fellows and under the Human Capital and Mobility Scheme, the Trinity Trust and Trinity
College Dublin (TCD) for sponsorship of various postgraduate students, especially Professor
Pranom Chantaranothai, and all of the herbaria, especially TCD and all others listed in their
publications, without whose collections and support this chapter would have been unconstructible.
Lyn Craven and Ed Biffin acknowledge support from the Pacific Biological Foundation, CSIRO
and ANU, and the many individuals and institutions who generously have provided material,
information and field and other assistance. Ed Biffin holds an ABRS Postgraduate Scholarship from
the Australian Biological Resources Study and a Scholarship from the Australian National University.
REFERENCES
1. George, A.S., Myrtaceae, Family description, in Fl. Australia 19, George, A.S., Ed., Australian
Government Publishing Service, Canberra, 1988, 1.
2. Johnson, L.A.S. et al., Myrtaceae, in Flowering Plants in Australia, Morley, B.D. and Toelken, H.R.,
Eds., Rigby, Willoughby, 1988, 175.
3. Kochummen, K.M., Eugenia, in Tree Flora of Malaya 3, Ng, F.S.P., Ed., Longman, London, 1995, 172.
4. Lemmens, R.H.M.J., Syzygium, in PROSEA (Plant Resources of South East Asia) 5 Timber Trees:
Minor Commercial Timbers, Eds. Lemmens, R.H.M.J., Soerianegara I., and Wong, W.C., Backhuys,
Leiden, 1995, 441.
5. Mabberley, D.J., The Plant Book, 2nd ed., Cambridge University Press, Cambridge, 1997.
6. Schmid, R., Comparative anatomy and morphology of Psiloxylon and Heteropyxis, and the subfamilial
and tribal classification of Myrtaceae, Taxon, 29, 559, 1980.
7. Craven, L.A., Myrtaceae of New Guinea, in Ecology of Papua, Conservation International, in press.
8. Chippendale, G.M., Eucalyptus, in Fl. Australia 19, George, A.S., Ed., Australian Government
Publishing Service, Canberra, 1988, 1.
9. Craven, L.A., Unravelling knots or plaiting rope: what are the major taxonomic strands in Syzygium
sens. Lat. (Myrtaceae) and what should be done with them? in Taxonomy: The Cornerstone of
Biodiversity Proc. Fourth Fl. Males. Symp., Saw, L.G., Chua, L.S.L., and Khoo, K.C., Eds., Forest
Research Institute, Malaysia, Kuala Lumpur, 2001, 75.
10. Craven, L.A., Four new species of Syzygium (Myrtaceae) from Australia, Blumea, 48, 479, 2003.
11. Craven, L.A. and Biffin, E., Anetholea anisata transferred to, and two new Australian taxa of, Syzygium
(Myrtaceae), Blumea, 50, 157, 2005.
9579_C016.fm Page 271 Saturday, November 11, 2006 3:43 PM
Matters of Scale: Dealing with One of the Largest Genera of Angiosperms 271
12. Parnell, J., Numerical analysis of Thai members of the Eugenia-Syzygium group (Myrtaceae), Blumea,
44, 351, 1999.
13. Schmid, R., A resolution of the Eugenia-Syzygium controversy Myrtaceae, Amer. J. Bot., 59, 423,
1972.
14. McVaugh, R., The genera of American Myrtaceae: an interim report, Taxon, 17, 354, 1968.
15. Niedenzu, F., Myrtaceae, in Die Natürlichen Pflanzenfamilien 3, Abteilung 7, Engler, A. and Prantl,
K., Eds., Engelmann, Leipzig, 1893, 57.
16. Nic Lughadha, E. and Proença, C., A survey of the reproductive biology of the Myrtoideae Myrtaceae,
Ann. Missouri Bot. Gard., 83, 480, 1996.
17. Hora, F.B., Myrtaceae, in Flowering Plants of the World, Heywood, V.H., Ed., Oxford University
Press, Oxford, 1978, 161.
18. Briggs, B.G. and Johnson, L.A.S., Evolution of the Myrtaceae: evidence from inflorescence structure,
Proc. Linn. Soc. New South Wales, 102, 157, 1979.
19. Johnson, L.A.S. and Briggs, B.G., Myrtales and Myrtaceae: a phylogenetic analysis, Ann. Missouri
Bot. Gard., 71, 700, 1984/5.
20. Wilson, P.G. et al., Relationships within Myrtaceae sensu lato based on a matK phylogeny, Pl. Syst.
Evol., 251, 3, 2005.
21. Ashton, P.S., Myrtaceae, in A Rev. Handbook Fl. Ceylon 2, Dassanayake, M.D., Ed., Balkema,
Rotterdam, 1981, 403.
22. Kostermans, A.J.G.H., Eugenia, Syzygium and Cleistocalyx (Myrtaceae) in Ceylon, Quart. J. Taiwan
Mus., 34, 117, 1981.
23. Simpson, S.C. and Weiner, E.S.C., The Oxford English Dictionary, Book Club Associates for Oxford
University Press, London, 1989.
24. Willis, J.C., Age and Area, Cambridge University Press, Cambridge, 1922.
25. Minelli, A., Biological Systematics: The State of the Art, Chapman and Hall, London, 1993.
26. Minelli, A., Fusco, G., and Sartori, S., Self-similarity in biological classifications, Biosystems, 26, 89, 1991.
27. Frodin, D.G., History and concepts of big plant genera, Taxon, 53, 753, 2004.
28. Chantaranothai, P. and Parnell, J., Syzygium, in Fl. Thailand 7, Santisuk, T. et al., Eds., Forest
Herbarium, Royal Forest Department, Bangkok, 2002, 811.
29. Turner, I.M., Myrtaceae, in A Catalogue of the Vascular Plants of Malaya: Gardens’ Bull. Singapore,
47, 370, 1995.
30. Chantaranothai, P. and Parnell, J., A revision of Acmena, Cleistocalyx, Eugenia s.s. and Syzygium
(Myrtaceae) in Thailand, Thai Forest Bull., 21, 1, 1994.
31. Gamage, H.K., Ashton, M.S. and Signhakumara, B.M.P., Leaf structure of Syzygium spp. (Myrtaceae)
in relation to site affinity within a tropical rain forest, Bot. J. Linn. Soc., 141, 365, 2003.
32. Poonswad, P., Nest site characteristics of four sympatric species of hornbills in Khao Yai National
Park, Thailand, Ibis, 137, 183, 1995.
33. FAOSTAT, Food and Agricultural Organisation Statistical Data, http://faostat.fao.org/faostat, 2005.
34. Oyen, L.P.A. and Xuan, Dung, N., Introduction, in PROSEA (Plant Resources of South East Asia)
19: Essential Oils, Oyen, L.P.A. and Xuan Dung, N., Eds., Backhuys, Leiden, 1999, 15.
35. Sardjono, S., Syzygium polyanthum (Wight) Walpers, in PROSEA (Plant Resources of South East
Asia) 13: Spices, de Guzman, C.C. and Siemonsma, J.S., Eds., Backhuys, Leiden, 1999, 218.
36. Panggabean, G., Syzygium, in PROSEA (Plant Resources of South East Asia) 2: Edible Fruits and
Nuts, Oyen, L.P.A. and Xuan Dung, N., Eds., Backhuys, Leiden, 1992, 292.
37. Vinning, G. and Moody, T., Wax apple, in A Market Compendium of Tropical Fruit, RIRDC, Barton,
ACT, 1997, 267.
38. Hyland, B.P.M., A revision of Syzygium and allied genera (Myrtaceae) in Australia, Austral. J. Bot.
Suppl. Ser., 9, 1, 1983.
39. Djadjo Djipaa, C., Delmée, M., and Quetin-Leclercq, J., Antimicrobial activity of bark extracts of
Syzygium jambos (L.) Alston (Myrtaceae), J. Ethnopharmacol., 71, 307, 2000.
40. Shafi, P.M., et al., Antibacterial activity of Syzygium cumini and Syzygium travancoricum leaf essential
oils, Fitoerapia, 73, 414, 2002.
41. Eddowes, P.J., Water gum, in Commercial Timbers Papua New Guinea: Their Properties and Uses,
Office of Forests, Port Moresby, 1977, 20.
42. Hartley, T.G. and Perry, L.M., A provisional enumeration of species of Syzygium Myrtaceae from
Papuasia, J. Arnold Arb., 54, 160, 1973.
9579_C016.fm Page 272 Saturday, November 11, 2006 3:43 PM
43. Craven, L.A. and Matarczyk, C.A., Acmena, Acmenosperma, Eugenia, Syzygium, Waterhousea, in Fl.
Australia, Wilson, A., Ed., in press.
44. Chen, J. and Craven, L.A., Myrtaceae, in Fl. China, Zhengyi, W. Raven, P.H. and Deyuan, H., Eds.,
in press.
45. Stevens, P.F., On characters and characters states: do overlapping and non-overlapping variation,
morphology and molecules all yield data of the same value, in Homology and Systematics, Scotland,
R, and Pennington, T., Eds., Taylor and Francis, London, 2000, 80.
46. Linnaeus, C., Species Plantarum, Impensis Laurentii Salvii, Stockholm, 1753.
47. De Candolle, A.P., Myrtaceae, in Prodr. Syst. Nat. Reg. Veg. 3, Treuttel and Würz, Paris, 1828, 207.
48. Wight, R., Myrtaceae, in Illustrations of Indian Botany 2, American Mission Press, Madras, 1841, 6.
49. Bentham, G., Notes on Myrtaceae, J. Linn. Soc., Bot. 10, 101, 1869.
50. Bentham, G. and Hooker, J.D., Myrtaceae, in Genera Plantarum 1, Reeve and Co., London, 1865, 690.
51. Blume, C.L., Myrtaceae, in Mus. Bot. Lugduno-Batavum, Brill, Leiden, 1850, 66.
52. Miquel, F.A.W., Myrteae, in Fl. Ned. Indië 1, Post, Amsterdam, Post, Utrecht, Fleischer, Leipzig,
1855, 407.
53. Henderson, M.R., The genus Eugenia (Myrtaceae) in Malaya, Gardens’ Bull. Singapore, 12, 1, 1949.
54. Merrill, E.D. and Perry, L.M., Reinstatement and revision of Cleistocalyx Blume (including Acicalyptus
A. Gray), a valid genus of the Myrtaceae, J. Arnold Arb., 18, 322, 1937.
55. Merrill, E.D. and Perry, L.M., A synopsis of Acmena DC., a valid genus of the Myrtaceae, J. Arnold
Arb., 19, 1, 1938.
56. Merrill, E.D. and Perry, L.M., On the Indo-Chinese species of Syzygium Gaertner, J. Arnold Arb., 19,
99, 1938.
57. Merrill, E.D. and Perry, L.M., The Myrtaceae of China, J. Arnold Arb., 19, 191, 1938.
58. Merrill, E.D. and Perry, L.M., The myrtaceous genus Syzygium Gaertner in Borneo, Mem. Amer. Acad.
Arts Sci., 18, 135, 1939.
59. Merrill, E.D. and Perry, L.M., Plantae Papuanae Archboldianae, IX, J. Arnold Arb., 23, 233, 1942.
60. Merrill, E.D., Readjustments in the nomenclature of Philippine Eugenia species, Phil. J. Sci., 79, 351,
1951.
61. Ingle, H.D. and Dadswell, H.E., The anatomy of the timbers of the South-west Pacific area, Austral.
J. Bot., 1, 353, 1953.
62. Pike, K.M., Pollen morphology of Myrtaceae from the South-west Pacific area, Austral. J. Bot., 4, 3, 1956.
63. Khatijah, H.H., Cutler, D.F., and Moore, D.M., Leaf anatomical studies of Eugenia L. (Myrtaceae)
species from the Malay Peninsula, Bot. J. Linn. Soc., 110, 137, 1992.
64. Haron, N.W. and Moore, D.M., The taxonomic significance of leaf micromorphology in the genus
Eugenia L. (Myrtaceae), Bot. J. Linn. Soc., 120, 265, 1996.
65. Parnell, J., Pollen of Syzygium (Myrtaceae) from S.E. Asia, especially Thailand, Blumea, 48, 303, 2003.
66. Belsham, S.R. and Orlovich, D.A., Development of the hypanthium in Acmena smithii and Syzygium
australe (Acmena alliance, Myrtaceae), Austral. Syst. Bot., 16, 621, 2003.
67. Porter, E.A., Nic Lughadha, E., and Simmonds, M.S.J., Taxonomic significance of polyhydroxyalkaloids
in the Myrtaceae, Kew Bull., 55, 615. 2000.
68. Hartley, T.G. and Craven, L.A., A revision of the Papuasian species of Acmena (Myrtaceae), J. Arnold
Arb., 58, 325, 1977.
69. Biffin, E., unpublished data, 2005.
70. Tobe, H., unpublished data, 2005.
71. Harrington, M.G. and Gadek, P.A., Molecular systematics of the Acmena alliance (Myrtaceae):
phylogenetic analyses and evolutionary implications with reference to Australian taxa, Austral. Syst.
Bot., 17, 63, 2004.
72. Biffin, E., Craven, L.A., Crisp, M.D., and Gadek, P.A., Molecular systematics of Syzygium and allied
genera (Myrtaceae): evidence from the chloroplast genome, Taxon, 55, 79, 2006.
73. Biffin, E. et al., Evolutionary relationships within Syzygium s.l. (Myrtaceae): molecular phylogeny
and new insights on morphology, Proc. Sixth Fl. Males. Symp., in press.
74. Smith, A.C., Myrtaceae, in Fl. Vitiensis Nova 3, Pacific Tropical Botanical Garden, Lawai, Hawaii,
1985, 289.
75. Dawson, J.W., Myrtaceae, Myrtoideae I: Syzygium, Fl. Nouvelle-Calédonie, 23, 1, 1999.
76. Utteridge, T., Review of checklist of woody plants of Sulawesi, Indonesia, Blumea Supplement 14,
Kew Bull., 59, 174, 2004.
9579_C016.fm Page 273 Saturday, November 11, 2006 3:43 PM
Matters of Scale: Dealing with One of the Largest Genera of Angiosperms 273
77. Parnell, J., The conservation of biodiversity: aspects of Ireland’s role in the study of tropical plant
diversity with particular reference to the study of the flora of Thailand and Syzygium, in Biodiversity:
The Irish Dimension, Rushton, B.S., Ed., Royal Irish Academy, Dublin, 2000, 205.
78. Parnell, J.A.N. et al., Plant collecting spread and densities; their potential impact on biogeographical
studies in Thailand, J. Biogeogr., 30, 1, 2003.
79. Parnell, J., European plant systematics and the European Flora, in Systematics Agenda 2000: The
Challenge for Europe, Blackmore, S. and Cutler, D., Eds., Samara Publishing for the Linnean Society
of London, London, 1996, 31.
80. Roos, M.C., Charting tropical plant diversity: Europe’s contribution and potential, in Systematics
Agenda 2000: The Challenge for Europe, Blackmore, S., and Cutler, D., Eds., Samara Publishing for
the Linnean Society of London, London, 1996, 54.
81. Schram, F.R. and Los, W., Training systematists for the 21st century, in Systematics Agenda 2000:
The Challenge for Europe, Blackmore, S., and Cutler, D., Eds., Samara Publishing for the Linnean
Society of London, London, 1996, 89.
82. Walmsley, Baroness et al., What on Earth? House of Lords Select Committee on Science and
Technology Third Report, London, http://www.publications.parliament.uk/pa/ld200102/ldselect/ldsctech/
118/11802.htm, 2002.
83. Scott, A.J., Syzygium, in Fl. Mascareignes 92, Bosser, J. et al., Eds., MSIRI, Mauritius; ORSTOM,
Paris; RBG, Kew, 1990, 28.
84. Verdcourt, B., Syzygium, in Fl. Trop. East Africa, Beentje, H.J., Ed., Balkema, Rotterdam, Brookfield,
2001, 67.
85. Parnell, J. and Chantaranothai, P., Myrtaceae, in Fl. Laos, Cambodg. Viet., Vidal, J. and Hull, S., Eds.,
Muséum National d’Histoire Naturelle et Association Botanique Tropicale, Paris, in prep.
86. Ashton, P.S., Myrtaceae, in Tree Fl. Sabah Sarawak, Forest Research Institute, Malaysia, in prep.
87. Craven, L.A., Syzygium (Myrtaceae) in Papuasia, in prep.
88. Boulter, S.L. et al., Any which way will do: the pollination biology of a northern Australian rainforest
canopy tree (Syzygium sayeri; Myrtaceae), Bot. J. Linn. Soc., 149, 69, 2005.
89. Hopper, S.D., Pollination of the rainforest tree Syzygium tierneyanum (Myrtaceae) at Kuranda, Northern
Queensland, Aust. J. Bot., 28, 223, 1980.
90. Lack, A.J. and Kevan, P.G., On the reproductive biology of a canopy tree, Syzygium syzygioides
(Myrtaceae), in a rain forest in Sulawesi, Indonesia, Biotropica, 16, 31, 1984.
91. Chantaranothai, P. and Parnell, J., The breeding biology of some Thai Syzygium species, Trop. Ecol.
35, 199, 1994.
92. Free, J.B., Myrtaceae, in Insect Pollination of Crops, 2nd ed., Academic Press, London, 1993, 383.
93. Ladd, P.G., Parnell, J.A.N., and Thomson, G., Anther diversity and function in Verticordia DC.
(Myrtaceae), Pl. Syst. Evol., 219, 79, 1999.
94. Ladd, P.G., Parnell, J., and Thompson, G., The morphology of pollen and anthers in an unusual
myrtaceaous genus (Verticordia), in Pollen and Spores: Morphology and Biology, Harley M.M.,
Morton C.M. and Blackmore S., Eds., Royal Botanic Gardens Kew, London, 2000, 325.
95. Greuter, W. et al., Int. Code Bot. Nomenclature (Saint Louis Code), Koeltz Scientific Books,
Königstein, 2000.
96. Craven, L.A., unpublished data, 2002.
97. Olmsted, R.G. and Scotland, R.W., Molecular and morphological datasets, Taxon, 54, 7, 2005.
98. Álvarez, I. and Wendel, J.F., Ribosomal ITS sequences and plant phylogenetic inference, Mol. Phyl.
Evol., 29, 417, 2003.
99. Biffin, E., unpublished data, 2005.
100. Miller, J.A., Assessing progress in systematics with continuous jackknife function analysis, Syst. Biol.,
52, 55, 2003.
101. Nordal, I. and Stedje, B., Paraphyletic taxa should be accepted, Taxon, 54, 5, 2005.
9579_C016.fm Page 274 Saturday, November 11, 2006 3:43 PM
9579_C017.fm Page 275 Monday, November 13, 2006 2:52 PM
17 Supersizing: Progress
in Documenting and
Understanding Grass
Species Richness
T. R. Hodkinson
Department of Botany, School of Natural Sciences, Trinity College Dublin,
Ireland
V. Savolainen
Molecular Systematics Section, Jodrell Laboratory, Royal Botanic Gardens,
Kew, Richmond, Surrey, England
S. W. L. Jacobs
National Herbarium, Royal Botanic Gardens Sydney, NSW, Australia
N. Salamin
Department of Ecology and Evolution, University of Lausanne, Switzerland
CONTENTS
275
9579_C017.fm Page 276 Monday, November 13, 2006 2:52 PM
ABSTRACT
This paper reviews the progress in documenting and understanding species richness for one of the
most diverse and economically important groups of plants (the grasses; Poaceae). It discusses the
value of modern taxonomic resources and large phylogenetic trees for macro-evolutionary studies.
More specifically, it discusses the use of phylogenetic trees for detecting and dating major lineages,
investigating biogeographical origins, identifying patterns of diversification and investigating factors
leading to species richness. Theoretical and practical issues regarding the production of large phylo-
genetic trees and supertrees of the grass family (c. 650 genera and 10,000 species) are also discussed.
It asks how far we are from complete tribal, generic and species phylogenetic trees of the grasses.
17.1 INTRODUCTION
If you can confidently say you know the grass family (Poaceae) you are misled. You are almost
certainly talking about a geographical area or one of its taxonomic groups (genera or tribes) because,
in terms of species richness, the family is vast. Even a specialist with lifelong devotion will only
just have begun to understand the diversity that exists within this family. It is the fifth most species
rich angiosperm family, ranking only behind Asteraceae (daisies), Fabaceae (beans), Orchidaceae
(orchids) and Rubiaceae (coffee family)1. Despite its size (651 genera and 10,000 species sensu
Clayton and Renvoize1; 635 genera and 9,000 species sensu Mabberley2), advances in grass taxonomy
and systematics have occurred faster than in most groups of plant because of their socioeconomic
and ecological importance. They cover, chiefly as grasslands or bamboo forests, more than one third
of the world’s land surface3 and provide staple cereal, sugar crops and reeds (such as Arundo, Avena,
Hordeum, Oryza, Phragmites, Saccharum, Secale, Sorghum, Triticum and Zea). They also include
many noncommercial and commercially bred forage and lawn species (such as the temperate species
in Alopecurus, Cynosurus, Dactylis, Festuca, Lolium, Phleum and Poa, or the tropical species in
Cynodon, Digitaria, Panicum, Paspalum, Pennisetum, Stenotaphrum, Urochloa and Zoysia).
Despite recent advances in grass systematics few large phylogenetic trees of the family have
been produced. Grass phylogenetics is, in many ways, still in its infancy and lags behind classical
taxonomy in its coverage of species and genera. Phylogenetic studies such as those by the Grass
Phylogeny Working Group (GPWG)4 are helping to shape taxonomic treatments and better define
genera and species5,6 but are often based on limited sampling. Large phylogenetic trees are required
also for accurate inferences of macro-evolutionary processes7–9. It is desirable to sample most of
the diversity of taxa within a study group to reduce the risk of incorrect phylogenetic tree
reconstruction10–12 and to include most of the relevant information to make optimal use of the
evolutionary trees obtained7,13–15.
This paper focuses on the problems and prospects of documentation and furthering systematic
understanding of species rich groups using the grasses as a case study. The first part of the chapter
outlines progress that has been made in classification, in monographic/floristic studies, and in the
dissemination of taxonomic information via electronic resources and informatics. The second part
reviews the current state of grass phylogenetics for the study of patterns and processes in grass evolution.
The final part examines future prospects in documenting and understanding grass species richness and
discusses some of the theoretical and practical issues regarding the production of large phylogenetic trees.
9579_C017.fm Page 277 Monday, November 13, 2006 2:52 PM
ae
ae
eae
eae
eae
eae
eae
eae
idea
idea
idea
oide
oide
oide
id
ar toid
Pooid
doid
ecoid
ridoid
buso
dino
onio
o
ochlo
Phar
Pueli
Panic
Aristi
th
Chlo
Ehrh
Arun
Danth
Bam
Cento
Anom
FIGURE 17.1 Distribution of species and genera in subfamilies of grasses. Light grey bars represent number
of genera and dark grey bars represent number of species. (Source: GPWG4.)
9579_C017.fm Page 278 Monday, November 13, 2006 2:52 PM
ae
ae
ae
ae
ea
ea
id e
ide
de
id e
o id
o id
coi
so
ico
Po
din
rid
bu
he
n
lo
Pa
n
m
tot
Aru
Ch
Ba
n
Ce
FIGURE 17.2 Distribution of species and genera in subfamilies of grasses. Light grey bars represent number
of genera and dark grey bars represent number of species. (Source: Clayton and Renvoize1.)
taxonomic journals are not yet available in this format. The main stumbling block is that, as yet,
new names are not recognised as legitimate if only published electronically. Digitally available
books are still a rarity. However, the trend over the next few decades will undoubtedly be towards
increased electronic publishing. A study commissioned by the British Library predicts that by 2020,
approximately 80% of UK book output will be available in electronic form, and approximately
40% (including research monographs) will only be available in this form21. There is also the potential
to digitise material that was not ‘born digital’ such as highly valuable existing taxonomic literature.
For example, the British Library has scanned various such works, including Shakespeare and
nineteenth-century newspapers21.
Taxonomists have also undertaken a number of major initiatives to provide web-based taxo-
nomic resources for all organisms, but this task is not small. Indeed, Wheeler et al.22 used the term
‘terascale taxonomy’ to describe the mountainous task of compiling information on trillions of
observations, for 10 or more millions of species in museum collections. These include observations
on classification, nomenclature, phylogeny, morphology, physiology, ontogeny, ecology, behaviour,
geography and genome. Global initiatives include: the Catalogue of Life (a consortium involving
Species 2000; http://www.sp2000.org); the Integrated Taxonomic Information System (ITIS;
http://www.itis.usda.gov); the Global Taxonomy Initiative (GTI; http://www.biodiv.org); and the
Global Biodiversity Information Facility (GBIF; http://www.gbif.org). The Catalogue of Life con-
sortium aims to catalogue all known organisms and construct a web-based freely accessible syn-
onymic index of species and associated data23. The GTI of the Convention on Biological Diversity
is involved in building taxonomic capacity and making taxonomic information available. The GBIF
is developing a network that links together dispersed but electronically available taxonomic
information24. The meshing together of informatics and taxonomy has therefore begun, and much
progress is being made in making taxonomic information available over the web23,25.
Electronic resources available to grass systematists include general plant bibliographic or
nomenclatural databases such as the Kew Bibliographic Databases (including the Kew Record of
Taxonomic Literature, the Plant Micromorphological Bibliographic Database and the Economic
Botany Bibliographic Database; http://www.kew.org/kbd/searchpage.do), Index Kewensis (avail-
able on CD-ROM), the International Plant Names Index (IPNI; http://www.ipni.org) and W3
Tropicos (http://mobot.mobot.org/W3T/search/vast.html). Useful sources of grass taxonomic literature
can also be obtained electronically from J.F. Veldkamp, National Herbarium of the Netherlands, Leiden
9579_C017.fm Page 279 Monday, November 13, 2006 2:52 PM
(veldkamp@nhn.leidenuniv.nl), and a list of links to many web-based electronic resources for grass
systematists can be found at http://mobot.mobot.org/W3T/search/nwgc.html. Web-based databases
are also available for the family. Notable among these are the near complete descriptive treatments
of all grass genera, species and their synonymy such as the World Grass Species database and the
World Grass Species Synonymy database (http://www.rbgkew.org.uk/data/grasses), the Grass Genera
of the World database (http://delta-intkey.com/grass17) and the Catalogue of New World Grasses
(CNWG; http://mobot.mobot.org/W3T/search/nwgc.html).
There is also a move towards digitising herbarium specimens in the form of scanned images.
A photograph can never replace a specimen, but virtual herbaria have many uses and can be
particularly useful for type specimens. For example, the US National Herbarium’s Botanical Type
Specimen Register (http://ravenel.si.edu/botany/types), W3 Tropicos and the CNWG have specimen
information and digital images of many grass species including type specimens. Digital Floras are
also being produced. Examples include AusGrass26, an interactive key with maps, species descrip-
tions and line illustrations of Australian grasses, and the Grasses for North America project
(http://herbarium.usu.edu/webmanual) that has descriptions, keys, maps and illustrations.
One challenge to the global grass taxonomy community will be to produce a list of accepted
names (and synonyms) and to standardise taxonomic treatments. For example, W3 Tropicos and
the World Grass Species database are not fully congruent. There is clearly great potential to further
develop web-based resources, and these will help alleviate some of the problems associated with
access to herbarium collections and type specimens in particular. There is also a need for mono-
graphic work at lower taxonomic rank such as genus. Monographs are steadily produced for the
grasses, but progress is slow, especially for large genera. The task of producing monographs is far
from equal for all grass genera (and angiosperms in general) because species are not distributed
randomly among genera. The frequency distribution of genera approximates a logarithmic curve
(the hollow curve) and is typical of angiosperm families (Hilu, Chapter 111,27). The distribution is
skewed toward monotypic genera and those with few species28–30. Furthermore there is a dispro-
portionate number of species in certain genera. A high percentage (c. 30%) of all Poaceae species
are in a few genera such as Agrostis, Bambusa, Digitaria, Eragrostis, Festuca, Poa, Panicum,
Paspalum and Stipa17, although these genera are slowly diminishing as new taxonomic treatments
become available and species are moved elsewhere. Furthermore, c. 3% of the genera contain 50%
of the grass species (Hilu, Chapter 11). Therefore, the concept of average generic size is almost
meaningless1. The biological and evolutionary reasons for these patterns are discussed in Frodin30
and Hilu (Chapter 11). These large genera clearly need to be revised if diversity within the family
is to be fully understood. The monophyly of many genera has also been questioned such as
Cortaderia31, Eragrostis32, Miscanthus5,6, Panicum33–35, Pennisetum36, Setaria36,37 and Sorghum38,39
(but see Dillon et al.40 for contrary evidence). Apomicts, such as many species in the large genus
Poa, also have their own taxonomic challenges (see Frodin30), as do polyploidy complexes, which
often include apomixes41,42.
The rate of progress in production of taxonomic monographs is slow. For example, Steussy43
estimates that a typical Ph.D. revision project of three to four years might include 10 to 40 species,
depending on the amount of macromolecular work. He also suggested that in professional life a
reasonable pace is only two species per year. Some believe that DNA taxonomy (or an alternative,
DNA barcoding) offers a full or partial solution to taxon recognition (discovery) and identification.
It has been suggested that DNA sequences be used to identify species; the sequenced DNA is
placed in a web-based databases such as DDBJ/EMBL/GenBank and linked to a verified herbarium
specimen from which it was taken. This may have some potential for grass identification and taxon
recognition, but many reservations apply. These have been discussed at length (Seberg and Petersen,
Chapter 3; Wheeler20,47; Tautz et al.44,46; Lipscomb et al.45; Chase48; Kristiansen49) and include
concerns about sequence quality, insufficient sampling within and among species, pseudogenes,
herbarium specimen quality and availability, type specimen use and common occurrence of hybrid-
isation and introgression and associated DNA exchange (capture) between closely related species.
9579_C017.fm Page 280 Monday, November 13, 2006 2:52 PM
However, it is premature to dismiss DNA taxonomy, as it undoubtedly has high potential. DNA
barcoding is seen by many as a better alternative (see Seberg and Petersen, Chapter 3) and uses
DNA sequences to aid identification but is not all prevailing when it comes to identification. DNA
sequences also have huge potential for phylogenetics, classification and for providing a phylogenetic
framework for developing meaningful monographic studies.
relative to the rest of the grasses, hereafter ‘earliest diverging lineage’). These are distributed in
the neotropics and are broad leaved forest genera. The next earliest diverging lineage was Pharus
(Pharoideae).
In contrast to single gene analyses there have been relatively few reports of combined analyses
or multi-gene studies of the entire family. The most significant combined data analysis included
62 grasses sampling approximately 8% of the genera4. Data sets of DNA sequences (nuclear PHYB,
ITS2 and gbssI; plastid ndhF, rbcL, rpoC2), plastid restriction site data and morphological data
were analysed alone and in combination. Relatively well resolved and well supported trees were
obtained and allowed for major reclassification of the family. Other studies include the combined
ndhF, rbcL and PHYB dataset of Clark et al.8 and the morphological, chromosomal, biochemical
and plastid DNA character set of Soreng and Davis64,82.
Tribe Subfamily
Lep
Cyn,Era, Chloridoideae
Lep
Orc
Pap
Incertae sedis
Dan Danthonioideae
Ari Aristidoideae
Aru,Eri,Mic Arundinoideae
PACCAD
Arundinel
And
Panicoideae
Isa
Pan
Arundinel
Cen Centochecoideae
Thy
PACCAD-P
Hai
Ave,Poe
Poideae
Tri
Bro
Dia
Sti,Amp,Mel
Phae
Bra,Nar,Lyg
Incertae sedis
Bam
B,E Bambusoideae
Spikelet clade Oly
Par
Ory Ehrhartoideae
Phy, Ehr, Str
Pue Puelioideae
Pha Pharoideae
Ano
Strep,Gua Anomochlooideae
Outgroup
FIGURE 17.3
9579_C017.fm Page 283 Monday, November 13, 2006 2:52 PM
amount of genomic information exists for a number of model species belonging to the largest
subfamilies. For example, the fully sequenced plastid genomes of maize (Z. mays ssp. mays;
Panicoideae), rice (O. sativa; Ehrhartoideae) and wheat (T. aestivum; Pooideae) are available88,89
and have been compared by Matsuoka et al.90 with tobacco (Nicotiana) as an outgroup. The three
cereals contained 106 genes, and eight of these were invariable between species. Analyses using
neighbour joining (NJ) (parameters not given) and 84 to 98 genes, depending on the inclusion or
exclusion of genes with significant rate heterogeneity, always resolved wheat as sister to rice
(consistent with a BEP clade hypothesis). However, bootstrap values varied in the range of 52–87%,
and the branch connecting maize to the wheat-rice grouping was very short in comparison to the
terminal branches. Ogihara et al.91 have examined structural alterations (caused by replication
slippage and intra-molecular recombination facilitated by microsatellite regions) in the plastid
genomes of the wheat, maize and rice. They found that the structure of the grass plastid chromosome
was highly similar, but that some hot spots for structural mutation were present. By examining the
deletion patterns of open reading frames in the inverted repeat (IR) regions and the junctions
between the IR and the small single copy region (SSC) they concluded that wheat is much more
similar to rice than to maize.
It seems, therefore, that results of whole genome analysis, including structural rearrangements91
and sequence nucleotide variation90, of three cereal species would be more consistent with the BEP
hypothesis than the PACCAD-P hypothesis. However, three taxon comparisons like those of Mat-
suoka et al.90 and Ogihara et al.91 tell us little about broad phylogenetic relationships, and this may
not be the final word in the BEP, PACCAD-P (or alternatives) debate. More genomes will need to
be added to represent all major clades. To test the monophyly of the BEP clade, it is necessary to
include a Bambusoideae s.s. taxon (for example, a woody or herbaceous bamboo). It would also
be fruitful to establish what patterns are found in the early diverging lineages of grasses to help
determine when the alterations described above took place. Further advances in phylogenomic
methods are expected in the near future92, and the grasses are well positioned as a model group
for studies in this direction.
FIGURE 17.3 Supertree of grass genera indicating positions of significant shifts in diversification rate. A
semistrict consensus of 1,000 trees is shown. Black shaded circles represent nodes where significant or marginally
significant shifts in diversification rate have occurred (P-values < 0.06). Subfamilies and tribes follow GPWG1
and the tribes are labelled as follows: Amp, Ampelodesmeae; And, Andropogoneae; Ano, Anomochloeae; Ari,
Aristideae; Arundinel, Arundinelleae; Aru, Arundineae; Ave, Aveneae; Bam, Bambuseae; Bra, Brachyelytreae;
Bro, Bromeae; Cen, Centotheceae; Cyn, Cynodonteae; Dan, Danthonieae; Dia, Diarrheneae; Era, Eragrostideae;
Ehr, Ehrharteae; Eri, Eriachneae; Gua, Guaduelleae; Hai, Hainardeae; Isa, Isachneae, Lep; Leptureae; Lyg,
Lygeeae; Mel, Meliceae; Mic, Micraireae; Nar, Nardeae; Oly, Olyreae; Orc, Orcuttieae; Ory, Oryzeae; Pan,
Paniceae; Pap, Pappophoreae; Par, Parianeae; Phae, Phaenospermatideae; Pha, Phareae; Phy, Phyllorachideae; Poe,
Poeae; Pue, Puelieae; Sti, Stipeae; Str, Streptogynaeae; Strept, Streptochaeteae; Thy, Thysanolaeneae; Tri, Triticeae.
9579_C017.fm Page 284 Monday, November 13, 2006 2:52 PM
anthers and pollen. However, further identification is problematic with this material, and even
confident assignment to subfamily level has not been possible. They have raceme-like panicles and
two florets per spikelet. Morphological characters could place them in Aveneae (Pooideae) or
Arundineae (Arundinoideae). Recently, Prasad et al.98 reported that they had detected phytoliths
from at least five taxa from extant grass subclades on the Indian continent dating from c. 70–65 mybp
(Late Cretaceous). They postulate a Gondwanan origin for the grasses (before 125 mybp). The
earliest clearly identifiable macrofossil is of Pharus preserved in amber and trapped in mammalian
hair. It dates to 45–30 mybp (Late Eocene/Early Oligocene99).
It is possible to conclude therefore that, on the basis of fossil evidence, grasses can be dated
back to 55 mybp with some confidence and to 70 mybp if grass-like pollen and phytoliths are
considered. In contrast to these dates, paleobotanical (grass macrofossils and grass pollen, known
generically as Monoporites annulatus) and indirect paleofaunal evidence (from herbivore morphol-
ogy) indicate that widespread grass-dominated ecosytems did not evolve until the early to middle
Miocene (20–15 mybp100), by which time all major lineages of grass were likely to have evolved
(that is, all currently recognised subfamilies were present). It seems therefore, that the major
divergences detected within Poaceae PACCAD, B, E, P (but excluding Pharoideae, Anomochloo-
ideae and Puelioideae, which were probably already present) occurred considerably earlier than
the 25–15 mybp suggested by the GPWG4 but still a long time following the origin of the family
(a conservative 70–50 mybp). All grasses known to have C4 metabolism belong to the PACCAD
clade. The first fossil C4 grass has been dated at 12.5 mybp, and isotopic evidence for C4 dates
from approximately 15 mybp (see Kellogg101 for a discussion of the parallel evolution of this
important trait within the grasses). This suggests that the PACCAD clade is at least 15 million
years old; however, it is likely to be considerably older, and molecular clock estimations of dates
support this. For example, Bremer57 indicates that the major diversifications including the PACC
clade date to about 55–35 mybp, and the divergence of maize and Pennisetum has been dated at
25 mybp102.
Molecular phylogenetic dating is complicated by a range of issues103–105 including the paucity
of the fossil record, the uncertainty in calibrating reference nodes with fossils, the uncertainty of
the proposed phylogeny and deviations of its branches from the molecular clock. Numerous
molecular phylogenetic dating methods exist, and recent advances have removed the assumption
of a molecular clock103,106–109. However, relatively few attempts have been made to date major clades
in Poales or Poaceae using molecular phylogenetic trees. Molecular phylogenetic dating of the
monocots has estimated the origin of Poales to the mid-Cretaceous (115 ± 11 mybp110). Bremer110
using the rbcL gene and NPRS dates the origin of Poaceae at between 80 and 70 mybp, depending
on whether the root or crown node is considered for calibration. Gaut111 used a phylogenetic tree
based on rbcL and ndhF gene sequences to date major radiations in the grasses and calibrated the
tree on a single point that assumed rice and maize diverged 50 mybp. This showed the grasses to
originate at 77 mybp and Ehrhartoideae (rice) and Pooideae (oats, barley and wheat) to diverge at 46
mybp. The divergence of Panicoideae from Ehrhartooideae and Pooideae occurred at 50 mybp. The
studies110,111 were based on small sample numbers (both included only nine grass species). It is clear
that there is a need for large, accurately dated trees of the grass family for macroevolutionary studies.
Inferences can sometimes be made directly about the prehistorical biogeography of species by
examining the geographical distribution of fossils and the present day distribution of species.
However, the GPWG4 believes that few conclusions regarding the biogeographical origin of grasses
can be made using such an approach. This is because the earliest diverging lineage of the grasses
(Anomochlooideae) is in Central and South America, the next earliest diverging lineage (Pharoi-
deae) is pantropical, and the next (Puelioideae) is restricted to tropical Africa. The paucity of the
fossil record cannot help interpret this complex multicontinental distribution pattern of early
diverging lineages. The sister groups to the grasses also vary considerably in biogeography. For
example, Joinvilleaceae is in Borneo, New Caledonia and throughout the Pacific, but Ecdeio-
coleaceae is restricted to Australia.
9579_C017.fm Page 285 Monday, November 13, 2006 2:52 PM
However, if the time scale is pushed back with recent fossil discoveries98,112 and the interpre-
tation of the Australian tectonic plate first proposed by Audley-Charles113 is considered, then perhaps
these distributions can provide us with information. The sister group is concentrated on or near the
Australian plate, the earliest diverging grass lineage (Anomochlooideae) is in Central and South
America, the next earliest diverging lineage (Pharoideae) is pantropical, the next (Puelioideae) is
restricted to tropical Africa, and the earliest phytolith fossils from central India at c. 70 mybp were
diverse in their subfamily composition. This distribution certainly seems to suggest a Gondwanan
origin for the whole clade.
An alternative method of prehistorical biogeography is to use phylogenetic trees that encompass
adequate taxon sampling (including representatives of clades with broadly differing geographical
origins). In these studies attempts are made to optimise the geography of taxa on trees (that is, on
nodes throughout the tree). Various types of parsimony reconstruction and related methods such as
vicariance dispersal analysis114 are commonly used for this purpose. Bremer57 used vicariance dispersal
analysis (using DIVA115) on phylogenetic trees of Poales to explore prehistorical biogeographic
patterns. His reconstructions suggest that Poaceae originated in South America, but the graminoid
clade and Poales originated in Australia. Ecdeiocoleaceae were sister to the grasses in his reconstruc-
tions and these two families shared a common ancestor about 76 mybp. The connection between
Australia and South America broke up about 35 mybp, so land migration was still possible between
these continents. Such a hypothesis would also help explain why reconstructions show a South
American origin for Poaceae, but an Australian origin for Ecdeiocoleaceae and the graminoid clade.
It should be noted, however, that some of his reconstructions yielded alternative optimisations for the
origin of Poaceae, but the origin of the graminoid clade was always Australia. Despite this, the origin
of the grasses looks most likely to be Gondwanan and probably South American; a hypothesis that needs
further exploration using large phylogenetic trees and other methods of biogeographic reconstruction.
clade in the panicoids (Paniceae) and a clade in the pooids (an Aveneae-Poeae subgroup). Shifts
in diversification rate have therefore occurred some time after the origin of these tribes.
Factors influencing diversification rate variation include prolific cladogenesis, adaptive radia-
tions, mass extinctions, key innovations, global change and coevolution. The relative importance
of each of these factors as causal agents to the diversification patterns we describe above is unknown.
Further study is required, diversification rate studies will need to incorporate branch length infor-
mation and dates will need to be found for the diversification rate shifts. This will also facilitate
the examination of correlations between diversification rate variation and past environmental and global
change factors such as temperature, aridity and CO2 levels. The factors correlating to these shifts are
currently being investigated (Bouchenak-Khelladi et al., in prep.). The study by Hodkinson et al.86
used genera and not species because of practical data/tree availability limitation. However, we
envisage future studies trying to investigate species-level variation in this and other ways. For
example, we would like to know why relatively few grass genera, such as Agrostis, Bambusa,
Digitaria, Eragrostis, Festuca, Poa, Panicum, Paspalum and Stipa, account for a considerable
percentage of all Poaceae species (possibly over 30% 12; see also Hilu, Chapter 11).
Identifying certain traits, key innovations or other factors that might influence the rate of evolution
and production of new species is a challenge to evolutionary biologists119,120. The observed differences
in species richness between certain clades can be correlated with the presence of particular factors.
However, to identify correlates of species richness, the hierarchical nature of evolutionary history has
to be taken into account in order to avoid erroneous inferences13. Comprehensive phylogenetic trees
that contain, when possible, estimates of divergence dates are therefore required to not only assess the
patterns of diversification but also to assess the processes leading to that diversification121. Kellogg101
identified C4 metabolism as a possible key innovation influencing species richness in the grasses by
simply comparing lineages with known species numbers. However, a more powerful approach to assess
the potential causes (correlations) of diversification and species richness is to use sister clade comparison
tests7,122. In these, comparisons are made between sister taxa, one possessing a trait (or factor) and the
other not. For example, Salamin and Davies123 mapped traits from Watson and Dallwitz17 onto a grass
supertree and identified all sister clades with contrasting traits (for example, bisexual versus monoecious
breeding system). They then made a comparison between the number of species in each sister clade
against the null hypothesis of equal speciation rates (following the methods of Slowinski and Guyer122
and Goudet124). Their results indicated that herbaceous habit and an annual life cycle have a significant
correlation with species richness. The results were also consistent with the hypothesis that annuals
might be better able to fit new niches and become more species rich125 and that generation time is a
factor influencing species richness in the grasses. Recombination and genetic change will be facilitated
by the short generation time of annuals. Woodiness is also linked to generation time in grasses, with
many of the woody bamboos having long generation times in comparison to other grasses1. A link
between speciation rates and nucleotide substitution rates has been established in other taxonomic
groups126–128, but evidence is inconclusive in the grasses129. However, the analysis by Gaut et al.129 was
limited to a small proportion of grass diversity, and extending the sampling could change its outcome.
The study of Salamin and Davies123 failed to show any link between a number of other characters and
speciation rate including the ability to resist drought, ability to tolerate saline environments, open versus
forest habitat and bisexual versus monoecious breeding system. Other factors require investigation.
this task has become relatively straightforward with modern automated sequencing. Finally, these
data have to be analysed and visualised in an appropriate way. The first task is the most challenging
and will require the longest time and major international collaboration. Good phylogenetic practice
therefore depends on good taxonomy and vice versa; the two are inextricably linked. There is a
paucity of suitably trained grass taxonomists, and they have limited time and resources. However,
assuming that the plants can be collected, and the data generated, we can ask whether there are
other theoretical impediments to the process. One major concern is whether methods of phylogenetic
reconstruction can accommodate large datasets.
Increasing the number of taxa sampled130–134 and the number of characters135–137 in a dataset
generally results in more reliable phylogenetic trees. This is partly because such datasets tend to
reduce random error and sampling bias (both properties of finite datasets) and sometimes incon-
sistency (an asymptotic property138). Furthermore, Källersjö et al.133 and Savolainen and Chase50
describe examples where homoplasy, present in large datasets, improves local phylogenetic signal.
In a parsimony framework, multiple successive substitutions along a branch cannot be observed,
and saturated characters can rapidly confound the tree search. Although large datasets contain
potentially more homoplastic changes, the large number of branches will spread these multiple
changes throughout the tree, making them locally informative, and therefore increasing their
inferential qualities50,84. Model based methods are less prone to such problems even with smaller
trees, because a sufficient model will take into account the hidden multiple changes of each
character.
Most phylogenetic reconstructions of the grasses have included a relatively small number of
species. For example, the combined analyses of the GPWG4 contained 61 genera and 62 species
(respectively, c. 8% and c. 0.6% of the total). If we accept that large phylogenetic trees are desirable,
it is also important to ask whether existing methods of phylogenetic reconstruction can accommo-
date matrix sizes that approximate to the size of the grass family and include multiple gene regions.
Many multi-gene analyses95,139, including the GPWG4 analyses, have incorporated missing data
because complete large datasets are often not practically producible. The influence of missing data
on phylogenetic inference has been discussed84,140. On theoretical grounds it would be expected
that increasing missing data could reduce accuracy of phylogenetic inference by increasing the
number of optimal solutions found and creating uncertainty in the placement of some taxa relative
to others84. It is therefore not clear how much missing data can be accommodated in phylogenetic
reconstruction, and the most conservative option is to produce complete maximum combined
matrices84,141.
Perhaps the largest phylogenetic reconstructions have been made within the higher plants. The
first ‘large tree’ was by Chase et al.142 who included 499 seed plants in a parsimony analysis of
rbcL sequences. The number of taxa included in analyses of plant groups has steadily grown, and
one study in particular included 2,538 rbcL sequences in a parsimony jackknifing analysis143. Other
large multi-gene analyses have been made95,144. The results of these empirical studies are encour-
aging and could satisfy us that matrix sizes up to approximately 2,500 taxa can be analysed using
existing methods such as maximum parsimony (MP) and NJ. Advances in phylogenetic reconstruc-
tion methods should also improve the situation. Methods that can search the tree space more
efficiently have been proposed to overcome the computationally intensive task of finding an optimal
tree. For example, parsimony jackknifing145 estimates well supported groups that will guide the
search through the tree space. Bayesian methods146,147, which are powerful and versatile approaches,
use a random walk through the tree space guided by the posterior probability of each tree to infer
credibility intervals on the topology, branch lengths and model parameters used. In comparison,
the more traditional way of estimating support by calculating bootstrap percentages is problematic
with matrices containing thousands of taxa. Even reducing the complexity of the search during the
bootstrap, which was shown to give comparable results than more extensive tree searches148, will
require enormous computer power. Parallelisation of the process on larger computer clusters will
reduce the computation time, but the process will remain tedious.
9579_C017.fm Page 288 Monday, November 13, 2006 2:52 PM
We can conclude, therefore, that it should be suitable to use existing methods of phylogenetic
reconstruction to analyse reliably a large multi-gene matrix of Poaceae, and that these analyses
should provide better estimates of phylogenetic pattern than analyses of smaller matrices. For
studies of tens of thousands of taxa we move into uncharted territory. We have to rely on results
of simulation studies to test the utility of methods such as parsimony to accurately reconstruct
phylogeny from such data sets. Salamin et al.137 used Monte Carlo simulations to assess the accuracy
of MP and NJ methods to retrieve model trees using 13,000 taxa (the number of angiosperm genera
and close to the number of grass species). The results were encouraging because even with relatively
inefficient heuristic search options a high percentage of nodes on the model tree were correctly
inferred (80% with parsimony). NJ was more problematic because computing the distance matrix
for such large matrices is more computationally demanding than a MP search with more than 5,000
characters. The number of characters that will be needed to construct complete and robust generic-
or species-level trees of the grasses is unknown. However, the simulation studies of Salamin et al.137
with a 13,000-taxon tree found a sharp improvement in phylogenetic accuracy when character data
sets were increased in size between 5,000 and 10,000 informative characters. Increasing the number
of characters beyond this had a slower impact on phylogenetic reconstruction accuracy.
Disregarding the theoretical considerations listed above, it is worth noting how far we still are from
a complete generic-level tree of the grasses from a practical perspective. Sequence data on the
grasses is accumulating at a rapid rate. A preliminary examination, in December 2004, of sequence
availability in DDBJ/EMBL/GenBank looked promising, as there were 68,153 sequences deposited
for the grasses. However, if we examine the subfamily distribution of these sequences (data not
shown) we find that most (over 98%) have been produced for Ehrhartoideae (68.6%), Pooideae
(15.6%) and Panicoideae (13.9%). These contain the most important cereal crops that have been
subject to intense genomic study. The sequencing of the c. 400 mbp of the Oryza genome (Ehrhar-
toideae) has been completed88,89,149. Furthermore, these figures represent the number of entries in
DDBJ/EMBL/GenBank (including replicated information and pseudogenes) and will therefore be
an overestimation of what could be used in a phylogenetic analysis.
For maximum sized multi-gene analyses it is desirable to combine data from some combination
of the most frequently sequenced gene regions (with good taxonomic sampling across the grass
family). The prime candidate regions for combination are therefore from the top 10 most sequenced
gene regions. The number of grass genera and species sequenced for each of these regions is given
in Figure 17.4. The maximum number of species and genera sequenced for any particular DNA
region is 577 and 162, respectively, for a nuclear ribosomal DNA region (ITS). The plastid gene
ndhF is the next best represented gene region (354 species, 162 genera). Assuming even complete
overlap of taxa, the maximum sized data set for two regions would be limited to 162 genera and
354 species. The reality is far worse than this ideal scenario, as there is often poor overlap of taxa
between genes. Combined datasets will therefore have to accommodate missing data, and targeted
sequencing needs to be conducted to maximise future dataset size.
17.4.3 DIVIDE-AND-CONQUER
An alternative approach to the direct analyses of large multi-gene region datasets for phylogenetic
reconstruction is to use some sort of ‘divide-and-conquer’ strategy to build trees from individual
data matrices and later assemble them on the basis of taxonomic overlap with other such trees
(Wilkinson and Cotton, Chapter 5 9,83,84,150). From a theoretical stance, these meta-analysis tech-
niques may be less favourable than direct large multi-region analyses for phylogenetic reconstruc-
tion, but they may offer an adequate solution to the problem of constructing large trees when we
9579_C017.fm Page 289 Monday, November 13, 2006 2:52 PM
577
400
354
350
300
273
250
210
200 190
175
162 162 163 162
150
114 110
101
100 81 79 74
60 61
50 24 23
0
ITS ndhF trnL matK rpoc2 Gbss1 phyB rbcL rpoA rps16
FIGURE 17.4 Data availability for single and multi-gene analyses. Number of grass species and genera
sequenced for each of the top 10 most frequent gene regions in DDBJ/EMBL/GenBank (December 2004).
Light grey bars represent number of genera and dark grey bars represent number of species.
consider the data availability problems of the grasses9,86. However, the analysis of truly large multi-
region analyses (supermatrices) may also ultimately depend on some sort of divide-and-conquer
strategy (the decomposition of the tree into subproblems and the recombining of these into an
overall solution; Wilkinson and Cotton, Chapter 5). The reliability of supertree methods has been
questioned151 and debated at length83,152. However, many of these reliability issues, such as poor
quality data, data duplication and data accountability, have been addressed (Wilkinson and Cotton,
Chapter 5 85,152). Several empirical studies are also adding support to the validity of the supertree
approach 9,153. For example, Salamin et al.9 used the results of single gene region analyses from the
GPWG4 to generate supertrees that were congruent with trees generated by combining these data
into multi-region analyses. They also constructed a supertree of the grass family (containing 395
genera) that was broadly congruent with previous studies.
Disregarding practical considerations, some sort of divide-and-conquer strategy will be required
to piece together large phylogenetic trees of the grasses, at least in the short term. Appropriate data
are simply not available for large multi-gene phylogenetic trees. If data become available, supertree
phylogenetic divide-and-conquer methods are still likely to be required to handle the scale of the
computational problem presented by the supermatrices. The source trees in the supertrees of Salamin
et al.9 and Hodkinson et al.86 do contain some duplicated data (the source data are not totally
independent). We are currently generating new supertrees that remove this problem by reconstruct-
ing optimal single gene region trees from all available sequence data for the top ten genes shown
in Figure 17.4 (primary data analyses of single regions) and then combining these trees using a
range of meta-analysis methods into supertrees (Kinney et al., in prep.) that also incorporate support
statistics such as bootstrapping. This approach removes the problem of duplication and non-
independence of data that can occur when taking topologies directly from published literature. We
are also comparing how these methods compare to supermatrix methods using the same data
(Kinney et al., in prep.).
17.5 CONCLUSIONS
The grasses represent a small but highly significant piece of the tree of life154–156 and lessons learnt
by grass systematists and taxonomists should help guide those researchers studying less well known
groups of organisms. The future of many aspects of grass systematics will depend critically on how
well classical taxonomy can be meshed with phylogenetics and informatics. The collection of grass
species, correct identification of specimens, maintenance of high-quality collections and associated
9579_C017.fm Page 290 Monday, November 13, 2006 2:52 PM
literature are likely to be the major limiting factors in the production of large phylogenetic trees (or
the use of these sequences for DNA barcoding or taxonomy). Technical advances in molecular biology
and digital archiving of sequence information have already occurred. There is a need to reach an
international consensus over taxonomic nomenclature and accepted names, but this will need major
coordinated action at an international level. We can also conclude that, despite the great accumulation
of sequence data for the grasses and the advances that have been made in phylogenetic theory and
phylogenetic reconstruction in the grasses, we are still a long way from complete species-level or
even generic-level trees of the family. Targeted sequencing should focus on filling in gaps in existing
data and improving taxonomic sampling across the family. These trees and associated supertrees will
have, amongst other things, great utility for macroevolutionary studies of grasses.
ACKNOWLEDGEMENTS
This work was supported by the Irish Higher Education Authority, Enterprise Ireland (SC2003/437)
and the Swiss National Science Foundation (81AN-068367).
REFERENCES
1. Clayton, W.D. and Renvoize, S.A., Genera Graminum: Grass Genera of the World, Her Majesty’s
Stationery Office, London. 1986.
2. Mabberley, D.J., The Plant Book, 2nd ed., Cambridge University Press, England, 1997.
3. Archibold, O.I.V., Ecology of World Vegetation, Chapman and Hall, London, 1995, chap. 1.
4. GPWG. Phylogeny and subfamilial classification of the grasses (Poaceae), Ann. Missouri Bot. Gard.,
88, 373, 2001.
5. Hodkinson, T.R. et al., Phylogenetics of Miscanthus, Saccharum and related genera (Saccharinae,
Andropogoneae, Poaceae) based on DNA sequences from ITS nuclear ribosomal DNA and plastid
trnL intron and trnL-F intergenic spacers, J. Plant Res., 115, 381, 2002.
6. Hodkinson, T.R et al., The use of DNA sequencing (ITS and trnL-F), AFLP and fluorescent in situ
hybridization to study allopolyploid Miscanthus (Poaceae), Amer. J. Bot., 89, 279, 2002.
7. Purvis, A., Using interspecies phylogenies to test macroevolutionary hypotheses, in New Uses for
New Phylogenies, Harvey, P.H. et al., Eds., Oxford University Press, Oxford, 1996, 153.
8. Barraclough, T.G. and Nee, S., Phylogenetics and speciation, Trends Ecol. Evol., 16, 391, 2001.
9. Salamin, N., Hodkinson T.R., and Savolainen, V., Building supertrees: an empirical assessment using
the grass family (Poaceae), Syst. Biol., 51, 136, 2002.
10. Graybeal, A., Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol., 47,
9, 1998.
11. Rannala, B. et al., Taxon sampling and the accuracy of large phylogenies, Syst. Biol., 47, 702, 1998.
12. Hillis, D.M. et al., Is sparse taxon sampling a problem for phylogenetic inference? Syst. Biol., 52,
124, 2003.
13. Felsenstein, J., Phylogenies and the comparative method, Am. Nat., 125, 1, 1985.
14. Harvey, P.H. and Pagel, M.D., The Comparative Method in Evolutionary Biology, Oxford University
Press, London, 1991, chap. 1.
15. Pagel, M., Inferring the historical patterns of biological evolution, Nature, 401, 877, 1999.
16. Renvoize, S. and Clayton, W.D., Classification and evolution of the grasses, in Grass Evolution and
Domestication, Chapman, G.P., Ed., Cambridge University Press, Cambridge, 1992, 3.
17. Watson, L. and Dallwitz, M.J., The Grass Genera of the World, CAB International, Wallingford, 1992.
18. Clark, L.G., Zhang W., and Wendel, J.F., A phylogeny of the grass family (Poaceae) based on ndhF
sequence data, Syst. Bot., 20, 436, 1995.
19. Godfray, H.C.J., Challenges for taxonomy, Nature, 417, 17, 2002.
20. Wheeler, Q.D., Transforming taxonomy, The Systematist, 22, 3, 2003.
21. Powell, D.J., Publishing Output to 2020, The British Library, 2004.
9579_C017.fm Page 291 Monday, November 13, 2006 2:52 PM
22. Wheeler, Q.D., Lipscomb, D., and Platnick, N., Terascale taxonomy: cyber-infrastructure and the
Linnaean legacy, in Proceedings of the Fourth Biennial Conference of the Systematics Association,
Trinity College Dublin, Ireland, 14, 2003.
23. Bisby, F.A., et al., Taxonomy, at the click of a mouse, Nature, 418, 367, 2002.
24. Knapp, S. et al., Taxonomy needs evolution, not revolution, Nature, 419, 559, 2002.
25. Blackmore, S., Biodiversity update: progress in taxonomy, Science 298, 365, 2002.
26. Sharp, D. and Simon, B.K., AusGrass: Grasses of Australia, ABRS Identification Series, interactive
CD ROM, ABRS and EPA, Queensland, 2002.
27. Clayton, W.D., The logarithmic distribution of angiosperm families, Kew Bull., 29, 271, 1974.
28. Clayton, W.D., Chorology of the genera of Gramineae, Kew Bull., 30, 111, 1975.
29. Clayton, W.D., The genus concept in practice, Kew Bull., 38, 149, 1983.
30. Frodin, D.G., History and concepts of big plant genera, Taxon, 53, 753, 2004.
31. Barker, N.P. et al., The paraphyly of Cortaderia (Danthonioideae; Poaceae): evidence from morphol-
ogy, chloroplast and nuclear DNA sequence data, Ann. Missouri Bot. Gard., 90, 1, 2003.
32. Ingram, A.L. and Doyle, J.J., Is Eragrostis (Poaceae) monophyletic? Insights from nuclear and plastid
sequence data, Syst. Bot., 29, 545, 2004.
33. Gómez-Martínez, R. and Culham, A., Phylogeny of the subfamily Panicoideae with emphasis on the
tribe Paniceae: evidence from the chloroplast trnL-F cpDNA region, in Grasses: Systematics and
Evolution, Jabobs, S.W.L. and Everett, J.E., Eds., CSIRO, Collingwood, Victoria, 2000, 136.
34. Duvall, M.R., Noll, J.D., and Minn, A.H., Phylogenetics of Paniceae (Poaceae), Amer. J. Bot., 88,
1988, 2001.
35. Giussani, L.M. et al., A molecular phylogeny of the grass subfamily Panicoideae (Poaceae) shows
multiple origins of C4 photosynthesis, Amer. J. Bot., 88, 1993, 2001.
36. Doust, A.N. and Kellogg, E.A., Inflorescence diversification in the panicoid ‘bristle grass’ clade
(Paniceae, Poaceae): evidence from molecular phylogenies and developmental morphology, Amer. J.
Bot., 89, 1203, 2002.
37. Kellogg, E.A. et al., Taxonomy, phylogeny and inflorescence development of the genus Ixophorus
(Panicoideae: Poaceae), Intern. J. Pl. Sci., 165, 1089, 2004.
38. Spangler, R.E. et al., Andropogoneae evolution and generic limits in Sorghum (Poaceae) using ndhF
sequences, Syst. Bot., 24, 267, 1999.
39. Spangler, R.E., Taxonomy, Sarga, Sorghum and Vacoparis (Poaceae: Andropogoneae), Aust. Syst.
Bot., 16, 279, 2003.
40. Dillon, S.L., Lawrence, P.K., and Henry, R.J., The use of ribosomal ITS to determine phylogenetic
relationships within Sorghum, Pl. Syst. Evol., 230, 97, 2001.
41. Yu, P., Comparative Reproductive Biology of Two Vulnerable and Two Common Grasses in Bothri-
ochloa Kuntze and Dicanthium Willem, Ph.D. thesis, University of New England, Armidale, Australia,
1999.
42. Yu, P., Prakash, N. and Whalley, R.D.B., Sexual and apomictic seed development in the vulnerable
grass Bothriochloa biloba S.T. Blake, Aust. J. Bot., 51, 75, 2003.
43. Steussy, T., The role of creative monography in the biodiversity crisis, Taxon, 42, 313, 1993.
44. Tautz, D. et al., DNA points the way ahead in taxonomy, Nature, 418, 479, 2002.
45. Lipscomb, D., Platnick, N., and Wheeler, Q., The intellectual content of taxonomy: a comment on
DNA taxonomy, Trends Ecol. Syst., 18, 65, 2003.
46. Tautz D. et al., A plea for DNA taxonomy, Trends Ecol. Syst., 18, 70, 2003.
47. Wheeler, Q.D., Taxonomic triage and the poverty of phylogeny, Phil. Trans. Royal Soc. Lond. B, 359,
571, 2004.
48. Chase, M.W. et al., Land plants and DNA barcodes: short-term and long-term goals, Phil. Trans. R.
Soc. B., 360, 1889, 2005.
49. Kristiansen, K.A. et al., DNA taxonomy: the riddle of Oxychloë (Juncaceae), Syst. Bot., 30, 284, 2005.
50. Savolainen, V. and Chase, M.W., A decade of progress in plant molecular phylogenetics, Trends Genet.,
19, 717, 2003.
51. Palmer, J.D., Soltis, D.E., and Chase, M.W., The plant tree of life: an overview and some points of
view, Amer. J. Bot., 91, 1437, 2004.
52. APG, An ordinal classification for the families of flowering plants, Ann. Missouri Bot. Gard., 85, 531, 1998.
9579_C017.fm Page 292 Monday, November 13, 2006 2:52 PM
53. Chase, M.W., Fay, M.F., and Savolainen, V., Higher-level classification in the angiosperms: new
insights from the perspective of DNA sequence data, Taxon, 49, 685, 2000.
54. APG, An update of the angiosperm phylogeny group classification for the orders and families of
flowering plants: APG II, Bot. J. Linn. Soc., 141, 399, 2003.
55. Davis, J.I., et al., A phylogeny of the monocots, as inferred from rbcL and atpA sequence variation,
and a comparison of methods for calculating jacknife and bootstrap values, Syst. Bot., 29, 467, 2004.
56. Simpson, D.A. et al., Phylogenetic relationships in Cyperaceae subfamily Mapanioideae inferred from
pollen and plastid DNA sequence data, Amer. J. Bot., 90, 1071, 2003.
57. Bremer, K., Gondwanan evolution of the grass alliance of families (Poales), Evolution, 56, 1374, 2002.
58. Cronquist, A., An Integrated System of Classification of Flowering Plants, Columbia University Press,
New York. 1981.
59. Linder, H.P. and Ferguson, I.K., On the pollen morphology and phylogeny of the Restionales and
Poales, Grana, 24, 65, 1985.
60. Campbell, C.S. and Kellogg, E.A., Sister group relationships of the Poaceae, in Grass Systematics
and Evolution, Soderstrom, T.R. et al., Eds., Smithsonian Institution Press, Washington, D.C., 1987,
217.
61. Briggs, B.G. and Johnson, L.A.S., Hopkinsiaceae and Lyginiaceae, two new families of Poales in
Western Australia, with revisions of Hopkinsia and Lyginia, Telopea, 8, 477, 2000.
62. Rudall, P.A. et al., Evolution of reproductive structures in grasses (Poaceae) inferred by sister-group
comparison with their putative closest living relatives, Ecdeiocoleaceae, Amer. J. Bot., 92, 1432, 2005.
63. Doyle, J.J. et al., Chloroplast DNA inversions and the origin of the grass family (Poaceae), Proc.
Natl. Acad. Sci. USA, 89, 7722, 1992.
64. Soreng, R.J. and Davis, J.I., Phylogenetics and character evolution in the grass family (Poaceae):
simultaneous analysis of morphological and chloroplast DNA restriction site character sets, Bot. Rev.,
64, 1, 1998.
65. Hilu, K.W., Alice, L.A., and Liang, H.P., Phylogeny of Poaceae inferred from matK sequences, Ann.
Missouri Bot. Gard., 86, 835, 1999.
66. Briggs, B.G., et al., A molecular phylogeny of Restionaceae and allies, in Grasses: Systematics and
Evolution, Jacobs, S.W.L. and Everett, J.E., Eds., CSIRO Collingwood, Victoria, 2000, 661.
67. Michelangeli, F.A., Davis J.I., and Stevenson, D.W., Phylogenetic relationships among Poaceae and
related families as inferred from morphology, inversions in the plastid genome, and sequence data
from the mitochondrial and plastid genomes, Amer. J. Bot., 90, 93, 2003.
68. Hamby, R.K. and Zimmer, E.A., Ribosomal RNA sequences for inferring phylogeny within the grass
family (Poaceae), Pl. Syst. Evol., 160, 29, 1988.
69. Doebley, J. et al., Evolutionary analysis of the large subunit of carboxylase (rbcL) nucleotide sequence
among the grasses (Gramineae), Evolution, 44, 1097, 1990.
70. Davis, J.I. and Soreng, R.J., Phylogenetic structure in the grass family (Poaceae) as inferred from
chloroplast DNA restriction site variation, Amer. J. Bot., 80, 1444, 1993.
71. Mason-Gamer, R.J., Weil, C.F., and Kellogg, E.A., Granule-bound starch synthase: structure, function,
and phylogenetic utility, Mol. Biol. Evol., 15, 1658, 1998.
72. Hsiao, C. et al., A molecular phylogeny of the grass family (Poaceae) based on the sequences of
nuclear ribosomal DNA (ITS), Aus. Syst. Bot., 11, 667, 1999.
73. Mathews, S. and Sharrock, R.A., The phytochrome gene family in grasses (Poaceae): a phylogeny and
evidence that grasses have a subset of the loci found in dicot Angiosperms, Mol. Biol. Evol., 13, 1141, 1996.
74. Mathews, S., Tsai, R.C., and Kellogg, E.A., Phylogenetic structure in the grass family (Poaceae):
evidence from the nuclear gene phytochrome B, Amer. J. Bot., 87, 96, 2000.
75. Liang, H. and Hilu, K.W., Application of the matK gene sequences to grass systematics, Canad. J.
Bot., 74, 125, 1995.
76. Barker, N.P., Linder, H.P., and Harley, E.H., Polyphyly of Arundinoideae (Poaceae): evidence from
rbcL sequence data, Syst. Bot., 20, 423, 1995.
77. Duvall, M.R. and Morton, B.R., Molecular phylogenetics of Poaceae: an expanded analysis of rbcL
sequence data, Molec. Phylogenet. Evol., 5, 353, 1996.
78. Zhang, W.P. Phylogeny of the grass family (Poaceae) from rpl16 intron sequence data, Molec.
Phylogenet. Evol., 15, 135, 2000.
9579_C017.fm Page 293 Monday, November 13, 2006 2:52 PM
79. Barker, N.P., Linder, H.P., and Harley, E.H., Sequences of the grass-specific insert in the chloroplast
rpoC2 gene elucidate generic relationships of the Arundinoideae (Poaceae), Syst. Bot., 23, 327, 1999.
80. Nadot, S., Bajon, R., and Lejeune B., The chloroplast gene rps4 as a tool for the study of Poaceae
phylogeny, Pl. Syst. Evol., 191, 27, 1994.
81. Clark, L.G. et al., The Puelioideae, a new subfamily of Poaceae, Syst. Bot., 25, 181, 2000.
82. Soreng, R.J. and Davis, J.I., Phylogenetic structure in Poaceae subfamily Pooideae as inferred from
molecular and morphological characters: misclassification vs. reticulation, in Grasses: Systematics
and Evolution, Jacobs, S.W.L. and Everett. J.E., Eds., CSIRO Collingwood, Victoria, 2000, 61.
83. Bininda-Emonds, O.R.P., Gittleman, J.L., and Steel, M.A., The (super)tree of life: procedures, prob-
lems, and prospects, Ann. Rev. Ecol. Syst., 33, 265, 2002.
84. Sanderson, M.J., and Driskell, A.C., The challenge of constructing large phylogenetic trees, Trends
Plant Sci., 8, 374, 2003.
85. Wilkinson, M. et al., The shape of supertrees to come: tree shape related properties of fourteen
supertree methods, Syst. Biol., 54, 419, 2005.
86. Hodkinson, T.R. et al., Large trees, supertrees and diversification of the grass family, in Aliso 23
(Grasses: Systematics and Evolution Vol. 3), Columbus, J.T. et al., Eds., Allen Press, KS, USA, in
press.
87. Salamin, N., Large trees, supertrees and the grass phylogeny, Ph.D. thesis, University of Dublin,
Trinity College Dublin, 2001.
88. Goff, S.A. et al., A draft sequence of the rice genome (Oryza sativa L. ssp. japonica), Science, 296, 79, 2002.
89. Yu, J., et al., A draft sequence of the rice genome (Oryza sativa ssp. indica), Science, 296, 79, 2002.
90. Matsuoka, Y. et al., Whole chloroplast genome comparison of rice, maize and wheat: implications for
chloroplast gene diversification and phylogeny of cereals, Mol. Biol. Evol., 19, 2084, 2002.
91. Ogihara, Y. et al., Structural features of a wheat plastome as revealed by complete sequencing of
chloroplast DNA, Molec. Genet. Genomics, 266, 740, 2002.
92. Eisen, J.A. and Fraser, C.M., Phylogenomics: intersection of evolution and genomics, Science, 300,
1706, 2003.
93. Friis, E.M., Pedersen K.R., and Crane, P.R., Fossil evidence of water lilies (Nymphaeales) in the Early
Cretaceous, Nature, 410, 357, 2001.
94. Sun, G. et al., Archaefructaceae, a new basal angiosperm family, Science, 296, 899, 2002.
95. Soltis, P.S., Soltis, D.E., and Chase, M.W., Angiosperm phylogeny inferred from multiple genes as a
tool for comparative biology, Nature, 402, 402, 1999.
96. Linder, H.P., The evolutionary history of the Poales/Restionales: a hypothesis, Kew Bull., 42, 297, 1987.
97. Crepet, W.L. and Feldmann, G.D., The earliest remains of grasses in the fossil record, Amer. J. Bot.,
78, 1010, 1991.
98. Prasad, V. et al., Dinosaur coprolites and early evolution of grasses and grazers, Science, 310, 1177, 2005.
99. Poinar, G.O. and Columbus, J.T., Adhesive grass spikelet with mammalian hair in Dominican amber:
first fossil evidence of epizoochory, Experientia, 48, 906, 1992.
100. Jacobs, B.F., Kingston, J.D., and Jacobs, L.L., The origin of grass-dominated ecosystems, Ann.
Missouri Bot. Gard., 86, 590, 1999.
101. Kellogg, E.A., The grasses: a case study in macroevolution, Ann. Rev. Ecol. Syst., 31, 217, 2000.
102. Gaut, B.S. and Doebley, J.F., DNA sequence evidence for the segmental allotetraploid origin of maize,
Proc. Natl. Acad. Sci. USA, 94, 6809, 1997.
103. Sanderson, M.J., Estimating absolute rates of molecular evolution and divergence times: a penalized
likelihood approach, Mol. Biol. Evol., 19, 101, 2002.
104. Donoghue, P.C.J. and Smith, M.P., Telling the Evolutionary Time: Molecular Clocks and the Fossil
Record, Systematics Association Series 66, CRC Press, Florida, 2003.
105. Magallón, S.A. and Sanderson, M.J., Angiosperm divergence times: the effect of genes, codon
positions, and time constraints, Evolution, 59, 1653, 2005.
106. Sanderson, M.J., A nonparametric approach to estimating divergence times in the absence of rate
constancy, Mol. Biol. Evol., 14, 1218, 1997.
107. Thorne, J.L., Kishino, H., and Painter, I.S., Estimating the rate of evolution of the rate of molecular
evolution, Mol. Biol. Evol., 15, 1647, 1998.
108. Cutler, D.J., Understanding the overdispersed molecular clock, Genetics, 154, 1403, 2000.
9579_C017.fm Page 294 Monday, November 13, 2006 2:52 PM
109. Cutler, D.J., Estimating divergence times in the presence of an overdispersed molecular clock, Mol.
Biol. Evol., 17, 1647, 2000.
110. Bremer, K., Early cretaceous lineages of monocot flowering plants, Proc. Natl. Acad. Sci. USA, 97,
4704, 2000.
111. Gaut, B.S., Evolutionary dynamics of grass genomes, New Phytol., 154, 15, 2002.
112. Piperno, D.R. and Sues, H-D., Dinosaurs dined on grass, Science, 310, 1126, 2005.
113. Audley-Charles, M.G., Dispersal of Gondwanaland: relevance to evolution of the angiosperms, in
Biogeographical Evolution of the Malay Archipelago, Whitmore, T.C., Ed., Clarendon Press, Oxford,
1987, 5.
114. Ronquist, F., Dispersal-vicariance analysis: a new approach to the quantification of historical bioge-
ography, Syst. Biol., 46, 195, 1997.
115. Ronquist, F., DIVA ver. 1.1, http://www.ebc.uu.se/systzoo/research/diva/diva.html, 1996.
116. Chan, K.M.A. and Moore, B.R., SYMMETREE: whole-tree analysis of differential diversification
rates, Bioinformatics, 21, 1709, 2004.
117. Moore, B.R., Chan, K.M.A., and Donoghue, M.J., Detecting diversification rate variation in supertrees,
in Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, Bininda-Emonds,
O.R.P., Ed., Kluwer Academic Publishers, Dordrecht, 2004, 487.
118. Chan, K.M.A. and Moore, B.R., Whole-tree methods for detecting differential diversification rates,
Syst. Biol., 51, 855, 2002.
119. Burger, W.C., Why are there so many kinds of flowering plants? Bioscience, 31, 577, 1981.
120. Maynard Smith, J. and Szathmáry, E., The Major Transitions in Evolution, Freeman, Oxford, UK,
1995, 346.
121. Weiblen, G.D., Oyama, R.K., and Donoghue, M.J., Phylogenetic analysis of dioecy in monocotyle-
dons, Am. Nat., 155, 46, 2000.
122. Slowinski, J.B. and Guyer, C.G., Testing whether certain traits have caused amplified diversification:
an improved method based on a model of random speciation and extinction, Am. Nat., 142, 1019, 1993.
123. Salamin, N., and Davies, T.J., Using supertrees to investigate species richness in grasses and flowering
plants, in Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, Bininda-
Emonds, O.R.P. Ed., Kluwer Academic Publishers, Dordrecht, 2004, 461.
124. Goudet, J., An improved procedure for testing the effects of key innovations on rate of speciation,
Am. Nat., 153, 549, 1999.
125. Bousquet, J., et al., Extensive variation in evolutionary rate of rbcL gene-sequences among seed plants,
Proc. Natl. Acad. Sci. USA, 89, 7844, 1992.
126. Barraclough, T.G., Harvey, P.H., and Nee, S., Rate of rbcL gene sequence evolution and species
diversification in flowering plants (angiosperms), Proc. R. Soc. Lond. B., 263, 589, 1996.
127. Savolainen, V. and Goudet, J., Rate of gene sequence evolution and species diversification in flowering
plants: a re-evaluation, Proc. R. Soc. Lond. B., 265, 603, 1998.
128. Barraclough, T.G. and Savolainen, V., Evolutionary rates and species diversity in flowering plants,
Evolution, 55, 677, 2001.
129. Gaut, B.S. et al., Comparisons of the molecular evolutionary process at rbcL and ndhF in the grass
family (Poaceae), Mol. Biol. Evol., 14, 769, 1997.
130. Hillis, D.M., Inferring complex phylogenies, Nature, 383, 130, 1996.
131. Hillis, D.M., Taxonomic sampling, phylogenetic accuracy, and investigator bias, Syst. Biol., 47, 3,
1998.
132. Soltis, D.E. et al., Inferring complex phylogenies using parsimony: an empirical approach using three
large DNA data sets for angiosperms, Syst. Biol., 47, 32, 1998.
133. Källersjö, M., Albert, V.A., and Farris, J.S., Homoplasy increases phylogenetic structure, Cladistics-
Int. J. Willi Hennig Soc., 15, 91, 1999.
134. Savolainen, V. et al., Phylogenetics of flowering plants based upon a combined analysis of plastid
atpB and rbcL gene sequences, Syst. Biol., 49, 306, 2000.
135. Erdos, P.L. et al., A few logs suffice to build (almost) all trees: part II, Theor. Comp. Sci., 221, 77, 1999.
136. Bininda-Emonds, O.R.P. et al., Scaling of accuracy in extremely large phylogenetic trees, in Pacific
Symposium on Biocomputing 6, Altman, R.B. et al., Eds., World Scientific Publishing Company, River
Edge, New Jersey, 2001, 547.
9579_C017.fm Page 295 Monday, November 13, 2006 2:52 PM
137. Salamin, N., Hodkinson, T.R., and Savolainen, V., Towards building the tree of life: a simulation study
for all angiosperm genera, Syst. Biol., 54, 183, 2005.
138. Sanderson, M.J., et al., Error, bias, and long-branch attraction in data for two chloroplast photosystem
genes in seed plants, Mol. Biol. Evol., 17, 782, 2000.
139. Qiu, Y.-L. et al., Phylogeny of basal angiosperms: analysis of five genes from three genomes, Int. J.
Plant Sci., 161, S3, 2000.
140. Wiens, J.J., Does adding characters with missing data increase or decrease phylogenetic accuracy?
Syst. Biol., 47, 625, 1998.
141. Kearney, M., Fragmentary taxa, missing data, and ambiguity: mistaken assumptions and conclusions,
Syst. Biol., 51, 369, 2002.
142. Chase, M.W. et al., Phylogenetics of seed plants: an analysis of nucleotide-sequence from the plastid
gene rbcL, Ann. Missouri Bot. Gard., 80, 528, 1993.
143. Källersjö, M. et al., Simultaneous parsimony jackknife analysis of 2538 rbcL DNA sequences reveals
support for major clades of green plants, land plants, seed plants and flowering plants, Pl. Syst. Evol.,
213, 259, 1998.
144. Savolainen, V. et al., Phylogeny reconstruction and functional constraints in organellar genomes:
plastid versus animal mitochondrion. Syst. Biol., 51, 638, 2002.
145. Farris, J.S. et al., Parsimony jackknifing outperforms neighbor-joining, Cladistics, 12, 1996.
146. Rannala, B. and Yang, Z.H., Probability distribution of molecular evolutionary trees: a new method
of phylogenetic inference. J. Mol. Evol., 43, 304, 1996.
147. Larget, B. and Simon, D.L., Markov Chain Monte Carlo algorithms for the Bayesian analysis of
phylogenetic trees, Mol. Biol. Evol., 16, 750, 1999.
148. Salamin N. et al., Assessing internal support with large phylogenetic DNA matrices, Molec. Phylo-
genet. Evol., 27, 528, 2003.
149. Adam, D., Now for the hard ones, Nature, 408, 792, 2000.
150. Bininda-Emonds, O.R.P., Novel versus unsupported clades: assessing the qualitative support for clades
in MRP supertrees, Syst. Biol., 52, 839, 2003.
151. Gatesy, J. et al., Resolution of a supertree/supermatrix paradox, Syst. Biol., 51, 652, 2002.
152. Bininda-Emonds, O.R.P. et al., Supertrees are a necessary not-so-evil: a comment on Gatesy et al.,
Syst. Biol., 52, 724, 2003.
153. Purvis, A., A composite estimate of primate phylogeny, Philos. Trans. R. Soc. Lond. B, 348, 405, 1995.
154. Baldauf, S.L., The deep roots of eukaryotes, Science, 300, 1703, 2003.
155. Mace, G.M., Gittleman, J.L., and Purvis, A., Preserving the tree of life, Science, 300, 1707, 2003.
156. Pennisi, E., Modernizing the tree of life, Science, 300, 1692, 2003.
9579_C017.fm Page 296 Monday, November 13, 2006 2:52 PM
9579_C018.fm Page 297 Saturday, November 11, 2006 3:52 PM
CONTENTS
ABSTRACT
The major task of systematists is to document the planet’s biodiversity. This has traditionally been
done using morphological characters, especially for inventory work involving the description of
new taxa (species or genera) and the production of checklists and Floras. We present an analysis
of collections from a well collected area on the highly biodiverse tropical island of New Guinea.
The species accumulation curve for this area reveals different collection patterns between the type
of collector and the type of habitat visited. Future surveys should be based on databases of large
collections (or large samples of collections, such as geographic or systematic subsets) and should
use a combination of generalist and specialist collectors in the field to produce a comprehensive
and rigorous sampling strategy.
297
9579_C018.fm Page 298 Saturday, November 11, 2006 3:52 PM
18.1 INTRODUCTION
An enduring challenge in systematics is how to accurately estimate the number of species on the
planet1. It is a question that is pertinent at practically all taxonomic scales of investigation and
particularly for large species rich taxa, the topic of this book. This chapter assesses the best
collecting strategies for sampling a large and taxonomically difficult taxon in one region on the
tropical island of New Guinea.
The lack of taxonomic capacity is well documented, with various solutions proposed including,
amongst others, DNA taxonomy and barcoding (for example, Blaxter2; Seberg and Peterson,
Chapter 3) and the use of parataxonomists in inventory studies (Krell3). These solutions often
assume that all life on Earth has already been sufficiently sampled and is just waiting in museums
to be catalogued. However, this is not the case, and the systematic community is far from efficiently
sampling the planet4, especially as it undergoes drastic changes in the extent and composition of
natural areas. This is a particular problem in the tropics, which have an incredibly high diversity
but not enough taxonomic capacity to sufficiently sample and describe their diversity. Sampling
strategies can broadly be grouped into two categories: unstructured and structured surveys, explored
below.
We wanted to examine a real life unstructured taxonomic inventory to try and reveal the presence
of any sampling patterns and, if so, investigate how these could affect future sampling strategies.
We studied a plant collecting programme in Mt. Jaya, New Guinea, organised by systematic
botanists at the Royal Botanic Gardens (RBG), Kew, U.K., from 1997 to 2005. The questions
addressed in this chapter are:
We do not seek to examine classical species area patterns; we assume that habitats have different
diversities and acknowledge that systematists rarely follow any particular experimental design or
ecological methodology to produce comparative data during taxonomic collecting trips. We also
wish to investigate collecting effort in regard to patterns of collections made during taxonomic
trips, that is, a species accumulation curve independent of time and area, rather than examining
species diversity (z values) between different habitats in New Guinea. We have only estimated the
number of species, not their relative abundances, and this is not an attempt to estimate the number
of species in the study area (for example, Rosenzweig et al.9).
18.2 METHODS
We analysed historical and contemporary herbarium collections from the Mt. Jaya area of New
Guinea. Mt. Jaya at 4,884 m is the highest peak in South East Asia and has been the focus for
scientific expeditions since 1913.
Recently, the company operating a copper mine on the mountain facilitated access to the area,
and in 1997 a plant checklist project was instigated at RBG Kew involving several expeditions to
the area and a programme of databasing all historical collections from Mt. Jaya held in herbaria
around the world. Our current analysis used information from the RBG Kew database that holds
9,600 records and includes nearly all, if not all, historical collections and all contemporary RBG
Kew collections from the study area.
Collections were assigned to 18 trips from 1913 onwards. Prior to the RBG Kew project, there
had been 13 collecting trips to the region with five RBG Kew expeditions carried out during the
period 1998–2000. Collections were assigned to the following habitats: lowland (0–1200 m);
montane (>1200 m and <3000 m); and alpine (3000 m and over), following Hope’s vegetation
classification for the mountain10,11.
Both historical and contemporary expeditions made collections in all habitats from mangrove
to alpine. Collections from montane to alpine areas are currently being used as a basis for a checklist
of the alpine flora of the area12. Utteridge has worked extensively on the project making collections
in the area, processing specimens and writing taxonomic accounts13–15, and has become well
acquainted with the Mt. Jaya flora. We are confident, therefore, that identifications in the database
to family and generic level are correct, as are species-level identifications for the alpine and montane
zones.
The data were analysed at species level, with the first occurrence of each species recorded, and
subsequent recollection of a species deleted from the database. Thus, if a set of collections was
identified to genus only they were treated as a single species; this will significantly underestimate
diversity for the lowland collections which have yet to be worked on. Plots were made using
Microsoft Excel, as XY (scatter) plots; logarithmic trend lines were added using the options
9579_C018.fm Page 300 Saturday, November 11, 2006 3:52 PM
provided in Excel. The vertical lines on Figures 18.1 and 18.2 indicate the first time a RBG Kew
expedition visited the area. Splitting the data into habitat type, we plotted species collection
accumulation curves for the alpine and montane habitats. To investigate how different collectors
contributed to the project, we excluded all collections made by specialists of their specialist groups;
specialists often made general collections as well. Plots of the accumulation of genera and species
were made for each collection event against the cumulative number of all collections up to and
including that event. Finally, we examined collecting efficiency by calculating the number of
collections it took before a new species was added to the inventory.
18.3 RESULTS
Only 7,945 of the 9,600 records could be used due to factors such as inadequate data, duplicated
records and misidentifications. For all habitats in the study area, a total of 698 genera and 1,935
species were recorded. Collections from the lowlands were not analysed at habitat level because
of the mosaic of habitats (such as mangrove, swamp forest, lowland rainforest and heath forest),
several of which had been visited many times and others hardly at all16. In addition we found that
many of the lowland collections had incomplete locality and collection data and were not named
to the same accuracy as the montane and alpine collections. Two different patterns can be seen in
the analysis: the effect of ‘collector type’ and collecting ‘efficiency’ through time.
2500
1500
500
0
0 2000 4000 6000 8000
Sample size (# collections)
FIGURE 18.1 Plot of all collections from the Mt. Jaya area. Cumulative number of collections (x-axis) and
cumulative number of species (y-axis). Generalist collector curve shown in solid squares; specialist collector
curve in open diamonds. The vertical line indicates the first RBG Kew expedition to the area.
9579_C018.fm Page 301 Saturday, November 11, 2006 3:52 PM
of specialist collectors visited the area were significantly different for the number of species
collected (P = 0.05 using a t-test), but not the number of genera (P = 0.06).
1200
y = 337.65Ln(x) - 1671.3
R2 = 0.9686
1000
800
# Species
600
y = 153.2Ln(x) - 654.87
400 R2 = 0.9836
200
0
0 500 1000 1500 2000 2500 3000
Sample size (# collections)
FIGURE 18.2 Plot of collections from alpine and montane habitats of the Mt. Jaya area. Cumulative number
of collections (x-axis) and cumulative number of species (y-axis). Alpine collections shown in solid diamonds;
montane collections in solid squares. The vertical line indicates the first RBG Kew expedition to the area.
9579_C018.fm Page 302 Saturday, November 11, 2006 3:52 PM
y=x
3.4 2
R =1
3.2
y = 0.7017x + 0.6608
2
3 R = 0.9912
2.4
2.2
1.8
1.8 2.3 2.8 3.3 3.8
log no. collections
FIGURE 18.3 Plot of collections from alpine and montane habitats of the Mt. Jaya area. Log10 cumulative
number of collections (x-axis) and log10 cumulative number of species (y-axis). Alpine collections shown in
open diamonds (slope = 0.56); montane collections in solid squares (slope = 0.70); null model of efficient
collecting (see text) shown as the y = x line.
18.4 DISCUSSION
Collecting trips to the Mt. Jaya area have been unstructured surveys, with each individual trip
collecting independently of each other. When the collections have been databased as a whole,
however, the cumulative species versus cumulative collections data plot out as classic species
accumulation curves (Figure 18.1). The number of new species added to the project initially rises
sharply compared to the number of collections made and then starts to level out towards an
asymptote. For the Mt. Jaya project this was shown most notably in the collecting plots from the
alpine regions (Figure 18.2).
General collecting has taken place during all stages of this project, whilst specialist collecting
was mainly limited to the RBG Kew trips. Specialist collectors collected many more species from
their taxon of expertise. This pattern was mentioned by Prance and Campbell17 in their analysis of
incoming collections of Chrysobalanaceae, who concluded that “recent collecting is baling in a lot
of herbarium material of common species, but it is not adequately covering rare species. This
indicates the need for more informed and knowledgeable collectors, especially specialists in various
plant families and less general collecting”. This is shown in our analysis, which looked at 14 groups
collected by 22 specialists (Figure 18.1). We interpret our data to mean that specialists recognise
and collect species that generalist collectors overlook. In a well organised large-scale survey, general
collecting has its place in the early stages. However, a phase where collecting efficiency sharply
drops can soon be reached, such as in the alpine collecting pattern post RBG Kew involvement
(Figure 18.2). This is either due to all species in an area being collected or the area only being
visited by general collectors who are not getting rare species, especially in taxonomically difficult
and species rich taxa. RBG Kew is a large institute which can call on both generalist and specialist
botanists. At least two members of the RBG Kew team took part in every Mt. Jaya expedition and
became knowledgeable of the flora; that is, they became good generalist collectors. However, they
were still unable to collect the full range of species in several difficult and species rich taxa such
as Araliaceae, Arecaceae, Cyperaceae, Ericaceae, Myrtaceae, Poaceae and pteridophytes. However,
specialists on the field trips did recognise and collect these extra taxa.
9579_C018.fm Page 303 Saturday, November 11, 2006 3:52 PM
It is important for inventory projects to initiate projects with generalist collectors and then use
specialist collectors in the field. This process will include drawing on databases of taxonomists and
inviting specialists from outside institutes to join their project where the expertise is not present in
their own institute. We argue that it is not sufficient to use specialists only to name collections once
the fieldwork has been done, because their expertise may include spotting taxa in the field,
something a generalist may not do, with the result that any additional collections of specialist
groups are an ‘ad hoc bonus’.
Databasing specimens is a cost effective way of estimating the stage of an inventory. Expeditions
to the tropics can be extremely expensive, and we have estimated that a two-month expedition
involving a team of four people equates to a year’s databasing. It has been difficult to estimate the
cost of collecting each specimen in New Guinea, as so much manpower was contributed by the
mining company without cost to the expedition. However, costs for a specimen have recently been
calculated at the RBG Kew as approximately STG£5.50 to enter the herbarium (including acces-
sioning and curation) and an additional STG£3 to identify a specimen18. During Kew’s last expe-
dition to the alpine region, 45 specimens were collected before a new species was added to the
project’s list, equating to herbarium costs of STG£247.50 for each new species. These costs are
higher, by a considerable amount, than those calculated by Parnell19 and Mann20, suggesting that
costs may vary considerably depending on factors such as the area of study undertaken and the
number of duplicates collected. Effort would have been better directed in the montane region (with
a slope of 0.7) to reduce that slope to that approaching that of the alpine region (with a slope of
0.56) (Figure 18.3).
We appreciate that much of what we present here is intuitive and already ‘known’, in people’s
minds at least. However, economic analyses of flora, checklist and similar projects carried out to
fulfil systematic means are sparse (but see Parnell19; Mann20; Funk and Richardson21). As such, this
simple analysis will allow predictions to be made as to the efficacy of a collecting programme in
a particular locality, and the number of visits that should be made to an area.
We recommend that an inventory strategy should be put in place before large projects are
undertaken. This should start with some understanding of the collecting history of the area,
databasing as many (if possible, all) known herbarium collections from the area, regardless of age.
These data should be analysed at the habitat level, to estimate the phase the inventory has reached.
The collector history should be examined to see if specialist collectors had previously visited the
inventory site. This will dictate future strategy, for example if general collecting is no longer
necessary for an area, then specialist collectors should be used. Decisions on taxa not fully
represented in the collections can be made from the database and will influence which specialists
are required to complete the inventory. Institutes will also have to consider external collaboration.
For the Mt. Jaya project many expeditions were conducted in the alpine zone, partly because
of the intrinsic desire of humans to get to the summits of large mountains, and partly to fulfil the
needs of the project to produce a checklist of the area impacted by mining. However, it is important
to know the collection history of areas with high peaks, especially as many of these areas are
conserved without due regard for the lowland forests surrounding them. For example, in the
Malaysian state of Sabah, many of the protected areas are on mountains with very few lowland
areas protected.
18.5 CONCLUSIONS
• Generalists do not collect all species in an area, even if they know the flora well. A
parataxonomist or generalist collector based in one area may not collect all species if
undertaking unstructured surveys, even after a long time.
• Species rich and taxonomically difficult groups are systematically undercollected by
generalists.
9579_C018.fm Page 304 Saturday, November 11, 2006 3:52 PM
• Specialist collectors have a high value in fieldwork, as they are able to pick out rare
species in the field.
• Databasing is a cost-effective way to guide the sampling strategy of an area.
ACKNOWLEDGEMENTS
We would like to thank our colleagues at RBG Kew for discussion during the writing of this paper,
especially Neil Brummitt, Stuart Cable, Aaron Davis, Helen Hopkins, Bob Johns, Justin Moat and
Maria Vibe Norup.
REFERENCES
1. Systematics Agenda, Systematics Agenda 2000: Charting the Biosphere, Systematics Agenda 2000,
New York, 1994.
2. Blaxter, M., Counting angels with DNA, Nature, 421, 122, 2003.
3. Krell, F.-T., Parataxonomy vs. taxonomy in biodiversity studies: pitfalls and applicability of ‘mor-
phospecies’ sorting, Biodivers. Conserv., 13, 795, 2004.
4. Raven, P.H. and Wilson, E.O., A fifty-year plan for biodiversity surveys, Science, 258, 1099, 1992.
5. Phillips, O. and Miller, J.S., Global Patterns of Plant Diversity: Alwyn H. Gentry’s Forest Transect
Data Set, Missouri Botanical Garden Press, St. Louis, MO, 2002.
6. Newbery, D.M. et al., Primary forest dynamics in lowland dipterocarp forest at Danum Valley, Sabah,
Malaysia, Phil. Trans. R. Soc. Lond. B, 354, 1763, 1999.
7. Mittermeier, R.A. et al., Wilderness and biodiversity conservation, Proc. Nat. Acad. Sci. USA, 100,
10309, 2003.
8. Campbell, D.G., The importance of floristic inventory in the tropics, in Floristic Inventory of Tropical
Countries: The Status of Plant Systematics, Collections, and Vegetation, plus Recommendations for
the Future, Campbell, D.G. and Hammond, H.D., Eds., New York Botanic Garden, New York, 1989, 5.
9. Rosenzweig, M.L. et al., Estimating diversity in unsampled habitats of a biogeographical province,
Conserv. Biol., 17, 864, 2002.
10. Hope, G.S., Vegetation, in The Equatorial Glaciers of New Guinea, Hope, G.S. et al., Eds., Balkema,
Rotterdam, 1976, 112.
11. Hope, G.S., New Guinea Mountain Vegetation Communities, in The Alpine Flora of New Guinea,
van Royen, P., Ed., J. Cramer, Vaduz, Liechtenstein, 1980, 153.
12. Johns, R.J., A Guide to the Alpine and Subalpine Flora of Mt Jaya, Royal Botanic Gardens, Kew, 2006.
13. Utteridge, T.M.A., Two new species of Maesa (Myrsinaceae) from Puncak Jaya, New Guinea:
contributions to the Flora of Mt Jaya I, Kew Bull., 55, 443, 2000.
14. Utteridge, T.M.A., The subalpine members of Pittosporum (Pittosporaceae) from Mt Jaya, New Guinea:
contributions to the Flora of Mt Jaya II, Kew Bull., 55, 699, 2000.
15. Utteridge, T.M.A., A new species of Medusanthera Seem. (Icacinaceae) from New Guinea: Medu-
santhera inaequalis Utteridge: contributions to the Flora of Mt Jaya IV, Kew Bull., 56, 233, 2001.
16. Utteridge, T.M.A., personal observations, 1999–2000.
17. Prance, G.T. and Campbell, D.G., The present state of tropical floristics, Taxon, 37, 519, 1988.
18. Harvey, T., personal communication, 2005.
19. Parnell, J.A.N., The monetary value of herbarium collections, in Biological Collections and Biodi-
versity, Rushton, B.S., Hackney, P., and Tyrie, C.R., Eds., Linnean Society of London Special Publi-
cation 3, Samora Publishing, UK, 2001, 271.
20. Mann, D.G., The economics of botanical collections, in The Value and Valuation of Natural Science
Collections, Nudds, J.R. and Pettitt, W., Eds., Geological Society, London, 1997, 68.
21. Funk, V.A. and Richardson, K.S., Systematic data in biodiversity studies: use it or lose it, Syst. Biol.,
51, 303, 2002.
9579_C019.fm Page 305 Saturday, November 11, 2006 2:01 PM
CONTENTS
19.1 Introduction...........................................................................................................................305
19.2 There Are Taxa, and Then There Are Taxa .........................................................................307
19.3 There Are Numbers, and Then There Are Numbers ...........................................................309
19.4 There Are Names, and Then There Are Names ..................................................................312
19.5 There Is Biogeography, and Then There Is Biogeography .................................................315
19.6 Summary...............................................................................................................................318
References ......................................................................................................................................318
ABSTRACT
Diatoms are one of the largest, if not the largest, group of cryptogamic photosynthetic organisms,
in terms of species diversity. This chapter address some of the basic issues concerning the history
of alpha taxonomy and the study of diatom diversity, exploring future possibilities from the
perspective of their geography.
19.1 INTRODUCTION
The thrust of this book is to address issues behind the systematics of large and species rich taxa.
Whilst that topic is of much significance, it begs a number of questions, those that pertain to
definitions and those that pertain to solutions. How large is large? What kinds of numbers allow a
taxon to legitimately be described as ‘species rich’? These are not trivial questions. Whatever
yardsticks are applied, we would probably find no disagreement if we were to describe the group
of organisms that are the focus of our concern as both extremely large and species rich, or at least
potentially so in both cases, regardless of the precise definition of each term.
Our organisms of study are diatoms (Bacillariophyta), a large and diverse group of photosyn-
thetic, single-celled eukaryotes, with their cells encased in a silica shell1. They are members of the
heterokont (stramenochrome) algae2; their sister taxon has been identified as a group of recently
described tiny flagellates, Bolidophyceae3, which has no more than three to five currently recognised
species4,5. The heterokonts belong within stramenopiles, themselves a remarkably large and diverse
group of eukaryotic organisms2. Stramenopiles and stramenochromes are both names that have
been applied to heterokont algae and their relatives6,7. The term ‘strameno’, meaning straw, refers
305
9579_C019.fm Page 306 Saturday, November 11, 2006 2:01 PM
FIGURE 19.1 Light micrograph of the Miocene fossil Amphorotia americana (Kain et Schultz) D.M.
Williams & G. Reid. (From lectotype slide, BM-Adams D. 846, Atlantic City, NJ. For further details see
Williams and Reid65.)
to the characteristic tripartite flagella hairs, a synapomorphy uniting these taxa. The stramenochromes8–10
are equivalent to the heterokont algae, whereas the stramenopiles include, as well as the heterokont
algae, öomycetes, labyrithulids, thraustochytrids and certain other flagellate protozoa2. Surprisingly,
the stramenopiles merit only a short two-page account in the recent Assembling the Tree of Life
compendium11, buried in the chapter on the relationships of green plants (Delwiche et al.12), though
an illustration of the diatom Cymbella cistula (Hemprich et Ehrenb.) Kirchner does appear on the
dust jacket.
Silica is inert, so when the organism dies the siliceous parts sink into the sediment and
are often preserved, creating a splendid diatom fossil record (Figure 19.1), stretching from the
present to the Cretaceous13. Reasonable estimates place the total number of diatom species
(fossil and Recent) somewhere in the region of 200,00014, roughly the same number of species
known for all higher plants15. It is estimated that some 15,000 living species have so far been
described, with another 5,000–8,000 species that are now extinct13. Should the estimates of
diatom diversity at the species level be even approximately accurate, it is evident that there
is a considerable amount of work that remains to be undertaken. Accounting for species
diversity is often referred to as alpha taxonomy, the discovery and description of the basic
units in classification16, yet species diversity is not the only problem facing diatom taxonomists
and systematists.
At present, diatom species are assigned to around 350 genera17,18, with another approximately
150 genera for fossil groups13; only a handful (no more than 30) of higher taxa, classes, subclasses,
orders and families, are recognised1, or at least in use. Rarely do diatomists refer to a specimen’s
family, for example, as characters for higher taxa have rarely been documented. In short, diatoms
are poorly known and poorly accounted for in their phylogenetic relationships (higher-level taxo-
nomy), their current classification (numbers of higher taxa) and biodiversity estimates, regardless
of the agreed vast number of species that remain to be described.
So much for definitions, solutions are of more immediate concern, particularly solutions
accounting for diatom diversity, both in terms of documenting and understanding it. Whilst it is
clear that many of the problems are simply practical, devising a strategy, should one wish to
approach the problem in that way, requires further consideration.
Below we deal with the understanding of diatom diversity by examining three aspects. First,
we briefly examine the history and understanding of alpha taxonomy as it has developed and how
algal taxonomy progressed within these changing paradigms. Second, we present some more
numbers for direct comparison within both the stramenopile clade as well as other plant and animal
groups. Finally, we propose some options for dealing with large genera within diatoms, focusing
studies on the geographical dimension, a poorly examined aspect of diatom studies, which, in some
circles, is considered a pointless endeavour19,20.
9579_C019.fm Page 307 Saturday, November 11, 2006 2:01 PM
Large and Species Rich Taxa: Diatoms, Geography and Taxonomy 307
the species level. He tirelessly urged botanists to move away from just the morphological herbarium
approach and advocated the investigation of ecological, cytological, genetical and chemical factors
to enhance and supplement the taxonomy of organisms in general. In 1935 he published a short
paper on ‘The investigation of plant species’. He wrote: “Those who, having been trained to an
appreciation of modern discoveries in ecology, cytology, genetics, and chemical factors, are trying
to widen the basis of taxonomy, have undertaken a long, slow and perhaps thankless task.”29. Well,
maybe. But that was 1935, not 1952. Even earlier, Turrill published an essay on ‘Species’, which
dealt with these very same issues30. So in 1952, 27 years on, the ideas were hardly modern, yet
Turrill persisted with the viewpoint that morphology was limited. His 1935 paper gained a certain
amount of significance for taxonomy as a whole.
For the first time Turrill mentioned the idea of alpha taxonomy: “It is suggested … that the
time has come when the student of floras whose taxonomy on the old lines is relatively well known
should attempt to investigate species by much more complete analyses of a wider range of characters
than is the rule”29. ‘On the old lines’ was to be read as morphology, herbarium taxonomy. Turrill
then delivered his coup de grace: “There is thus distinguished an alpha taxonomy and an omega
taxonomy, the latter being an ideal which will probably never be completely realized”29. Turrill
also noted: “The alpha taxonomist, however, really studies the constitution of the genus, not of the
species; he ‘gets inside’ the genus but does not ‘get inside’ the species”29.
Turrill was also a member of the group of biologists that in 1937 eventually became the
Systematics Association31. Through the efforts of Julian Huxley, the Association was largely respon-
sible for giving birth to what was called the ‘New Systematics’, another grand title given to this
all-encompassing approach to taxonomy32. Of course, at the time it seemed the right thing to do,
as if all information was of some equal merit. Turrill acknowledged that his omega classification
was probably impossible to obtain, but what was it? No one really quite knew. Yet what did happen
was that all the different facets of the new experimental approaches to taxonomy created a whole
series of special classifications or, more accurately, artificial classifications, designed to represent
each property being considered. Under this experimental umbrella were: Turesson and his
genecology33; Gilmore and his demes, along with its complicated terminology34,35; Danser and his
classificatory system36; and so on.
Gilmore and Turrill37outlined one possibility, where all the special classifications, those based
on ecology, those based on genetics, those based on demes, could be combined into one general
classification (Figure 19.2). At this point it appeared that alpha taxonomy (and to a certain extent,
omega taxonomy) had vanished from consideration. Nevertheless, it is not difficult to see, with all
these special classifications and the desire to produce from them a general classification, how
FIGURE 19.2 The relationship between special and general classifications. (Reproduction of illustration taken
from Gilmour and Turrill37.)
9579_C019.fm Page 309 Saturday, November 11, 2006 2:01 PM
Large and Species Rich Taxa: Diatoms, Geography and Taxonomy 309
phenetics (classification using overall similarity38,39) became popular not only among those who
embraced the ‘New Systematics’ but also those that needed to embrace a ‘modern’ approach to
taxonomy but were unwilling to engage with the earlier literature40.
Turrill’s views stem from an even earlier preoccupation. Around the turn of the nineteenth
century there was a general revolt from reconstructing phylogenies, at that time largely understood
as the apparent unifying principle of morphology. The history of this episode has been captured
by Boney’s account of the ‘Tansley Manifesto’ affair41, a general revolt against the more and more
extravagant phylogenetic speculations of a previous generation, ending up in a revolt against
morphology.
Phylogenetic speculations, in the form of ancestor-descendant sequences, were largely the
inspiration of one man, Ernst Haeckel, who coined the word phylogeny among many others42. The
revolt, set in motion for phycologists by Tansley and his associates, and pursued relentlessly by
people like Turrill and Gilmore, added to the decline and drift away from morphology and its
relation to classification, the effects of which are still with us today16.
For Turrill, alpha taxonomy was concerned with morphology but as a starting point, so with
the inclusion of all other forms of data it would finally lead to the omega taxonomy, or something
close. Whilst this history betrays the effects of a certain kind of thinking in British botany, zoologists
probably relied more on Mayr for their understanding43. Mayr, like Turrill, saw the issue as one of
progress, from alpha taxonomy onwards: “In the first stage, often called alpha taxonomy, emphasis
is on the description of new species and their preliminary arrangement in comprehensive genera.
In beta taxonomy relationships are worked out more carefully on the species level and on that of
the higher categories; emphasis is placed on the development of a sound classification. At the level
of gamma taxonomy much attention is paid to intraspecific variation, to various sorts of evolutionary
studies, and to a causal interpretation of organic diversity”43.
Thus both zoologists and botanists understood progress to be from the simple exercise of
description to the ‘various sorts of evolutionary studies’ that might seem of some greater interest.
This prescription, although rendered into modern vocabulary, is still repeated today, as if complexity
and more data offer serious solutions to the problem of classification44.
Of course, in the 1970s and 1980s the theory of classification known as ‘cladistics’ became
the dominant force for understanding the interrelationships of organisms, regardless of taxonomic
rank. Cladistics was viewed by some as a critique45 allowing a greater understanding of past
endeavours, while others saw in it nothing of substance, and much of phycology has remained
bathed in the ideas of Turrill and Mayr. So what is the significance of this for the study of ‘large
and species rich taxa’, even in the face of a taxonomic renaissance16?
TABLE 19.1
Estimated Numbers of Species in Subgroups of the Stramenochromes
Class Species Genera
Note: For Bacillariophyta (diatoms), the number outside the brackets is the number of species so
far described. Those marked with a * have been considered members of Chrysophyta but have
recently been removed48. Thus figures for taxa marked * relate to the last entry in the table for
Chrysophyta, enclosed in square brackets. In Dictyochophyceae approximately 2–7 species are
extant, belonging to the same genus; the majority of species in this taxon are known only from
fossils, arranged in 10–15 genera.
Source: See Andersen2 for summary; see also Bailey et al.91; Andersen et al.92; Honda and
Inouye93; Kawachi et al.94–96; O’Kelly97.
TABLE 19.2
Estimated Numbers of Species in a Selection
of Other ‘Algal’ and Plant Groups
Group Species
Bacillariophyta 200,000
Rhodophyta 20,000
‘Chlorophyta’ 120,000
Charophyta 20,000
Cryptophyta 1,200
Dinophyta 11,000
Haptophyta 2,000
Xanthophyta 2,000
‘Bryophytes’ 15,000
‘Ferns’ 13,025
‘Gymnosperms’ 980
‘Dicotyledons’ 199,350
Monocotyledons 59,300
Large and Species Rich Taxa: Diatoms, Geography and Taxonomy 311
TABLE 19.3
Estimated Numbers of Species in a Selection of Animal Groups
Group Species
Bacillariophyta 200,000
Insects 950,000
Molluscs 70,000
Mammals 4,842
Birds 9,932
‘Reptiles’ 8,134
‘Amphibians’ 5,578
‘Fishes’ 26,018
at the class level (those marked with an asterisk in Table 19.1, see Kristiansen48). Chrysophyta sensu lato48
illustrate another problem when attempting to compare taxa, as many new genera, classes and even
families are separated away when they include only one (or very few species), indicating a rather
idiosyncratic approach to ranking, if not classification. For example, Kawai et al.49 erected the new class
Schizocladophyceae for the new species Schizocladia ischiensis. Comparison between the number of
species and number of genera indicates that, again, idiosyncratic criteria are used in assigning rank.
Amongst a wider more varied choice of photosynthetic organisms (Table 19.2), potential
numbers of diatom species compare best with dicotyledons, again a large number in the context
of the plant kingdom. For example, there are between 11,000 and 13,000 known species of
Charophytes but an estimated total of only 20,000; for dinoflagellates there are some 2,000–4,000
known species with an estimated 11,000 total; and for Rhodophytes (red algae) there are some
4,000–6,000 known species with an estimated total of 5,500–20,000. With the exception of the
insects as a whole, expected diatom species are far in excess of most other groups (Table 19.3).
We offer two further comparisons of the numbers of diatom taxa to allow some idea of the
resolution in terms of higher categories. Our first comparison is with Lepidoptera and angiosperms,
as both have similar numbers of known species when compared to expected number of diatom
species (Table 19.4). Under the assumption that diatom species will reach the 200,000 mark, then
TABLE 19.4
Comparison between Some Higher Taxa of
Lepidoptera and Angiosperms Relative to the
Number of Species in Each Group (Number
of Diatoms Described)
Superfamilies Families Genera Species
TABLE 19.5
Comparison between Described Species of Diatoms and Described Species
of Mammals, Both Fossil and Recent
Genera Species Genera Species
Source: For mammals, O’Leary et al.50; for Bacillariophyta (diatoms), Round et al.1.
expected genera will be around 9,000 (at 450 times 20 for an even relative increase in numbers).
Should this figure be anywhere near accurate, there are at least 8,000 genera needed to achieve
some greater resolution.
The numbers of diatom species described so far can be compared with the number of species
described in mammals. This comparison, seemingly strange, allows direct comparison between
diatoms and a group of organisms with similar numbers of species but considerably better known
(Table 19.5; O’Leary et al.50). Mammals have fewer species placed in many more genera, implying
that the hierarchical structure in mammalian classification is better resolved (understood) and that
resolution and structure in diatom classification is rather poor (not well understood).
Large and Species Rich Taxa: Diatoms, Geography and Taxonomy 313
50
1990 Round
40
1997
30
1894 Cleve
20
1928 Karsten
10
0
1791 2003
FIGURE 19.3 Numbers of diatom genera described against time. (Data from Fourtanier and Kociolek17,18.)
to it. Most ‘dustbin’ taxa turn out to be paraphyletic nongroups defined by the lack of characters.
Nevertheless, since the publication of VanLandingham, Navicula has been more precisely defined56,
and some 30 new (or resurrected) genera are now commonly used for subgroups within the older genus.
Yet it remains unknown whether any of these new subgroups are monophyletic in any demonstrable
TABLE 19.6
List of Diatom Genera with the Largest Number of Species
Genera Species Freshwater Marine Fossil Cosmopolitan
Navicula* 9,000+ + + + +
Eunotia 2,170 + - + +
Pinnularia 1,820 + + + +
Nitzschia 1,745 + + + +
Coscinodiscus* 1,395 − + + +
Synedra* 1,200 + + + +
Cymbella* 1,180 + − + +
Amphora 1,050 + + + +
Cocconeis 1,000 + + + +
Gomphonema 1,000 + - + +
Achnanthes * 970 + + + +
Fragilaria* 925 + + + +
Surirella* 910 + + + +
Melosira* 900 + + + +
Triceratium* 850 + + + +
Note: Genera marked with an * have been revised in the last 10 years and are now better
circumscribed as a series of smaller genera. However, in most cases a large residue remains after
taxa have been separated.
Source: Data based on the numbers of names (rather than taxa) in VanLandingham’s Catalogue
of Diatoms51; figures for Synedra and Fragilaria are supplemented with databases records kept at
the Natural History Museum, London as part of ongoing research projects, and the figure for
Navicula species is from California Academy of Science databases (Kociolek personal commu-
nication).
9579_C019.fm Page 314 Saturday, November 11, 2006 2:01 PM
way, that is, have unique sets of synapomorphies (but see Cox and Williams57,58, for a beginning).
Additionally, many of the ‘unique’ groups of species have been placed not only in a new genus but
also in new families, thereby obscuring rather than clarifying relationships (for example, Cavinula
D.G. Mann and A. Stickle, six species previously in Navicula, in Cavinulaceae D.G. Mann).
Nevertheless, the number of new species added to Navicula continues to increase regardless
of attempts to discover monophyletic subgroups within this large genus57. Somewhat ironically,
Lange-Bertalot and Moser created the genus Naviculadicta Lange-Bertalot59 to accommodate
species that would have previously been described as part of Navicula but can now no longer be
so because of its more precise definition. In other words, they deliberately created a paraphyletic
group from another paraphyletic group, a taxonomic decision that is perhaps unique in systematics
(see Kociolek60 for further commentary).
Yet progress can be made. Between the two genera Fragilaria and Synedra there are over 2,000
names, covering freshwater, marine, fossil and Recent taxa (Table 19.6). Revision of both genera,
beginning in 198661, has allowed the discovery and recognition of many better-defined genera from
both a morphological perspective as well as their general ecological requirements (freshwater or
marine) (Table 19.7; further details can be found in Williams62). Many of the remaining names can
be examined (on a piecemeal basis if necessary) to see if they ‘fit’ the described genera. By fit, we
TABLE 19.7
Araphid Diatom Genera Described since 1986
Approximate Species
Genus Date Numbers Freshwater(F)/Marine(M)
Note: Includes taxa that would have been previously placed in either Fragilaria or Synedra or else with species that
would have been placed in either of those two genera when the more general circumscription applied.
Large and Species Rich Taxa: Diatoms, Geography and Taxonomy 315
mean search for and identify synapomorphic characters, rather than some ‘estimate’ of similarity,
overall or otherwise63. Such a search should apply to genera listed in Table 19.7 and, as such, would
allow both detailed revisionary work to continue along with floristic work and species recognition.
In other words, proper attention to the processes of classification is required, regardless of taxonomic
rank, that is the identification of synapomorphic characters (see also Ruck and Kociolek64 and Williams
and Reid65). As a rule, once ‘large’ genera are broken up into smaller more precisely circumscribed
units (monophyletic units), they become more sharply defined with respect to other parameters, such
as their ecology. One aspect that remains contentious is whether the subdivisions correspond to
geographical regions or areas. Put more bluntly: does biogeography matter with respect to diatoms19?
But there are no such things as water-babies. How do you know that? Have you been there to see? And
if you had been there to see, and had seen none, that would not prove that there was none … . And no
one has a right to say no water-babies exist till they have seen no water-babies existing; which is quite
a different thing, mind, from not seeing water-babies; and a thing which nobody ever did, or perhaps
ever will do you know that?
Kingsley, C.
The Water-Babies, a Fairy Tale for a Land Baby, 186366
The term biogeography can be defined in different ways67. To some, biogeography is simply an
extension of ecology68; to others it deals with the subject of migration and dispersal69; and to others
it concerns itself with taxa, the areas they occupy and their relationships70; and still symposia are
organised to address the question ‘what is biogeography?’71. That such diversity of focus exists
makes it somewhat problematic to discuss the issue relative to diatoms and their distribution. Often,
one might read of diatoms, or of other ‘protists’, as having ‘no biogeography’, or lacking sufficient
patterns of distribution to indicate any regional separation of significance19.
Given the relevance and potential of biogeography for tackling the subject of ‘large genera’,
as well as the more all encompassing world of evolutionary studies72, we try and tackle the subject
here from the viewpoint that biogeographic studies deal with the relationships of taxa and the areas
they occupy, a viewpoint which requires thorough taxonomic work on a scale fit to address a
particular problem. That is, if the focus is the distribution of freshwater diatoms around the Pacific
rim (areas bordering the Pacific Ocean, Eastern Russia, Eastern China, Japan, Western United
States, Central America, Western South America, Eastern South East Asia), then ideally one requires
knowledge of endemic diatom species from at least three different areas within that region73. This
raises the issue of endemism and what it might mean.
A paper of major significance in the development of biogeography was that of De Candolle74,
who, amongst other things, was the first to introduce the term endemism to biogeography. He related
the idea to genera, defining endemic genera as those with many species confined to one region (De
Candolle74; see Nelson75 for translation). Since that time, the term endemism does not refer to any
particular taxon or any particular area: any taxon (species, genus) might be endemic to Lake Baikal,
East Russia or the Southern hemisphere. Finlay et al.76 puzzle over the term ‘endemic protist’. They
offer the argument that because of their small size, many rare species may not be sampled, thus
allowing them to be interpreted as endemics when, in fact, they are simply rare. Whilst the possibility
of undersampling is a constant issue, it is hard to see how their argument could in any way be either
tested or falsified. It is phrased in such a way as to qualify for what one might call a ‘water-baby’
theory, that not finding something will never be sufficient reason for believing it to not exist. Thus,
one might always invoke undersampling for never having found specimens of rare species.
9579_C019.fm Page 316 Saturday, November 11, 2006 2:01 PM
TABLE 19.8
Numbers of Species in Total against Numbers Described for the First Time
Area Total Species New (%) References
Africa
Madagascar 628 249 (40%) Metzeltin and Lange-Bertalot102,
Spaulding and Kociolek103
South America
Tropical South America (including c. 700 202a (29%) Metzeltin and Lange-Bertalot104
Brazil, Guyana, Venezuela)
The Andes (including Ecuador, 888 184b (21%) Rumrich, Lange-Bertalot and Rumrich80
Venezuela, Chile, Tierra del Fuego)
Uruguay c. 850 102c (12%) Metzeltin, Lange-Bertalot and Garcia-
Rodriguez105
North America
Cape Cod c. 250 42d (17%) Siver et al.106
Europe (Boreal)
Siberia (Vaigach, Mestnyi and Matveev 490 48e (10%) Lange-Bertalot and Genkal107
Islands–North West Siberia)
Australasia
New Caledonia (South West Pacific) 643 257 (40%) Moser et al.108,109, Moser110
Subantarctica
Ile de la Possession 220 57f (26%) van de Vijer et al.111
Note: Data from a selection of Floras primarily published in the Iconographia Diatomologica series. Few of the published
Floras deal with noted or recognised biogeographical regions, other than those that coincide with an island (for example
Madagascar). Rather, most describe collections made within political regions. Thus, while these data are of limited use
in providing a general interpretation, they do provide an indication of the potential number of species yet to be described.
a According to Metzeltin and Lange-Bertalot104, they described 202 new taxa, identified 131 as ‘supposedly’ new
species. If these 131 are confirmed as new species the percentage endemic taxa rises to 47.6%, nearly half.
b According to Rumrich et al.80, they described 84 new taxa, identified 59 as ‘probably’ new and a further 271
d This number includes taxa that could not be identified; they may not all be new.
e According to Lange-Bertalot and Genkal107, they described 42 new taxa from 159 ‘taxonomically undefined’.
f The total number includes the 20 new species of Stauroneis described in a later monograph112.
9579_C019.fm Page 317 Saturday, November 11, 2006 2:01 PM
Large and Species Rich Taxa: Diatoms, Geography and Taxonomy 317
There may be much to criticise in the sampling regimes and taxonomic procedures adopted for
each of these floras. For example, the Andes is not a region in the biological sense, as either side
of this mountain range has quite distinct floras, and to include samples from Ecuador, Venezuela,
Chile and Tierra del Fuego, may obscure rather than illuminate regional differences80. However,
nearly all report levels of endemism from 10% of the total up to the remarkable figure of 40%
(Table 19.8). Perhaps even for diatoms, cosmopolitanism may be the exception. Nevertheless,
endemism needs to be understood as a hierarchical concept, with cosmopolitanism describing global
distributions, if indeed anything can be truly global81.
Rather than attempt to discover if there are any patterns of distribution to explain, Finlay
et al.19,76,82 proposed the term ‘ubiquitous dispersal’, a process, one imagines, intended to describe
all ‘protist’ distribution. ‘Ubiquitous dispersal’ is a process meaning “random dispersal across all
spatial scales, all the way up to the global scale”76, and it “is essentially a ‘neutral’ process, driven
by the absolute abundance of organisms”76. To bolster their argument — that it is the sheer quantity
of organisms as the driving force — they refer to an analogy: “Millions of people indulge every
week [in the lottery], but the grim truth is that each individual has a vanishingly low probability
of winning the big prize. Almost every week, however, one or a few randomly self-selecting
individuals do win because of the vast number of individuals taking part raises the probability that
someone has to win.”76.
Analogies do not usually bear close examination. Yet the form of argument — probabilities
that something will happen somewhere — is not new, extending back to Darwin’s notion of dispersal
as an explanation for disjunct distributions, a process that Finlay and his colleagues take for granted
as the driving force. George Gaylord Simpson, for example, made the same kind of argument in
1952 with respect to geological time:
If the probability that some member of a population will cross a barrier is .000001 in any one year, in
a large population this means that the probability for any one designated individual is almost infinites-
imally small, so much that it would seem absolutely impossible to even the best qualified observer in
the field. Yet during the course of a million years the event would be probable, p = 0.63, again. In the
course of 10 million years the event would become so extremely probable as to be, for most practical
purposes, certain, p = 0.9999583.
Both Finlay and Simpson argue not for ‘ubiquitous dispersal’ but a version of ‘improbable
dispersal’75, effectively suggesting that given enough specimens (Finlay and Simpson) or enough
time (Simpson), a water-baby will indeed be found (see also Lund84 and Wilkinson85). Such
arguments appeal to probabilities rather than facts, the latter being the number of endemic diatoms
recognised and their nonrandom distribution65,73,79,86. Such arguments also ignore the fact that, while
life evolves on Earth, so the Earth does too, both, in fact, together87. Alexander du Toit (1878–1948)
was one of the first persons to defend (and promote) continental drift; Simpson, initially, saw no
use for continental movements to explain organism distributions. Du Toit wrote in response to one
of Simpson’s papers: “The notion of random, and sometimes two-way ‘rafting’ across the wide
oceans … evinces, however, a weakening of the scientific outlook, if not a confession of doubt
from the viewpoint of organic evolution”88 (also see McCarthy89).
From an earlier diatom perspective Ehrenberg had considered the distribution of organisms to
be a problem worthy of consideration, especially to explain the disjunct distribution of fossil
‘Infusoria’ around Pacific coastal regions: “… the Rocky Mountains are a more powerful barrier
between the two sides of America, than the Pacific Ocean between America and China; the infusorial
forms of Oregon and California being wholly different from those of the east side of the mountains,
while they are partly identical with Siberian species”90.
For example, our work on Lake Baikal has focused on the genus Eunotia Ehrenb. Among the
many species present in the Lake was E. clevei, a rather large and unusual species for the genus.
Intensive examination of the Lake Baikal flora and surrounding areas yielded a number of species
9579_C019.fm Page 318 Saturday, November 11, 2006 2:01 PM
endemic to the lake, all most closely related to each other (by virtue of synapomorphies) and best
separated into a new genus65. Further examination of more specimens yielded not wider distributions
but more species, some occurring in other large, deep water lakes (for example, Lake Hovsgol),
others extending into South East Asia and (often) into marine waters65. Furthermore, a number of
extinct fossil specimens added yet more species, with a West Coast North American–Chinese South
East Asian distribution suggesting a trans-Pacific relationship, one already found for Tetracyclus
and offering a test for Ehrenberg’s earlier proposals73.
19.6 SUMMARY
Although estimates for numbers of species differ, diatoms are a large group, most likely exceeding
200,000 species (fossil and Recent). Their taxonomic hierarchy is poorly resolved, many genera having
more than 1,000 available names, with few families circumscribed in such a way as to assist identification.
The adoption of cladistic approaches to classification has been slow; even now, few taxa are demonstrably
monophyletic: that is, form groups supported by synapomorphies (homologies). Because the relationship
between taxon and synapomorphy (homology) is so well established, cladistic approaches seem more
than reasonable, in spite of the perceived difficulty in finding appropriate synapomorphic resemblances.
Since the 1950s, geographical data have rarely been considered useful information, apart from
the largely nomenclatural tradition of noting a specimen’s location when found. Thus, we suggest
the most reasonable way forward for diatom taxonomy when considering ‘large’ genera (taxa) is
to adopt a regional approach and attack the problem from both a taxonomic (cladistic) and geo-
graphical perspective, allowing testable theories of relationships among the organisms investigated
and the areas they occupy.
REFERENCES
1. Round, F.E., Crawford, R.M., and Mann, D.G., The Diatoms: Biology and Morphology of the Genera,
Cambridge University Press, Cambridge, 1990.
2. Andersen, R.A., Biology and systematics of Heterokont and Haptophyte algae, Amer. J. Bot., 91,
1508, 2004.
3. Guillou, L. et al., Bolidomonas: a new genus with two species belonging to a new algal class, the
Bolidophyceae (Heterokonta), J. Phycol., 35, 368, 1999.
4. Guillou, L. et al., Diversity and abundance of Bolidophyceae (Heterokonta) in two Oceanic regions,
App. Environ. Microb., 65, 4528, 1999.
5. Kühn, S., Medin, M., and Eller, G., Phylogenetic position of the parasitoid nanoflagellate Pirsonia
inferred from nuclear-encoded small subunit ribosomal DNA and a description of Pseudopirsonia n.
gen. and Pseudopirsonia mucosa (Drebes) comb. nov., Protist, 155, 143, 2004.
6. Patterson, D.J., Stramenopiles: chromophyte from a protistan perspective, in The Chromophyte Algae:
Problems and Perspectives, Green, J.C., Leadbeater, B.S.C., and Diver, W.L., Eds., Clarendon Press,
Oxford, 1989, 357.
7. Leipe, D.D. et al., 16S-like rRNA sequences from Developayella elegans, Labyrinthuloides haliotidis,
and Proteromonas lacerate confirm that the stramenopiles are a primarily heterotrophic group, Eur.
J. Protistol., 32, 449, 1996.
8. Leipe, D.D. et al., The stramenophiles from a molecular perspective: 16S-like rRNA sequences from
Labyrinthuloides minuta and Cafeteria roenbergensis, Phycologia, 33, 369, 1994.
9. Patterson, D.J., The diversity of eukaryotes, Am. Nat., 154, 96, 1999.
10. Ben Ali A., et al., Phylogenetic relationships among algae based on complete large-subunit rRNA
sequences, Int. J. Syst. Evol. Micr., 51, 737, 2001.
11. Cracraft, J., and Donoghue, M.J., Eds., Assembling the Tree of Life, Oxford University Press, Oxford,
2004.
12. Delwiche, C.F. et al., Algal evolution and the early relation of green plants, in Assembling the Tree
of Life, Cracraft, J., and Donoghue, M.J., Eds., Oxford University Press, Oxford, 2004, 121.
9579_C019.fm Page 319 Saturday, November 11, 2006 2:01 PM
Large and Species Rich Taxa: Diatoms, Geography and Taxonomy 319
13. Nikolaev, V.A. and Harwood, D.M., Morphology and taxonomic position of the Late Cretaceous
diatom genus Pomphodiscus Barker and Meakin, Micropaleontology, 46, 167, 2000.
14. Mann, D.G. and Droop, S.J.M., Biodiversity, biogeography and conservation of diatoms, Hydrobio-
logia, 336, 19, 1996.
15. Frodin, G.G., History and concepts of big plant genera, Taxon, 53, 753, 2004.
16. Wheeler, Q.D., Taxonomic triage and the poverty of phylogeny, Phil. Trans. R. Soc. Lond. B, 359,
571, 2004.
17. Fourtanier, E. and Kociolek, J.P., Catalogue of the diatom genera, Diatom Res., 14, 1, 1999.
18. Fourtanier, E. and Kociolek, J.P., Addendum to ‘Catalogue of the diatom genera’, Diatom Res., 18,
245, 2003.
19. Finlay, B.J., Monaghan, E.B., and Maberly, S.C., Hypothesis: the rate and scale of dispersal of
freshwater diatom species is a function of their global abundance, Protist, 153, 261, 2002.
20. Finlay, B.J. and Fenchel, T., Cosmopolitan metapopulations of free-living microbial eukaryotes,
Protist, 155, 237, 2004.
21. Desmond, A., Huxley: The Devil’s Disciple, Michael Joseph, London, 1994.
22. Desmond, A., Huxley: Evolution’s High Priest, Michael Joseph, London, 1997.
23. Huxley, T.H., The Gentians: notes and queries, J. Linn. Soc., 24, 101, SM 4, 612, 1888.
24. Moss, W., Taxa, taxonomists, and taxonomy, in Numerical Taxonomy, J. Felsenstein, Ed., NATO ASI
series G, Ecological Sciences 1, 72, 1983.
25. Williams, D.M and Ebach, M.C., The Foundations of Comparative Biology, Kluwer Academic-Plenum
Publishers, submitted.
26. Williams, D.M., Classification, Collections, Diatoms and Biogeography, in The British Phycological
Society 50th Jubilee Meeting, Programme and Abstracts, 6, 2002.
27. Hubbard, C.E., William Turrill, Biographical Memoirs of Fellows of the Royal Society, 17, 689, 1971.
28. Turrill, W.B., Some taxonomic aims, methods and principles: their possible application to the algae,
Nature, 169, 388, 1952.
29. Turrill, W.B., The investigation of plant species, Proc. Linn. Soc. Lond., 147, 104, 1935.
30. Turrill, W.B., Species, J. Bot., 63, 359, 1925.
31. Winsor, M.P., The English debate on taxonomy and phylogeny, 1937–1940, Hist. Phil. Life Sci., 17,
227, 1995.
32. Huxley, T.H., Evolution: The Modern Synthesis, G. Allen and Unwin, London, 1942.
33. Turesson, G., The species and the variety as ecological units, Hereditas, 3, 100, 1922.
34. Gilmour, J.S.L. and Heslop-Harrison, J., The deme terminology and the units of micro-evolutionary
change, Genetica, 27, 147, 1954.
35. Walters, S.M., Experimental and orthodox taxonomic categories and the deme terminology, Plant Syst.
Evol., 167, 35, 1989.
36. Danser, H.B., Über die begriffe komparium, kommiskuum und konvivien und über die Entstehungs-
weise der konvivien, Genetica, 11, 399, 1929.
37. Gilmour, J.S.L. and Turrill, W.B., The aim and scope of taxonomy, Chronica Botanica, 6, 217, 1941.
38. Sokal, R.R. and Sneath, P.H.A., Principles of Numerical Taxonomy, W.H. Freeman and Co., San
Francisco, 1963.
39. Sneath, P.H.A., and Sokal, R.R., Numerical Taxonomy. The Principles and Practices of Numerical
Classification, W.H. Freeman and Co., San Francisco. 1973.
40. Winsor, M.P., Species, demes, and the omega taxonomy: Gilmour and The New Systematics, Biol.
Philos, 15, 349, 2000.
41. Boney, A.D., The ‘Tansley Manifesto’ affair, New Phytol., 118, 3, 1991.
42. Williams, D.M., Haeckel, E., and Agassiz, L., Trees that bite and their geographical dimension, in
What is Biogeography? Ebach, M.C. and Tangey, R., Eds., CRC Press, 2006.
43. Mayr, E., Principles of Systematic Zoology, McGraw Hill, New York, 1969.
44. Dayrat, B., Towards integrative taxonomy, Biol. J. Linn. Soc., 85, 407, 2005.
45. Nelson, G., Species and taxa: systematics and evolution, in Speciation and Its Consequences, Otte,
D. and Endler, J.A., Eds., Sinauer Associates, Sunderland, MA, 1989, 60.
46. Poulin, M. and Williams, D.M., Conservation of diatom biodiversity: a perspective, Proceedings of
the 15th International Diatom Symposium, 161, 2002.
47. Ebach, M.C. and Williams, D.M., Classification, Taxon, 53, 791, 2004.
9579_C019.fm Page 320 Saturday, November 11, 2006 2:01 PM
48. Kristensen, J. and Preisig, H.R., Encyclopedia of Chrysophyte genera, Bibliotheca Phycologia, 110,
1, 2001.
49. Kawai, H. et al., Schizocladia ischiensis: a new filamentous marine Chromophyte belonging to a new
class, Schizocladiophyceae, Protist, 154, 211, 2003.
50. O’Leary, M.A. et al., Building the mammalian sector of the tree of life: combining different data and
a discussion of divergence times for placental mammals, in Assembling the Tree of Life, Cracraft, J.
and Donoghue, M.J., Eds., Oxford University Press, Oxford, 2004, 490.
51. VanLandingham, S.L., Catalogue of the Fossil and Recent Genera and Species of Diatoms and Their
Synonyms, Cramer, Lehre, 1967–1979.
52. Ehrenberg, C.G., Verbreitung und Einfluss des mikroskopischen Lebens in Süd- und Nord-Amerika,
Abhandlungen der Königliche Akademie der Wissenschaften zu Berlin (1841), 1844, 291.
53. Kützing, F.T., Die kieselschaligen Bacillarien oder Diatomeen, Nordhausen, 152 S., 30 Taf. 1844,
Auflage 2, 1865.
54. Cleve, P.T., Synopsis of naviculoid diatoms, Kongliga Svenska Vetenskaps-Akademiens Handlingar,
26, 1, 1894.
55. Karsten, G., Abteilung Bacillariophyta (Diatomeae), in Die naturlichen Pflanzenfamilien, Peridineae
(Dinoflagellatae), Diatomeae (Bacillariophyta), Myxomycetes, Engler, A. and Prantl, K., Eds., Wilhelm
Engelmann, Leipzig, 2, 105, 1928.
56. Cox, E.J., Studies on the diatom genus Navicula Bory: the typification of the genus, Bacillaria, 2,
137, 1979.
57. Cox, E.J., and Williams, D.M., Systematics of naviculoid diatoms: the interrelationships of some taxa
with a stauros, Eur. J. Phycol., 35, 273, 2003.
58. Cox, E.J., and Williams, D.M., Systematics of Naviculoid diatoms (Bacillariophyta): a preliminary
analysis of protoplast and frustule characters for family and order level classification, Syst. Biodivers.,
4, 2006.
59. Lange-Bertalot, H. and Moser, G., Brachysira: Monoraphie der Gattung, Bibliotheca Diatomologica,
32, 1, 1994.
60. Kociolek, J.P., Comment: taxonomic instability and the creation of Naviculadicta Lange-Bertalot in
Lange-Bertalot and Moser, a new catch-all genus of diatoms, Diatom Res., 11, 219, 1996.
61. Williams, D.M., Comparative morphology of some species of Synedra with a new definition of the
genus, Diatom Res., 1, 131, 1986.
62. Williams, D.M., Some notes on the classification of Fragilaria, Synedra and their sub-groups, in
Microalgal Biology, Evolution and Ecology, Crawford, R.M., Moss, B., Mann, D.G., and Preisig,
H.R., Eds., Nova Hedwigia Beihefte 130, Gebrüder Borntraeger Verlagsbuchhandlung, Science Pub-
lishers, Stuttgart, 2006.
63. Kociolek, J.P., Historical constraints, species concepts and the search for a natural classification,
Diatom, 13, 3, 1997.
64. Ruck, E.C. and Kociolek, J.P., Preliminary phylogeny of the family Surirellaceae (Bacillariophyta),
Bibliotheca Diatomologica, 50, 1, 2004.
65. Williams, D.M. and Reid, G., Amphorotia nov. gen., a new genus in the family Eunotiaceae (Bacil-
lariophyceae), based on Eunotia clevei Grunow in Cleve et Grunow, Diatom Monographs, 6, 1, 2005.
66. Kingsley, C., The Water-Babies, A Fairy Tale for a Land Baby, Macmillan, London, 1863.
67. Ebach, M.C., Forum on biogeography: introduction, Taxon, 53, 889, 2004.
68. Walter, H.S., Understanding places and organisms in a changing world, Taxon, 53, 905, 2004.
69. Avise, J.C., What is the field of biogeography, and where is it going? Taxon, 53, 893, 2004.
70. Parenti, L.R. and Humphries, C.J., Historical biogeography, the natural science, Taxon, 53, 899, 2004.
71. Ebach, M.C. and Tangey, R., What is Biogeography? CRC Press, 2006.
72. Williams, D.M. and Ebach, M.C., The reform of palaeontology and the rise of biogeography: 25 years
after ‘Ontogeny, Phylogeny, Paleontology and the Biogenetic law’ (Nelson 1978), J. Biogeogr., 31,
685, 2004.
73. Williams, D.M., Fossil species of the diatom genus Tetracyclus (Bacillariophyta, ‘ellipticus’ species
group): morphology, interrelationships and the relevance of ontogeny, Phil. Trans. R. Soc., Lond. B,
351, 1759, 1996.
74. Candolle, A.P., Géographie botanique, in Dictionnaire des Sciences Naturelles, XVIII, Strasbourg and
Paris, 1820.
9579_C019.fm Page 321 Saturday, November 11, 2006 2:01 PM
Large and Species Rich Taxa: Diatoms, Geography and Taxonomy 321
75. Nelson, G., From Candolle to Croizat: comments on the history of biogeography, J. Hist. Biol., 11,
269, 1978.
76. Finlay, B.J., Esteban, G.F., and Fenchel, T., Protist diversity is different? Protist, 155, 15, 2004.
77. Hustedt, F., Bacillariophyta (Diatomeae), in Die Süsswasser-Flora Mitteleuropas, 2nd ed., Pascher,
A., Ed., G. Fischer, Jena, 1930, 10.
78. Patrick, R. and Reimer, C.W., The diatoms of the United States exclusive of Alaska and Hawaii,
Monographs of the Natural Sciences of Philadelphia, 13, 1, 1966.
79. Kociolek, J.P. and Spaulding, S.A., Freshwater diatom biogeography, Nova Hedwigia, 71, 223. 2000.
80. Rumrich, U., Lange-Bertalot, H., and Rumrich, M., Diatoms of the Andes (from Venezuela to Patagonia/
Tierra del Fuego), Annotated Diatom Micrographs, Iconographia Diatomologica, 9, 1, 2000.
81. Williams, D.M., On diatom endemism and biogeography: Tetracyclus and Lake Baikal endemic
species, Proceedings of the 17th International Diatom Symposium, 433, 2004.
82. Finlay, B.J. and Clarke, K.J., Ubiquitous dispersal of microbial species, Nature, 400, 828, 1999.
83. Simpson, G.G., Probabilities of dispersal in geological time, Bull. Am. Mus. Nat. Hist., 99, 163, 1952.
84. Lund, J.W.G., Annual Report of the Freshwater Biological Association, 70, 43, 2002.
85. Wilkinson, D.M., Dispersal, cladistics and the nature of biogeography, J. Biogeogr., 30, 1779,
2003.
86. Williams, D.M., Diatom biogeography: some preliminary considerations, Proceedings of the 13th
International Diatom Symposium, BioPress Ltd, 311, 1994.
87. Nelson, G. and Platnick, N., Systematics and Biogeography: Cladistics and Vicariance, Columbia
University Press, New York, 1981, 567.
88. Du Toit, A., Tertiary mammals and continental drift: a rejoinder to George G. Simpson, Am. J. Sci.,
242, 145, 1944.
89. McCarthy, D., Biogeography and scientific revolutions, The Systematist, 25, 3, 2005.
90. Ehrenberg, G.C., On infusorial deposits on the River Chutes in Oregon, Am. J. Sci., 2nd ser., 9, 140,
1850.
91. Bailey, J.C. et al., Phaeothamniophyceae classis nova.: a new lineage of chromophytes based upon
photosynthetic pigments, rbcL sequence analysis and ultrastructure, Protist, 149, 245, 1998.
92. Andersen, R.A., Potter, D., and Bailey, C.J., Pinguiococcus pyrenoidosus gen. et sp. nov. (Pinguio-
phyceae), a new marine coccoid alga, Phycol. Res., 50, 57, 2002.
93. Honda, D. and Inouye, I., Ultrastructure and taxonomy of a marine photosynthetic stramenopile
Phaeomonas parva gen. et sp. nov. (Pinguiophyceae) with emphasis on the flagellar apparatus archi-
tecture, Phycol. Res., 50, 75, 2002.
94. Kawachi, M., Noël, M.H., and Andersen, R.A., Re-examination of the marine ‘chrysophyte’ Polypodochrysis
teissieri (Pinguiophyceae), Phycol. Res., 50, 91, 2002.
95. Kawachi, M. et al., Pinguiochrysis pyriformis gen. et sp. nov. (Pinguiophyceae), a new picoplanktonic
alga isolated from the Pacific Ocean, Phycol. Res., 50, 49, 2002.
96. Kawachi, M. et al., The Pinguiophyceae classis nova, a new class of photosynthetic stramenopiles
whose members produce large amounts of omega-3 fatty acids, Phycol. Res., 50, 31, 2002.
97. O’Kelly, C.J., Glossomastix chrysoplasta n. gen., n. sp. (Pinguiophyceae), a new coccoidal, colony-
forming golden alga from southern Australia, Phycol. Res., 50, 67, 2002.
98. Kristensen, N.P., Ed., Lepidoptera, Moths and Butterflies: Vol. 1 Evolution, Systematics, and Bioge-
ography, W. de Gruyter, Berlin, New York, 1999, 1.
99. Brummitt, R.K., Vascular Plant Families and Genera, Royal Botanic Gardens, Kew, 1992, 804.
100. Williams, D.M and Round, F.E., Revision of the genus Fragilaria, Diatom Res., 2, 267, 1987.
101. Williams, D.M. and Round, F.E., Revision of the genus Synedra Ehrenb., Diatom Res., 1, 313, 1986.
102. Metzeltin, D. and Lange-Bertalot, H., Diatoms from the ‘Island Continent’ Madagascar, Annotated
Diatom Micrographs, Iconographia Diatomologica, 11, 1, 2002.
103. Spaulding, S.A., and Kociolek, J.P., Freshwater diatoms (Bacillariophyceae), in Natural History of
Madagascar, Goodman, S. and Benstead, J., Eds., University of Chicago Press, 276, 2003.
104. Metzeltin, D. and Lange-Bertalot, H., Diversity–taxonomy–geobotany: tropical diatoms of South
America I: about 700 predominantly rarely known or new taxa representative of the neotropical flora,
Annotated Diatom Micrographs, Iconographia Diatomologica, 5, 1, 1998.
105. Metzeltin, D., Lange-Bertalot, H., and Garcia-Rodriguez, F., Diatoms of Uruguay, Annotated Diatom
Micrographs, Iconographia Diatomologica, 15, 1, 2004.
9579_C019.fm Page 322 Saturday, November 11, 2006 2:01 PM
106. Siver, P.A. et al., Diatoms of North America. The freshwater flora of Cape Cod, Annotated Diatom
Micrographs, Iconographia Diatomologica, 14, 1, 2005.
107. Lange-Bertalot, H., and Genkal, S.I., Phytogeography–diversity–taxonomy: diatoms from Siberia I:
islands in the Arctic Ocean (Yugorsky–Shar Strait), 2nd corrected printing, Annotated Diatom Micro-
graphs, Iconographia Diatomologica, 6, 1, 1999.
108. Moser, G., Steindorf, A., and Lange-Bertalot, H., Neukaledonien. Diatomeenflora einer Tropeninsel,
revision der collection Maillard und untersuchung neuen materials, Bibliotheca Diatomologica, 32,
1, 1995.
109. Moser, G., Lange-Bertalot, H., and Metzeltin, D., Island of endemics, New Caledonia: a geobotanical
phenomenon, Bibliotheca Diatomologica, 38, 1, 1998.
110. Moser, G., Die Diatomeenflora von Neukaledonien–Systematik–Geobotanik — Ökologie: Ein Fazit,
Bibliotheca Diatomologica, 43, 1, 1999.
111. van de Vijver, B., Frenot, Y., and Beyens, L., Freshwater Diatoms from Ile de la Possession (Crozet
Archipelago, Subantarctica), Bibliotheca Diatomologica, 46, 1, 2002.
112. Van de Vijver, B., Beyens, L., and Lange-Bertalot, H., The genus Stauroneis in the Arctic and (sub-)
Antarctic regions, Bibliotheca Diatomologica, 51, 1, 2004.
9579_C020.fm Page 323 Saturday, November 11, 2006 2:04 PM
G. C. Zuccarello
School of Biological Sciences, Victoria University of Wellington,
New Zealand
CONTENTS
20.1 Introduction..........................................................................................................................324
20.2 The Red Algae.....................................................................................................................325
20.2.1 Higher Classification of Red Algae .......................................................................325
20.2.2 Orders of Red Algae ..............................................................................................326
20.2.3 Speciation in Red Algae ........................................................................................327
20.2.4 Population Structure of Red Algae........................................................................331
20.2.5 Reproductive Isolating Mechanisms......................................................................331
20.2.6 Examples of Speciation Studies in Red Algae......................................................332
20.3 Conclusions..........................................................................................................................333
References ......................................................................................................................................334
ABSTRACT
The algae are a non-monophyletic group of highly numerous organisms which exhibit extremely
diverse morphologies. It is estimated that there are >350,000 species of algae globally, although
only a fraction of this number have been described. Here we use the red algae (Rhodophyta) to
demonstrate the approaches that have been taken in their identification and classification. The red
algae are an ancient and morphologically highly diverse group with about 5,800 described species.
Different approaches have been employed in their classification, including anatomical, biochemical
and physiological studies, but molecular studies have also had a profound impact on our under-
standing of their evolution. Ordinal classification has increased from four orders recognised in the
nineteenth century to 30 currently recognised. Based primarily on morphological observations,
some orders have considerably more species than others; for example, the largest is Ceramiales
with c. 2,300 species, with some of its genera being very large (>200 species). We explore in the
red algae what factors, many of them unique to this group, may have led to high levels of speciation
323
9579_C020.fm Page 324 Saturday, November 11, 2006 2:04 PM
(reproductive isolation). In red algae, levels of genetic uniqueness are shown to be correlated with
reproductive isolation and not always with morphological distinctness. There is also evidence that
red algal populations are highly differentiated over small distances. In addition, red algae have
unique reproductive systems that may lead to the easy acquisition of reproductive isolation. In order
to obtain a greater understanding of the causes and mechanisms of reproductive isolation in red
algae, we propose that a new line of research targeted at reproductive incompatibility be explored.
We conclude that the continuation of a multifaceted approach, including molecular techniques,
population studies and cell biology remains necessary to illuminate evolution in the red algae.
20.1 INTRODUCTION
The algae are an artificial, non-monophyletic, grouping of highly numerous organisms exhibiting
extremely diverse morphologies which have been grouped together on the basis of their ability to
photosynthesise and that they are not higher plants (see Williams and Reid, Chapter 19). They
have been variously classified from Linnaeus’1 concept in 1753 of a few genera within a subdivision
of the class Cryptogamia, which also included the ferns, mosses and fungi, through some 15 genera2
to the latest classifications in which we find the eukaryotic algae placed among several of the major
lineages of the organisms on Earth3,5. Keeling4 has tentatively identified five supergroups of eukary-
otes where the distribution of the plants reflects a history of endosymbiosis. Algae are found in
four of these five groups. Green and red algae and glaucophytes are in Primoplantae, which arose
from primary cyanobacterial symbiosis. Dinoflagellates, phaeophytes, chrysophytes, bacillario-
phytes and haptophytes are placed in Chromalveolates and possibly have plastids as a result of red
algal secondary symbiosis. Chlorarachniophytes are in Rhizaria and euglenophytes in Excavates,
and both these algal groups are thought to have evolved ‘plant like’ attributes through secondary
symbiosis via green algal endosymbionts (Palmer et al.5, and see Hodkinson and Parnell, Chapter 1,
for a phylogenetic tree of these groups). If we are to continue to refine the tree of life and resolve
its structure at different levels of classification, then it is necessary to continue to identify and
describe algae, to understand the interrelationships of algal groups and to understand some of the
mechanisms that lead to this great diversity.
It has been estimated that there could be over 350,000 species of algae6 (Table 20.1). To put
this into context, despite this figure being subject to large margins of error, there are c. 300,000
higher plant species (bryophytes, pteridophytes, gymnosperms and angiosperms). However, the
number of species estimated for each algal group varies enormously. While some groups of algae,
such as the brown algae, appear to be relatively well documented taxonomically, the number of
described species for others, such as the diatoms (silicated unicells) falls well below the estimated
total (Table 20.1). Whatever the ultimate reality, evidence suggests that there is still an enormous
number of species to be identified, particularly in some groups. For example, diatoms require much
further study to determine their diversity (see Williams and Reid, Chapter 19), and the same is true
for algae that occur in terrestrial habitats such as on bark, leaves and rocks6, and for minute
planktonic organisms that have only been recognised in the last quarter century7. Whatever the true
figure of algal diversity, we are faced with a bewildering number of species and therefore need to
consider the ways in which species identification of large taxa can be studied. We also need to
consider the implications of the additional knowledge of all new species at all levels of classification
in relation to the topology of the tree of life.
The red algae (phylum Rhodophyta) have approximately 5,800 described species8, making them
a species rich group within the algae. In this chapter, we examine how this group of algae has been
studied in order to identify and classify species. The first part of the chapter defines the red algae
and how phycologists have tried to identify and classify species using both traditional and modern
techniques. The second part examines the higher classification and describes how this has developed
over the last 250 years. The third part explores work in defining red algal species and factors that
may have lead to these high levels of diversity. In this chapter we concentrate on the subphylum
9579_C020.fm Page 325 Saturday, November 11, 2006 2:04 PM
Systematics of the Species Rich Algae: Red Algal Classification, Phylogeny and Speciation 325
TABLE 20.1
The Eukaryote Algal Groups and Estimated Numbers of Species in Each Division
Estimated No. of Estimated No.
Division Subdivision Algal Groups Described Species of Species
Eurhodophytina, which reflects the new red algal classification system of Saunders and
Hommersand9 and is the most species-rich red algal group.
the origins of the current classification started by Linnaeus in the eighteenth century to the time
that they were writing. The difficulty of identifying and classifying red algae as a consequence of
morphological variation is well illustrated from the beginning of such studies. Linnaeus’1 class
Cryptogamia in which all algae were placed, was created for the groups of species which did not
display phanerogamic reproductive structures and was included in just one of 24 classes (the other
23 being devoted to the phanerogams). Linnaeus’ classification accepted 14 genera of algae, but
essentially only Chara, Conferva, Fucus and Ulva contained organisms now accepted as algae17.
Species of red algae were referred to on the basis of their morphological forms (Conferva, slender
and filamentous; Fucus, fleshy or cartilaginous thalli; Ulva, flat, membranous thalli).
In the nineteenth century, colour was used by both Lamouroux18 and Harvey19 to split up the
algae. In 1813 Lamouroux18 established the Floridée for what we now know as species belonging
to the red algae, and from which Florideophyceae is derived. In 1836 Harvey19 divided the algae
into Rhodospermae (red algae), Melanospermae (brown algae), Chlorospermae (green algae) and
Diatomaceae (diatoms). The difficulty of classifying the red algae based on morphology and colour
is well illustrated when considering what were originally thought of as red algae. Harvey’s19
Rhodospermae was largely accurate, with the exception of Porphyra and Bangia, which were placed
in the green algae until Berthold20 in 1882 aligned them with the red algae. The ability to distinguish
the red seaweeds from the browns and greens can still be a problem for the beginner today.
By the start of the twentieth century the two subclasses of the red algae, Bangiophycidae (as
Bangioideae) and Florideophyceae (as Florideae) were established and have remained a subject for
debate since that time. At the supraordinal level, relationships had remained virtually unchanged
since 1900 until Magne21 proposed a new scheme in 1989 based on morphological characters and
reproductive systems. He proposed three subclasses for Rhodophyceae: Archaeorhodophycidae
(without sporangia), Metarhodophycidae (having only a portion of the parent cell converted to a
sporangium) and Eurhodophycidae (having sporangia), which includes Bangiales and Florideo-
phyceae. Magne’s hypothesis was an attempt to infer monophyletic lineages but failed to take into
account the diversity within and between lineages.
The next major revision was by Saunders and Hommersand9, who proposed a classification
based on recent and traditional evidence. In this classification, summarised in Figure 20.1, all the
species encompassed traditionally in Rhodophyta have been placed in the subkingdom of
Rhodoplantae, with two phyla: Cyanidiophyta and Rhodophyta. Rhodophyta are split into three
subphyla: Rhodellophytina, Metarhodophytina and Eurodophytina. Bangiophyceae and Floridae
are raised to class status within Eurodophytina. Floridae are further subdivided into four subclasses;
Hildenbrandiophycidae, Nemaliophycidae, Ahnfeltiophycidae and Rhodomeniophycidae. There is
a lot of work to be done to refine this classification, as acknowledged by Saunders and
Hommersand9, but for the first time in the history of red algal classification, it should be a better
reflection of evolutionary history and diversity.
Systematics of the Species Rich Algae: Red Algal Classification, Phylogeny and Speciation 327
KINGDOM PLANTAE
SUBKINGDOM Rhodoplantae
FIGURE 20.1 Summary of higher level classification of red algae (Rhodoplantae). (From Saunders and
Hommersand9. With permission.)
Hommersand26, whose meticulous study of Gracilaria verrucosa (Hudson) Papenfuss, using aceto-
iron-hematoxylin-chloral hydrate27 to stain nuclei, enabled them to erect the order Gracilariales.
However, the next major breakthrough was the work of Pueschel and Cole28, who examined
the fine structure of pit plugs (occlusions in the small pore between cells following cell division)
in red algae. The structure of these pit plugs was found to be stable and a useful systematic character
in resolving monophyletic groups (although not further phylogenetic relationships; see Garbary
and Gabrielson29). As a consequence of their observations, Pueschel and Cole28 were able to provide
further support for orders already in existence and to propose some other orders (Table 20.2).
Garbary and Gabrielson29 used cladistic analysis to attempt to resolve phylogenetic relationships
amongst the red algae. However, a major advance has been the use of molecular DNA sequence
data, and this has enabled a much clearer understanding of phylogenetic relationships. For the red
algae in particular, after nearly 250 years, our understanding of all levels of classification has been
revolutionised, particularly where molecular data has been used in conjunction with traditional
techniques. It is this that has enabled Saunders and Hommersand9 to almost double the number of
orders from those presented by Garbary and Gabrielson29. No doubt the new classification will
change with time, but the changes proposed because of molecular studies are arguably the most
fundamental that we have seen since Linnaeus created Cryptogamia.
TABLE 20.2
Classifications of Red Algae to Show the Increase in the Number of Recognised Orders over the Last Half Century
Dixon and Irvine Garbary and Saunders and
Fritsch (1945)66 Kylin (1956)67 Dixon (1973)17 (1977)16 Pueschel and Cole (1982)28 Gabrielson (1990)29 Hommersand (2004)9
Plocamiales
Cyanidiales
a Names in capital letters are classes.
b See Christensen68 regarding change from Nemalionales to Nemaliales.
Source: Data from Garbary and Gabrielson29 and Saunders and Hommersand9.
Systematics of the Species Rich Algae: Red Algal Classification, Phylogeny and Speciation
329
9579_C020.fm Page 330 Saturday, November 11, 2006 2:04 PM
TABLE 20.3
Number of Species of Red Algae for Each Order
Orders of Red Algae Numbers of Species
BANGIOPHYCEAE
Porphyridiales 19-23(?)
Compsopogonales 12
Rhodochaetales 1
Erythropeltidales 45
FLORIDEOPHYCEAE
Acrochaetiales 275
Nemaliales 201
Palmariales 43
Corallinales 564
Gelidiales 166
Hildenbrandiales 17
Batrachospermales 147
Bonnemaisoniales 34
Gigartinales 920
Rhodymeniales 315
Ceramiales 2,300
Balbianiales 3
Balliales 6
Colaconematales 19
Rhodogorgonales 2
Thoreales 15
Gracilariales 222
Ahnfeltiales 9
Pihiellales 1
Bangiales 124
Halymeniales 241
Nemastomatales*
Plocamiales 40
Cyanidiales 5
a In Gigartinales in Guiry et al.8
Source: Data from Guiry et al.8
Batrachospermum, they are all marine genera. These genera are all characteristically common and
widely distributed. They also present taxonomic problems because of their variable morphology
and because generic circumscriptions have been in various states of flux.
Many of the large genera have been the subject of DNA sequence studies in recent years and
have been split into smaller genera as new phylogenetic hypotheses have revealed monophyletic
groups that share previously unappreciated synapomorphies (Laurencia30,31 and Polysiphonia32).
Furthermore, many genera that were thought to consist of few species have been shown to reveal
many highly divergent lineages, such as Bangia33,34.
What has driven this great diversity and high numbers of species? The question has to be
qualified with the observation that most of these recognised ‘species’ are still based on morpho-
logical, and often typological, concepts of species. Very few studies have taken the concept of
species beyond this alpha taxonomic description, but what has been done reveals that diversity at
the species level is probably even greater than previously recognised.
9579_C020.fm Page 331 Saturday, November 11, 2006 2:04 PM
Systematics of the Species Rich Algae: Red Algal Classification, Phylogeny and Speciation 331
Let us propose that we consider an entity as a species in the genetic and evolutionary sense
when certain criteria either have been demonstrated or are at least strongly suspected. The main
criterion is some form of reproductive isolation35. It is important that this is tested experimentally
or that distinct entities are found in sympatry that remain distinct. The strength of the reproductive
isolation cannot easily be predicted a priori and needs experimental, ecological and demographic
studies to be performed. Another criterion is that they should show some level of genetic uniqueness.
However, the level of distinctness needed is not easy to determine globally. The genetic markers
used in these analyses need to be critically explored. Nearly all phylogenetic studies in red algae
involve sets of genetic markers which, though useful at higher levels, may not be appropriate for
providing information on reproductive processes in speciating and newly speciated algae36. Yet
genetic uniqueness, percentage of genetic divergence and phylogenetic topology have been used
to indicate species status in some algae (see example in Saunders and Lehmkuhl37). More impor-
tantly, in the few algal groups studied, levels of genetic divergence have been shown to be correlated
with reproductive isolation38,39 and not always with morphological distinctness.
Reproductive isolation is an important criterion in many species definitions, particularly in
producing populations that have unique evolutionary trajectories. Reproductive isolation has been
criticised mainly because it is often not feasible to test, and, if used in the strict sense (100%
reproductive isolation for species status), would exclude many well recognised species that can
form viable hybrids with other species35. Reproductive isolation is often a consequence of population
isolation, with divergent populations accumulating mutations that cause gamete incompatibility or
hybrid sterility/inviability. An understanding of red algal population genetics is important if we are
to gain insights into how populations differentiate and hence speciate. Also important in our
understanding of speciation in red algae is knowledge into factors that cause prezygotic isolation
(incompatibilities of gametes to fuse, inability of gametic nuclei to fuse) or postzygotic isolation
such as improper zygote (carposporophyte) division and inviability of meiosis.
Once sperm-to-egg contact has been made, enzymes are needed for the digestion of the two cell
walls before the respective membranes and cytoplasms can fuse. In both these interactions, genetic
changes in isolated populations can lead to incompatibility when populations come into contact
again. The extended egg receptive area means that once a sperm nucleus has entered the egg
cytoplasm it must travel down the egg to the egg nucleus. This interaction involves actin filaments
within the extended egg structure and motor proteins (myosin) on the sperm nuclei50,51, another
area where incompatibilities can arise.
Also unique in red algae is the fate of the zygote nucleus. In contrast to most organisms in
which the zygote is released to produce the diploid individual, the red algal zygote is amplified
while still attached to the female gametophyte before it releases diploid spores (zygote copies).
The process by which this occurs is unique to different groups of red algae and is used in higher-
level classification, but is complex enough that the dividing zygote is usually considered an alternate
stage of the life cycle (the carposporophyte, in a three life history stage sexual cycle). This unique
development is also an area in which incompatibility is manifested, even before a free living diploid
is formed. This has been shown in some experiments in which red algae from different locations,
and/or genetic types, were artificially hybridised. Carposporophytes began forming and then aborted
forming ‘pseudocystocarps’ (partially formed carposporophytes plus surrounding female tissue)
(Brodie et al.52; Zuccarello and West53; Kamiya et al.54 and references therein), suggesting that the
diploid hybrid nucleus was not able to properly coordinate carposporophyte development. Many
of these processes are only now starting to be investigated using molecular methods, but all these
unique red algal attributes indicate that there is greater scope for red algae to develop incompatible
reproductive interactions and hence to lead to reproductively isolated species.
Systematics of the Species Rich Algae: Red Algal Classification, Phylogeny and Speciation 333
northern refugium around the English Channel62. This scenario would mean that reproductive
isolation occurred between the interglacial cycles before the population could have reconnected
(several tens of thousands of years). We have no knowledge of whether this is considered a fast or
slow rate of reproductive isolation acquisition in algae.
These data indicate that even low levels of genetic divergence in a commonly used nonfunctional
intraspecies marker, the rubisco spacer, are correlated with reproductively isolated individuals. How
far this can be projected is difficult to say, but in the few studies that have used rubisco spacer data
and hybridisation experiments a similar pattern is seen. In Caloglossa postiae M. Kamiya and R. J.
King, samples collected from Japan and Australia do not differ in rubisco spacer sequence and
only produce pseudocystocarps when artificially hybridised63. In a study of Spyridia filamentosa
(Wulfen) Harvey in which only a limited amount of hybridisations were possible a difference of
only 7 bp in the rubisco spacer correlated with reproductive isolation between two samples38. Also
in Caloglossa vieillardii (Kützing) Setchell samples differing by 13 bp are also reproductively
isolated (data not shown).
The only other well documented example within red algae in which molecular studies and
hybridisation data have been used to understand speciation and the prevalence of cryptic species is
within the Bostrychia radicans/B. moritziana complex39,53,64,65. This cosmopolitan tropical species
group is composed of seven distinct genetic lineages that are highly divergent and morphologically
indistinguishable39. These lineages are also reproductively isolated, representing cryptic species with
either narrow or wide-ranging distributions, with lineages being sympatric in certain populations39,65.
Along the East coast of the United States, two lineages are found. Within one of these lineages
different chloroplast and mitochondrial haplotypes are found along the East coast, with one haplotype
found in Northern populations (haplotype B; New York, Virginia, North Carolina) and another in
more Southern populations (haplotype C; Georgia, Florida). Both haplotypes are found in intermediate
areas (South Carolina)39. These two haplotypes differ by 4 bp differences in the rubisco spacer and
are reproductively isolated. Haplotype C is 1 bp different from samples of the same species complex
found in Pacific Mexico (haplotype A), and yet these samples show intermediate levels of reproductive
isolation (diploids are formed but they do not go through meiosis)39. These data indicate that even
within phylogenetically well supported lineages, reproductive isolation can be present, increasing the
number of evolutionary lineages within this red algal group.
The increasing number of molecular studies at the species/population level in red algae will
increase our knowledge of genetic variation, evolution and genetic structure within species. This
work must go beyond the standard molecular markers to incorporate more unlinked nuclear markers
that will give clues to hybridisation within and between populations. These studies must still be
combined with time-consuming algal culturing and hybridisation studies. Only with knowledge of
reproductive isolation within alga groups can we couple our ever-increasing molecular data with
an important reproductive parameter. A new line of research should be targeted at reproductive
incompatibility in red algae. Combining our knowledge of sister species or semi-isolated popula-
tions with an understanding of the fertilisation process and cellular/physiological areas of incom-
patibility will lead to a greater understanding of the causes and mechanisms of reproductive isolation
in red algae.
20.3 CONCLUSIONS
In conclusion, molecular data will continue to illuminate the evolutionary relationships within the
red algae at multiple levels. The use of multiple molecular markers and better taxon sampling will
lead to more natural higher-level taxonomies. A population focus in species studies (greater sam-
pling within and between populations) will lead to a better understanding of the tempo and history
of speciation in these organisms. Concentration on cell biological techniques may unravel the
complex physiological process that occur in the unique prezygotic and postzygotic processes that
are crucial in establishing and maintaining distinct species in this diverse and important algal group.
9579_C020.fm Page 334 Saturday, November 11, 2006 2:04 PM
REFERENCES
1. Linnaeus, C., Species plantarum, exhibentes plantas rite cognitas, ad genera relatas, cum differentiis
specificis, nominibus trivialibus, synonymis selectis, locis natalibus digestas, Tomas I, Cryptogamia,
Salvi, Stockholm, Sweden, 1753, 1061.
2. Hoek, C. van den, Mann, D.G. and Jahns, H.M., Algae: An Introduction to Phycology, Cambridge
University Press, Cambridge, 1995, 623.
3. Baldauf, S.L., The deep roots of eukaryotes, Science, 300, 1703, 2003.
4. Keeling, P.J., Diversity and evolutionary history of plastids and their hosts, Am. J. Bot., 91, 1481, 2004.
5. Palmer, J.D., Soltis, D.E., and Chase, M.W., The plant tree of life: an overview and some points of
view, Am. J. Bot., 91, 1437-1445, 2004.
6. World Conservation Monitoring Centre, Global Biodiversity: Status of the Earth’s Living Resources,
Chapman and Hall, 1992.
7. Moon-van der Staay, S.Y., de Wachter, R., and Vaulot, D., Oceanic 18S rDNA sequences from
picoplankton reveal unsuspected eukaryotic diversity, Nature, 409, 607, 2001.
8. Guiry, M.D., Rindi, F., and Guiry, G.M., AlgaeBase version 4.0, National University of Ireland,
Galway, http://www.algaebase.org, 2005.
9. Saunders, G.W. and Hommersand, M.H., Assessing red algal supraordinal diversity and taxonomy in
the context of contemporary systematic data, Am. J. Bot., 91, 1494, 2004.
10. Woelkerling, W.J., An introduction, in Biology of the Red Algae, Cole, K.M. and Sheath, R.G., Eds.,
Cambridge University Press, Cambridge, 1990, chap. 1.
11. Butterfield, N.J., Bangiomorpha pubescens n. gen., n. sp.: implications for the evolution of sex,
multicellularity, and the Mesoproterozoic/Neoproterozoic radiation of eukaryotes, Paleobiology, 26,
386, 2000.
12. Brodie, J. and Irvine, L.M., Seaweeds of the British Isles Volume 1 Part 3B Bangiophycidae, Intercept,
Hampshire, 2003, 167.
13. Littler, D.S., Marine Plants of the Caribbean: A Field Guide from Florida to Brazil, Smithsonian
Institution Press, Washington, DC, 1989, 263.
14. Womersley, H.B.S., The Marine Benthic Flora of Southern Australia Part IIIC, State Herbarium of
South Australia, South Australia, 1998, 535.
15. Silva, P.C., Basson, P.W., and Moe, R.L., Catalogue of the benthic marine algae of the Indian Ocean,
Univ. Calif. Publs. Bot., 79, 1.
16. Dixon, P.S. and Irvine, L.M., Seaweeds of the British Isles Volume 1 Rhodophyta Part 1: Introduction,
Nemaliales, Gigartinales, British Museum (Natural History) London, 1977, 252.
17. Dixon, P.S., Biology of Rhodophyta, Oliver and Boyd, Edinburgh, 1973, 285.
18. Lamouroux, J.V.F., Essai sur les genres de la famille de Thalassiophytes non articulées, Annales du
Muséum (National) d’Histoire Naturelle (Paris), 20, 21, 1813.
19. Harvey, W.H., Algae, in Flora Hibernica Vol. 2, Mackay, J.T., Ed., Curry, Dublin, Ireland, 1836, 157.
20. Berthold, G., Fauna and Flora des Golfes von Neapel VIII Die Bangiaceen des Golfes von Neapel,
Leipzig, 1882, 1.
21. Magne, F., Classification et Phylogénie des Rhodophycées, Cryptog., Algol., 10, 101, 1989.
22. Schmitz, F. and Hauptfleisch, P., Rhodophyceae, in Die Naturlichen Pflanzenfamilien, vol. 1(2), Engler,
A. and Prantl, K., Eds., Englemann, Leipzig, 1896/7, 298
23. Oltmanns, F., Morphologie und Biologie der Algen 1, Fischer, Jena, 1904, 733.
24. Guiry, M.D., A preliminary consideration of the taxonomic position of Palmaria palmata (Linnaeus)
Stackhouse = Rhodymenia palmata (Linneaus) Greville, J. Mar. Biol. Assoc. UK, 54, 509, 1974.
25. Guiry, M.D., The importance of sporangia in the classification of the Florideophyceae, in Modern
Approaches to the Taxonomy of Red and Brown Algae, Systematics Association Special Volume 10,
Irvine, D.E.G. and Price, J.H., Eds., Academic Press, London, 1978, 111.
26. Fredericq, S. and Hommersand, M.H., Proposal of the Gracilariales ord. nov. (Rhodophyta) based on
an analysis of the reproductive development of Gracilaria verrucosa, J. Phycol., 25, 213, 1989.
27. Wittmann, W., Aceto-iron-haematoxylin-chloral hydrate for chromosome staining, Stain Technol., 40,
161, 1965.
28. Pueschel, C.M. and Cole, K.M., Rhodophycean pit plugs: an ultrastructural survey with taxonomic
implications, Am. J. Bot., 69, 703, 1982.
9579_C020.fm Page 335 Saturday, November 11, 2006 2:04 PM
Systematics of the Species Rich Algae: Red Algal Classification, Phylogeny and Speciation 335
29. Garbary, D.J. and Gabrielson, P.W., Taxonomy and evolution, in Biology of the Red Algae, Cole, K.M.
and Sheath, R.G., Eds., Cambridge University Press, Cambridge, 1990, chap. 18.
30. Nam, K.W., Maggs, C.A., and Garbary, D.J., Resurrection of the genus Osmundea with an emendation
of the generic delineation of Laurencia (Ceramiales, Rhodophyta), Phycologia, 33, 384, 1994.
31. Nam, K.W. et al., Taxonomy and phylogeny of Osmundea (Rhodomelaceae, Rhodophyta) in Atlantic
Europe, J. Phycol., 36, 759, 2000.
32. Choi, H.G. et al., Phylogenetic relationships of Polysiphonia (Rhodomelaceae, Rhodophyta) and its
relatives based on anatomical and nuclear small-subunit rDNA sequence data, Can. J. Bot., 79, 1465,
2001.
33. Broom, J.E.S., Farr, T.J., and Nelson, W.A., Phylogeny of the Bangia flora of New Zealand suggests
a southern origin for Porphyra and Bangia (Bangiales, Rhodophyta), Mol. Phylogenet. Evol., 31,
1197, 2004.
34. Müller, K.M., Cannone, J.J., and Sheath, R.G., A molecular phylogenetic analysis of the Bangiales
(Rhodophyta) and description of a new genus and species, Pseudobangia kaycoleia, Phycologia, 44,
146, 2005.
35. Coyne, J.A. and Orr, H.A., Speciation, Sinauer Associates, Sunderland, MA, 2004, 545.
36. Small, R.L., Cronn, R.C., and Wendel, J.F., Use of nuclear genes for phylogeny reconstruction in
plants, Aus. Syst. Bot., 17, 145, 2004.
37. Saunders, G.W. and Lehmkuhl, K.V., Molecular divergence and morphological diversity among four
cryptic species of Plocamium (Plocamiales, Florideophyceae) in northern Europe, Eur. J. Phycol., 40,
293, 2005.
38. Zuccarello, G.C., Sandercock, B., and West, J.A., Diversity within red algal species: variation in
world-wide samples of Spyridia filamentosa (Ceramiaceae) and Murrayella periclados
(Rhodomelaceae) using DNA markers and breeding studies, Eur. J. Phycol., 37, 403, 2002.
39. Zuccarello, G.C. and West, J.A., Multiple cryptic species: molecular diversity and reproductive
isolation in the Bostrychia radicans/B. moritziana complex (Rhodomelaceae, Rhodophyta) with focus
on North American isolates, J. Phycol., 39, 948, 2003.
40. Hoffmann, A.J., The arrival of seaweed propagules at the shore: a review, Bot. Mar., 30, 151, 1987.
41. Sosa, P.A. and Lindstrom S.C., Isozymes in macroalgae (seaweeds): genetic differentiation, genetic
variability and applications in systematics (Review), Eur. J. Phycol., 34, 427, 1999.
42. Valero, M. et al., Concepts and issues of population genetics in seaweeds, Cah. Biol. Mar., 42, 53, 2001.
43. Engel, C.R., Destombe, C., and Valero, M., Mating system and gene flow in the red seaweed Gracilaria
gracilis: effect of haploid-diploid life history and intertidal rocky shore landscape on fine scale genetic
structure, Heredity, 92, 289, 2004.
44. Faugeron, S. et al., Hierarchical spatial structure and discriminant analysis of genetic diversity in the
red alga Mazzaella laminarioides (Gigartinales, Rhodophyta), J. Phycol., 37, 705, 2001.
45. Wright, J.T., Zuccarello, G.C., and Steinberg, P.D., Genetic structure in the subtidal red alga Delisea
pulchra, Mar. Biol., 136, 439, 2000.
46. Zuccarello, G.C. et al., Population structure and physiological differentiation of haplotypes of Caloglossa
leprieurii (Rhodophyta) in a mangrove intertidal zone, J. Phycol., 37, 235, 2001.
47. Kim, G.H. and Fritz, L., Gamete recognition during fertilization in a red alga, Antithamnion nippon-
icum, Protoplasma, 174, 69, 1993.
48. Kim, G.H., Lee, I.K., and Fritz, L. Cell-cell recognition during fertilization in a red alga, Antithamnion
sparsum (Ceramiaceae, Rhodophyta), Plant Cell Physiol., 37, 621, 1996.
49. Kim, G.H. and Kim, S.H., The role of F-actin during fertilization in the red alga Aglaothamnion
oosumiense (Rhodophyta), J. Phycol., 35, 806, 1999.
50. Wilson, S.M., Pickett-Heaps, J.D., and West, J.A., Fertilization and the cytoskeleton in the red alga
Bostrychia moritziana (Rhodomelaceae, Rhodophyta), Eur. J. Phycol., 37, 509, 2002.
51. Wilson, S.M., West, J.A., and Pickett-Heaps, J.D., Time-lapse videomicroscopy of fertilization and the
actin cytoskeleton in Murrayella periclados (Rhodomelaceae, Rhodophyta), Phycologia, 42, 638, 2003.
52. Brodie, J., Guiry, M.D., and Masuda, M., Life history, morphology and crossability of Chondrus
ocellatus forma ocellatus and C. ocellatus forma crispoides (Gigartinales, Rhodophyta) from the
north-western Pacific, Eur. J. Phycol., 28, 183, 1993.
53. Zuccarello, G.C. and West, J.A., Hybridization studies in Bostrychia: 1. B. radicans (Rhodomelaceae,
Rhodophyta) from Pacific and Atlantic North America, Phycol. Res., 43, 233, 1995.
9579_C020.fm Page 336 Saturday, November 11, 2006 2:04 PM
54. Kamiya, M., Zuccarello, G.C., and West, J.A., Evolutionary relationships of the genus Caloglossa
(Delesseriaceae, Rhodophyta) inferred from large-subunit ribosomal RNA gene sequences, morpho-
logical evidence and reproductive compatibility, with description of a new species from Guatemala,
Phycologia, 42, 478, 2003.
55. Guiry, M.D., Species concepts in marine red algae, in Progress in Phycological Research 8, Round,
F.E. and Chapman, D.J., Eds., Biopress Ltd, Bristol, 1992, chap. 5.
56. Guiry, M.D. and West, J.A., Life history and hybridization studies on Gigartina stellata and Petrocelis
cruenta (Rhodophyta) in the north Atlantic, J. Phycol., 19, 474, 1983.
57. Zuccarello G.C. et al., A molecular re-examination of speciation in the intertidal red alga Mastocarpus
stellatus (Gigartinales, Rhodophyta) in Europe, Eur. J. Phycol., 40, 337, 2005.
58. Destombe, C. and Douglas, S.E., Rubisco spacer sequence divergence in the rhodophyte alga
Gracilaria verrucosa and closely related species, Curr. Genet., 19, 395, 1991.
59. Destombe, C., Correction, Curr. Genet., 22, 173, 1992.
60. Wattier, R. and Maggs, C.A., Intraspecific variation in seaweeds: the application of new tools and
approaches, Adv. Bot. Res., 35, 171, 2001.
61. Zuccarello, G.C. et al., A mitochondrial marker for red algal intraspecific relationships, Mol. Ecol.,
8, 1443, 1999.
62. Provan, J., Wattier, R.A., and Maggs, C.A., Phylogeographic analysis of the red seaweed Palmaria
palmata reveals a Pleistocene marine glacial refugium in the English Channel, Mol. Ecol., 14, 793,
2005.
63. Kamiya, M. et al., Reproductive and genetic distinction between broad and narrow entities of Calo-
glossa continua (Delesseriaceae, Rhodophyta), Phycologia, 38, 356, 1999.
64. Zuccarello, G.C. and West, J.A., Hybridization studies in Bostrychia 2: correlation of crossing data
and plastid DNA sequence data within B. radicans and B. moritziana (Ceramiales, Rhodophyta),
Phycologia, 36, 293, 1997.
65. Zuccarello, G.C., West, J.A., and King, R.J., Evolutionary divergence in the Bostrychia moritziana/B.
radicans complex (Rhodomelaceae, Rhodophyta): molecular and hybridization data, Phycologia, 38,
34, 1999.
66. Fritsch, F.E., Structure and Reproduction of the Algae 2, Cambridge University Press, Cambridge,
1945, 939.
67. Kylin, H., Die Gattungen der Rhodophyceen, Gleerup, Lund, 1956, 673.
68. Christensen, T., Two new families and some names and combinations in the algae, Blumea, 15, 91,
1967.
9579_Index.fm Page 337 Friday, November 17, 2006 1:55 PM
Index
f = figure; t = table
A Apistogramma, 217
Archaeorhodophycidae, 326
Acanthopterygii, 214–216 Aristida, 174
Acaronia, 217 Arthropoda, 194. See also Insects
Actual evapotranspiration (AET), 156 Ascomycota, 229–230
Adaptations Assembling Fungal Tree of Life Project (AFTOL), 239
breeding, 219–220 Astatoreochromis, 219
feeding, 218–219 Astatotilapia, 216
reproductive, 331–332 Asteraceae, 9, 140, 276
Adaptive radiation, 218–220 Asthenochloa, 174
Adaptive zones, 153 Astronotinae, 217
Adhesion protein family, 180 Astronotus, 217
Aequidens, 219 Axis paramorphism, 182
Age and diversification, 169–170
Ahnfeltiophycidae, 326
Alexfloydia, 15 B
Algae. See also appropriate genus or family
characteristics and species, 324–325 Bacillariophyta. See Diatoms
defined, 323–324 Bamboo, 236, 276
red Bambusa, 174, 279, 286
higher classification, 325–326, 328–329t Bangia, 326, 330
orders, 326–327 Bangiales, 326
population structure, 331 Bangioideae, 326
reproductive isolating mechanisms, 331–332 Bangiomorpha, 325
speciation, 327–331, 332–333 Bangiophycidae, 326
Algorithms, genetical, 117 Barcode Of Life Database (BOLD), 38–40, 42
All Birds Barcoding Initiative (ABBI), 38 Barcoding, DNA, 24, 36–40
Allopatric speciation, 153–218 Basidiomycota, 231–232
Allopolyploid hybrids, 139–141 Batrachospermum, 327, 330
Alpha taxonomy, 307–309 Beauveria, 229
Amphilophus, 215 Behavioural traits, 214, 218, 220
Amplified Fragment Length Polymorphism (AFLP), 98, Bias, taxonomic, 40–41
129, 136–142, 240 Bilaterian, 179, 185
Anabantids-nandids, 215 Biodiversity crisis, 3, 15, 27–28
Anamorph, 229, 233, 239 Biodiversity estimates, 4, 37, 199, 227, 233, 306, 309
Angiosperms, 8–10, 130, 129–148, 150, 165–176, Biogeography
251–256, 276–290, 311–312. See also appropriate diatoms, 315–318
genus or family grass family, 283–285
Ficus, 129–148 Biological Surveys and Inventories (BSI), 29
Poaceae (grasses), 165–176, 275–296 Biomass theory, 155
Syzygium, 251–276 Biotoecus, 216
Animals. See also appropriate genus or family Body patterning, 178
developmental genes, 180–183 Bone morphogenetic protein-4 gene, 179
discontinuous variation, 185–186 Bostrychia, 333
larval legs versus true legs, 185 Bower, 219
molecular phylogenetics in, 178–180 Breeding
roots of evolutionary developmental biology in, 178 biology of Syzygium, 266–267
segmentation of bodies, 184–185 cichlid adaptations, 219–220
Annealing, simulated, 115–116 British Phycological Society, 307
Anomochloa, 169, 280 Brown algae, 324–325
Antenna, 182 Bulbs, 158
337
9579_Index.fm Page 338 Friday, November 17, 2006 1:55 PM
Index 339
Index 341
Heuristics J
cladogram search, 115–119
homology determination, 119–122 Joining, quartet, 72–73
Hexapods (Hexapoda), 4, 8–9, 195 Juncaceae, 235–237, 280
Hibiscus, 136
Hildenbrandiophycidae, 326–327
Hollow Curve Distribution (HCD), 14–15, 253, 279 K
skewed distribution and, 165–176
Kew Bibliographic Databases, 278
Homology, 181
Key innovation hypothesis, 153–155, 270
assessment, 14
Kingdoms, 229
determination heuristics, 119–122
King Tut Exhibit, 26
multiple alignment methods, 120
Kyoto Protocols, 29
of process, 183
Kyphosids, 215
Homopholis, 15
Homoploid hybrids, 142
Horizontal gene transfer, 51
Hotspots, 150 L
Hox family, 180 Labridae, 214–215
Human capital in taxonomic/systematic research, 27, 203, Labroidei (labroids), 214–215, 218
206 Lactuca, 137–138
Hybrid phylogeny, 103 Lacustrine radiations, 217, 219
Hybrids Lankester, Sir E. Ray, 22–23
allopolyploid, 139–141 Large genera, 6–7, 237, 251, 279. See also appropriate
homoploid, 142 genus
in red algae, 332 coevolution of figs and their pollinating wasps,
131–132
diatoms, 312–315
I incongruence in phylogenetic trees of, 139–142
low copy nuclear markers in, 135–136
Inadequacy of taxonomic data and standards, 41–42 low levels of variation in standard markers, 132–135
Incongruence in phylogenetic trees, 52, 139–143 prospects of studying, 130
Independent contrasts, 153–154, 158 Larval legs, 185
Informatics, 30–31 Lateral gene transfer, 51, 52–54, 56
Insects, 4, 8–9, 194 Latitudinal gradient, 155
diversity Laurencia, 330
and classification, 4, 194–196 Leaf litter fungi, 237
drivers of, 197–199 Legs, larval versus true, 185
plant bug, 200–209 Leks, 219
numbers of species of, 194 Lepidiophagy, 218
plant coradiation, 197–199 Lepidoptera, 311–312
species richness, 199–200 Lethrinops, 220
taxonomy Lettuce, 137–138
impediments, 203–205 Lichens, 228
industrial cyber, 205–208 Life history impact on genus size, 171–173
Instars, 184 Linnaean naming system, 40, 42
Institutional cooperation, 30 Lobochilotes, 215
Institutional issues in taxonomic/systematic research, 25–26 Local search refinement, 115
Integrated Taxonomic Information System (ITIS), 278 Lophotrochozoan, 180
Internal Transcribed Spacer Regions (ITS), 133–138, 140, Low copy nuclear markers, 135–136
262–265, 280–281
International Commission of Botanical Nomenclature, 31
International Commission of Zoological Nomenclature, 31
M
International Plant Names Index (IPNI), 253, 278
Internet. See Electronic resources Macroalgae, 331
Iranocichla, 217 Magnolia, 237
Iridaceae, 157–159 Mahengochromis, 217
Irises, 157–159 Markers, DNA sequence
Isachne, 174 low copy, 135–136
Isolating mechanisms, 218, 323, 341 low levels of variation in standard, 132–135
Isophysis, 158 Mastocarpus, 332
Ixia, 159 Mathanosarcina, 51
9579_Index.fm Page 342 Friday, November 17, 2006 1:55 PM
Index 343
Index 345
Tuber, 229 W
Turbellarian Taxonomic Database, 25
Turrill, William Bertram, 307–309 Wagner tree, 115
Two step versus one step analysis, 114 Wasps, 131–132, 138–139
Web-based resources, 207, 253, 278
Wrasses, 215
U World Grass Species Synonymy, 279
Ultraviolet radiation (UV), 156
Ulva, 326 X
Unaligned sequence data, 114–115
Xylella, 52, 53f
Unique specimen identification, 207
Urbilateria, 179, 184
Urediniomycetes, 231–232 Y
Ustilaginomycetes, 231–232
Yucca, 132
V Z
Variables, data set, 81–82 Zoological Museum (Amsterdam), 27
Vibrio, 52, 53f Zygomorphy, 159
Voting systems, quartet-based supertree construction, 68–71 Zygomycota, 232
9579_C022.fm Page 347 Thursday, November 16, 2006 4:16 PM
Systematics Association
Publications
1. Bibliography of Key Works for the Identification of the British Fauna and Flora, 3rd
edition (1967)†
Edited by G.J. Kerrich, R.D. Meikie and N. Tebble
2. Function and Taxonomic Importance (1959)†
Edited by A.J. Cain
3. The Species Concept in Palaeontology (1956)†
Edited by P.C. Sylvester-Bradley
4. Taxonomy and Geography (1962)†
Edited by D. Nichols
5. Speciation in the Sea (1963)†
Edited by J.P. Harding and N. Tebble
6. Phenetic and Phylogenetic Classification (1964)†
Edited by V.H. Heywood and J. McNeill
7. Aspects of Tethyan biogeography (1967)†
Edited by C.G. Adams and D.V. Ager
8. The Soil Ecosystem (1969)†
Edited by H. Sheals
9. Organisms and Continents through Time (1973)†
Edited by N.F. Hughes
10. Cladistics: A Practical Course in Systematics (1992)*
P.L. Forey, C.J. Humphries, I.J. Kitching, R.W. Scotland, D.J. Siebert and D.M. Williams
11. Cladistics: The Theory and Practice of Parsimony Analysis (2nd edition)(1998)*
I.J. Kitching, P.L. Forey, C.J. Humphries and D.M. Williams
347
9579_C022.fm Page 348 Thursday, November 16, 2006 4:16 PM
69. Neotropical Savannas and Seasonally Dry Forests: Plant Diversity, Biogeography and
Conservation (2006)
Edited by R.T. Pennington, G.P. Lewis and J.A. Rattan
70. Biogeography in a Changing World (2006)
Edited by M.C. Ebach and R.S. Tangney
71. Pleurocarpous Mosses: Systematics & Evolution (2006)
Edited by A.E. Newton and R.S. Tangney
COLOUR FIGURE 14.1 Cichlid fishes. Cichlids have a conservative bauplan, and specialised attributes,
such as hypertrophied lips are the result of parallel evolution, thus making species and higher level diagnoses
difficult. (a) Amphilophus sp. ‘fatlip’ in Lake Xiloa, Nicaragua; (b) Placidochromis milomo at Nkhomo
Reef, Lake Malawi, Malawi; (c) Lobochilotes labiatus at Nkondwe Island, Lake Tanganyika, Tanzania.
(Photos reproduced with permission from A.F. Konings.)
9579_Color.fm Page 2 Saturday, November 11, 2006 3:12 PM
COLOUR FIGURE 15.2 Basidiomycete fungi. (a) Dacryopinax spathularia, (b) Pseudocoprinus disseminatus.
(Photos reproduced with permission from Edward Grand, Chiang Mai, Thailand.)
9579_Color.fm Page 3 Saturday, November 11, 2006 3:12 PM
A B D
L M E
K J
H G
COLOUR FIGURE 16.4 Flowers and fruit of species of the Syzygium group. (A) Flowers of Syzygium mal-
accense (L.) Merr. and L.M. Perry; (B–D) flowers, inflorescence and fruit, respectively, of Acmena cf.
divaricata Merr. and L.M. Perry; (E) fruit of Piliocalyx bullatus Brongn. and Gris; (F) buds and flowers of
Syzygium longifolium (Brongn. and Gris) J.W. Dawson; (G–H) fruit and flowers, respectively, of Syzygium
aqueum (Burm. f.) Alston; (I) flowers of Syzygium jambos (L.) Alston; (J) buds (note calyptras) of Syzygium
kuebiniense J.W. Dawson; (K) fruit of Syzygium rubrimolle B. Hyland; (L–M) flowers and fruit, respectively,
of Syzygium glenum Craven. (Reproduced with permission from G. Sankowsky (A–D, G, H, K–M), L.
Craven. (F, I) and E. Biffin (J).)
9579_Color.fm Page 4 Saturday, November 11, 2006 3:12 PM
A B D
F E
L M H
I
K
COLOUR FIGURE 16.5 Flowers, fruit and foliage of species of the Syzygium group. (A–B) buds and flowers,
and fruit, respectively, of Acmenosperma pringlei B. Hyland; (C) Syzygium wilsonii subsp. cryptophlebium
(F. Muell.) B. Hyland; (D) fruit of Syzygium elegans (Brongn. and Gris) J.W. Dawson; (E–G) habit, young
leaves, and buds and flowers, respectively, of Syzygium acre (Pancher ex Guillaumin) J.W. Dawson; (H) fruit
of Syzygium cormiflorum (F. Muell.) B. Hyland; (I) flowers of Syzygium boonjee B. Hyland; (J) flower of
Syzygium sp.; (K) flowers of Syzygium balansae (Guillaumin) J.W. Dawson; (L) fruit of Syzygium maraca
Craven and Biffin; (M) young fruit of Syzygium sp. (Reproduced with permission from A. Ford (A–B), G.
Sankowsky (C, H, I, L), E. Biffin (D), L. Craven. (E–G, K, M) and J. Dowe (J).)