Taxonomic Guide to Infectious Diseases: Understanding the Biologic Classes of Pathogenic Organisms [2 ed.] 0128175761, 9780128175767

Taxonomic Guide to Infectious Diseases: Understanding the Biologic Classes of Pathogenic Organisms, Second Edition tackl

643 119 4MB

English Pages 399 [387] Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Taxonomic Guide to Infectious Diseases: Understanding the Biologic Classes of Pathogenic Organisms [2 ed.]
 0128175761, 9780128175767

Table of contents :
Cover
Taxonomic Guide to
Infectious Diseases:

Understanding the Biologic
Classes of Pathogenic Organisms
Copyright
Other books by the author
About the author
Preface to second edition
Preface to first edition
1
Principles of taxonomy
Section 1.1 The consequence of evolution is diversity
Section 1.2 What is a classification?
Section 1.3 The tree of life
Glossary
References
2
Species and speciation
Section 2.1 A species is a biological entity
Section 2.2 The biological process of speciation
Section 2.3 Diverse forms of diversity
Section 2.4 The species paradox
Glossary
References
3
Bacteria
Section 3.1 Overview of Class Bacteria
Section 3.2 Alpha Proteobacteria
Section 3.3 Beta Proteobacteria
Section 3.4 Gamma Proteobacteria
Section 3.5 Epsilon Proteobacteria
Section 3.6 Spirochaetes
Section 3.7 Bacteroidetes and Fusobacteria
Section 3.8 Mollicutes
Section 3.9 Class Bacilli plus Class Clostridia
Section 3.10 Chlamydiae
Section 3.11 Actinobacteria
Glossary
References
4
Eukaryotes
Section 4.1 Overview of Class Eukaryota
Section 4.2 Metamonada
Section 4.3 Euglenozoa
Section 4.4 Percolozoa
Section 4.5 Apicomplexa
Section 4.6 Ciliophora (ciliates)
Section 4.7 Heterokonta
Section 4.8 Amoebozoa
Section 4.9 Choanozoa
Section 4.10 Archaeplastida
Glossary
References
5
Animals
Section 5.1 Overview of Class Animalia
Section 5.2 Opisthokonts to Class ParaHoxozoa
Section 5.3 Bilaterians to Protostomes
Section 5.4 Platyhelminthes (flatworms)
Section 5.5 Nematoda
Section 5.6 Acanthocephala
Section 5.7 Chelicerata
Section 5.8 Hexapoda
Section 5.9 Crustacea
Glossary
References
6
Fungi
Section 6.1 Overview of Class Fungi
Section 6.2 Zoopagomycota and Mucoromycota (formerly Zygomycota)
Section 6.3 Basidiomycota
Section 6.4 Ascomycota
Section 6.5 Microsporidia
Glossary
References
7
Viruses
Section 7.1 Viruses and the meaning of life
Section 7.2 Viral phylogeny
Section 7.3 Group I viruses: Double-stranded DNA
Section 7.4 Group II viruses: Single-stranded (+) sense DNA
Section 7.5 Group III viruses: Double-stranded RNA
Section 7.6 Group IV viruses: Single-stranded (+) sense RNA
Section 7.7 Group V viruses: Single-stranded (−) sense RNA
Section 7.8 Group VI viruses: Single-stranded RNA reverse transcriptase viruses with a DNA intermediate in life cycle
Section 7.9 Group VII viruses: Double-stranded DNA reverse transcriptase viruses
Glossary
References
8
Changing how we think about infectious diseases
Section 8.1 Abandoning Koch's postulates
Section 8.2 Prion diseases: Fulfilling Koch's postulates, but without an organism
Section 8.3 Diagnostic challenges
Section 8.4 Discovering new infections among the diseases of unknown origin
Section 8.5 Unstable taxonomies
Section 8.6 Taxonomic stupidity
Section 8.7 Recurring sources of error
Glossary
References
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z

Citation preview

Taxonomic Guide to Infectious Diseases

Taxonomic Guide to Infectious Diseases Understanding the Biologic Classes of Pathogenic Organisms

Second Edition Jules J. Berman

Academic Press is an imprint of Elsevier 125 London Wall, London EC2Y 5AS, United Kingdom 525 B Street, Suite 1650, San Diego, CA 92101, United States 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom © 2019 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-12-817576-7 For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals

Publisher: Andre Gerhard Wolff Acquisition Editor: Linda Versteeg-buschman Editorial Project Manager: Sandra Harron Production Project Manager: Punithavathy G Cover Designer: Matthew Limbert Typeset by SPi Global, India

Other books by the author

About the author Jules J. Berman received two baccalaureate degrees from MIT; in Mathematics, and in Earth and Planetary Sciences. He holds a PhD from Temple University, and an MD from the University of Miami. He was a graduate student ­researcher in the Fels Cancer Research Institute, at Temple University, and at the American Health Foundation in Valhalla, New York. His postdoctoral studies were completed at the US National Institutes of Health, and his residency was completed at the George Washington University Medical Center in Washington, DC. Dr. Berman served as Chief of Anatomic Pathology, Surgical Pathology, and Cytopathology at the Veterans Administration Medical Center in Baltimore, Maryland, where he held joint appointments at the University of Maryland Medical Center and at the Johns Hopkins Medical Institutions. In 1998, he transferred to the US National Institutes of Health, as a Medical Officer, and as the Program Director for Pathology Informatics in the Cancer Diagnosis Program at the National Cancer Institute. Dr. Berman is a past president of the Association for Pathology Informatics, and the 2011 recipient of the Association's Lifetime Achievement Award. He has firstauthored more than 100 journal articles and has written 19 science books. His most recent titles, published by Elsevier, are: Taxonomic Guide to Infectious Diseases: Understanding the Biologic Classes of Pathogenic Organisms (2012) Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information (2013) Rare Diseases and Orphan Drugs: Keys to Understanding and Treating the Common Diseases (2014) Repurposing Legacy Data: Innovative Case Studies (2015) Data Simplification: Taming Information with Open Source Tools (2016) Precision Medicine and the Reinvention of Human Disease (2018) Principles and Practice of Big Data: Preparing, Sharing, and Analyzing Complex Information, Second Edition (2018) Evolution's Clinical Guidebook: Translating Ancient Genes into Precision Medicine (2019) xi

Preface to second edition Everything has been said before, but since nobody listens we have to keep going back and beginning all over again. Andre Gide

This second edition of the Taxonomic Guide to Infectious Diseases, like the first edition, confronts the impossibility of mastering all the human infections. There are just too many of them. Instead, we take the easy way out by learning the basic biology of the 40 or so different classes of organisms that contain infectious species. Within each class of infectious organisms, the member species have traits in common with one another. If we understand the characteristic biological properties and identify features of one prototypical species from each class of organisms, we can pretty well guess how the other species from the same class will behave. We will learn that if we can confidently assign a suspected pathogen to a well-described genus (i.e., a class of related species), we can often determine how to treat the infection and prevent the occurrence of additional infections in the at-risk community. As in the first edition, we abandon the ranking system employed by classic taxonomists (e.g., Kingdom, Phylum, Order, Family, Genus, Species) and their subcategories (e.g., Superphylum, Phylum, Subphylum, Infraphylum, and Microphylum). In this book, all ranks will simply be referred to as “Class.” The direct father class is the superclass, and the direct child class is the subclass. The use of “Class,” “Superclass,” and “Subclass” conforms to nomenclature standards developed by the metadata community (i.e., uses a standard terminology employed by the computational field that deals with the description of data). The terms “genus” (plural “genera”) and “species” will preserve the binomial assignment of organism names. In the prior edition, viruses were considered to be nonliving biological agents; little more than nucleic acid wrapped in a capsule. The present edition argues that viruses are living organisms, with their own phylogenetic histories. The role that viruses play in the evolution of the organisms they infect will be discussed. Also, in the prior edition, various classes of living organisms were described, but there was scant discussion of the evolutionary developments that account for the different classes of living organisms. This edition rectifies the oversight, and provides a plausible explanation as to how ancient classes of organisms arose, and how new species of organisms arise.

xiii

xiv  Preface to second edition

The first edition included an appendix listing the class lineages for most of the known infectious organisms in humans; an inclusion that added a great deal to the mass of data contained in the book, without adding much light on the subject. The second edition dispenses with the listing and replaces it with a collection of images, inserted into the chapters, intended to highlight the physical traits of the classes of organisms that infect humans. The images serve as visual reminders of the prototypical features that characterize taxonomic classes and their subclasses. Finally, the first edition of the Taxonomic Guide to Infectious Diseases was published in 2012, and this edition provides an opportunity to catch up with new developments in the field. Today, health-care workers, medical researchers, students, and curious laypersons have ample access to a wealth of detailed information concerning the thousands of organisms that are potential human pathogens. None of us lack data, but all of us lack a resource that makes sense of the data at hand. This book organizes, simplifies, and provides meaning to the rapidly growing field of medical microbiology.

Preface to first edition Order and simplification are the first steps toward the mastery of a subject. Thomas Mann

This book explains the biological properties of infectious organisms in terms of the properties they inherit from their ancestral classes. For example, the class of organisms known as Apicomplexa contains the organisms responsible for malaria, babesiosis, cryptosporidiosis, cyclosporan gastroenteritis, isosporiasis, sarcocystosis, and toxoplasmosis. When you learn the class properties of the apicomplexans, you'll gain a basic understanding of the biological features that characterize every infectious organism in the class. If you are a student of microbiology, or a health-care professional, you need to be familiar with hundreds of infectious organisms. There are many resources, web-based and paper-based, that describe all of these diseases in great detail, but how can you be expected to integrate volumes of information when you are confronted by a sick patient? It is not humanly possible. A much better strategy is to learn the basic biology of the 40 classes of organisms that account for all of the infectious diseases that occur in humans. After reading this book, you will be able to fit newly acquired facts, pertaining to individual infectious species, onto an intellectual scaffold that provides a simple way of understanding their clinically relevant properties. Biological taxonomy is the scientific field dealing with the classification of living organisms. Nonbiologists, who give any thought to taxonomy, may think that the field is the dullest of the sciences. To the uninitiated, there is little difference between the life of a taxonomist and the life of a stamp collector. Nothing could be further from the truth. Taxonomy has become the grand unifying theory of the biological sciences. Efforts to sequence the genomes of prokaryotic, eukaryotic, and viral species, thereby comparing the genomes of different classes of organisms, have revitalized the field of evolutionary taxonomy (phylogenetics). The analysis of normal and abnormal homologous genes in related classes of organisms have inspired new disease treatments targeted against specific molecules and pathways characteristic of species, classes, or organisms. Students who do not understand the principles of modern taxonomy have little chance of perceiving the connections between medicine, genetics, pharmacology, and pathology, to say nothing of clinical microbiology. Here are some of the specific advantages of learning the taxonomy of infectious diseases. xv

xvi  Preface to first edition

1. As a method to drive down the complexity of medical microbiology Learning all the infectious diseases of humans is an impossible task. As the number of chronically ill and immune-compromised patients has increased, so have the number of opportunistic pathogens. As global transportation has become commonplace, the number of exotic infections spread worldwide has also increased. A few decades ago, infectious disease experts were expected to learn a few hundred infectious diseases. Today, there are over 1400 organisms that can cause diseases in humans, and the number is climbing rapidly, while the techniques to diagnose and treat these organisms are constantly improving. Textbooks cannot cover all these organisms in sufficient detail to provide healthcare workers with the expertise to provide adequate care to their patients. How can any clinician learn all that is needed to provide competent care to patients? The first step in understanding infectious diseases is to understand the classification of pathogenic organisms. Every known disease-causing organisms has been assigned to one of 40 well-defined classes of organisms, and each class fits within a simple ancestral lineage. This means that every known pathogenic organism inherits certain properties from its ancestral classes and shares these properties with the other members of its own class. When you learn the class properties, along with some basic information about the infectious members of the classes, you gain a comprehensive understanding of medical microbiology. 2. Taxonomy as web companion Getting information off the Internet is like taking a drink from a fire hydrant. Mitchell Kapor

The web is a great resource. You can find a lot of facts, and if you encounter an unfamiliar word or a term, the web will provide a concise definition, in a jiffy. The web cannot, however, provide an understanding of the related concepts that form the framework of a scientific discipline. The web supplies facts, but books tell you what the facts mean. Before the web, scientific texts needed to contain narrative material as well as the detailed, raw information pertaining to the field. For example, a microbiology text would be expected to contain long descriptions of each infectious organism, the laboratory procedures required to identify the organism, its clinical presentation, and its treatment. As a result, authors were caught between writing enormous texts that contained much more information than any student could possibly absorb, or they wrote short works covering a narrow topic in microbiology, or they wrote review books that hinted at many different topics. Today, authors have the opportunity to create in-depth and comprehensive works that are quite short, without sacrificing conceptual clarity. The informational details can be deferred to the web! This book concentrates on its primary goal; describing all pathogenic organisms in relation to their taxonomic assignments. All of the ancestral classes and every genus is explained in some detail, with

Preface to first edition xvii

every species listed, but the details are left to the web. You will notice that for a relatively short text, the Taxonomic Guide to Infectious Diseases has a large index. The index was designed as a way to connect terms and concepts that appear on multiple places within the text, and as a key to information on the web. Most of the index terms have excellent discussion in Wikipedia. You will find that the material retrieved from Wikipedia will make much more sense to you, and will have much more relevance to your own professional activities, after you have read this book. 3. As protection against professional obsolescence There seems to be so much occurring in the biological sciences, it is just impossible to keep on top of things. With each passing day, you feel less in tune with modern science, and you wish you could return to a time when a few fundamental principles grounded your chosen discipline. You will be happy to learn that science is all about finding generalizations among data or among connected systems (i.e., reducing the complexity of data or finding simple explanations for systems of irreducible complexity). Much, if not all, of the perceived complexity of the biological sciences derives from the growing interconnectedness of once-separate disciplines: cell biology, ecology, evolution, climatology, molecular biology, pharmacology, genetics, computer sciences, paleontology, pathology, statistics, and so on. Scientists today must understand many different fields, and must be willing and able to absorb additional disciplines, throughout their careers. As each field of science becomes entangled with others the seemingly arcane field of biological taxonomy has gained prominence because it occupies the intellectual core of virtually every biological field. Modern biology seems to be data-driven. A deluge of organism-based genomic, proteomic, metabolomic, and other “omic” data is flooding our data banks and drowning our scientists. This data will have limited scientific value if we cannot find a way to generalize the data collected for each organism to the data collected on other organisms. Taxonomy is the scientific method that reveals how different organisms are related. Without taxonomy, data has no biological meaning. The discoveries that scientists make in the future will come from questions that arise during the construction and refinement of biological taxonomy. In the case of infectious diseases, when we find a trait that informs us that what we thought was a single species is actually two species, it permits us to develop treatments optimized for each species, and to develop new methods to monitor and control the spread of both organisms. When we correctly group organisms within a common class, we can test and develop new drugs that are effective against all of the organisms within the class, particularly if those organisms are characterized by a molecule, pathway, or trait that is specifically targeted by a drug. Terms used in diverse sciences, such as homology, metabolic pathway, target molecule, acquired resistance, developmental stage, cladistics, monophyly,

xviii  Preface to first edition

model organism, class property, phylogeny, all derive their meaning and their utility from biological taxonomy. When you grasp the general organization of living organisms, you will understand how different scientific fields relate to each other, thus avoiding professional obsolescence. How the text is organized If you are reading Taxonomic Guide to Infectious Diseases to gain a general understanding of taxonomy, as it applies to human diseases, you may choose to read the introductory chapters, followed by reading the front sections of each subsequent chapter. You can defer reading the genera and disease lists until you need to relate general knowledge of a class of organisms to specific information on pathogenic species. If you are a health-care professional, you will find that when you use the index to find the chapter that lists a particular organism or infectious disease, you can quickly grasp the fundamental biological properties of the disease. This deep knowledge will help you when you use other resources to collect detailed pathologic, clinical, and pharmacologic information. Though about 334 living organisms account for virtually all of the infectious diseases occurring in humans, about 1000 additional organisms account for “case report” incidents, involving one or several people, an isolated geographic region, or otherwise-harmless organisms that cause disease under special circumstances. The book Appendix lists just about every infectious organism (about 1400 species), and the taxonomic hierarchy for each genus. When you encounter the name of an organism, and you just can't remember anything about its taxonomic lineage (i.e., the class of the organism and the ancestral classes), you can find it quickly in the appendix. With this information, you can open the chapter that describes the class properties that apply to the species. Some clinical concepts are taxonomically promiscuous. For example, the hepatitis viruses (A through G) are dispersed under several different classes of viruses. Moreover, the A through G list of hepatitis viruses excludes some of the most important viruses that target the liver (e.g., yellow fever virus, dengue virus, Epstein-Barr virus). Topics that cross class boundaries, such as hepatitis viruses, long-branch attraction, virulence factors, vectors, zoonoses, and many others, are included in the Glossary. Nota Bene Biological nomenclature has changed a great deal in the past few decades. If you learned medical microbiology in the preceding millennium, you may be surprised to learn that kingdoms have fallen (the once mighty kingdom of the protozoans has been largely abandoned), phyla have moved from one kingdom to another (the microsporidians, formerly protozoans, are now fungi), and numerous species have changed their names (Pneumocystis carinii is now Pneumocystis jirovecii). Most striking is the expansion of the existing ranks.

Preface to first edition xix

Formerly, it was sufficient to divide the classification into a neat handful of divisions: Kingdom, Phylum, class, Order, Family, Genus, and Species. Today, the list of divisions has nearly quadrupled. For example, Phylum has been split into the following divisions: Superphylum, Phylum, Subphylum, Infraphylum, and Microphylum. The other divisions are likewise split. The subdivisions often have a legitimate scientific purpose. Nonetheless, current taxonomic order is simply too detailed for readers to memorize. Taxonomists referring to a class of any rank will sometimes use the word “taxon.” I find this term somewhat lacking because it cannot be modified to refer to a direct parent or child taxon. In this book, all ranks will simply be referred to as “Class.” The direct father class is the superclass, and the direct child class is the subclass. The terms “genus” (plural “genera”) and “species” will preserve the binomial assignment of organism names. In the case of viruses, Baltimore Classification is used, which places every virus into one of seven Groups. Since “Group” is applied universally and consistently by virologists who employ the Baltimore Classification, its use is preserved here. Subdivisions of the Baltimore Group viruses are referred to herein as classes. The use of “Class,” “Superclass,” and “Subclass” conforms to nomenclature standards developed by the metadata community (i.e., uses a standard terminology employed by the computational field dealing with the description of data). This simplified terminology avoids the complexities endured by ­traditional taxonomists. Regarding the use of upper and lower case terminology, when referring to a formal taxonomic class, positioned within the hierarchy, the uppercase letters and Latin plural forms are used (e.g., Class Eukaryota). When referring to the noun and adjectival forms, lowercase characters and the English pluralized form are used (e.g., an eukaryote, the eukaryotes, or eukaryotic organisms). Each chapter contains a hierarchical listing of organisms, roughly indicating the ordered rank of the infectious genera covered in each chapter. Classes that do not contain infectious organisms are omitted from the schema. Traditionally, the class rank would be listed in the hierarchy (e.g., Order, Suborder, Infraorder). In this book, the relative descent through the hierarchy is indicated by indentation. The lowest subclass in each taxonomic list is “genus,” which is marked throughout with an asterisk. This visual method of ranking the classification produces an uncluttered, disease-only taxonomy and provides an approximate hierarchical rank for each class and species.

Chapter 1

Principles of taxonomy Section 1.1 The consequence of evolution is diversity There can be only one Motto of the immortals in the fictional Highlander epic

Most readers are familiar with the premise of the “Highlander” movies and television shows, which depict a “survival-of-the-fittest” struggle among a population of immortal humans. In the end, there must be only one surviving immortal. Of course, the most casual glance at our surroundings informs us that we live in an “Anti-Highlander” world wherein evolution pushes us to everincreasing species diversity [Glossary Survival of the fittest]. Introductory courses in evolution stress the notion that evolution leads to improved species, through natural selection [1]. If evolution served the single purpose of improving species, then we would live in a Highlander world, where a small number of the most successful species would prevail, and the others would perish. One of the recurring themes discussed in this book is that the primary consequence of evolution is speciation, the biological process that accounts for the enormous diversity of species that inhabit our planet. When we understand speciation, we can fully grasp the phylogenetic classification of organisms (i.e., the classification of species by their ancestral lineages). When we understand classification, we can simplify the task of understanding the biological properties of the thousands of species that are potential pathogens in humans. Furthermore, we can discover general methods of prevention or treatment that apply to whole classes of related organisms [Glossary Human ancestral lineage, Organism]. How many species live on earth today? A large number of species comes from the prokaryotes (i.e., cells with no nuclei, consisting of Class Bacteria plus Class Archaea), which are estimated to have between 100 thousand and 10 million species. These numbers almost certainly underestimate the true number of prokaryotic species, as they are based on molecular techniques that would exclude valid species that happen to have sequence similarities with other species [2]. As an example of how methodology impacts numbers, samples of soil yield a few hundred different species per gram, based on culturing. If the species are counted on the basis of 16s RNA gene sequencing, we find a few thousand different species of bacteria in each gram of soil. If we base the count on DNA-DNA reassociation kinetics, the number of different bacterial species, per gram of soil, rises to several million [3]. Taxonomic Guide to Infectious Diseases. https://doi.org/10.1016/B978-0-12-817576-7.00001-8 © 2019 Elsevier Inc. All rights reserved.

1

2  Taxonomic guide to infectious diseases

The eukaryotes (i.e., organisms whose cells contain a nucleus) are estimated to have about 9 million species [4]. As for the viruses, we really don't have any good estimate for the number of their species, although it is claimed that viruses account for the greatest number of organisms, species, and classes of species on the planet [5–7]. If we confine ourselves to counting just those viruses that infect mammals, we have an estimate of about 320,000 [8]. Adding up the estimates for prokaryotes, eukaryotes, and viruses, we get a rough and conservative 10-20 million living species. In addition to the individual species of organisms that live on earth, there are numerous combinations of organisms whose lives are entangled with one another. Perhaps the best known examples of which are the lichens. Formerly known as the Mycophycophyta, lichens are now recognized to be aggregate organisms wherein each component has its own phylogenetic lineage. Lichens independently emerged from fungi associating with algae and cyanobacteria multiple times throughout history [9]. It is worth noting that species counts, even among the most closely scrutinized classes of organisms, are always subject to revision. In the past, the rational basis for splitting a group of organisms into differently named species required, at the very least, heritable functional or morphologic differences among the members of the group. Gene sequencing has changed the rules for assigning new species. For example, various organisms with subtle differences from Bacteroides fragilis have been elevated to the level of species based on DNA homology studies. These include Bacteroides distasonis, Bacteroides ovatus, Bacteroides thetaiotaomicron, and Bacteroides vulgatus [10]. Accounting for underestimation, it should come as no surprise that one study has suggested that there are at least a trillion species of organisms on earth [11]. Of course, the number of living species is a tiny fraction of all the species that have lived and died through the course of earth's history. It is estimated that 5–50 billion species have lived on earth, and more than 99% of them have met with extinction, leaving a relatively scant 10–100 million living species [12]. If the purpose of every species were to ensure its own survival, then they are all doing a very bad job of it, insofar as nearly all species become extinct. In Section  2.2, “The Biological Process of Speciation,” we shall see that the determinant of biological success, for any species, is to produce new species. It is the production of descendant classes of species that confers inherited cellular properties that we observe in all living organisms, and that we now use to construct classifications of organisms. Although there are millions of species on this planet, we should be grateful that only a tiny fraction is infectious to humans. Nobody knows the exact number of living species, but for the sake of discussion, let us accept that there are 50 million species of organisms on earth (a gross underestimate by some accounts). There have been about 1400 pathogenic organisms reported

Principles of taxonomy  Chapter | 1  3

in the medical literature. This means that if you should stumble randomly upon a member of one of the species of life on earth, the probability that it is an infectious pathogen is about 0.000028 [Glossary Burden of infectious diseases, Incidence, Infectious disease]. With the all the different species of organisms on earth today, numbering perhaps in the hundreds of millions, how can we hope to understand the biosphere? It's all done with classification. Infectious agents fall into a scant 40 biological classes (32 classes of living organisms plus 7 classes of viruses plus 1 current class of prions). When we have learned the basic biology of the major taxonomic divisions that contain the infectious organisms, we will understand the fundamental biological features that characterize every clinically relevant organism.

Section 1.2 What is a classification? Deus creavit, Linnaeus disposuit, Latin for “God Creates, Linnaeus organizes.” Carolus Linnaeus

The human brain is constantly processing visual and other sensory information collected from the environment. When we walk down the street, we see images of concrete, asphalt, grass, other persons, birds, and so on. Every step we take conveys another world of sensory input. How can we process it all? The mathematician and philosopher Karl Pearson (1857–1936) has likened the human mind to a “sorting machine” [13]. We take a stream of sensory information and sort it into objects, and then we collectively put the individual objects into general classes. The green stuff on the ground is classified as “grass,” and the grass is subclassified under some larger groups such as “plants.” Flat stretches of asphalt and concrete may be classified under “road” and the road might be subclassified under “man-made constructions.” If we did not have a culturally determined classification of objects in the world, we would have no languages, no ability to communicate ideas, no way to remember what we see, and no way to draw general inferences about anything at all. Simply put, without classification, we would not be human. Every culture has some particular way to impose a uniform perception of the environment. In English-speaking cultures, the term “hat” denotes a universally recognized object. Hats may be composed of many different types of materials, and they may vary greatly in size, weight, and shape. Nonetheless, we can almost always identify a hat when we see one, and we have no trouble distinguishing a hat from all other types of objects. An object is not classified as a hat simply because it shares a few structural similarities with other hats. A hat is classified as a hat because it has a relationship with every other hat, as an item of clothing that fits over the head. Taxonomists search for relationships, not similarities, among different species and classes of organisms [14]. But isn't a similarity a type of

4  Taxonomic guide to infectious diseases

relationship? Actually, no. To better understand the difference, imagine the following scenario. You look up at the clouds, and you begin to see the shape of a lion. The cloud has a tail, like a lion's tale, and a fluffy head, like a lion's mane. With a little imagination, the mouth of the lion seems to roar down from the sky. You have succeeded in finding similarities between the cloud and a lion. When you look at a cloud and you imagine a tea kettle producing a head of steam, you may recognize that the physical forces that create a cloud from the ocean's water vapor and the physical forces that produce steam from the water in a heated kettle are the same. At this moment, you have found a relationship. The act of searching for and finding relationships lies at the heart of science; it's how we make sense of reality. Finding similarities is an aesthetic joy, but it is not science. General principles of classification Oddly enough, despite the importance of classification in our lives, few humans have a firm understanding of the process of classification; it's all done for us on a subconscious level. Consequently, when we need to build and explain a formal classification, it can be difficult to know where to begin. As an example, how might we go about creating a classification of toys? Would we arrange the toys by color (red toys, blue toys, etc.), by size (big toys and medium-sized toys), or by composition (metal toys, plastic toys, cotton toys). How could we be certain that when other people create a classification for toys, their classification will be equivalent to ours? For modern biologists, the key to the classification of living organisms is evolutionary descent (i.e., phylogeny). The hierarchy of classes corresponds to the succession of organisms that evolved from the earliest living organism to the current set of extant species. Historically, pre-Darwinian biologists, who knew nothing about evolution, somehow produced a classification that looked much like the classification we use today. Before the discovery of the Burgess shale (discovered in 1909 by Charles Walcott), taxonomists could not conduct systematic reviews of organisms in rock strata; hence, they could not determine the epoch in which classes of organisms first came into existence, nor could they determine which fossil species preceded other species. Until late in the 20th century, taxonomists could not sequence nucleic acids; hence, they could not follow the divergence of shared genes in different organisms. Yet they managed to produce a fairly accurate and modern taxonomy. A 19th-century taxonomist would have no trouble in adjusting to the classification used in this book [Glossary Taxonomy, Clade, Cladistics, Class, Monophyletic class, Synapomorphy]. How did the early taxonomists arrive so close to our modern taxonomy, without the benefit of the principles of evolution, geobiology, modern paleontological discoveries, or molecular biology? For example, how was it possible for Aristotle to know, about 2000 years ago, that a dolphin is a mammal, not a fish? Aristotle studied the anatomy and the developmental biology of many different types of animals. One large group of animals was distinguished by a

Principles of taxonomy  Chapter | 1  5

gestational period in which a developing embryo is nourished by a placenta, and the offspring are delivered into the world as formed, but small versions of the adult animals (i.e., not as eggs or larvae), and the newborn animals feed from milk secreted from nipples, overlying specialized glandular organs (mammae). Aristotle knew that these were features that specifically characterized one group of animals and distinguished this group from all the other groups of animals. He also knew that dolphins had all these features; fish did not. He correctly reasoned that dolphins were a type of mammal, not a type of fish. Aristotle was ridiculed by his contemporaries for whom it was obvious that dolphins were a type of fish. Unlike Aristotle, they based their classification on similarities, not on relationships. They saw that dolphins looked like fish and dolphins swam in the ocean like fish, and this was all the proof they needed. For about 2000 years following the death of Aristotle, biologists persisted in their belief that dolphins were a type of fish. For the past several hundred years, biologists have acknowledged that Aristotle was correct after all; dolphins are mammals. Aristotle, and legions of taxonomists who followed him, understood that taxonomy is all about finding the key properties that characterize entire classes and subclasses of organisms. Selecting the defining properties from a large number of morphologic, developmental and physiologic features in many different species requires attention to detail, and occasional moments of intellectual brilliance. To build a classification, the taxonomist must perform the following: (1) define classes (i.e., find the properties that define a class and extend to the subclasses of the class); (2) assign species to classes; (3) position classes within the hierarchy; and (4) test and validate all the above. These tasks require enormous patience and humility. A classification is a hierarchy of objects that conforms to the following principles: –1. The classes (groups with members) of the hierarchy have a set of properties or rules that extend to every member of the class and to all of the subclasses of the class, to the exclusion of all other classes. A subclass is itself a type of class wherein the members have the defining class properties of the parent class plus some additional property(ies) specific for the subclass [Glossary Parent class]. –2. In a hierarchical classification, each subclass may have no more than one parent class. The root (top) class has no parent class. The biological classification of living organisms is a hierarchical classification. –3. In the classification of living organisms, the species is the collection of all the organisms of the same type (e.g., every squirrel belongs to a species of “squirrel”). –4. Classes and species are intransitive. For example, a horse never becomes a sheep, and Class Bikonta never transforms into Class Unikonta. –5. The members of classes may be highly similar to each other, but their similarities result from their membership in the same class (i.e.,

6  Taxonomic guide to infectious diseases

conforming to class properties), and not the other way around (i.e., similarity alone cannot define class inclusion). When we look at a schematic that represents a classification, we are typically shown a tree of nodes, with each class occupying a node, and the branches to lower nodes represent the connections of a class to its subclasses. A ­taxonomy is a classification that has all of its members assigned to their respective classes. In the case of the classification of living organisms, the classes are assigned according to their ancestry (i.e., by their phylogenetic relationships). It is essential to distinguish a classification system from an identification system. An identification system matches an individual organism with its assigned species name. Identification is based on finding several features that, taken together, can help determine the name of an organism. For example, if you have a list of identifiers: large, hairy, strong, African, jungle-dwelling, knuckle-­walking; you might correctly identify the organisms as a gorilla. These identifiers are different from the phylogenetic features that were used to classify gorillas within the hierarchy of organisms (Animalia: Chordata: Mammalia: Primates: Hominidae: Homininae: Gorillini: Gorilla). Specifically, you can identify an animal as a gorilla without knowing that a gorilla is a type of mammal. You can classify a gorilla as a member of Class Gorillini without knowing that a gorilla happens to be large. One of the most common mistakes in biology is to confuse an identification system with a classification system. The former simply provides a handy way to associate an object with a name; the latter is a system of relationships among organisms [Glossary Classification versus ontology].

Section 1.3 The tree of life Individuals do not belong in the same taxon because they are similar, but they are similar because they belong to the same taxon. George Gaylord Simpson (1902–84) [15]

Taxonomy is the science of classifying the elements of a knowledge domain. In the case of terrestrial life forms, taxonomy involves assigning a name and a class to every species of life. Biologists presume that there are at least 50 million living species on earth, so the task of building a biological taxonomy is likely to continue for as long as science persists. Not all scientists are suited, intellectually or emotionally, to be taxonomists. Nonetheless, every thoughtful scientist understands that taxonomy is essential to the advancement of science and to the preservation of life on earth. Most biologists would agree to the following: – Statement 1. Every organism on earth belongs to a class of organisms with a set of shared biological features that was inherited through an ancestral lineage.

Principles of taxonomy  Chapter | 1  7

– Statement 2. All organisms on earth have a genome consisting of DNA or RNA. DNA is a highly stable nucleic acid that is transcribed into RNA, and RNA is translated into proteins. – Statement 3. Every organism on earth belongs to one of the four classes (to be described in later chapters): Class Class Class Class

Archaea Bacteria Eukaryota Viridae

Questions of precedence (i.e., “Which class arose first?”) and parentage (i.e., “Which class served as the progenote for which other class?”) is a matter for lively debate [16, 17]. It is generally accepted that the prokaryotes (i.e., Class Archaea plus Class Bacteria) preceded the emergence of Class Eukaryota, insofar as the root eukaryote seems to have been constructed from biological components extracted from prokaryotes and possibly viruses [18–21]. In addition, the rightful inclusion of Class Viridae (viruses) is disputed, insofar as many biologists consider viruses to be little more than nonliving packets of infective genetic material. In Chapter 7, “Viruses,” we will examine the controversial status of viruses as living organisms. – Statement 4. Every eukaryotic organism that lives today is a descendant of a single eukaryotic ancestor [18]. – Statement 5. Every organism belongs to a species that has a set of features that characterizes every member of the species and that distinguishes the members of the species from organisms belonging to any other species. Of course, it is difficult to garner unanimous agreement by scientists, and every fundamental principle of taxonomy has been challenged at one time or another. For those who would include prions among the living organisms, statements 1–3 are debatable (as will be discussed in Section  8.2, “Prion Diseases: Fulfilling Koch's Postulates, but Without an Organism”). Because each class of organisms has exactly one parent class, we can use the classification of living organisms to construct a simple, unbranched ancestral lineage, for each and every included class or species. For example, here is the ancestral lineage for mosquitoes (scientific name, Culicidae), a species of Class Diptera (flies): Culicidae (mosquitoes) Culicoidea Culicomorpha Nematocera Diptera (class of flies) Holometabola Neoptera Pterygota Dicondylia Insecta Hexapoda Pancrustacea

8  Taxonomic guide to infectious diseases Mandibulata Arthropoda Panarthropoda Ecdysozoa Protostomia Bilateria Eumetazoa Metazoa Opisthokonta Eukaryota cellular organisms

Statement 5. Introduces the concept of “species,” which has a long and disputatious history. It has been argued that nature produces individuals, not species; the concept of species being a mere figment of the human imagination, created for the convenience of taxonomists who need to group similar organisms. There are those who would use computational methods to group organisms into various species. If you start with a set of feature data on a collection of organisms, you can write a computer program that will cluster the organisms into species, according to their similarities. In theory, one computer program, executing over a large dataset containing measurements for every earthly organism, could create a complete biological classification. The status of a species is thereby reduced from a fundamental biological entity to a mathematical construction. This view is anathema to classic taxonomists, who have long held that a species is a natural unit of biological life, and that the nature of a species is revealed through the intellectual process of building a consistent taxonomy [22]. There are a host of problems consequent to computational methods for classification. First, there are many different mathematical algorithms that cluster objects by similarity. Depending on the chosen algorithm, the assignment of organisms to one species or another would change. Secondly, mathematical algorithms do not cope well with species convergence. Convergence occurs when two species independently acquire identical or similar traits through adaptation; not through inheritance from a shared ancestor. Examples are: the wing of a bat and the wing of a bird; the opposable thumb of opossums and primates; and the beak of a platypus and the beak of a bird. Unrelated species frequently converge upon similar morphologic solutions to common environmental conditions or shared physiological imperatives. Algorithms that cluster organisms based on similarity may group divergent organisms under one class. It is often assumed that computational classification, based on morphologic feature similarities, will improve when we acquire whole-genome sequence data for many different species. Imagine an experiment wherein you take DNA samples from every organism you encounter: bacterial colonies cultured from a river, unicellular nonbacterial organisms found in a pond, small multicellular organisms found in soil, crawling creatures dwelling under rocks, and so on. You own a powerful sequencing machine, which produces the full-length sequence

Principles of taxonomy  Chapter | 1  9

for each sampled organism, and you have a powerful computer that sorts and clusters every sequence. At the end, the computer prints out a huge graph, wherein all the samples are ordered. Groups with the greatest sequence ­similarities are clustered together. You may think you've created a useful classification, but you haven't really, because you don't know anything about the organisms that are clustered together. You don't know whether each cluster represents a species, or a class (a collection of related species), or whether a cluster may be contaminated by organisms that share some of the same gene sequences, but are phylogenetically unrelated (i.e., the sequence similarities result from chance or from convergence, but not by descent from a common ancestor). The sequences do not tell you very much about the biological properties of specific organisms, and you cannot infer which biological properties characterize the classes of clustered organisms. You have no certain knowledge whether the members of any given cluster of organisms can be characterized by any particular gene sequence (i.e., you do not know the characterizing gene sequences for classes of organisms). You do not know the genus or species names of the organisms included in the clusters, because you began your experiment without a presumptive taxonomy. Basically, you simply know what you knew before you started; that individual organisms have unique gene sequences that can be grouped by sequence similarity. A strictly molecular approach to classification has its limitations, but we shall see that thoughtful biologists can use molecular data to draw profound conclusions about the classification of living organisms [23, 24] [Glossary Convergence, LUCA]. Taxonomists are constantly engaged in an intellectual battle over the principles of biological classification. They all know that the stakes are high. When unrelated organisms are mixed together in the same class, and when related organisms are separated into unrelated classes, the value of the classification is lost, perhaps forever. To understand why this is true, we need to understand that a classification is a hypothesis-generating machine. Species within a class share genes, metabolic pathways, and structural anatomy. Shared properties allow scientists to form general hypotheses that may apply to all the members of a class. Without an accurate classification of living organisms, it would be impossible to make significant progress in the diagnosis, prevention, or treatment of classes of infectious organisms [Glossary Blended class, Pathway, Pathway-driven disease]. James Joyce is credited with saying that “there are two sides to every argument; unfortunately, I can only occupy one of them.” Students of the life sciences simply cannot hope to understand terrestrial organisms without accepting, at least tentatively, statements 1–5. After they have mastered the principles and practice of modern taxonomy, as described herein, they can reassess the value of contrarian arguments.

10  Taxonomic guide to infectious diseases

Glossary Blended class  Class blending refers to a mistake in proper classification, in which members are assigned to the wrong classes, or in which a class is created whose members are unrelated. These kinds of mistakes often arise when taxonomists are unaware of the ­differences among the members assigned to a class, or when taxonomists create an untenable class. For example, if you were to make a Mouse class, and you included Mickey Mouse as one of the instances of the class, you would be blending a cartoon with an animal, and this would be a mistake. If you were to create a Flying Animal class, you would be blending birds and houseflies and flying squirrels, none of which are biologically related. After reading the preceding paragraph, you might be thinking that class blending is the kind of careless mistake that you will be smart enough to avoid. Not so. Class blending is a pervasive and costly sin that is committed by virtually every biological scientist at some point in his or her career. One error can easily set your research back a decade, if you're not mentally focused on this subtle issue. When you read old texts, written before we knew anything about microorganisms, it's clear that the causes of historical plagues are largely unknown. We recognize today that one of the plague bacteria is Yersinia pestis. But, in fact, we do not know with certainty the specific causes of any of the major plagues in ancient Greece and medieval Europe. Typhus may have been involved. Measles and smallpox are the likely causes of past plagues. Malarial outbreaks should not be overlooked. Now suppose you are a statistician and are magically ported to Southern Italy, in 1640, where people are dying in great number, of the plague, and you are a doctor trying to cope with the situation. You're not a microbiologist, but you know something about designing clinical trials, and one of the local cognoscenti has just given you an herb that he insists is a cure for the plague. “Take this drug today, and your fever will be gone by the next morning,” he tells you. As it happens, the herb is an extract of bark from the Cinchona tree, recently imported from Brazil. It is a sure-fire cure for malaria, a disease endemic to the region. But you don't know any of this. Before you start treating your patients, you'll want to conduct a clinical trial. At this time, physicians knew nothing about the pathogenesis of malaria. Current thinking was that it was a disease caused by breathing insalubrious swamp vapors; hence the word roots “mal” meaning bad, and “aria” meaning air. You have just been handed a substance derived from the Cinchona tree, but you do not trust the herbalist. Insisting on a rational approach to the practice of medicine, you design a clinical trial, using 100 patients, all of whom have the same symptoms (delirium and fever) and all of whom carry the diagnosis of plague. You administer the cinchona powder, also known as quinine, to all the patients. A few improve, but most don't. You call the trial a washout. You decide not to administer quinine to your patients. What happened? We know that quinine arrived as a miracle cure for malaria. It should have been effective in a population of 100 patients. The problem with this hypothetical clinical trial is that the patients under study were assembled based on their mutual symptoms: fever and delirium. These same symptoms could have been accounted for by any of hundreds of other diseases that were prevalent at the time. The criteria employed to render a diagnosis of plague were imprecise, and the trial population was diluted with nonmalarial patients who were guaranteed to be nonresponders. Consequently, the trial

Principles of taxonomy  Chapter | 1  11 failed, and you missed a golden opportunity to treat your malaria patients with quinine, a new, highly effective, miracle drug. It isn't hard to imagine present-day dilemmas not unlike our fictitious quinine trial. If you are testing the effectiveness of an antibiotic on a class of people with bacterial ­pneumonia, the accuracy of your results will be jeopardized if your study population includes subjects with viral pneumonia, or smoking-related lung damage. The ­consequences of class ­blending are forever with us. It is impossible to conduct rational trials for appropriate targeted therapies when the trial groups are composed of blended classes of individuals [25]. The medical literature is rife with research of dubious quality, based on poorly designed classifications and blended classes. One caveat, efforts to reduce class blending can be counterproductive if undertaken with excess zeal. For example, in an effort to reduce class blending, a researcher may choose groups of subjects who are uniform with respect to every known observable property. For example, suppose you want to actually compare apples with oranges. To avoid class blending, you might want to make very sure that your apples do not include any kumquats or persimmons. You should be certain that your oranges do not include any limes or grapefruits. Imagine that you go even further, choosing only apples and oranges of one variety (e.g., Macintosh apples and Navel oranges), size (e.g., 10 cm), and origin (e.g., California). How will your comparisons apply to the varieties of apples and oranges that you have excluded from your study? You may actually reach conclusions that are invalid and irreproducible for more generalized populations within each class. In this case, you have succeeded in eliminating class blending, at the expense of losing representative populations of the classes. Burden of infectious diseases  Each year, 50–60 million people die worldwide. How many of these deaths can be attributed to infectious diseases? According to World Health Organization, in 1996, “Infectious diseases remain the world's leading cause of death, accounting for at least 17 million (about 33%) of the 52 million people who die each year” [26]. Of course, only a small fraction of infections result in death, and it is impossible to determine the total incidence of infectious diseases that occur each year, for all organisms combined. Still, it is useful to consider some of the damage inflicted by just a few of the organisms that infect humans. Malaria infects 500 million people. About 2 million people die each year from malaria [26]. About 2 billion people have been infected with Mycobacterium tuberculosis. Tuberculosis kills about 3 million people each year [26]. Each year, about 4 million children die from lung infections, and about 3 million children die from infectious diarrheal diseases [26]. Rotaviruses are one of many causes of diarrheal disease (Group III Viruses). In 2004, rotaviruses were responsible for about half a million deaths, mostly in developing countries [27]. Worldwide, about 350 million people are chronic carriers of Hepatitis B, and about 100 million people are chronic carriers of Hepatitis C. In aggregate, about one quarter (25 million) of these chronic carriers will eventually die from ensuing liver diseases [26]. Infectious organisms can kill individuals through mechanisms other than the direct pathologic effects of growth, invasion, and inflammation. Infectious organisms have been implicated in vascular disease. Organisms implicated in coronary artery disease and stroke include Chlamydia pneumoniae and Cytomegalovirus [28]. Infections caused by a wide variety of infectious organisms can result in cancer. About 7.2 million deaths occur each year from cancer, worldwide. About one-fifth of these cancer deaths are caused by infectious organisms [29]. Hepatitis B alone accounts

12  Taxonomic guide to infectious diseases for about 700,000 cancer deaths each year, from hepatocellular carcinoma [30]. Organisms contributing to cancer deaths include bacteria (Helicobacter pylori), animal parasites (schistosomes and liver flukes), and viruses (Herpesviruses, Papillomaviruses, Hepadnaviruses, Flaviviruses, Retroviruses, Polyomaviruses). Though fungal and plant organisms do not seem to cause cancer through human infection, they produce a m ­ ultitude of biologically active secondary metabolites (i.e., synthesized molecules that are not directly involved in the growth of the organism), some of which are potent carcinogens. For example, aflatoxin produced by Aspergillus flavus is possibly the most powerful carcinogen ever studied [31]. In aggregate, infectious diseases are the number one killer of humans worldwide, and contribute to vascular disease and cancer, the two leading causes of death in the most developed countries. These observations clearly indicate that every health-care professional, not just infectious disease specialists, must understand the biology of infectious organisms. Clade  A clade consists of a monophyletic class and all of its descendant monophyletic classes. A clade should be distinguished from a lineage, the latter being the list of a class's ascendant superclasses. Because a class can have more than one child class, a pictogram of a clade will often look like a branching tree. In a classification, where each class is restricted to one parent class, ascending lineages are represented as a nonbranching line of ancestors, leading to the root (i.e., top class) of the classification. Cladistics  The technique of producing a hierarchy of clades, wherein each clade is a monophyletic class. In this book, we define a classification so that it conforms to the rules of cladistics. Hence, a classification is cladistic and clades are equivalent to classes and subclasses. The terms “cladistics” and “clade,” enjoyed by taxonomists, are omitted from the text. Instead, we employ the “class” terminology (e.g., class, subclass, child class, parent class) that is preferred by bioinformaticians and computer scientists who rely upon object-oriented programming languages [32]. Class  A defined group within a taxonomy. The most familiar classes in biological taxonomy are the classes that form the ranked hierarchy of living organisms: Kingdom, Phylum, Class, Order, Family, Genus, and Species. It is somewhat confusing that one of the classes of organisms is “Class,” and another of the classes is named “Order.” This means that when the terms “Class” or “Order” appear in a sentence, the reader must somehow ­distinguish between the general term and the specific term. In this book, classes are unranked. The word “class,” lowercase, is used as a general term. The word “Class,” uppercase, followed by an uppercase animal division (e.g., Class Animalia), represents a group within the taxonomy. In the biological hierarchy, each class has exactly one direct ancestor class (also called parent class or superclass), though an ancestor class can have more than one direct descendant class (also called child class, or subclass). Classification versus ontology  A classification is a system in which every object in a knowledge domain is assigned to a class within a hierarchy of classes. The properties of superclasses are inherited by the subclasses. Every class has one immediate superclass (i.e., ­parent class), although a parent class may have more than one immediate subclass (i.e., child class). Objects do not change their class assignment in a classification, unless there was a mistake in the assignment. For example, a rabbit is always a rabbit, and does not change into a tiger. A classification should be distinguished from an ontology. In an ontology, a class may have more than one parent class and an object may be a member of more than one class. A classification can be considered a restrictive and simplified form of ontology wherein

Principles of taxonomy  Chapter | 1  13 each class is limited to a single parent class and each object has membership in one and only one class [33]. Convergence  When two species independently acquire an identical or similar trait through adaptation; not through inheritance from a shared ancestor. Examples are: the wing of a bat and the wing of a bird; the opposable thumb of opossums and primates; and the beak of a platypus and the beak of a bird. Human ancestral lineage  Here is the ancestral lineage of human beings, beginning with the earliest indication of life on earth. Wherever possible, the major classes of organisms are annotated with a very approximate chronology. It is useful to have some notion of the time interval between classes of ancestral organisms, even if it is somewhat inaccurate. Earliest indication of life 4100 mya Prokaryota (3900 mya) Eukaryota (2100–1000 mya) Podiata Unikonta (Amorphea) Obazoa Opisthokonta Holozoa (1300 mya) Apoikozoa (950 mya) Metazoa (760 mya) Eumetazoa (Diploblasts, Histozoa, Epitheliozoa) (635 mya) ParaHoxozoa Planulozoa Bilateria (Triploblasts) (555 mya) Nephrozoa (555 mya) Deuterostomia (Enterocoelomates) (540 mya) Chordata (530 mya) Craniata (480 mya) Vertebrata (500 mya) Gnathostomata (419 mya) Euteleostomi Sarcopterygii (419 mya) Dipnotetrapodomorpha Tetrapodomorpha (390 mya) Tetrapoda (367 mya) Amniota (340 mya) Synapsida (308 mya) Mammalia (220 mya) Theriiformis Theria (160 mya) Eutheria (160–125 mya) Boreoeutheria (124–101 mya) Euarchontoglires (100 mya) Euarchonta (99–80 mya) Primatomorpha (79.6 mya) Primates (75 mya) Haplorrhini (63 mya) Simiiformes (40 mya) Catarrhini (30 mya) Hominoidea (28 mya) Hominidae (15 mya)

14  Taxonomic guide to infectious diseases Homininae (8 mya) Hominini (5.8 mya) Hominina (4 mya) Homo (2.5 mya) Homo sapiens (0.3 mya) Homo sapiens (modern) (0.07 mya)

Incidence  The number of new cases of a disease occurring in a time interval (e.g., 1 year), expressed as a fraction of a predetermined population size (e.g., 100,000 people). For example, if there were 11 new cases of a rare disease occurring in a period of 1 year, in a population of 50,000 people, then the incidence would be 22 cases per 100,000 persons per year. Infectious disease  A disease caused by an organism that enters the human body. The term “infectious disease” is sometimes used in a way that excludes diseases caused by parasites. In this book, the parasitic diseases of humans are included among the infectious disease. The term “infectious disease” is often used interchangeably with “infection,” but the two terms are quite different. It is quite possible to be infected with an organism, even a pathogenic organism, without developing a disease. In point of fact, the typical human carries many, perhaps dozens, of endogenous pathogenic organisms that lie dormant under most circumstances. Examples are: Pneumocystis jiroveci (the fungus that causes pneumonia in immunodeficient individuals), Varicella (the virus that may, when the opportunity arises, erupt as shingles), Aspergillus species (which not uncommonly colonize the respiratory tract, producing pneumonia in a minority of infected individuals), Candida (a ubiquitous fungus that lives on skin and in mucosal linings and produces diseases of varying severity in a minority of infected individuals). LUCA  Abbreviation for Last Universal Common Ancestor, also known as the cenancestor. Assuming that all organisms on earth descend from a common ancestor, then LUCA is the most recent population of organisms from which all organisms now living on Earth have a common descent. LUCA is thought to have lived 3.5–3.8 billion years ago [34]. Monophyletic class  A class of organisms that includes a parent organism and all its descendants, while excluding any organisms that did not descend from the parent. If a subclass of a parent class omits any of the descendants of the parent class, then the parent class is paraphyletic. If a subclass of a parent class includes organisms that did not descend from the parent, then the parent class is polyphyletic. A class can be paraphyletic and polyphyletic, if it excludes organisms that were descendants of the parent and if it includes organisms that did not descend from the parent. The goal of cladistics is to create a hierarchical classification that consists exclusively of monophyletic classes (i.e., no paraphyly, no polyphyly). Organism  A living entity that is composed of identifiable parts that act in concert to perform some measurable action(s). This definition permits us to think of organisms as biological systems confined to a defined structure. This definition runs into some trouble when we observe organisms that are composed of other organisms. For example, all animals are composed of cells. Both the animal and its component cells satisfy the definition of an organism. By convention, we allow the composite organism (e.g., a human) to subsume the component organisms (e.g., the liver cell). It's worth remembering that we can preserve the life of individual cells of an organism long after the composite organism has died (e.g., tissue culture or freezing). It

Principles of taxonomy  Chapter | 1  15 is currently well within the realm of our imaginations that we can reconstruct a composite organism (more precisely, its genetic equivalent) from stored individual cells that have been induced to become totipotent stem cells. Parent class  The immediate ancestor or the next-higher class (i.e., the direct superclass) of a class. For example, in the classification of living organisms, Class Vertebrata is the parent class of Class Gnathostomata. Class Gnathostomata is the parent class of Class Teleostomi, and so on. Pathway According to traditional thinking, a pathway is a sequence of biochemical reactions, involving a specific set of enzymes and substrates that produces a chemical product or that fulfills a particular function. The classic pathway is the Krebs cycle. It was common for students to be required to calculate the output of the cycle (in moles of ATP) based on stoichiometric equations employing known amounts of substrate. As we learn more and more about cellular biology, the term “pathway” acquires a broader meaning. One pathway may intersect or subsume other pathways. Furthermore, a pathway may not be constrained to an anatomically sequestered area of the cell, and the activity of a pathway may change from cell type to cell type or may change within one cell depending on the cell's physiologic status. The individual enzymes that participate in a pathway may have different functions, in alternate pathways. New pathways evolve by recruiting enzymes from various preexisting pathways [35]. The many ways in which the component parts of a pathway can be assigned has led to an inflation of pathway networks. When we assume that a published pathway represents a specific and uniform cellular process, we may easily draw false inferences, leading to unverifiable claims [36]. In general, the term “pathway” is best used as a convenient conceptual device to organize classes of molecules that interact with a generally defined set of partner molecules to produce a somewhat consistent range of biological actions. Pathway-driven disease  Refers to disorders whose clinical phenotype is largely the result of a single, identifiable pathway. Diseases with similar clinical phenotypes can often be grouped together if they share a common, disease-driving pathway. Examples include the channelopathies (driven by malfunctions of pathways involving the transport of ions through membrane channels), ciliopathies (driven by malfunctions of cilia), and lipid receptor mutations (driven by any of the mutations involving lipid receptors). Certain types of conditions do not fall easily into the “pathway-driven” paradigm. For example, it is difficult to speak of a class of diseases all driven by errors in transcription factor pathways. A single transcription factor may regulate pathways in a variety of cell types with differing functions and embryologic origins. Hence, the syndromes resulting from a mutation in a transcription factor may involve multiple pathways and multiple tissues and will not have any single, identifiable pathway that drives the clinical phenotype. At this point, our ability to sensibly assign diseases to pathways is limited because the effects of a mutation in a single gene may indirectly affect many different pathways, and those pathways may vary from cell-type to cell-type. There is some hope that as more cell-based data becomes available, modern data analysis techniques will reliably match specific diseases with specific pathways [37]. Survival of the fittest  Phrase was first used by Herbert Spencer, a contemporary of Darwin's, in his Principles of Biology (1864), who referred to natural selection as a process that favored the survival of the fittest “races” (Spencer's terminology). The term was not intended to refer to the survival of the fittest individuals of a species. Moreover, fitness, as it applies to species, refers to the ability of the species to speciate, to produce a diverse

16  Taxonomic guide to infectious diseases class of descendant species over time. There is nothing in the theory of evolution through natural selection that specifically addresses the issue of the survival of individuals in the species. Synapomorphy  A trait found in the members of a class and its subclasses (i.e., shared by the species descending from the ancestral species in which the trait first appeared). Taxonomy  When we write of “taxonomy” as an area of study, we refer to the methods and concepts related to the science of classification, derived from the ancient Greek taxis, “arrangement,” and nomia, “method.” When we write of “a taxonomy,” as a construction within a classification, we are referring to the collection of named instances (class members) in the classification. To appreciate the difference between a taxonomy and a classification, it helps to think of taxonomy as the scientific field that determines how the different members within the classification are named. Classification is the scientific field that determines how related named members are assigned to classes, and how the different classes are related to one another. A taxonomy is similar to a nomenclature; the difference is that in a taxonomy, every named instance must have an assigned class.

References [1] Darwin C. Origin of the species by means of natural selection, or the preservation of favoured races in the struggle for life. London: John Murray; 1859. [2] Whitman WB, Coleman DC, Wiebe WJ. Prokaryotes: the unseen majority. Proc Natl Acad Sci 1998;95:6578–83. [3] Schloss PD, Handelsman J. Toward a census of bacteria in soil. PLoS Comput Biol 2006;2:e92. [4] Mora C, Tittensor DP, Adl S, Simpson AGB, Worm B. How many species are there on earth and in the ocean? PLoS Biol 2011;9:e1001127. [5] Suttle  CA. Environmental microbiology: viral diversity on the global stage. Nat Microbiol 2016;1:16205. [6] Suttle  CA. Marine viruses: major players in the global ecosystem. Nat Rev Microbiol 2007;5:801–12. [7] Mihara  T, Koyano  H, Hingamp  P, Grimsley  N, Goto  S, Ogata  H. Taxon richness of “Megaviridae” exceeds those of bacteria and archaea in the ocean. Microbes Environ 2018;33:162–71. [8] Anthony  SJ, Epstein  JH, Murray  KA, Navarrete-Macias  I, Zambrana-Torrelio  CM, Solovyov  A, et  al. A strategy to estimate unknown viral diversity in mammals. mBio 2013;4(5):2013. e00598-13. [9] Lutzoni  F, Pagel  M, Reeb  V. Major fungal lineages are derived from lichen symbiotic ancestors. Nature 2001;411:937–40. [10] Baron EJ, Allen SD. Should clinical laboratories adopt new taxonomic changes? If so, when? Clin Infect Dis 1993;16:S449–50. [11] Locey KJ, Lennon JT. Scaling laws predict global microbial diversity. Proc Natl Acad Sci U S A 2016;113:5970–5. [12] Raup DM. A kill curve for Phanerozoic marine species. Paleobiology 1991;17:37–48. [13] Pearson K. The grammar of science. London: Adam and Black; 1900. [14] Scamardella JM. Not plants or animals: a brief history of the origin of Kingdoms Protozoa, Protista and Protoctista. Int Microbiol 1999;2:207–16. [15] Simpson GG. Principles of animal taxonomy. New York: Columbia University Press; 1961. [16] Koga Y, Kyuragi T, Nishihara M, Sone N. Archaeal and bacterial cells arise independently from noncellular precursors? A hypothesis stating that the advent of membrane phospholipid

Principles of taxonomy  Chapter | 1  17

[17] [18] [19] [20] [21] [22] [23] [24] [25]

[26] [27] [28] [29] [30] [31]

[32] [33]

[34] [35] [36] [37]

with enantiomeric glycerophosphate backbones caused the separation of the two lines of descent. J Mol Evol 1998;46:54–63. Forterre P, Gaia M. Giant viruses and the origin of modern eukaryotes. Curr Opin Microbiol 2016;31:44–9. Lane N. Life ascending: the ten great inventions of evolution. London: Profile Books; 2009. Forterre P. Three RNA cells for ribosomal lineages and three DNA viruses to replicate their genomes: a hypothesis for the origin of cellular domain. PNAS 2006;106:3669–74. Filee J. Multiple occurrences of giant virus core genes acquired by eukaryotic genomes: the visible part of the iceberg? Virology 2014;466–467:53–9. Forterre P. The two ages of the RNA world, and the transition to the DNA world: a story of viruses and cells. Biochimie 2005;87:793–803. Ernst DQK. Mayr and the modern concept of species. PNAS 2005;102(suppl 1):6600–7. Berman  J. Precision medicine, and the reinvention of human disease. Cambridge, MA: Academic Press; 2018. Berman JJ. Evolution's clinical guidebook: translating ancient genes into precision medicine. Cambridge, MA: Academic Press; 2019. Committee on A Framework for Developing a New Taxonomy of Disease, Board on Life Sciences, Division on Earth and Life Studies, National Research Council of the National Academies. Toward precision medicine: building a knowledge network for biomedical research and a new taxonomy of disease. Washington, DC: The National Academies Press; 2011. The state of world health. Chapter 1 in World Health Report 1996. World Health Organization. Available from: http://www./whr/1996/en/index.html; 1996. Weekly epidemiological record. World Health Org 2007;32:285–96. Muhlestein  JB, Anderson  JL. Chronic infection and coronary artery disease. Cardiol Clin 2003;21:333–62. zur Hausen H. Infections causing human cancer. Hoboken: John Wiley and Sons; 2006. DNA transforming viruses. Microbiology bytes; 2004. Wales  JH, Sinnhuber  RO, Hendricks  JD, Nixon  JE, Eisele  TA. Aflatoxin B1 induction of hepatocellular carcinoma in the embryos of rainbow trout (Salmo gairdneri). J Natl Cancer Inst 1978;60:1133–9. Berman JJ. Ruby programming for medicine and biology. Sudbury, MA: Jones and Bartlett; 2008. Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR, et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 2001;294:1719–23. N  G, Xu  Y, Labedan  B. The last universal common ancestor: emergence, constitution and genetic legacy of an elusive forerunner. Biol Direct 2008;3:29. Copley RR, Bork P. Homology among (beta-alpha)8 barrels: implications for the evolution of metabolic pathways. J Mol Biol 2000;303:627–40. Rog CJ, Chekuri SC, Edgerton ME. Challenges of the information age: the impact of false discovery on pathway identification. BMC Res Notes 2012;5:647. Greene CS, Troyanskaya OG. Chapter 2: data-driven view of disease biology. PLoS Comput Biol 2012;8:e1002816.

Chapter 2

Species and speciation Section 2.1 A species is a biological entity The purpose of narrative is to present us with complexity and ambiguity. Scott Turow

It has been argued that nature produces individuals, not species; the concept of species being a mere figment of the human imagination, created for the convenience of taxonomists who need to group together similar organisms. In point of fact, there are many excellent reasons to believe that species are biological entities, on equal or better scientific footing than individual organisms. The justification for the species concept follows. 1. Species are well defined, and membership within a species is immutable. Early definitions of species were fashioned to exclude most organisms, including all bacteria, all unicellular eukaryotes, and all fungi. One long-held definition for a species was that it was a class of animals that shared main characteristics and that could breed with one another. Aside from excluding the vast majority of living organisms, this early definition didn't do much to help explain how species came into existence, and did not inform us how to choose the main characteristics that determined membership in a species. The modern definition of species can be expressed in three words: “evolving gene pool” [1]. This elegant definition is a simple concept to comprehend and to defend and serves to explain how new species come into existence (i.e., by collecting a new gene pool) [1–3]. Because each member of a species has a genome constructed from its unique species gene pool, it is clear that membership within a species is immutable (e.g., a fish cannot become a cat and a cat cannot become a goat because their genomes come from the unique gene pools of their respective species). 2. Membership within a species is biologically determined for every living organism. It is interesting to note that we humans have no trouble grasping the idea that individual organisms (such as ourselves) are distinct biological entities, while

Taxonomic Guide to Infectious Diseases. https://doi.org/10.1016/B978-0-12-817576-7.00002-X © 2019 Elsevier Inc. All rights reserved.

19

20  Taxonomic guide to infectious diseases

we tend to think of our species as belonging to some sort of life continuum, with no sharp boundaries between one species and another species of the same class (e.g., two species of similar-looking frogs). In point of fact, each of us is a chimeric organism, with a diploid part (what we see as our functioning bodies), and a haploid part (our germ cells that can potentially recombine with other germ cells to produce endless generations of diploid/haploid chimeric organisms). Furthermore, each of us is composed of billions of cells that can live and replicate independently (e.g., in tissue culture, or as a transplant). We are individuals only in the sense that we choose to think of ourselves as such. We can assert that a species has a real biological domain, inasmuch as if we eliminate a species, then every member of the species must perish. 3. Species respond biologically to natural selection. Natural selection operates on the gene pool of a species, changing the balance of available genes. Hence, species, not individuals, are influenced biologically, by natural selection. 4. Species live and die and have a primary biological purpose: speciation. Millions of species occupy the biosphere [4]. The observed diversity of species suggests that the purpose and destiny of a species is to speciate; to produce new offspring species. Hence, the success of a species is not determined by whether it has produced lots of individuals of the species, or whether it has persisted for a long time, but whether it has produced a descendant lineage of new species. In summary, species have the properties associated with every living entity: uniqueness, life, death, and the issuance of progeny.

Section 2.2 The biological process of speciation One of the most fundamental goals of modern biological research is comprehension of the way in which species arise. George Gaylord Simpson (1902–84), in 1945 [5]

George Gaylord Simpson, a mid-20th century evolutionary biologist and taxonomist, had a gift for posing some of the best and enduring questions in his field. It is intriguing that one year after Simpson explained the importance of solving the mystery of how species arise, the science fiction author, Ray Bradbury, inadvertently found the answer. In “The Million-Year Picnic,” one of the stories in “The Martian Chronicles,” Bradbury relates how a man takes his family from our planet, to live permanently on Mars. Once there, they burn the rocket that transported them, so that they can never return to earth. One evening, the father tells his children that he is going to take them to see the Martians. They walk together to a canal, and the father shows them their own reflections in the water.

Species and speciation  Chapter | 2  21

The family had become, by virtue of leaving the earth behind, a new species; a species of Martians. The family depicted by Ray Bradbury had separated itself from the gene pool of earthlings, and had established, along with the other Martian colonists, a gene pool that would evolve in its own manner, to produce a species that would, over time, become obviously different from its parent species: Homo sapiens. The Martians might develop a horny coat to protect against cosmic radiation, or hypertrophied tear glands to protect against dust storms, or any number of modifications to cope with the toxic Martian atmosphere. Regardless, we can be certain that the gene pool available to the colonizing Martians would evolve differently than the gene pool of earthlings. 1. Species speciate. Individuals do not. On the subject of science fiction, you doubtless recall the 25th episode of the third season of Star Trek: The Next Generation, titled, “Transfigurations.” In this episode, a Zalkonian named “John” takes refuge aboard the Enterprise to escape pursuit by Captain Sunad, a zealous Zalkonian determined to capture John and return him to Zalkon, where John will be punished for crimes unspecified. During the ensuing drama, and between commercial breaks, John acquires strange powers, including the power to heal. In the last moment, John evolves into a being of pure energy, and flies under his own power through space. Presumably, he is headed back to Zalkon, where he will be the father of a new species of energetic Zalkonians, like himself. Of course, this book on taxonomy is focused on terrestrial entities, but it should be apparent that evolution on the planet Zalkon is very different from evolution here on earth. For starters, earthling organisms never speciate, the reason being that a species is an evolving gene pool. Individuals contribute only one set of genes to a pool composed of the genes of many individuals. Hence, a single individual cannot evolve to become a new species. Can a single organism account for a diverse collection of subspecies? Yes, and we suspect that it happens all the time on islands or in geographic areas isolated by mountains, rivers, or habitat barriers. The key point to remember is that the founder of a new species is not itself a new species; it is simply the first contributor to a gene pool that will grow and evolve, to produce new species. To understand how gene pools speciate, imagine a female primate of Class Strepsirrhini, the parent class of Class Lemuriformes, desperately clinging to a tree limb and washed out to sea, only to turn up on the shore of Madagascar. After a few days, she assesses her situation and determines that she is the only bona fide member of Class Strepsirrhini on the island. Adding to the drama, she discovers that she is pregnant. She hopes it's a boy, maybe twins. If she and her brood survive, and if the children successfully breed, they will establish a new gene pool from which species will evolve. In fact, a scenario something like the one described has been suggested for the origin of all the species of lemur on Madagascar. The reasoning is simple: lemurs are found only on the island of Madagascar and on the Comoros Islands, northwest of Madagascar. Hence, they must have arisen on the island from an ancestral species that lived

22  Taxonomic guide to infectious diseases

only on the island. The nonlemur members of Class Lemuriformes, the galagos and lorisids, live in Africa and Asia. Hence, the founding parent of the lemurs of Madagascar most likely came from Africa. The important point here is that the member of Class Lemuriformes who landed in Madagascar was nothing special. Unlike John the Zalkonian, she didn't transform into another species. Her only claim to fame was that she contributed to a gene pool. The gene pool evolved, and the process of evolution required nothing more than natural selection of pooled genes. Eventually, all the different species of lemur now living in Madagascar came to be. Should we give credit to the original Madagascar primate for all the species of lemur living on earth today? Technically, no. We owe credit to the gene pool that she helped fill. Had there been a different pregnant primate, arriving under the same set of circumstances, we would have no lemurs, but we would have a different collection of species in their place. Evolutionary biologists suspect that just as one strepsirrhine primate may account for all the different types of lemur on the island of Madagascar, one caviate rodent may have been responsible for all the extant rodents in South America. 2. Species can only speciate into something that they already are. In this book, we stress the point that we, as humans, are members of every class of organisms in our ancestral lineage. Hence, we are eukaryotes, we are deuterostomes, we are craniates, and so on. Furthermore, the assertion that humans are eukaryotes must not be interpreted as a didactic device, intended to remind us of our early roots. In point of fact, we actually are eukaryotes, the reason being that the gametes in every organism represent the true progeny of the gametes in the organism's ancestry. Skeptics are counseled to follow the history of their own gametes backwards through evolution. The gametes in our bodies are the progeny of the fusion of two parental gametes. This process can be followed iteratively up through the chain of ancestors within a species, and up through the gametes of the parent species, and on and on, until reaching the root eukaryote, a single-celled organism that was, for all practical purposes, “all gamete.” This explains why we have hundreds of the same core genes found in all eukaryotic organisms, and why we are dues-paying members of Class Eukaryota, Class Deuterostomia, and Class Craniata, down the line to Class Homo sapiens. 3. Speciation is inevitable. All species that survive will speciate, given time. Eventually, something will happen that isolates a subpopulation (e.g., migration to an island, change in the habitat in a geographic region, even a new highway that splits a population). New species begin at the moment when a population is split, even if there are no outward signs of change. The horseshoe crab is often touted as a species that never speciates, insofar as the horseshoe crabs living today look much like the

Species and speciation  Chapter | 2  23

horseshoe crabs that lived 100 million years ago. Nevertheless, the living horseshoe crabs belong to at least four modern species, with their own phylogenies determined by molecular analysis [6] (Fig. 2.1). Over time, and after a population has split off and isolated itself, small variations in the separate gene pools will produce subtle differences in the respective members of each population [7]. We can guess that several million years ago, some horseshoe crabs split from the herd, and started their own species. 4. We can learn about class properties by studying divergent sister classes. There are many instances in which we have lost the living representatives of ancestral sister classes through extinction. In some cases, we risk losing sister classes through inattention and negligence. Currently, there is one living species that represents the sister class to all other angiosperms (i.e., flowering plants). This species, Amborella trichopoda, is an unassuming shrub found only on the small Pacific island of New Caledonia. It is feared that Amborella trichopoda is on the brink of extinction. A botanist wrote, in a 2008 PhD thesis (translated from the French), “The disappearance of Amborella trichopoda would imply the disappearance of a genus, a family and an entire order, as well as the only witness to at least 140 million years of evolutionary history,” [8] [Glossary Sister class and cousin class]. 5. Speciation does not require new genetic mutations. As remarked previously, when a group of individuals, each representing a sampling from the species' gene pool, wander off somewhere and mate exclusively with one another, they are creating their own evolving gene pool; hence their own species. The initial gene pool of the new species is a subset of the gene pool of the parent species, and the members of the new species are ­indistinguishable,

FIG. 2.1  Horseshoe crabs (Limulus polyphemus), an example of an animal that has evolved, but slowly. (Source, Wikipedia, and released into the public domain by its author, Breese Greg, of the US Fish and Wildlife Service.)

24  Taxonomic guide to infectious diseases

as a group, from the members of the parent species. Hence, the creation of the new species did not require the acquisition of any new genetic mutations. It's reasonable to ask, “If the new species is indistinguishable from the parent species, and has a gene pool that was present in the parent species, then how can we possibly assert that a new species was created?” Gradually the gene pool of the new species will evolve, accumulating new variants of genes, and serving as the genetic material for individuals who will look less and less like the parent species, over time. The greatest threat to the survival of a new species is crossmating between members of the parent species and the child species; this would result in a mixing of the genes, and we would no longer have two separately evolving gene pools. Just for fun, let's assume that we are wrong, and that each new species arises from a new gene that evolved from the parent species, rendering the offspring sufficiently different to earn themselves a place in the list of terrestrial species. If this were the case, then we should be able to identify the “squirrel” gene, the “tulip” gene, and the “mosquito” gene that distinguishes each of these species from its parent species and from every other species on earth. Of course, this is an impossibility. We cannot identify the defining “species gene” or the set of genes that is characteristic for any species on earth. We cannot even find the gene that separates human from gorilla. If our species were defined by the acquisition of one new gene, then a loss-of-function mutation in our “species gene” would cause affected individuals to regress back to our ancestral species. We would have Neanderthals walking among us. Of course, this is nonsensical. There simply is no such thing as the “human gene.” What, then, distinguishes one species from another? The answer is “the gene pool.” Humans have their own gene pool. Squirrels have their gene pool. Tulips have their gene pool. It is as simple as that.

Section 2.3 Diverse forms of diversity I always wanted to be somebody, but now I realize I should have been more specific. Lily Tomlin

When we think about biological diversity, there is a tendency for each of us to contemplate the issue in relation to our own chosen field of interest. A zoologist is likely to think of diversity in the number and behavior of living animal species. A geneticist will consider the totality of different functional genes available to the biosphere. A chemist might think in terms of all the different molecular species synthesized by living organisms. Of course, the different modes of diversity are biologically related. For example, a species is basically an evolving gene pool, and each species will eventually contribute its gene pool to the number of genes cataloged in a large genomic databases. Due to the overwhelming ­complexity

Species and speciation  Chapter | 2  25

of trying to think in terms of all the types of biological diversities, as they relate to one another, it is probably best to focus on the most familiar concepts, one by one: species diversity, genetic and proteomic diversity, regulatory diversity, structural (i.e., anatomic and cytologic) diversity, and chemical diversity [9]. Diversity of Species As discussed in Section 1.1, “The Consequence of Evolution is Diversity,” there are many millions of species of organisms that live on our planet. The estimates vary from a low number of 10 million to a high number of 1 trillion [10]. From this staggeringly large number of terrestrial species, we can infer that speciation is a relatively easy, almost inevitable, process. We can see several reasons why species tend to diversify over time: –1. Species need other species. Aside from the many carnivorous organisms that prey on other living organisms, there are many organisms that live off dead, decaying, or fully decayed organisms. Even plants that live off of sunlight and water and carbon dioxide rely on soil nutrients containing the decomposed detritis of formerly living species. –2. The term “fittest” has no absolute meaning. An organism that is more fit for survival under one set of conditions may be totally unfit under another set of circumstances (i.e., extremes of weather, susceptibility to infection, diminished food supplies). The best way of providing a fit organism, for any environmental condition, is to produce lots and lots of species, expecting that a few of them will be suited to the new environment [Glossary Susceptibility]. –3. Third, and most important, species speciate. A species will always produce another species if the genetic and environmental conditions permit (as discussed in the earlier sections of this chapter). Speciation is something that some classes of organisms seem to do much better than others. For example, there are over 350,000 species of beetle, exceeding the combined number of plant species (250,000) plus roundworm species (12,000) plus mammals (4000). Vertebrates are biological underachievers, compared with beetles, when it comes to speciation. It would seem that the ability to produce other species is itself a trait held by species, and this trait is referred to by the term “evolvability.” One of the most evolvable vertebrates is the cichlid (Class Cichlidae), many species of which are found in home aquaria. There are about 2000 known species of cichlid, and hundreds of different species can be found swimming together in African lakes. The hundreds of cichlid species in Lake Victoria took an estimated 15,000–100,000 years to radiate (i.e., to diversify from a single founder species), a very short time span by evolutionary standards [11]. It is impossible to determine any specific factor that renders a species capable of diversifying. In the case of cichlids, nearly every variable examined in the ­geographic

26  Taxonomic guide to infectious diseases

and ecological history of this fish would seem to encourage diversity (populations that expand and contract, lakes that swell up and dry out, changes in flora and fauna cohabiting the lake, etc.). On a molecular level, cichlids are endowed with a genome containing a large number of gene duplications, an abundance of noncoding elements, and many novel microRNAs [11]. Any of these factors may have played an important role in the adaptive radiation of cichlid species. This serves as an example of the relationship between genetic diversity and species diversity. Gene diversity The earth's proteome consists of all the different protein-coding genes on the planet. The estimates of the planetary proteome vary widely. The lowest number seems to be 5 million [12]. Elsewhere, we read that the human intestine contains about 40,000 species of bacteria producing a whopping 9 million unique bacterial genes [13–15]. It seems plausible that before all the counts come in, we'll find that there are billions of genes in the total collection. The human genome contributes a meager 20–25 thousand genes to the proteome. Other, seemingly less complex, animals have a genetic repertoire larger than humans. For example, the nearly microscopic crustacean Daphnia pulex (the water flea) has 31,000 genes. Plants tend to have way more genes than animals. For example, rice contributes an estimated 46–56 thousand genes to Earth's proteome [16]. Chemical diversity We must begin by confessing that the metazoans (animals), as a class, are metabolic underachievers in terms of the chemical and metabolic diversity [17]. The prokaryotes are far more advanced [18]. Among the eukaryotes, only Class Archaeplastida (i.e., plants) and Class Fungi seem to be making any effort to impress [19, 20]. The eukaryotes largely rely on endosymbiotic relationships with current or former prokaryotes to perform complex biosyntheses (e.g., mitochondria and chloroplasts captured from former bacteria). Otherwise, eukaryotes are saddled with the rather hum-drum tasks of synthesizing organelles and membranes and manipulating the various ingredients for cellular life (carbohydrates, structural and enzymatic proteins, lipids, and nucleic acids). These fundamental chemical constituents of living organisms were established about 3 or 4 billion years ago, and haven't changed much since. Some estimates suggest that the first fungi appeared as early as 1.3 billion years ago, while the first land plants may have evolved 700 million years ago. The first fossils of vascular land plants are dated to about 480 million years ago, just after the end of the Cambrian explosion (about 500 million years ago). Regardless of the timing, we can surmise that following the Cambrian explosion, plants, fungi, and metazoans were obliged to evolve their own coping mechanisms for cohabitation [Glossary Cambrian explosion].

Species and speciation  Chapter | 2  27

Whereas animals rely on their body structure for both aggressive and defensive activities (e.g., running after prey and running away from predators), plants and fungi rely on their ability to synthesize bioactive chemicals that act as respiratory poisons (e.g., cyanide), neuro-muscular agents (e.g., nicotine), irritants (e.g., capsaicin), and a host of other chemical warfare tactics. When we eat plants and mushrooms, we can expect to ingest some of the chemicals that were created with the intention of killing us. For example, cycasin is a toxin and carcinogen found in the seeds and the pollen of every class of cycad tree [21]. Among the fungi, Aspergillus flavus, a ubiquitous fungus found growing on peanuts and other crop plants, synthesizes aflatoxin, one of the most powerful liver carcinogens known [22]. Peanut butter manufacturers take great pains to insure that peanuts are harvested under conditions that minimize their contamination with aflatoxin, carefully monitoring the amounts of aflatoxin in manufactured peanut butter to ensure that batches exceeding an allowed level will never reach the market [Glossary Carcinogen, Carcinogenesis]. Fungi produce alpha-amanitin, a strong, often fatal toxin, produced by the mushroom Amanita phalloides. Another fungal product is gyromitrin, a hydrazine compound present in most members of the common False Morel genus. Nobody is quite sure what effects gyromitrin and other related hydrazine molecules may have on human consumers. Weaponized molecules play no role in the primary functions of plant and fungal cells (i.e., do not participate in cellular physiologic processes), and are referred to as secondary metabolites or as idiolites [19]. The terminology conveys the idea that if all the secondary metabolites in a plant cell or a fungal cell were to disappear, then the cells would survive happily, provided that no predatory organisms spoiled the fun. Secondary metabolites account for a large portion of the chemical diversity in bacteria, plants, and fungi [20]. Plants devote 15%–25% of their genes to producing enzymes involved in the synthesis of secondary metabolites; of which several hundred thousands have been reported [23]. We presume that every secondary metabolite is bioactive under some set of circumstances; otherwise, the synthetic method for creating the chemical would not have evolved. In point of fact, a good bit of medicinal chemistry, as it was pursued in the 20th century, consisted of finding appropriate secondary metabolites from bacteria, fungi, or plants that would have some utility in the prevention or treatment of human diseases. Structural diversity The one area of diversity wherein animals seem to take the lead is structural diversity. Structural diversity really took off in the Cambrian explosion, when nearly all the extant metazoan body plans were established. Although there were multicellular animals that lived prior to the Cambrian, it would seem that the dominant eukaryotes were standard issue unicellular organisms. These organisms varied in terms of size, shape, and external structures (e.g., wavy

28  Taxonomic guide to infectious diseases

­ embranes, pseudopods, undulipodia, and cilia), but couldn't compete with the m diversity of structures that arose in the Cambrian explosion. The key event that propelled the attainment of structural diversity in metazoans was almost certainly the evolution of specialized junctions, particularly the desmosome, that uniquely characterize animal cells. The desmosomes act like rivets, and soft animal cells serve as somewhat modular building blocks that can be assembled into almost any shape and size imaginable (e.g., ducts, glands, acini, cavities, membranes). The application of cuticles and other hardened tissue, from collagen, keratin, or chitin; and the synthesis of bone from hydroxyapatite deposited into proteinaceous matrices, allowed the animal kingdom to produce a multitude of species with variable shaped outer and inner structures. The field of paleontology has, until recently, been devoted to understanding the characteristic design of hard structures that distinguish one animal from another. Among the animals, the holometabolous insects probably take the prize when it comes to structural diversity, insofar as a single organism may pass through multiple stages of life, each having a its own structural morphology [Glossary Holometabolism]. Fungi and plants display a fair amount of structural diversity; usually simple variations on common themes (e.g., stalk, leaves, flowers). With few exceptions, the plant kingdom does a much better job with colors than does the animal kingdom. Plants rely on flavonoids (particularly anthocyanins) and carotenoids to produce their wonderful and vivid colors. For the most part, animals have a single pigment molecule, melanin, as their primary source of coloration. An assortment of colors can be coaxed from the melanin molecule by controlling the concentration and spatial distribution of melanin within cells, and by making small modifications to the base molecule. Other colors, such as the red of hemoglobin, are produced with iron and other metal cofactors bound to proteins [Glossary Cofactor]. Regulatory diversity Relatively early in eukaryotic phylogeny, cells evolved a diverse methodology for regulating their genomes. This would include the evolution of the epigenome, wherein the DNA of genetic material is modified by base methylations, and these modifications are themselves modified at every step of cell-type development. Chromatin, the structural backbone of the genome, is modified by the attachment of proteins (histones and nonhistone varieties), and by the wrapping of units of DNA into tight nucleosomes. There are numerous ways in which chromatin is modified, including remodeling factors, histone deacetylases, ­heterochromatin-binding proteins, and topoisomerases [24]. Aside from the complexities of the epigenome, there are a host of genome modifiers that micromanage every aspect of gene expression, including transcription (e.g., transcription factors, promoters, enhancers, silencers, pseudogenes, siRNA, miRNA, competitive endogenous RNAs), posttranscription

Species and speciation  Chapter | 2  29

(splicing, RNA silencing, RNA polyadenylation, mRNA stabilizers), translation (e.g., translation initiation factors, ribosomal processing), and posttranslational protein modifications (e.g., chaperones in mammals, protein trafficking). Disruptions of any of these regulatory processes may produce disease in humans and other metazoans [25–35]. These genomic regulators will not be discussed further here except to remark that some level of gene regulation is found in every class of organisms (i.e., prokaryotes, single-celled eukaryotes, animals, plants, fungi, and viruses). Many of these regulatory systems are common to all eukaryotes, while others seem to be specific for particular subclasses. For example, imprinting among animals seems to be confined to Class Eutheria (i.e., placental mammals). Similarly, chaperone proteins seem to be something exclusive to Class Mammalia. The value of diversification Diversity affords us the opportunity to find new antibiotics, new anticancer agents, and new methods to control and modify just about any metabolic pathway or regulatory mechanism we choose to study. It is due to genetic diversity among diverse species that we have found the thermophilic taq polymerase (from Thermus aquaticus bacteria) used in PCR (polymerase chain reactions) and the gene-editing enzymes used in CRISPR/Cas9 (prokaryotic species) and Cre-LoxP (bacteriophage P1), and CAR-T (lentiviral and gammaretroviral vectors) [36–39] [Glossary CAR T-cell therapy].

Section 2.4 The species paradox A horse is a horse of course of course Mr. Ed television show, theme music, 1961–66

In the introductory section of this chapter, we defined species as an evolving gene pool. Does this simple definition account for what we see when we try to find a species? Not really. Whenever we want to study a species, we always end up collecting a bunch of individual organisms that are members of the species. From these organisms, we try to find the features that characterize all the members of the species and that distinguish one species from every other species. When doing so, we always make the same observation: that every individual member of a species is a unique organism and that every offspring of every organism is uniquely different from either of its parents. Here is the apparent paradox. We observe that every species is a collection of organisms that are all different from one another; and the differences among the organisms are constantly reassorted into new and unique individuals, with every generation. With its members constantly changing, how can we ever have a species that has stable characteristics that distinguish one species from other species? How can we uphold the intransitive property of species that forbids

30  Taxonomic guide to infectious diseases

individuals to change their membership from one species to another, when new species are constantly evolving from the existing species? It doesn't seem to make any sense! [Glossary Intransitive property]. The solution to the paradox is very simple. Every species is defined by its ancestral lineage; not by the collective diversity of the individuals within the species. Two individuals belong to the same species if they share the same ancestry; regardless of differences in their genomes. If we were to sequence the genome of an apple tree in an orchard, we will not find an “apple” gene that establishes its species. There is no such thing as an “apple” gene, but there is such a thing as an ancestral lineage for the species known as apple (i.e., Malus domestica). We will find that any particular apple tree has a unique genome that is different from the genome of every other apple tree on the planet (with the exception of the trees that were cloned from the same founder). If we look a bit harder, we would probably find a set of sequences that, considered together, does a good job at identifying a sample as a member of the apple species. We can think of each species as having a collective gene pool, with the genome of each unique individual representing a sampling from the pool. Changes in a single gene introduced to the species gene pool may account for startling phenotypic variants within the species, but they never account for the birth of a new species. For example, anyone who visits the produce department of a well-stocked grocery store will encounter varieties of Brassica oleracea, each producing a different menu item: – Brassica oleracea Acephala Group—kale and collard greens – Brassica oleracea Alboglabra Group—kai-lan (Chinese broccoli) – Brassica oleracea Botrytis Group—cauliflower, Romanesco broccoli, and broccoflower – Brassica oleracea Capitata Group—cabbage – Brassica oleracea Gemmifera Group—brussels sprouts – Brassica oleracea Gongylodes Group—kohlrabi – Brassica oleracea Italica Group—broccoli The Acephala group (from the root meaning without a head), represented by kale and collard greens, is phenotypically most like the wild cabbage. Cauliflower differs from wild cabbage because of mutation in a single gene (the CAL gene) which produces an inflorescence. This means that the stem cells (of the meristem) grow into a mass of undifferentiated cells; basically a Brassica hamartoma [Glossary Hamartoma, Wild type]. One of the many cultivars of Brassica oleracea, known as Jersey cabbage, grows up to 3 m tall. These giant-sized cabbages have woody stalks that look like tree limbs. Hence, a single species of plant can provide nearly every common green vegetable that appears on American dinner plates, as well as stalks suitable as walking canes. (Fig. 2.2)

Species and speciation  Chapter | 2  31

FIG.  2.2  Jersey cabbage walking sticks are another member of the Brassica oleracea species. (Source, Wikipedia, from a photograph by Man Vyi and entered into the public domain.)

The different cultivars of Brassica oleracea are analogous to serotypes of bacteria or breeds of animals. They all represent variants of the same species. Although the cultivars may be distinguished from one another by simple genetic variations, sometimes involving a single gene, they all belong to the same species because they all share the same ancestry and the same gene pool. We can glimpse some of the enormous genetic variation within the members of a species by focusing our attention on SNPs (single-nucleotide polymorphisms), which are easy to detect, and for which much data has been collected. A SNP is a variation between members of a species that occurs in at least 1% of the population. To get an accurate determination of all the SNPs in the human population, we would need to sequence everyone's genome. Sequencing everyone's genome is impossible, at present, but we can do our best to get a fair sampling of the human population. The current rough estimate of the number of SNPs in the human population is 10–30 million. This number may increase substantially, as we improve the gene sampling process, and it is fair to assume that the number would be much higher if we counted rare polymorphisms occurring in