My watch list

Cladistics

Part of the Biology series on

Evolution

Mechanisms and processes

Adaptation
Genetic drift
Gene flow
Mutation
Natural selection
Speciation

Research and history

Evidence
Evolutionary history of life
History
Modern synthesis
Social effect / Objections

Evolutionary biology fields

Cladistics
Ecological genetics
Evolutionary development
Human evolution
Molecular evolution
Phylogenetics
Population genetics

Biology Portal · v • d • e

Cladistics is the hierarchical classification of species based on evolutionary ancestry. Cladistics is distinguished from other taxonomic classification systems because it focuses on evolution (rather than focusing on similarities between species), and because it places heavy emphasis on objective, quantitative analysis. Cladistics generates diagrams called cladograms that represent the evolutionary tree of life. DNA and RNA sequencing data are used in many important cladistic efforts. Computer programs are widely used in cladistics, due to the highly complex nature of cladogram-generation procedures. A major contributor to cladistics was the German entomologist Willi Hennig, who referred to it as phylogenetic systematics.^[1] The term phylogenetics is often used synonymously with cladistics. Cladistics originated in the field of biology but in recent years has found application in other disciplines. The word cladistics is derived from the ancient Greek κλάδος, klados, or "branch."

Product highlight

Revolutionize your production: real-time Raman analysis for maximum efficiency

Efficient inline analysis for liquids and solids

User-friendly software for effortless Design of Experiments (DoE)

1 Cladograms
2 Cladistics compared with Linnaean taxonomy
3 Cladistics compared to phenetics
4 Monophyletic groups encouraged
5 Simplified step by step procedure
6 How complex is the Tree of Life?
7 Phylocode approach to naming species
8 Terminology
- 8.1 Origin of the term "cladistics"
- 8.2 Three definitions of clade
9 Applying Cladistics to other disciplines
10 See also
11 References

Cladograms

The starting point of cladistic analysis is a group of species and molecular, morphological, or other data characterizing those species. The end result is a tree-like relationship-diagram called a cladogram.^[2] The cladogram graphically represents a hypothetical evolutionary process. Cladograms are subject to revision as additional data becomes available.

Synonyms — The term evolutionary tree is often used synonymously with cladogram. The term phylogenetic tree is sometimes used synonymously with cladogram,^[3] but others treat phylogenetic tree as a broader term that includes trees generated with a non-evolutionary emphasis.

Subtrees are Clades — In a cladogram, all organisms lie at the leaves.^[4] The two taxa on either side of a split are called sister taxa or sister groups. Each subtree, whether it contains one item or a hundred thousand items, is called a clade.

2-Way versus 3-Way Forks — Many cladists require that all forks in a cladogram be 2-way forks. Some cladograms include 3-way or 4-way forks when the data is insufficient to resolve the forking to a higher level of detail, but nodes with more than two branches are discouraged by many cladists. See phylogenetic tree for more information about forking choices in trees.

Depth of a Cladogram — If a cladogram represents N species, the number of levels (the "depth") in the cladogram is on the order of log₂(N).^[5] For example, if there are 32 species of deer, a cladogram representing deer will be around 5 levels deep (because 2⁵=32). A cladogram representing the complete tree of life, with about 10 million species, would be about 23 levels deep. This formula gives a lower limit: in most cases the actual depth will be a larger value because the various branches of the cladogram will not be uniformly deep. Conversely, the depth may be shallower if forks larger than 2-way forks are permitted.

Number of Distinct Cladograms — For a given set of species, the number of distinct rooted cladograms that can be drawn (ignoring which cladogram best matches the species characteristics) is:^[6]

Number of Species	2	3	4	5	6	7	8	9	10	N
Number of Cladograms	1	3	15	105	945	10,395	135,135	2,027,025	34,459,425	1357...*(2N-3)

This exponential growth of the number of possible cladograms explains why manual creation of cladograms becomes very difficult when the number of species is large.

Extinct Species in Cladograms — Cladistics makes no distinction between extinct and non-extinct species,^[7] and it is appropriate to include extinct species in the group of organisms being analyzed. Cladograms that are based on DNA/RNA generally do not include extinct species because DNA/RNA samples from extinct species are rare. Cladograms based on morphology, especially morphological characteristics that are preserved in fossils, are more likely to include extinct species.

Time Scale of a Cladogram — A cladogram tree has an implicit time axis,^[8] with time running forward from the base of the tree to the leaves of the tree. If the approximate date (for example, expressed as millions of years ago) of all the evolutionary forks were known, those dates could be captured in the cladogram. Thus, the time axis of the cladogram could be assigned a time scale (e.g. 1 cm = 1 million years), and the forks of the tree could be graphically located along the time axis. Such cladograms are called scaled cladograms. Many cladograms are not scaled along the time axis, for a variety of reasons:

Many cladograms are built from species characteristics that cannot be readily dated (e.g. morpohological data in the absence of fossils or other dating information)
When the characteristic data is DNA/RNA sequences, it is feasible to use sequence differences to establish the relative ages of the forks, but converting those ages into actual years requires a significant approximation of the rate of change^[9]
Even when the dating information is available, positioning the cladogram's forks along the time axis in proportion to their dates may cause the cladogram to become difficult to understand or hard to fit within a human-readable format

Cladistics compared with Linnaean taxonomy

Prior to the advent of cladistics, most taxonomists used Linnaean taxonomy to organize lifeforms. That traditional approach, still in use by some researchers (especially in works intended for a more general audience^[11]) uses several fixed levels of a hierarchy, such as Kingdom, Phylum, Class, Order, and Family. Cladistics does not use those terms, because one of the fundamental premises of cladistics is that the evolutionary tree is very deep and very complex, and it is not meaningful to use a fixed number of levels.

Linnaean taxonomy insists that groups reflect phylogenies, but in contrast to cladistics allows both monophyletic and paraphyletic groups as taxa. Since the early 20th century, Linnaean taxonomists have generally attempted to make genus- and lower-level taxa monophyletic.

Cladistics originated in the work of Willi Hennig, and since that time, there has been a spirited debate^[12] about the relative merits of cladistics versus Linnaean classification.^[13] Some of the debates that the cladists engaged in had been running since the 19th century, but they entered these debates with a new fervor,^[14] as can be learned from the Foreword to Hennig (1979) in which Rosen, Nelson, and Patterson wrote the following:

Encumbered with vague and slippery ideas about adaptation, fitness, biological species and natural selection, neo-Darwinism (summed up in the "evolutionary" systematics of Mayr and Simpson) not only lacked a definable investigatory method, but came to depend, both for evolutionary interpretation and classification, on consensus or authority. (Foreword, page ix)

Proponents of cladistics enumerate key distinctions between cladistics and Linnaean taxonomy as follows:^[15]

Cladistics	Linnaean Taxonomy
Treats all levels of the tree as equivalent.	Treats each tree level uniquely. Uses special names (such as Family, Class, Order) for each level.
Handles arbitrarily-deep trees.	Often must invent new level-names (such as superorder, suborder, infraorder, parvorder, magnorder) to accommodate new discoveries. Biased towards trees about 4 to 12 levels deep.
Discourages naming or use of groups that are not monophyletic	Acceptable to name and use paraphyletic groups
Primary goal is to reflect actual process of evolution	Primary goal is to group species based on morphological similarities
Assumes that the shape of the tree will change frequently, with new discoveries	New discoveries often require re-naming or re-levelling of Classes, Orders, and Kingdoms
Definitions of taxa are objective, hence free from personal interpretation	Definitions of taxa require individuals to make subjective decisions. For example, various taxonomists suggest that the number of Kingdoms is two, three, four, five, or six (see Kingdom).
Taxa, once defined, are permanent (e.g. "taxon X comprises the most recent common ancestor of species A and B along with its descendants")	Taxa can be renamed and eliminated (e.g. Insectivora is one of many taxa in the Linnaean system that have been eliminated).

Proponents of Linnaean taxonomy contend that it has some advantages over cladistics, such as:^[16]

Cladistics	Linnaean Taxonomy
Limited to entities related by evolution or ancestry	Supports groupings without reference to evolution or ancestry
Does not include a process for naming species	Includes a process for giving unique names to species
Difficult to understand the essence of a clade, because clade definitions emphasize ancestry at the expense of meaningful characteristics	Taxa definitions based on tangible characteristics
Ignores sensible, clearly-defined paraphyletic groups such as reptiles	Permits clearly-defined groups such as reptiles
Difficult to determine if a given species is in a clade or not (e.g. if clade X is defined as "most recent common ancestor of A and B along with its descendants", then the only way to determine if species Y is in the clade is to perform a complex evolutionary analysis)	Straightforward process to determine if a given species is in a taxon or not
Limited to organisms that evolved by inherited traits; not applicable to organisms that evolved via complex gene-sharing or lateral transfer	Applicable to all organisms, regardless of evolutionary mechanism

Cladistics compared to phenetics

For some decades in the mid-late 20th century, a commonly used methodology was phenetics ("numerical taxonomy"). This can be seen as a precedessor^[17] to some methods of today's cladistics (namely distance matrix methods like neighbor-joining), but made no attempt to resolve phylogeny, only similarities. Considered cutting-edge at its time as they were among the first bioinformatics applications, phenetic methods are today superseded by cladistic analyses^{[citation needed]} due to their inability of phenetics to provide an evolutionary hypothesis, except by chance.

Monophyletic groups encouraged

Many cladists discourage the use of paraphyletic groups because they detract from cladisitcs' emphasis on clades (monophyletic groups). In contrast, proponents of the use of paraphyletic groups argue that any dividing line in a cladogram creates both a monophyletic section above and a paraphyletic section below. They also contend that paraphyletic taxa are necessary for classifying earlier sections of the tree – for instance, the early vertebrates that would someday evolve into the family Hominidae cannot be placed in any other monophyletic family. They also argue that paraphyletic taxa provide information about significant changes in organisms' morphology, ecology, or life history – in short, that both paraphyletic groups and clades are valuable notions with separate purposes.

Simplified step by step procedure

A simplified procedure for generating a cladogram is:^[19]

Gather and organize data
Consider possible cladograms
Select best cladogram

Step 1: Gather and organize data

A cladistic analysis begins with the following data:

a list of species to be organized
a list of characteristics to be compared
for each species, the value of each of the listed characteristics or character states

For example, if analyzing 20 species of birds, the data might be:

the list of 20 species
characteristics such as genome sequence, skeletal anatomy, biochemical processes, and feather coloration
for each of the 20 species, its particular genome sequence, skeletal anatomy, biochemical processes, and feather coloration

Molecular versus morphological data

The characteristics used to create a cladogram can be roughly categorized as either morphological (synapsid skull, warm-blooded, notochord, unicellular, etc.) or molecular (DNA, RNA, or other genetic information).^[19] Prior to the advent of DNA sequencing, all cladistic analysis used morphological data.

As DNA sequencing has become cheaper and easier, molecular systematics has become a more and more popular way to reconstruct phylogenies.^[20] Using a parsimony criterion is only one of several methods to infer a phylogeny from molecular data; maximum likelihood and Bayesian inference, which incorporate explicit models of sequence evolution, are non-Hennigian ways to evaluate sequence data. Another powerful method of reconstructing phylogenies is the use of genomic retrotransposon markers, which are thought to be less prone to the problem of reversion that plagues sequence data. They are also generally assumed to have a low incidence of homoplasies because it was once thought that their integration into the genome was entirely random; this seems at least sometimes not to be the case however.

Ideally, morphological, molecular, and possibly other phylogenies should be combined into an analysis of total evidence: All have different intrinsic sources of error. For example, character convergence (homoplasy) is much more common in morphological data than in molecular sequence data, but character reversions that cannot be noticed as such are more common in the latter (see long branch attraction). Morphological homoplasies can usually be recognized as such if character states are defined with enough attention to detail.

Plesiomorphies and synapomorphies

The researcher decides which character states were present before the last common ancestor of the species group (plesiomorphies) and which were present in the last common ancestor (synapomorphies) by considering one or more outgroups. This makes the choice of an outgroup an important task, since this choice can profoundly change the topology of a tree. Note that only synapomorphies are of use in characterising clades.

Avoid homoplasies

A homoplasy is a character that is shared by multiple species due to some cause other than common ancestry.^[21] Typically, homoplasies occur due to convergent evolution. Use of homoplasies when building a cladogram is sometimes unavoidable but is to be avoided when possible.

A well-known example of homoplasy due to convergent evolution would be a character "presence of wings". Though the wings of birds, bats, and insects serve the same function, each evolved independently, as can be seen by their anatomy. If a bird, bat, and a winged insect were scored for the character "presence of wings", a homoplasy would be introduced into the dataset, and this confounds the analysis, possibly resulting in a false evolutionary scenario.

Homoplasies can often be avoided outright in morphological datasets by defining characters more precisely and increasing their number. In the example above, utilizing "wings supported by bony endoskeleton" and "wings supported by chitinous exoskeleton" as characters would avoid the homoplasy. When analyzing "supertrees" (datasets incorporating as many taxa of a suspected clade as possible), it may become unavoidable to introduce character definitions that are imprecise, as otherwise the characters might not apply at all to a large number of taxa. The "wings" example would be hardly useful if attempting a phylogeny of all Metazoa as most of these don't have wings at all. Cautious choice and definition of characters thus is another important element in cladistic analyses. With a faulty outgroup or character set, no method of evaluation is likely to produce a phylogeny representing the evolutionary reality.

Step 2: Consider possible cladograms

Main article: Computational phylogenetics

When there are just a few species being organized, it is possible to do this step manually, but most cases require a computer program. There are scores of computer programs available to support cladistics.^[22] See phylogenetic tree for more information about tree-generating computer programs.

Because the total number of possible cladograms grows exponentially with the number of species, it is impractical for a computer program to evaluate every individual cladogram. A typical cladistic program begins by using heuristic techniques to identify a small number of candidate cladograms. Many cladistic programs then continue the search with the following repetitive steps:

Evaluate the candidate cladograms by comparing them to the characteristic data
Identify the best candidates that are most consistent with the characteristic data
Create additional candidates by creating several variants of each of the best candidates from the prior step
Use heuristics to create several new candidate cladograms unrelated to the prior candidates
Repeat these steps until the cladograms stop getting better

Computer programs that generate cladograms use algorithms that are very computationally intensive,^[23] because the cladogram algorithm is NP-hard.

Step 3: Select the best cladogram

There are several algorithms available to identify the "best" cladogram.^[24] Most algorithms use a metric to measure how consistent a candidate cladogram is with the data. Most cladogram algorithms use the mathematical techniques of optimization and minimization.

In general, cladogram-generation algorithms must be implemented as computer programs, although some algorithms can be performed manually when the data sets are trivial (for example, just a few species and a couple of characteristics).

Some algorithms are useful only when the characteristic data is molecular (DNA, RNA) data. Other algorithms are useful only when the characteristic data is morphological data. Other algorithms can be used when the characteristic data includes both molecular and morphological data.

Algorithms for cladograms include least squares, neighbor-joining, parsimony, maximum likelihood, and Bayesian inference.

Biologists sometimes use the term parsimony for a specific kind of cladogram-generation algorithm and sometimes as an umbrella term for all cladogram algorithms.^[25]

Algorithms that perform optimization tasks (such as building cladograms) can be sensitive to the order in which the input data (the list of species and their characteristics) is presented. Inputting the data in various orders can cause the same algorithm to produce different "best" cladograms. In these situations, the user should input the data in various orders and compare the results.

Using different algorithms on a single data set can sometimes yield different "best" cladograms, because each algorithm may have a unique definition of what is "best".

Because of the astronomical number of possible cladograms, algorithms cannot guarantee that the solution is the overall best solution. A non-optimal cladogram will be selected if the program settles on a local minimum rather than the desired global minimum.^[26] To help solve this problem, many cladogram algorithms use a simulated annealing approach to increase the likelihood that the selected cladogram is the optimal one.^[27]

How complex is the Tree of Life?

One of the arguments in favor of cladistics is that it supports arbitrarily complex, arbitrarily deep trees. Especially when extinct species are considered (both known and unknown), the complexity and depth of the tree can be very large. Every single speciation event, including all the species that are now extinct, represents an additional fork on the hypothetical, complete cladogram representing the full tree of life. Fractals can be used to represent this notion of increasing detail: as a viewpoint zooms into the tree of life, the complexity remains virtually constant^[28]. This great complexity of the tree, and the uncertainty associated with the complexity, is one of the reasons that cladists cite for the attractiveness of cladistics over traditional taxonomy.

Proponents of non-cladistic approaches to taxonomy point to puncuated equilibrium to bolster the case that the tree-of-life has a finite depth and finite complexity. If the number of species currently alive is finite, and the number of extinct species that we will ever know about is finite, then the depth and complexity of the tree of life is bounded, and there is no need to handle arbitrarily deep trees.

Phylocode approach to naming species

A formal code of phylogenetic nomenclature, the PhyloCode^[29], is currently under development for cladistic taxonomy. It is intended for use by both those who would like to abandon Linnaean taxonomy and those who would like to use taxa and clades side by side. In several instances (see for example Hesperornithes) it has been employed to clarify uncertainties in Linnaean systematics so that in combination they yield a taxonomy that is unambiguously placing the group in the evolutionary tree in a way that is consistent with current knowledge.

Terminology

Main article: Phylogenetic nomenclature

A clade is an ancestor species and all of its decencents
A monophyletic group is a clade
A paraphyletic group is a monophyletic group that excludes some of the descendants (e.g. reptiles are sauropsids excluding birds). Most cladists discourage the use of paraphyletic groups.
A polyphyletic group is a group consisting of members from two non-overlapping monophyletic groups (e.g. flying animals). Most cladists discourage the use of polyphyletic groups.
An outgroup is an organism that is considered not to be part of the group in question, but is closely related to the group.
A characteristic that is present in both the outgroups and in the ancestors is called a plesiomorphy (meaning "close form", also called an ancestral state).
A characteristic that occurs only in later descendants is called an apomorphy (meaning "separate form", also called a "derived" state) for that group. Note: The adjectives plesiomorphic and apomorphic are used instead of "primitive" and "advanced" to avoid placing value-judgments on the evolution of the character states, since both may be advantageous in different circumstances. It is not uncommon to refer informally to a collective set of plesiomorphies as a ground plan for the clade or clades they refer to.
A species or clade is basal to another clade if it holds more plesiomorphic characters than that other clade. Usually a basal group is very species-poor as compared to a more derived group. It is not a requirement that a basal group be extant. For example, palaeodicots are basal to flowering plants.
A clade or species located within another clade is said to be nested within that clade.

Origin of the term "cladistics"

Hennig's major book, even the 1979 version, does not contain the term cladistics in the index. He referred to his own approach as phylogenetic systematics, implied by the book's title. A review paper by Dupuis observes that the term clade was introduced in 1958 by Julian Huxley, cladistic by Cain and Harrison in 1960, and cladist (for an adherent of Hennig's school) by Mayr in 1965.^[30]

Three definitions of clade

There are three ways to define a clade for use in a cladistic taxonomy.^[31]

Node-based: the most recent common ancestor of A and B along with all of its descendants.

Stem-based: all descendants of the oldest common ancestor of A and B that is not also an ancestor of Z.

Apomorphy-based: the most recent common ancestor of A and B, along with all of its descendants, possessing a certain derived character. This definition is generally discouraged by most cladists.

Applying Cladistics to other disciplines

The processes used to generate cladograms are not limited to the field of biology^[32]. The generic nature of cladistics means that cladistics can be used to organize groups of items in many different realms. The only requirement is that the items have chararacteristics that can be identified and measured.

For example, one could take a group of 200 spoken languages, measure various characteristics of each language (vocabulary, phonemes, rhythms, accents, dynamics, etc) and then apply a cladogram algorithm to the data. The result will be a tree that may shed light on how, and in what order, the languages came into existence.

Thus, cladistic methods have recently been usefully applied to non-biological systems, including determining language families in historical linguistics, culture, history^[33], and filiation of manuscripts in textual criticism.

References

Ashlock, Peter D. (1974). "The uses of cladistics". Annual Review of Ecology and Systematics 5: 81-99. ISSN 0066-4162.
Cuénot, Lucien (1940). "Remarques sur un essai d'arbre généalogique du règne animal". Comptes Rendus de l'Académie des Sciences de Paris 210: 23-27. Available free online at http://gallica.bnf.fr (No direct URL). This is the paper credited by Hennig (1979) for the first use of the term 'clade'.
Cavalli-Sforza, L.L. and A.W.F. Edwards (Sep., 1967). "Phylogenetic analysis: Models and estimation procedures". Evol. 21 (3): 550-570.
de Queiroz, Kevin (1992). "Phylogenetic taxonomy". Annual Review of Ecology and Systematics 23: 449–480. ISSN 0066-4162.
Dupuis, Claude (1984). "Willi Hennig's impact on taxonomic thought". Annual Review of Ecology and Systematics 15: 1-24. ISSN 0066-4162.
Felsenstein, Joseph (2004). Inferring phylogenies. Sunderland, MA: Sinauer Associates. ISBN 0-87893-177-5.
Hamdi, Hamdi; Hitomi Nishio, Rita Zielinski and Achilles Dugaiczyk (1999). "Origin and phylogenetic distribution of Alu DNA repeats: irreversible events in the evolution of primates". Journal of Molecular Biology 289: 861–871. PMID 10369767.
Hennig, Willi (1950). Grundzüge einer Theorie der Phylogenetischen Systematik. Berlin: Deutscher Zentralverlag. .
Hennig, Willi (1982). Phylogenetische Systematik (ed. Wolfgang Hennig). Berlin: Blackwell Wissenschaft. ISBN 3-8263-2841-8.
Hennig, Willi (1975). "'Cladistic analysis or cladistic classification': a reply to Ernst Mayr". Systematic Zoology 24: 244-256. The paper he was responding to is reprinted in Mayr (1976).
Hennig, Willi (1966). Phylogenetic systematics (tr. D. Dwight Davis and Rainer Zangerl). Urbana, IL: Univ. of Illinois Press (reprinted 1979 and 1999). ISBN 0-252-06814-9.
Hennig, Willi (1979). Phylogenetic systematics (3rd edition of 1966 book). ISBN 0-252-06814-9. Translated from manuscript and so never published in German.
Hull, David L. (1979). "The limits of cladism". Systematic Zoology 28: 416-440.
Kitching, Ian J.; Peter L. Forey, Christopher J. Humphries and David M. Williams (1998). Cladistics: Theory and practice of parsimony analysis, 2nd ed., Oxford University Press. ISBN 0-19-850138-2.
Luria, Salvador; Stephen Jay Gould and Sam Singer (1981). A view of life. Menlo Park, CA: Benjamin/Cummings. ISBN 0-8053-6648-2.
Mayr, Ernst (1982). The growth of biological thought: diversity, evolution and inheritance. Cambridge, MA: Harvard Univ. Press. ISBN 0-674-36446-5.
Mayr, Ernst (1976). Evolution and the diversity of life (Selected essays). Cambridge, MA: Harvard Univ. Press. ISBN 0-674-27105-X. Reissued 1997 in paperback. Includes a reprint of Mayr's 1974 anti-cladistics paper at pp. 433-476, "Cladistic analysis or cladistic classification." This is the paper to which Hennig (1975) is a response.
Patterson, Colin (1982). "Morphological characters and homology". Joysey, Kenneth A; A. E. Friday (editors) Problems in Phylogenetic Reconstruction, London: Academic Press.
Rosen, Donn; Gareth Nelson and Colin Patterson (1979), Foreword provided for Hennig (1979)
Shedlock, Andrew M; Norihiro Okada (2000). "SINE insertions: Powerful tools for molecular systematics". Bioessays 22: 148–160. ISSN 0039-7989. PMID 10655034.
Sokal, Robert R. (1975). "Mayr on cladism -- and his critics". Systematic Zoology 24: 257-262.
Swofford, David L.; G. J. Olsen, P. J. Waddell and David M. Hillis (1996). "Phylogenetic inference", in Hillis, David M; C. Moritz and B. K. Mable (editors): Molecular Systematics, 2. ed., Sunderland, MA: Sinauer Associates. ISBN 0-87893-282-8.
Wiley, Edward O. (1981). Phylogenetics: The Theory and Practice of Phylogenetic Systematics. New York: Wiley Interscience. ISBN 0-471-05975-7.
Zwickl DJ, Hillis DM (2002). "Increased taxon sampling greatly reduces phylogenetic error". Systematic Biology 51: 588-598.

Category: Phylogenetics

This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "Cladistics". A list of authors is available in Wikipedia.