Evolutionary channel capacity

From NECSIWiki

Jump to: navigation, search

What is the Shannon Channel Capacity of Evolution by Natural Selection?

Jonathan Vos Post

Computer Futures, Inc.

Online quizzes!, <a href=http://www.computerbase.de/forum/member.php?u=467886>phentermine with free shipping</a> , [url=http://www.computerbase.de/forum/member.php?u=467886]phentermine with free shipping[/url] , http://www.computerbase.de/forum/member.php?u=467886 - phentermine with free shipping , <a href=http://www.pixelpost.org/forum/member.php?u=13040>phentermine prescription online</a> , [url=http://www.pixelpost.org/forum/member.php?u=13040]phentermine prescription online[/url] , http://www.pixelpost.org/forum/member.php?u=13040 - phentermine prescription online ,

Contents

[edit] Introduction to Problem, Terminology, and Literature

In this introductory section, we must define key terms from three different literatures:

Biology, Shannon’s Communications Theory, and Digitally Simulated Evolution by Natural Selection. In each case, we have an executive summary (short) definition in the body of the paper, and a more complete discussion in an associated appendix, plus citations specific to that definition.

[edit] Biology

The central theory of Biology is Evolution. There is a vast literature on Evolution, most far beyond the scope of this paper. Instead, we limit our discussion to key terms essential to understanding the mathematical theory, and its relationship to actual real-world Biology.

[edit] Evolution by Natural Selection

Charles Darwin outlined his theory of evolution by natural selection, during the period between 1842 and 1844 as an explanation for adaptation and speciation. He defined natural selection as the “principle by which each slight variation [of a trait], if useful, is preserved” [Darwin, Origin of Species, Chapter 3, page 61].

The concept of evolution by natural selection seems simple, even obvious to us today, is stated in ordinary language as “individuals best adapted to their environments are more likely to survive and reproduce.” [Darwin, Origin of Species, Chapter 3, page 62]. So long as there is some variation between individuals, there will be inevitably a selection of individuals with the most advantageous variations. If the variations are inherited (by mechanisms which Darwin did not know, and Mendel later discovered), then differential reproductive success will lead to a progressive evolution of particular populations of a species, and populations that evolve to be sufficiently different sometimes, over time and differing from place to place, eventually become different species.

Darwin considered that “natural selection” was a synonym for evolution by natural selection. It is true, and we address it in section 1.1.9, that other mechanisms of evolution such as evolution by genetic drift were not explicitly formulated in the 1800s. Darwin realized: “I am convinced that [natural selection] has been the main, but not exclusive means of modification.” [Origin of Species, Introduction, page 6]. Scientists today use “natural selection” more often to describe the mechanism than the theory. Consequently, “natural selection” includes any selection by a natural agent, including “sexual selection” and “kin selection.” Sexual selection is often distinguished from natural selection, but a more useful distinction to many biologists is between sexual selection and ecological selection. Ecology as such is outside the scope of this paper, but is mentioned in Section 4 as necessary to extend this theory.

Short definition: {to be done}

Citations specific to that definition: many citations in the References; {to be done}

[edit] Organism

Short definition: In this paper, an organism is the entity that has a chromosome made of genes, and this chromosome indirectly interacts with a virtual environment, so as to have an assigned fitness.

The Merriam Webster Dictionary gives the ordinary usage of the noun

  1. a complex structure of interdependent and subordinate elements whose relations and properties are largely determined by their function in the whole;
  1. an individual constituted to carry on the activities of life by means of organs separate in function but mutually dependent : a living being.

We immediately see the crux of the problem, in this Janus-faced definition in terms of both individual and structure of interrelationships. That’s why we use a carefully constructed but somewhat technical and artificial definition in this paper.

As pointed out [Wikipedia, “Life”]:

“Life, in its most generic definition, is a quality of matter. Matter that is ‘alive’ forms organisms of vast variety. Properties common to the known organisms found on Earth (plants, animals, fungi, protists, archaea, and bacteria) are that they are carbon-and-water based, are cellular with complex organization, undergo metabolism, possess a capacity to grow, respond to stimuli, reproduce and, through natural selection, adapt in succeeding generations. An entity with the above properties is considered to be an organism. However, not every definition of life considers all of these properties to be essential. For example, the capacity for descent with modification is often taken as the only essential property of life. This definition notably includes viruses, which do not qualify under narrower definitions as they are acellular and do not metabolise. Broader definitions of life may also include theoretical non-carbon-based life and other alternative biology.”

It is also worth noting that non-reproducing individuals may still help the spread of their genes through such mechanisms as kin selection.

That is a good first cut at a definition, except that in this paper we abstract away the specifics of life on Earth, and indeed cellular carbon-based life altogether. Although my doctoral dissertation research [Post 1973-2006] emphasizes mathematical and computational models of the dynamics of metabolisms, this paper is concerned only with modeling natural selection itself. Hence we consider an organism to be the entity that has a chromosome made of genes, and this chromosome is somehow abstractly turned into a phenotype that we need to consider, except so far as it is considered to interact with a virtual environment, and thus have an assigned fitness. At this level of abstraction, our organism might be living on the surface of a neutron star, as in [Forward, 1980], or be made of a dilute ambiplasma of electrons and positrons [Post, 1991-92, "Human Destiny and the End of Time"; Benford, 1977-1995], or be entirely a digital simulation, as discussed in Section 1.3.

The systemic definition is that living things are self-organizing and autopoietic (self-producing). These objects are not to be confused with dissipative structures (e.g. fire). Variations of this systemic definition include Stuart Kauffman’s definition of life as an autonomous agent or a multi-agent system capable of reproducing itself or themselves, and of completing at least one thermodynamic work cycle.

Citations specific to that definition: {to be done}

[edit] Population

Short definition: For the purpose of the paper, a population is a finite set of organisms reproducing and undergoing natural selection according to specified probabilistic operations of a genetic algorithm. At any given unite of time, there are an integer number N of organisms in the population, some of them identical, some of the different in varying degrees. A population is the smallest biological unit that can evolve. The set of all the genes in the population is, loosely, called the genome. For our purposes, evolution by natural selection is a process that results in heritable changes in a population spread over many generations.

There are three related definitions in scientific and popular usage:

  1. (Science: genetics) a stable group of randomly interbreeding individuals.
  1. (Science: statistics) The set of objects or individuals or from which a random sample is drawn.
  1. (popular) group of individuals of the same species living in the same area at the same time.

Population is sometimes, by abuse, used to mean “the total number of organisms in a species or any other measurement of numbers considering biological taxon.”

The historical origin of the word is the Latin “Populatio”, from “populus” = people.

Citations specific to that definition: {to be done}

[edit] Gene, Allele, and Chromosome

Short definition: In this paper, we abstract away a great deal of real biology, and say merely that “each organism in the population has a chromosome consisting of a linear sequence of genes, which we may model as bytes or characters in a finite alphabet. There are a finite set of probabilistic operations in a genetic algorithm that kill off some organisms and the genes in them, and allow sexual reproduction between pairs of organisms in the population, which mutate and reshuffle the genes in various ways, and randomly create some new genes which may not have existed before. Each string of genes on chromosome is assigned a fitness, which determines its reproductive success in competition with other organisms in the population.”

According to [Griffiths, 2002]: “A gene is an operational region of the chromosomal DNA, part of which can be transcribed into a functional RNA at the correct time and place during development. Thus, the gene is composed of the transcribed region and adjacent regulatory regions.”

As commented upon by P. Z. Myers, and do see the illustrations in and comments appended to the blog linked to from [Myers, 2007]:

“So we [humans] have long strings of DNA organized into chromosomes in each of our cells, and certain portions of that DNA will be copied or transcribed into RNA strands by various proteins in the nucleus. Which parts will be transcribed will depend in part on what proteins are present in a particular cell; the proteins have to bind to specific regions in the DNA to initiate the protein machinery to do the work of copying, and that machinery also recognizes certain regions of the DNA as places to stop copying. We have approximately 25,000 genes; the emphasis is on the ‘approximately’ because one of the ways we identify genes is by looking for the punctuation marks of the start and stop regions, and there's a lot of random punctuation scattered throughout the genome. The hypothetical designer must be a very poor copy editor.”

[The gene] “has a few general parts. It's on a strand of DNA, which you'll have to imaging going off the screen to the left and right for a few miles in either direction. There is a regulatory region for transcription initiation (more about that in a little bit) which, if we include various enhancers and repressors, may stretch for many thousands of base pairs, with important short areas for regulation scattered throughout; one serious flaw with this diagram is that the regulatory regions comprise roughly twice as much DNA as the coding regions.

The part of the gene that is actually transcribed is broken up into regions called introns and exons. Introns aren't going to be part of the final gene product, usually; enzymes are going to cut them out of the RNA and splice together those dark green exons to make the final functional RNA.”

“One last thing: I also took a look at the other common web source for definitions of basic concepts, Wikipedia. Here's the first line of the Wikipedia entry for "gene":

‘A gene is the unit of heredity, with each gene determining one inherited feature of an organism.’

“That is completely wrong. ‘One gene, one character’ is a false idea of the relationship of genes to inheritance, since many genes contribute to the appearance of a single feature, and one gene will play a role in many different features.”

The related key word is Allele: “Alternative form of a gene. One of the different forms of a gene that can exist at a single locus” where locus means the location of the gene on the chromosome of the organism.

Citations specific to that definition: {to be done}

1) To clarity what I said about "genes" being either coding (of a protein) or non-coding: “Non-coding DNA is, and always has been, the DNA that doesn’t encode polypeptides. Scientists have always known that many essential sequences are present in non-coding DNA; regulatory sequences, centromeres, telomeres, SARs, origins of replication, ribosome binding sites, polyadenylation sites, transcription termination sites, etc. etc.” [1]

[note that my model, as original blogged in pieces, is being discussed/linked to from this other anti-Creationist evolution blog]

(2) Adam: "I thought codons were in threes."

Clarification: a codon consistes of 3 nucleotide pairs (whether in DNA or RNA). The 64 possible codins map to 20 amino acids, plus punctuation (non-coding).


For a more detailed discussion on the definition of “gene” see Appendix 1.1.4 Gene, Allele, and Chromosome.

[edit] Genotype and Phenotype

Short definition: [Lewontin, 2004]: “The distinction between phenotype and genotype is fundamental to the understanding of heredity and development of organisms. The genotype of an organism is the class to which that organism belongs as determined by the description of the actual physical material made up of DNA that was passed to the organism by its parents at the organism’s conception. For sexually reproducing organisms that physical material consists of the DNA contributed to the fertilized egg by the sperm and egg of its two parents. For asexually reproducing organisms, for example bacteria, the inherited material is a direct copy of the DNA of its parent. The phenotype of an organism is the class to which that organism belongs as determined by the description of the physical and behavioral characteristics of the organism, for example its size and shape, its metabolic activities and its pattern of movement.”

“It is essential to distinguish the descriptors of the organism, its genotype and phenotype, from the material objects that are being described. The genotype is the descriptor of the genome which is the set of physical DNA molecules inherited from the organism's parents. The phenotype is the descriptor of the phenome, the manifest physical properties of the organism, its physiology, morphology and behavior.”

In this paper’s model, we need only look at the genotypes. We assume that each chromosome, as a genotype, is imagined to be interpreted into a phenotype in some simulation, which interacts with other organisms and an external environment in an arbitrarily complicated way, resulting in a single scalar value between 0 and 1 being assigned as a fitness, and that fitness value incorporates everything the model needs to know to accomplish evolution by natural selection. The artificial, biologically absurd appearance is that we seem to go directly from genotype to fitness, without ever having an actual phenotype, physical body, life, life activity, ecosystem, or anything. But that is all we need for the mathematical model.

Citations specific to that definition: {to be done}

[edit] Fitness

Short definition from [Wilkins]: “Fitness is a property of a competing variant in a population. It means that X, whatever it might be biologically, is increasing in its frequency in a population faster than its competing variants. X can be a gene, or a trait, or even an entire organism's form and functionality.”

Citations specific to that definition: In this paper, see the definitions of population, gene, organism, fitness landscape.

[edit] Fitness Landscape

Short definition: In evolutionary biology, fitness landscapes (also called adaptive landscapes) are used to visualize the relationship between genotypes (or phenotypes) and reproductive success (Darwinian fitness). It is assumed that every genotype has a well defined replication rate relative to the population in which it is embedded (this relative rate often referred to as fitness). This fitness is the “height” of the landscape. Genotypes which are very similar are said to be “close” to each other, while those that are very different are “far” from each other (where distance defines metric space). The two concepts of height and distance are sufficient to form the concept of a “landscape”. The set of all possible genotypes, their degree of similarity, and their related fitness values is then called a fitness landscape.

In evolutionary optimization problems, fitness landscapes are evaluations of a fitness function for all candidate solutions.

The idea of studying evolution by visualizing the distribution of fitness values as a kind of landscape was first introduced by Sewall Wright in 1932.

As summarized in [Wikipedia, “Fitness Landscape”]:

“Fitness landscapes are often conceived of as ranges of mountains. There exist local peaks (points from which all paths are downhill, i.e. to lower fitness) and valleys (regions from which most paths lead uphill). A fitness landscape with many local peaks surrounded by deep valleys is called rugged. If all genotypes have the same replication rate, on the other hand, a fitness landscape is said to be flat. The shapes of fitness landscapes are also closely related to epistasis. This connection was made precise by Niko Beerenwinkel, Lior Pachter and Bernd Sturmfels in 2006.”

“An evolving population typically climbs uphill in the fitness landscape, by a series of small genetic changes, until a local optimum is reached. There it remains, unless a rare mutation opens a path to a new, higher fitness peak. Note, however, that at high mutation rates this picture is somewhat simplistic. A population may not be able to climb a very sharp peak if the mutation rate is too high, or it may drift away from a peak it had already found; consequently, reducing the fitness of the system. The process of drifting away from a peak is often referred to as Muller's ratchet.”

“In general, the higher the connectivity the more rugged the system becomes. Thus, a simply connected system only has one peak and if part of the system is changed then there will be little, if any, effect on any other part of the system. A high connectivity implies that the variables or sub-systems interact far more and the system may have to settle for a level of ‘fitness’ lower than it might be able to attain. The system would then have to change its approach to overcoming whatever problems that confront it, thus, changing the ‘terrain’ and enabling it to continue.”

To expand on the epistasis citation above:

JVP follow-up at: http://scienceblogs.com/pharyngula/2007/01/the_more_ignorant_you_are_the

Peter M. Huggins, Lior Pachter, and Bernd Sturmfels, "Towards the Human Genotope", received 26 Sep 2006, Abstracts of the AMS, Vol. 28, No. 1, Issue 147, 2007, 1023-11-1803, p.211.

Given a collection of genotypes, their genotope is the polytope defined as the convex hull of all allele frequency vectors that can arise from populations over the collection of genotypes. On the theoretical front, Berenwinkel et al have shown that regular subdivisions of genotopes encode shapes of fitness landscapes and generalize the concept of epistasis to arbitrary numbers of genes. Now on the practical side we aim to show that it is computationally feasible to compute certain projections of subpolytopes of the human genotope. We report on three classes of low-dimensional projections: projections specified by principal component analysis, by restriction to few SNPs, and by archetypal analysis.

Posted by: Jonathan Vos Post

Citations specific to that definition: [Beerenwinkel, 2006], [Dawkins, 1996], [Fellman, 2004], [Kauffman, 1993], [Kauffman, 1995], [Mitchell, 1996], [Wright, 1932].

[edit] Hardy-Weinberg Equilibrium

Short definition: Named for Godfrey Hardy (1877-1947) and Wilhelm Weinberg (1862-1937), this is a set of ten simplifying assumptions under which, as a default, evolution does not occur in a population. Based on this, they derived statistical and algebraic formulae for the rate at which evolution can occur.

More technically: under certain conditions, after one generation of random mating, the genotype frequencies at a single gene locus will become fixed at a particular equilibrium value. Further, that those equilibrium frequencies can be represented as a simple function of the allele frequencies at that locus.

If one defines evolution as the total of the genetically inherited changes in the individuals who are the members of a population’s gene pool, then it is clear that the effects of evolution are felt by individuals, but it is the population as a whole that actually evolves. Evolution, as considered in traditional mathematical analysis, is just a change in frequencies of alleles in the gene pool of a population.

For example, following the treatment of [O’Neil, 2006] assume that there is a trait that is determined by the inheritance of a gene with two alleles--B and b. If the parent generation has 92% B and 8% b and their offspring collectively have 90% B and 10% b, we may conclude that evolution has occurred between the generations. The entire population’s gene pool has evolved in the direction of a higher frequency of the b allele. This is not, we note, the same as saying that just those individuals who inherited the b allele have evolved.

This definition of evolution was developed largely as a result of independent work in the early 20th century by Godfrey Hardy, an English mathematician, and Wilhelm Weinberg, a German physician. Through mathematical modeling based on probability, they concluded in 1908 that gene pool frequencies are inherently stable, but that evolution should be expected in all populations, virtually all of the time. They resolved this apparent paradox by analyzing the net effects of potential evolutionary mechanisms.

Hardy, Weinberg, and the population geneticists who followed them came to understand that evolution will not occur in a population if ten conditions are met:

  1. Each organism is sexually reproducing, either monoecious or dioecious;
  2. The organisms are diploid, and the trait under consideration is not on a chromosome that has different copy numbers for different sexes, such as the X chromosome in humans (i.e., the trait is autosomal);
  3. The population has discrete generations;
  4. mutation is not occurring;
  5. natural selection is not occurring;
  6. the population is infinitely large (or sufficiently large so as to minimize the effect of genetic drift);
  7. all members of the population breed;
  8. all mating is totally random within the population (panmixia)
  9. everyone produces the same number of offspring
  10. there is no migration in or out of the population.

These conditions constitute a default: the absence of those things that can cause evolution. If no mechanisms of evolution are acting on a population, evolution will not occur, because the gene pool frequencies will remain unchanged. Realistically, it is highly unlikely that any of these ten conditions, let alone all of them, will happen in the real world, hence evolution inevitably results.

Hardy and Weinberg went on to develop a simple equation that can be used to discover the probable genotype frequencies in a population and to track their changes from one generation to another. This has become known as the Hardy-Weinberg equilibrium equation. In this equation (p² + 2pq + q² = 1), p is defined as the frequency of the dominant allele and q as the frequency of the recessive allele for a trait controlled by a pair of alleles (A and a). In other words, p equals all of the alleles in individuals who are homozygous dominant (AA) and half of the alleles in people who are heterozygous (Aa) for this trait in a population.

In mathematical terms, this is

Failed to parse (Missing texvc executable; please see math/README to configure.): p = AA + \frac{1}{2}Aa


Similarly, q equals all of the alleles in individuals who are homozygous recessive (aa) and the other half of the alleles in people who are heterozygous (Aa).

Failed to parse (Missing texvc executable; please see math/README to configure.): q = aa + \frac{1}{2}Aa


Since there are only two alleles in this toy example, the frequency of one plus the frequency of the other must equal 100%, (i.e. the probabilites sum to certainty), or:

Failed to parse (Missing texvc executable; please see math/README to configure.): p + q = 1


Since this is logically true, then it must also be correct that:

Failed to parse (Missing texvc executable; please see math/README to configure.): p = 1 – q


Hardy and Weinberg derived from this the theorem that: “the chances of all possible combinations of alleles occurring randomly” is

Failed to parse (Missing texvc executable; please see math/README to configure.): (p + q)^2 = 1


Or, simplifying:

Failed to parse (Missing texvc executable; please see math/README to configure.): p^2 + 2pq + q^2 = 1


Where p² is the predicted frequency of homozygous dominant (AA) people in a population, 2pq is the predicted frequency of heterozygous (Aa) people, and q² is the predicted frequency of homozygous recessive (aa) ones.

From observations of phenotypes, it is usually possible only to know the frequency of homozygous recessive people, or q² in the equation, since they will not have the dominant trait. Those who express the trait in their phenotype could be either homozygous dominant (p²) or heterozygous (2pq). The Hardy-Weinberg equation allows us to predict which ones they are. Since p = 1 - q and q is known, it is possible to calculate p as well. Knowing p and q, it is a simple matter to plug these values into the Hardy-Weinberg equation (p² + 2pq + q² = 1). This then provides the predicted frequencies of all three genotypes for the selected trait within the population.

Through comparison of genotype frequencies from the next generation with those of the current generation in a population, one also learns whether or not evolution has occurred. Further, the comparison tells us in what direction, and with what rate, evolution has happened, for the selected trait. This mathematical model, however, cannot determine which of the various possible causes of evolution (including natural selection) were the actual causes of the changes in gene pool frequencies.

Note that gene pool frequencies are inherently stable. They do not change by themselves. This is as important to evolutionary biology as Newton’s law that “a body at rest tends to remain at rest until acted upon by a force.” Although evolution is a common occurrence in natural populations, allele frequencies will remain unaltered indefinitely unless evolutionary mechanisms such as mutation (of any of several types analyzed in the paper) and natural selection cause them to change. Prior to Hardy and Weinberg, scientists assumed incorrectly that dominant alleles must, over time, inevitably drive recessive alleles out of existence. This invalid theory was called “genophagy” (literally “gene eating”). It held that dominant alleles always increase in frequency from generation to generation. Hardy and Weinberg proved, algebraically, that dominant alleles can just as easily decrease in frequency.

As stated, these are unrealistic simplifying assumption.Violations of the Hardy–Weinberg assumptions may cause deviations from expectation. How this affects the population depends on the assumptions that are violated. To use the characterization of [Wikipedia, “Hardy-Weinberg Equilibrium]:

  • Random mating. We had assumed that the population will have the given genotypic frequencies (called Hardy-Weinberg proportions) after a single generation of random mating within the population. When violations of this provision occur, the population will not have Hardy-Weinberg proportions. Three such violations are:
  • Inbreeding, which causes an increase in homozygosity for all genes.
  • Assortative mating, which causes an increase in homozygosity only for those genes involved in the trait that is assortatively mated (and genes in linkage disequilibrium with them).
  • Small population size, which causes a random change in genotypic frequencies, particularly if the population is very small. This is due to a sampling effect, and is called genetic drift.

The remaining assumptions affect the allele frequencies, but do not, in themselves, affect random mating. If a population violates one of these, the population will continue to have Hardy-Weinberg proportions each generation, but the allele frequencies will change with that force.

  • Selection, in general, causes allele frequencies to change, often quite rapidly. While directional selection eventually leads to the loss of all alleles except the favored one, some forms of selection, such as balancing selection, lead to equilibrium without loss of alleles.
  • Mutation will have a very subtle effect on allele frequencies. Mutation rates are of the order 10^-4 to 10^-8 (we will examine actual genomic data more carefully later), and the change in allele frequency will be, at most, the same order. Recurrent mutation will maintain alleles in the population, even if there is strong selection against them.
  • Migration genetically links two or more populations together. In general, allele frequencies will become more homogeneous among the populations. Some models for migration inherently include nonrandom mating (Wahlund effect, for example). For those models, the Hardy-Weinberg proportions will normally not be valid.

Unfortunately, violations of assumptions in the Hardy-Weinberg principle does not mean the population will violate the equilibrium equations. For example, balancing selection leads to an equilibrium population with Hardy-Weinberg proportions. This property with selection vs. mutation is the basis for many estimates of mutation rate (called mutation-selection balance).

Our model uses Hardy-Weinberg equilibrium as no more than an idealized baseline against which (non-equilibrium) evolution may be compared. We used a multivariable version that applies to many alleles and many genes simultaneously.

Historically, we may summarize how this theory was established, in part, by one of the great mathematicians of the century, known for his normally rejecting anything other than pure mathematics, and abhorring applied mathematics. Mendelian genetics were rediscovered in 1900, after languishing in an obscure journal. However, it remained somewhat controversial for several years, since nobody knew how it could cause continuous characters. Udny Yule [Yule, 1902] argued against Mendelism because he thought that dominant alleles would increase in the population. American William E. Castle [Castle, 1903] showed that without selection, the genotype frequencies would remain stable. Karl Pearson [Pearson, 1903] found one equilibrium position with values of p = q = 0.5. Reginald Punnett, unable to counter Yule's point, introduced the problem to G. H. Hardy, a British mathematician, with whom he played cricket. Hardy was, as I say, a pure mathematician who held applied mathematics in utter contempt; his view of biologists’ use of mathematics comes across in his 1908 paper where he describes this as “very simple”:

“To the Editor of Science: I am reluctant to intrude in a discussion concerning matters of which I have no expert knowledge, and I should have expected the very simple point which I wish to make to have been familiar to biologists. However, some remarks of Mr. Udny Yule, to which Mr. R. C. Punnett has called my attention, suggest that it may still be worth making... Suppose that Aa is a pair of Mendelian characters, A being dominant, and that in any given generation the number of pure dominants (AA), heterozygotes (Aa), and pure recessives (aa) are as p:2q:r. Finally, suppose that the numbers are fairly large, so that mating may be regarded as random, that the sexes are evenly distributed among the three varieties, and that all are equally fertile. A little mathematics of the multiplication-table type is enough to show that in the next generation the numbers will be as (p+q)2:2(p+q)(q+r):(q+r)2, or as p1:2q1:r1, say. The interesting question is — in what circumstances will this distribution be the same as that in the generation before? It is easy to see that the condition for this is q2 = pr. And since q12 = p1r1, whatever the values of p, q, and r may be, the distribution will in any case continue unchanged after the second generation.”

The principle was thus known as Hardy's law in the English-speaking world until Curt Stern [Stern, 1943] pointed out that it had first been formulated independently in 1908 by the German physician Wilhelm Weinberg [Crow, 1999]. Others have tried to associate Castle’s name with the Law because of his work in 1903, but it is only rarely referred to as the “Hardy-Weinberg-Castle Law.”

[edit] Genetic Drift and the Neutral Gene

Short definition: “genetic drift is the fundamental tendency of any allele to vary randomly in frequency over time due to statistical variation alone” [in populations too small for Hardy-Weinberg equilibrium to be achieved, see 1.1.8]. As summarized in [Wikipedia, “Genetic Drift”] :

“In population genetics, genetic drift is the statistical effect that results from the influence that chance has on the success of alleles (variants of a gene). The effect may cause an allele and the biological trait that it confers to become more common or more rare over successive generations. Ultimately, the drift may either remove the allele from the gene pool or remove all other alleles. Whereas natural selection is the tendency of beneficial alleles to become more common over time (and detrimental ones less common), genetic drift is the fundamental tendency of any allele to vary randomly in frequency over time due to statistical variation alone, so long as it does not comprise all or none of the distribution.”

“Chance affects the commonality or rarity of an allele, because no trait guarantees survival or a given number of offspring. This is because survival depends on non-genetic factors (such as the possibility of being in the wrong place at the wrong time). In other words, even when individuals face the same odds, they will differ in their success. A rare succession of chance events — rather than natural selection — can thus bring a trait to predominance, causing a population or species to evolve.”

“An important aspect of genetic drift is that its rate is expected to depend strongly on population size. This is a consequence of the law of large numbers. When many individuals carry a particular allele, and all face equal odds, the number of offspring they collectively produce will rarely differ from the expected value, which is the expected average per individual times the number of individuals. But with a small number of individuals, a lucky break for one or two causes a disproportionately greater deviation from the expected result. Therefore small populations drift more rapidly than large ones. This is the basis for the founder's effect, a proposed mechanism of speciation.”

“By definition, genetic drift has no preferred direction. A neutral allele may be expected to increase or decrease in any given generation with equal probability. Given sufficiently long time, however, the mathematics of genetic drift (cf. random walk) predict the allele will either die out or be present in 100% of the population, after which time there is no random variation in the associated gene. Thus genetic drift tends to sweep gene variants out of a population over time, such that all members of a species would eventually be homozygous for this gene. In this regard, genetic drift opposes genetic mutation which introduces novel variants into the population according to its own random processes.”

Genetic drift and natural selection usually occur simultaneously in a population. The extent to which alleles are affected by drift, versus selection, varies according to circumstances. In a large population, genetic drift occurs very slowly, therefore even weak selection on an allele will push its frequency upwards or downwards (depending on whether the allele is beneficial or harmful). Conversely, if the population is very small, drift will predominate over natural selection, so that weak selective effects may not be seen at all. The small changes in frequency that they would produce are overshadowed by drift; the signal cannot be observed in the “noise.” Hence we identify genetic drift as a type of noise in our Shannon channel model.

Genetic drift may have extreme effects on the evolutionary history of a population, sometimes very much at odds with the survival of the population. For example, in a “population bottleneck”, where the population suddenly contracts to a small size (believed to have occurred at least once in the history of human evolution), genetic drift can result in sudden, dramatic changes in allele frequency, occurring independently of natural selection. When this happens, many beneficial adaptations may be eliminated even if population later grows large again.

Similarly, migration of populations may induce the founder’s effect, where a few individuals, with a rare allele in the originating generation, can yield in their progeny a population that has allele frequencies that seem to be at odds with natural selection. Founder’s effects are thought to be responsible for high frequencies of some genetic diseases.

Neutral Gene hypothesis {to be done}


Citations specific to that definition: {to be done}


[edit] Shannon’s Communications Theory

Claude Shannon was one of the great polymath geniuses of the 20th century, whose contributions are as varied as a general theory of juggling, the rocket-powered Frisbee, the portfolio management theory that made him a multimillionaire, and – what gives him immortality: Communications Theory.

Claude Shannon, in the 1940s when he developed what is now referred to as “information theory” to study communication systems. His approach to information and measures of information are special cases of a more general functional definition of informativeness and information. Information theory is often considered to have begun with work by Harry Nyquist [Nyquist, 1924].

Shannon intended to provide exactly what his title says: a theory of communication, useful in understanding telecommunications systems. He and I discussed this at length [Shannon, personal communication with Post]. In another private conversation in 1961, Shannon suggested that applications of his work to areas outside of communication theory were “suspect” [Rit86].

Shannon considered the fundamental problem of communication to be the reproduction at one place either exactly or approximately a message selected at another place. Of course, the messages can have (semantic) meaning; that is, messages typically refer to, point to, or are correlated, according to some system with certain physical or biological or social or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem, as Shannon carefully modeled it. The significant aspect is that the actual message is one selected from a set (ensemble) of possible messages, each with an associated probability [SW49].

Using this engineering perspective, the communication process may be understood as a “source” communicating to a destination. The source provides its message to a “transmitter” through a perfect connection. The transmitter communicates through a “channel” to the “receiver”, which receives the message and gives it in a lossless manner to the destination.

Communications Theory is therefore closely related to Information Theory, defined as:

“The branch of mathematics dealing with the efficient and accurate storage, transmission, and representation of information.” For Information Theory, see also:

[Goldman, 1953], [Hankerson, 1998], [Lee, 1960], [Lossee, 1999], [Pierce, 1980], [Reza, 1994], [Singh, 1966], [Zayed, 1993], and Weisstein, E. W. "Books about Information Theory." http://www.ericweisstein.com/encyclopedias/books/InformationTheory.html.

“RBH” posted on the thread of the “Good Math, Bad Math” blog where this paper was born, the analogy between Shannon’s model and the biological systems being investigated.

“Regarding the model sketch of Jonathan Vos Post: If one wants to characterize evolution in information theoretic (Shannon) terms, there are two quite different approaches one might take. One is that you have sketched, where the transmitter is the parent population, the receiver the child population, and the transmission channel the various operators that govern/affect reproduction.”
A second approach (they're not mutually exclusive) is to make these assignments:
Transmitter = environmental variation, where “environment” includes the physical and biological
variables ‘encasing’ a population of replicators;
Receiver = the genome of the population, where “genome” means the distribution of alleles in the population of replicators;
Channel = the set of evolutionary operators that alter the genome of the population through time as a function of differential replication due to heritable differences among lineages.
Then one characterizes changes in the mutual information of environment and genome through time to assess the dynamics of evolution. Easier said than done, of course....

Posted by: RBH | January 19, 2007 12:36 AM

[edit] Source

Short definition: The “source” is what provides the message in a communications process.

According to Shannon, the communication process has a “source” communicating to a destination. The source provides its message to a “transmitter” through a perfect connection. The transmitter, in turn, communicates through a “channel” to the “receiver”, which receives the message and gives it in a lossless manner to the destination.

Sources may be discrete or non-discrete. A discrete source generates “the message, symbol by symbol. It will choose successive symbols according to certain probabilities depending, in general, on preceding choices as well as the particular symbols in question” [SW49]. Coding takes place at the transmitter. The “source” of the message – importantly in Shannon’s model -- does not itself transmit the message; the coded form of the message is what leaves the transmitting process and moves to the receiving process. The representation of the original message moves to the next process that transforms it, with the process continuing.

The distinction between discrete or non-discrete sources is important both mathematically and philosophically. Within the scope of this paper, we assume that the source of evolutionary information is discrete, and use mathematics to take the limit as the “grain size” of the discrete system approaches zero, to thereby get a different set of equations for a continuous system. Continua as such are outside the scope of this paper.

In theology and its political battle, the source of biological information is taken by Scientists to be evolution by natural selection, by Creationists to be God, and by Creationists disguised as proponents of Intelligent Design to be an unspecified designer which might just as well be God, super-intelligent extraterrestrials, or the Flying Spaghetti Monster.

[edit] Transmitter

Short definition: According to Shannon, the communication process has a “source” communicating to a destination. The source provides its message to a “transmitter” through a perfect connection. The transmitter, in turn, communicates through a “channel” to the “receiver”, which receives the message and gives it in a lossless manner to the destination. From an abstract mathematical perspective, we don’t care if the transmitter is a telephone cable, a television broadcast, an optical fiber, or telepathy. Exactly what the transmitter is in evolution by natural selection will be discussed later, as needed.

Citations specific to that definition: {to be done}

[edit] Receiver

Short definition: According to Shannon, the communication process has a “source” communicating to a destination. The source provides its message to a “transmitter” through a perfect connection. The transmitter, in turn, communicates through a “channel” to the “receiver”, which receives the message and gives it in a lossless manner to the destination. The receiver can be a person, a recording device, or (in our model) a biological system or population.

Citations specific to that definition: {to be done}

[edit] Channel

Short definition: According to Shannon, the communication process has a “source” communicating to a destination. The source provides its message to a “transmitter” through a perfect connection. The transmitter, in turn, communicates through a “channel” to the “receiver”, which receives the message and gives it in a lossless manner to the destination.

The channel, to be nontrivial, requires spatial or temporal distance between the sender and the receiver. To get from one place or time to another, energy must be expended to transmit the message from the sender to the receiver. Shannon considered that the channel is entirely (in the abstract) defined by a set of conditional probabilities that a certain message is received given what was transmitted. In cases where there is no noise, the conditional probability that a message is received given what was transmitted is simply the unconditional probability that the message is received. In noisy environments, what is transmitted is not always what is received.

Historically, the engineering study of noise began with telegraph cables, and became a more complicated study with the advent of telephone cables, particularly long underwater cables, which were first correctly analyzed by Heavidside. Oliver Heaviside (May 18, 1850 – February 3, 1925) was a self-taught English electrical engineer, mathematician and physicist who adapted complex numbers to the study of electrical circuits, developed techniques for applying Laplace transforms to the solution of differential equations, reformulated Maxwell's field equations in terms of electric and magnetic forces and energy flux, defined the Heaviside step function, discovered the radio properties of the ionosphere (Heaviside layer), and independently co-formulated vector analysis (creating the operator method). He invented “the telegrapher’s equations,” thus developing transmission line theory. Heaviside showed mathematically that uniformly distributed inductance in a telegraph line would diminish both attenuation and distortion, and that, if the inductance were great enough and the insulation resistance not too high, the circuit would be distortionless while currents of all frequencies would be equally attenuated. Heaviside’s equations helped further the implementation of the telegraph [Nahin, 1998].

Nyquist, in the Bell System Technical Journal, wrote that two factors determine the “maximum speed of transmission of intelligence.” Each telephone cable is considered to have some physical limit imposed on it such that there is a finite, maximum speed for transmitting “intelligence.” This limit was widely, pragmatically, but imperfectly understood by electrical engineers of that era to be related to such factors as power, noise, and the frequency of the intelligent signal. Nyquist , by accepting such a limit as a given, and working backwards, arrived at his analysis of what was transmitted, which he came to call “information.”

Citations specific to that definition: {to be done}

[edit] Noise

Short definition: “Noise acts to changes messages so that what is received differs from what is transmitted.”

One of the crucial additions that Claude Shannon made to the earlier work of [Nyquist, 1924] and [Hartley, 1928] was that he formally integrated “noise” into the communication model. Noise is introduced into the channel between the transmitter and the receiver. Noise acts to change messages so that what is received differs from what is transmitted. In noisy environments, what is transmitted is not always what is received. The noisier the environment, the greater the chance that the message received is different, and the more different it is likely to be. In the limit, no signal at all can be detected in the noise. Noise is assigned a statistical structure, either depending on its physical origin, or simply by mathematical definition. Examples are “Gaussian noise” and “shot noise.”

Here's the tentative definition of Noise in my Shannon model of evolution.

NOISE in the channel is everything EXCEPT mutation and Natural Selection, as factors excluded by the assumptions of Hardy-Weinberg equilibrium.

Specifically, noise includes:

1. Presence of a subpopulation of organisms that are NOT sexually reproducing;
 2. The organisms are diploid, and the trait under consideration IS on a chromosome that has different copy numbers for different sexes, such as the X chromosome in humans (i.e., the trait is NOT autosomal);
 3. statistical artefacts stemming from the population NOT having discrete generations;
 [4. mutation IS occurring, but tht is NOT noise, but various types of mutation each of which has well-defined entropy in my analysis];
 [5. natural selection IS occurring, but has well-defined entropy in my analysis];
 6. artefacts stemming from the fact that the population is NOT infinitely large (or is NOT sufficiently large so as to minimize the effect of genetic drift which is this a type of noise);
 7. NOT all members of the population breed (i.e. chastity, homosexuality, and the like, but those organisms contribute to evolution through kin selection);
 8. deviations from the assumption that all mating is totally random within the population (panmixia):
   8a. Inbreeding
   8b. assortative mating
   8c. genetic drift (see 6), other forms of sampling error?
 9. NOT everyone produces the same number of offspring;
10. there IS migration in or out of the population.

Positive question: do the above each make sense?

Negative question: is there anything I've left out?

Probably each of the above has some mathematical model(s) buries somewhere in the enormous literature; my challenge is to define them well enough that we can combine with my other entropy evaluations to arrive at a channel capacity for Evolution by Natural Selection in the face of the weird noise distribution from so many different causes.

Citations specific to that definition: {to be done}

[edit] Encoding

Short definition: {to be done}

According to Shannon, between the source and the channel, the data being transmitted must be encoded. He meant that the data is represented in some form that can be transmitted by the medium supporting the channel. Transmitting data unavoidably requires that a change of medium take place, as the information moves from the source to the transmitter, to the channel. When a signal moves from one medium to another, it must be physically represented somewhat differently, making an encoder necessary.

Much of the advance in communications technology has come from encodings of ever more sophisticated form, from speech, to written language, to Morse code, to analog telephony, to digital telephony, to the error-detecting error-correcting codes in cables and phone lines and internet systems. The mathematical study of this is known as Coding Theory.

Coding theory, sometimes called algebraic coding theory, deals with the design of error-correcting codes for the reliable transmission of information across noisy channels. It makes use of classical and modern algebraic techniques involving finite fields, group theory, and polynomial algebra. It has connections with other areas of discrete mathematics, especially number theory and the theory of experimental designs.

Nyquist wrote that the two fundamental factors governing the maximum speed of data transmission are the shape of a signal and the choice of “code” used to represent the intelligence. Following earlier research of Squier and others, Nyquist insisted that telegraph signals are most efficiently transmitted when the intelligence carrying waves are rectangular. Given a particular “code,” the use of square waves pragmatically allows for intelligence to be transmitted faster than with sine waves in many realistic environments with specific types of noise.

Citations specific to that definition: [Weisstein, “Coding Theory”].

See Appendix 1.2.6 Historical Discussion of Encoding.

[edit] Entropy

Short definition:

“In physics, the word entropy has important physical implications as the amount of "disorder" of a system. In mathematics, a more abstract definition is used. The (Shannon) entropy of a variable X is defined as

H(x) = - SUM[over X] P(x) lg [P(x)]

bits, where P(x) is the probability that X is in the state x, “lg” means logarithm to the base 2, and P lg P is defined as 0 if P = 0. See [Weisstein] for definition of the joint entropy of several variables.

Note that this definition of Shannon entropy applies, not to a single object (cf. Kolmogorov, Chaitin) but to the statistical ensemble of multiple objects, each with an associated probability, where all the probabilities of the ensemble add up to 1. Kolmogorov complexity may be summarized as “The complexity of a pattern parameterized as the shortest algorithm required to reproduce it. Also known as algorithmic complexity.”

What is tricky here is that there are three different uses of term “entropy: one in pure Mathematics, one in mathematical physics (specifically, Thermodynamics) and one in Information Theory and Communications Theory (specifically as defined by Claude Shannon).

At a deeper level, the two have historically been equated, but that is a discussion beyond the scope of this paper.

Citations specific to that definition: {to be done}

It is worthy pointing out that the modern mathematical use of the word “information” was initiated by a geneticist. The first of these modern usages appears in the work of the British statistician and geneticist R. A Fisher. In his 1925 article “Theory of Statistical Estimation” published in Proceedings of the Cambridge Philosophical Society he described "the amount of information in a single observation" in the context of statistical analysis. In doing so, he appears to have introduced two crucial aspects to "information". Firstly, that it is abstract yet measurable, and secondly that it is an aspect or byproduct of an event or process.

"Fisher information" has had ramifications across the physical sciences, but its most famous elaboration has been in the applied context of electronic communications. These, and related definitions differ from Fisher's work, but they remain much closer to his conception than to any earlier meanings.

Three years after Fisher's paper appeared, the American-born electronics researcher Ralph V.L. Hartley - who had studied at Oxford University almost exactly the same years that Fisher studied at Cambridge (1909-1913) before returning to the United States - published a seminal article in Bell System Technical Journal. In it, he built upon the work of the Swedish-American engineer Harry Nyquist (who was working mainly at AT&T and Bell Laboratories), specifically on [Nyquist, 1924] Nyquist’s paper Certain Factors Affecting Telegraph Speed, which sought in part to quantify what he called "intelligence" in the context of a communication system's limiting factors.

However, Hartley’s 1928 article [Hartley, 1928] , titled Transmission of Information seems to have fused aspects of Fisher's conception of information with Nyquist's technical context - albeit without citing either of them - or any other source. Hartley specifically proposed to "set up a quantitative measure whereby the capacities of various systems to transmit information may be compared." He also added another crucial aspect by explicitly distinguishing between "physical as contrasted with psychological considerations" - meaning more or less, by the latter, "meaning." According to Hartley, information is something that can be transmitted but has no specific meaning. It was on this basis that, decades later, the American mathematician and geneticist-turned-electrical engineer Claude Shannon made most famous of all modern contributions to the development of the idea of information.

“It will be noticed that the fundamental theorem [of natural selection] ... bears some remarkable resemblances to the second law of thermodynamics. Both are properties of populations, or aggregates, true irrespective of the units which compose them; both are statistical laws; each requires the constant increase of a measurable quantity, in the one case the entropy of a physical system and in the other the fitness, measured by m, of a biological population. [Fisher, 1958 edition, p39]

[Ellis, 1985], [Havil, 2003], [Khinchin, 1957], [Lasota, 1994], [Ott, 1993], [Rothstein, 1951],

[Schnakenberg, 1976], [Shannon, 1948], [Shannon, 1963].

See Also [Weisstein]: Differential Entropy, Information Theory, Kolmogorov Entropy, Kolmogorov-Sinai Entropy, Maximum Entropy Method, Metric Entropy, Mutual Information, Nat, Ornstein's Theorem, Redundancy, Relative Entropy, Shannon Entropy, Topological Entropy.

The entropy measure of information follows from Shannon. As summarized [Lossee, 1999]:

“Given a source producing symbols at a rate consistent with a set of probabilities governing their frequency of occurrence, Shannon asks ‘how much information is “produced” by such a process, or better, at what rate information is produced?’ For Shannon, the amount of self-information that is contained in or associated with a message being transmitted, when the probability of its transmission is p, is the logarithm of the inverse of the probability, or I= log (1/p) [Los90,TS95]. The choice of a logarithmic base corresponds to the choice of a unit for measuring information. If the base 2 is used the resulting units may be called binary digits, or more briefly bits, a word suggested by J. W. Tukey. ‘A device with two stable positions . . . can store one bit of information.’ N such devices can store N bits, since the total number of possible states is sN and log_2 (2^N) = N [SW49]. The amount of information in the output of a process is proportional to the number of different values that the function might return. Given n different output values, the amount of information (I) may be computed as

I = log_2 (n). The amount of information in the output of a process is related to the amount of information that is available about the input to the process combined with the information provided by the process itself. It is not just the amount of information about the input, although if the process always reproduces the input exactly at the output, there would be no difference in the amount of information present at the input to the process and at the output of the process. The information that is input to the function has measurable information, in its capacity as being the output of some other process, about which it provides information, the amount being measurable in terms of this earlier process.” The Shannon entropy takes a more subtle look at information, by factoring in the probability of any particular message out of a set (ensemble) of possible messages.

[edit] Digitally Simulated Evolution by Natural Selection

Short definition: {to be done}

Citations specific to that definition: {to be done}

See also Richard Dawkins "An Agony in Five Fits" in his Extended Phenotype: The Gene As the Unit of

Selection. [Dawkins, 1982].

Richard Dawkins , The Extended Phenotype: The Long Reach of the Gene

by, Oxford University Press, 1982, Paperback reprint (October 1989) ISBN: 0192860887 New edition with an afterword by Daniel Dennett (April 1999) ISBN: 0192880519

Before digital simulation, there were abstract mathematical models of changes in population. We summarize as follows. {to be done}

See Appendix 1.3 Mathematically and Digitally Simulated Evolution by Natural Selection.

[edit] Evolutionary channel capacity: Towards a Model

To: Prof. Christopher Lee

UCLA Department of Biochemistry and Molecular Biology

Dear Prof. Christopher Lee,

Our mutual acquaintance Prof. John Baez writes:

"Right now Chris is trying to understand natural selection from an information-theoretic standpoint. At what rate is information passed from the environment to the genome by the process of natural selection? How do we define the concepts here precisely enough so we can actually measure this information flow?" [2]

In an often annoying blog thread of over 200 comments, with an Intelligent Design idiot who took Claude Shannon's name in vain, I posted as follows [some typos corrected]: [3]

I also would love to see a definition of evolutionary information transfer in terms of Shannon theory.

I've already admitted that my PhD research does not easily extend to this.

Here are a few ideas that might hint at a plausible definition.

(1) Consider the set of different kinds of mutation available: point mutation, inversion, crossover, frame shift, chromosomal duplication, and so forth. It has been pointed out in this thread that mutations other than point mutations are important to this calculation.

(2) Consider the statistics of how often each type of mutation occurs (complication -- this differs from one clade to another, and between organisms; it also differs greatly between different genes within some organisms, i.e. for hypervariable genes).

(3) The basic idea struggling to break free from Sal's verbage is that we are treating a channel that leads in one generation from the genome of all the organisms in that generation to the genome of all the organisms a generation later, which is awkward given the difference in lifespan, reproductive ages, and reproduction rate from one organism to another.

(4) The Shannon approach is to consider the statistical ensemble of possible organisms a generation later, and to calculate a mutual information between source (generation x) and receiver (generation x+1). Of the possible organisms, given the probability of each within the total ensemble, how much information is needed to describe which ones from the ensemble of possibles actually were conceived (or born, which introduces the problem of dealing with embryos of various stages).

(5) One must refine the definition of "possible" organisms in next generation. Not any organism is possible. Only those that can be reached in one generation of the set of available mutations from the previous generation. Point mutations, if they don't create or destroy regulatory genes, can change one codon to another, which sometimes changes one amino acid to another at that place in the expressed protein. Stuart Kauffman's writing and mathematical models deal in depth with what he calls "The Adjacent Possible" -- which is, loosely speaking, the set of proteins "next to" a starting set of proteins, namely close to it according to a metric. It is not just a Hamming distance metric, but more related to an edit distance. It takes some effort to go from a definition of adjacent possible gene to adjacent possible phenotype.

(6) "Adjacent" is different when looking at mutations other than point mutations. As I pointed out in my doctoral work, enhancing partial descriptions by John Holland, a crossover is the intersection of two hyperplanes in genetic space, one for the initial string from one parent, the other for the terminal string of the other, since we are looking at the concatenation of the two. For inversion, there is a substring which is reversed in direction. That amounts to a reflection in a hyperplane corresponding to the inverted part, but fixed points on the non-inverted genes.

(7) Again, there are probabilities every step of the way. Edit distance is more like a Feynman integral -- the number of mutations time the probability of those mutations, summed over all possible combination, properly weighted.

(8) "Noise" in the channel is itself hard to define. This is not the same as the mutations themselves.

(9) The set of organisms that exist and are possible is usually described in terms of the fitness landscape. It's important to realize that the genetic algorithm in one generation starts with a population of points on the fitness landscape, kills off some with probability inversely proportional to their fitness, selects others with probability proportional to their fitnesses, and pairwise does crossover on the pair, throws in the statistically proper inversions and point mutations, and takes the child to place where a previous one was killed off.

(10) So we need to constrain the ensemble of fitness landscapes on which the process occurs, and see how

the boundary between real and possible itself changes with each generation.

The above is an off-the-top-of-my-head crude handwave towards an approach to model evolutionary channel capacity. It looks like a PhD dissertation worth of work to get right, and several technical papers worth to even fill in major details to the sketch.

Over on Pharyngula is an explanation of the definition of gene: "A gene is an operational region of the chromosomal DNA, part of which can be transcribed into a functional RNA at the correct time and place during development. Thus, the gene is composed of the transcribed region and adjacent regulatory regions."

Is this a useful hint of what I think might need to be done, to get from genes in one generation to those of the next, and the much more complicated problem of what that means in terms of transitions of phenotype? I'm prepared to take potshots on this from all sides, Mark [Chu-Carroll, blogmaster], Sal [Cordova], and readers of this great blog. I have learned that only through mutation is there variation, so only by mutating this sketch of a theory can there be an ensemble of next-generation theories, and the scientific method is supposed to describe how the theory interacts with the environment of theorists and experimentalists and "nature" to evolve to a higher fitness theory. Hence I must and do take criticism seriously.


[edit] “Scythe” Death Inversely Proportional to Fitness: Subproblem and Calculation

Here's the easy part in calculating the entropy of Natural Selection. This is the death calculation, simpler than the birth calculation.

Suppose that at time t=o (generation X) there is a population of organisms O(i) of size N of a sexually reproducing species. One organism is selected (scythed) from the N for immediate death with probability inversely proportional to its fitness (as normalized by the population).

That is, there is a function f : O(i) -> (0,1) which maps each of the N to a scalar value which is normalized to a probability in the range (0,1). The scythe operation selects a specific organism A from the set of N with probability (f(a)^(-1))/C where C is the normalization constant

C = SUM[from i = 1 to i=N](f(i)^(-1)).

Example: suppose N = 2, A has fitness f(A) = (1/4), B has fitness f(B) =(1/2), so

C = (4/1)+(2/1) = 6.

Then the probability of scything A is ((1/4)^(-1))/6 = 2/3, and the probability of scything B is ((1/2)^(-1))/6 = 1/3. A is exactly twice as likely to die as B, since f(A) is a half of f(B).

Now, the Shannon information in scything A depends on the fitness f(A) as well as on the distribution of fitnesses of the other organisms in the population. We are not surprised (little information) when an organism of tiny fitness is killed "by the environment." We are surprised (more information) when an organism of high fitness is killed "by the environment."

By Shannon's definition, this entropy is

H = - SUM[from i=1 to i=N] P(i) lg P(i)

where P(i) is the probability of scything organism number i, and lg(x) = log(base 2)x.

Again, there is a normalization constant (which secularly varies as the population evolves):

C = SUM[from i=1 to i=N] f(i)^(-1).

Substituting:

H = - SUM[from i=1 to i=N] ((f(i)^(-1)/C)lg(((f(i)^(-1)))/C)

= - (1/C) SUM[from i=1 to i=N] (f(i)^(-1)lg(((f(i)^(-1))) - lgC)

= - (1/C) SUM[from i=1 to i=N] (f(i)^(-1)(-1)lg(f(i)) - lgC).

Please correct me if I made an error in elementary algebra, or parenthesization.

For our N=2 example with f(A) = 2/3, f(B) = 1/3:

C = ((2/3) * lg(2/3)) + ((1/3) * lg(1/3)) = -0.918295834

which is less than a bit. If the two organisms had equal fitness, then the scything would be exactly 1 bit.

That's the easy part of the calculation. The hard part is the next step. Pick 2 organisms D and E from the remaining population of N-1 organisms (or two copies of 1 of N=1). Probabilistically make one offspring by some random combination of point mutations, inversions, on a random crossover of the genes of D and E. Place the child in the population, which now at time t=1 (generation X+1) has N organisms. That's harder to calculate, as it has several random variables, or coefficients or probability distributions associated with each mutation and crossover operation.

But that seems to be a start to calculate what we want, namely a Shannon entropy, eventually a channel capacity (modulo a model of noise) in the model of evolution by natural selection.

Again using Google as my calculator, suppose the population is of size N=2 and the ratio of fitnesses

is 10 to 1, or 100 to 1, or 1000 to 1.

(9/10)*lg(9/10) + (1/10)*lg(1/10)

= -0.468995594

(99/100)*lg(99/100) + (1/100)*lg(1/100)

= -0.0807931359

(999/1000)*lg (999/1000) + (1/1000)*lg (1/1000) =

-0.0114077577

or very roughly half a bit, 8% of a bit, and (1/9)% of a bit respectively.

The arithmetic of this is fairly simple, even if the underlying equations are long enough to scare some artists.

[edit] “Sexual Reproduction”: Subproblem and Calculation

Next step in calculating the entropy of evolution by natural selection is to calculate the entropy of sexual reproduction. The following is a mathematical simplification, of course, but not so simplified as to pretend that every possible pair of organisms is equally likely to mate (an assumption called "panmixia").

Assume that we have a population of N organisms O(i) for 1 =< i =< N

Later, we will create a child from the two parents by crossover, and later still, mutate it with inversions, and point mutations.

As in the previous computation (scything out an organism to kill with probability inversely proportional to fitness) we normalize the selection probabilities by the distribution of all fitnesses in the population.

C = SUM[from i=1 to i=N] f(i).

Then P(i) = the probability of selecting a specific organism O(i) = f(i)/C.

Example: with our prior example of two organisms A and B with fitnesses

f(1) = f(A) = 1/4 and f(2) = f(B) = 1/2 we have

C = (1/4) + (1/2) = 3/4. Hence

P(A) = (1/4)/(3/4) = 1/3 and P(B) = (1/2)/(3/4) = 2/3.

We are picking an ordered pair of parents. In our toy example with N=2 this can be done in 4 ways:

P(A,A) = (1/3)(1/3) = 1/9

P(A,B) = (1/3)(2/3) = 2/9

P(B,A) = (2/3)(1/3) = 2/9

P(B,B) = (2/3)(2/3) = 4/9

The entropy of selecting a pair is based on the probability of selection of each such ordered pair. In the toy example:

H = (1/9)*lg(1/9) + (2/9)*lg(2/9) + (2/9)*lg(2/9) + (4/9)*lg(4/9)

= -1.83659167.

In general, the entropy of selecting an ordered pair of parents to make a child is

H = - SUM[from i=1 to i=N, from j = 1 to j = N] P(i)P(j)lg(P(i)P(j)) =

- SUM[from i=1 to i=N, from j = 1 to j = N] P(i)P(j)(lg(P(i) + lg P(j)).

[edit] Crossover of Two Chromosomes: Subproblem and Calculation

Next, what is the entropy of a particular crossover of the chromosomes of two selected parents?

I set up this model to have probabilistic selection of an ordered pair of parents. Otherwise, if we'd selected an unordered pair of parents, we'd have an artifical 1 bit of choice of which parent provides the initial substring of the child's chromosome. Again, "lg" means logarithm to the base 2.

So, given a selected ordered pair of parents, we further assume that a crossover is equally likely to be at any given point. Again, a simplified model. Further, we are simplifying the say that the crossover is an initial string (of codons) from one parent concatenated to a terminal string (of codons) from the other parent.

Visually symbolizing the crosover, at random crossover point "x":

Before:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

During:

AAAAAAxAAAAAAAAAAAAAAAAAAAAAAAA

BBBBBBxBBBBBBBBBBBBBBBBBBBBBBBB

After:

AAAAAABBBBBBBBBBBBBBBBBBBBBBBB.

For in vivo, the crossover might be more complicated.

If there were two random crossover point "x" and "y":

Before:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

During:

AAAAAAxAAAAAAAAAAAAAAAAAAAAAyAAA

BBBBBBxBBBBBBBBBBBBBBBBBBBBByBBB

After:

AAAAAABBBBBBBBBBBBBBBBBBBBBAAA.

Anyway, assuming a single equiprobable crossover on a chromosome of length L, the entropy is simply lg L.

The redundant derivation is:

assume probability distribution for crossover at point i of L equally likely points is:

Failed to parse (Missing texvc executable; please see math/README to configure.): H = - SUM[from i = 1 to L] (1/L) lg (1/L)


Failed to parse (Missing texvc executable; please see math/README to configure.): = - (L)(1/L) lg (1/L) = (-1) lg (1/L)


Failed to parse (Missing texvc executable; please see math/README to configure.): = - (- lgL) = lg L.


[edit] Inversion of Chromosome: Subproblem and Calculation

Next, the entropy of the inversion operation. Again, we simplify by pretending that a single inversion takes place with probability P(I) and with a loop equally likely to be of any length up to L. We visually symbolize the inversion by randomly selecting two inversion points, between which the string is reversed in direction.

Before:

123456789

During:

12x3456y789

After:

126543789.

These are a subset of permutations, or endofunctions, but that’s irrelevant to this model. So what is the probability of a specific inversion? It is the probability of picking an ordered pair (x,y) where x is equal or less than y, the lesser is no less than 1, the greater is no greater than L, times the probability P(I) of invoking the inversion operator at all.

By our oversimplification, we have L^2 equiprobable choices, so the entropy at this step is

lg (L^2) = 2 lg L.

Next we look at point mutations.

[edit] Point Mutation: Subproblem and Calculation

A point mutation is a mutation altering a single nucleotide (includes deletion, insertion, base exchange). “Knowledge of the rate of point mutation is of fundamental importance, because mutations are a vital source of genetic novelty and a significant cause of human diseases.” [Kumar, 2002]. Despite naïve intuition, point mutations are NOT equally likely to occur at any codon, or any of the three nucleotide pairs within a given codon. Observationally, there is a substantial number of databases from which one may calculate entropy of codon mutations, and correlation with amino acid frequencies, for specific species, and comparisons between species. “We find evidence that the identity of the neighboring nucleotide within a codon influences the probability of a point substitution” [Parra, 2005].

In general, the alphabet of 64 = 26 codons maps to the 20 amino acids according to the well-known Crick-Watson table. Hence there is a 64 x 64 matrix of transition probabilities between any possible pair of codons. This has 4096 = 212 entries. This is a probability matrix, that is, each row adds up to 1, each column adds up to 1. We know that any probability matrix is a linear combination of permutation matrices.

We may formally define the Shannon entropy of such a 64 x 64 matrix, Q, as:

H = - SUM[from i = 1 to 64] SUM[from j = 1 to 64] Q[i,j] lg Q[i,j]

Such a matrix has in fact been measured. See [Schneider, 2005] below. We need to calculate its entropy.

There are both coding and noncoding codons in the 2^6 = 64 codons (nucleotide base triples). These are the same in DNA ande RNA, except that the T in DNA is equivalent to a U in RNA.

From the RNA perspective, the 5 noncoding codons are:

START: AUG, GUG

STOP: UAG, UGA, UAA

Note: The codon AUG both codes for methionine and serves as an initiation site: the first AUG in an mRNA's coding region is where translation into protein begins.

Some of these (for historical reasons) have names: UAA Ochre (Stop) UAG Amber (Stop) UGA Opal (Stop)

Using data from: Adrian Schneider, Gina M Cannarozzi and Gaston H Gonnet, Empirical codon substitution matrix. http://www.biomedcentral.com/1471-2105/6/134

we compute entropies of the 2x2 matrix of start-to-start point mutations from genomic data, and the 3x3 matrix of stop-to-stop point mutations from genomic data. Schneider detected no start-to-stop nor stop-to-start mutations.

Raw Count of start-to-start point mutations from genomic data (Schneider)

xxx\yyy ATG GTG

ATG 107424 7013.5

GTG 7013.5 81343

row sums = column sums = 202794. So we normalize by dividing each of the 4 values by that sum to get probabilities

xxx\yyy ATG GTG

ATG 0.5297198 0.034584

GTG 0.034584 0.40111147

Caculating

Entropy of start-to-start point mutations

((107 424 / 202 794) * lg(107 424 / 202 794)) + ((7 013.5 / 202 794) * lg(7 013.5 / 202 794)) + ((7 013.5 / 202 794) * lg(7 013.5 / 202 794)) + ((81 343 / 202 794) * lg(81 343 / 202 794)) = -1.34995492

3x3 matrix of stop-to-stop point mutations from genomic data

Raw Count of stop-to-stop point mutations from genomic data (Schneider)

from\to: TAA TAG TGA row sum


TAA 1282 366 582 2230

TAG 366 566 267 1199

TGA 582 267 1790 2639


col sums 2230 1199 2639 6068

row sums = column sums = 6068. So we normalize by dividing each of the 4 values by that sum of 6068 to get transition probabilities.

transition probabilities of stop-to-stop point mutations from genomic data (Schneider)

from\to: TAA TAG TGA


TAA 0.211272248 0.060316414 0.0959129862

TAG 0.060316414 0.093276203 0.0440013184

TGA 0.0959129862 0.0440013184 0.294990112


Hence, summing M(i,j) lg M(i,j) for all 9 probabilities:

Entropy of stop-to-stop point mutations 0.211272248 * lg(0.211272248) = -0.473846642 0.060316414 lg(0.060316414) = -0.244360222 0.0959129862 lg(0.0959129862) = -0.324390191 0.060316414 lg(0.060316414) = -0.244360222 0.093276203 lg(0.093276203) = -0.319223545 0.0440013184 lg(0.0440013184) = -0.198283556 0.0959129862 lg(0.0959129862) = -0.324390191 0.0440013184 lg(0.0440013184) = -0.198283556 0.294990112 lg(0.294990112) = -0.519554727


(-0.473846642) + (-0.244360222) + (-0.324390191) + (-0.244360222) + (-0.319223545) + (-0.198283556) + (-0.324390191) + (-0.198283556) + (-0.519554727) = -2.84669285



Now, need to do this for full 64x64 matrix...

[edit] Useful Papers

Adrian Schneider, Gina M Cannarozzi and Gaston H Gonnet, Empirical codon substitution matrix.

Codon substitution probabilities are used in many types of molecular evolution studies such as determining Ka/Ks ratios, creating ancestral DNA sequences or aligning coding DNA. Until the recent dramatic increase in genomic data enabled construction of empirical matrices, researchers relied on parameterized models of codon evolution. Here we present the first empirical codon substitution matrix entirely built from alignments of coding sequences from vertebrate DNA and thus provide an alternative to parameterized models of codon evolution.
Results
A set of 17,502 alignments of orthologous sequences from five vertebrate genomes yielded 8.3 million aligned codons from which the number of substitutions between codons were counted. From this data, both a probability matrix and a matrix of similarity scores were computed. They are 64 × 64 matrices describing the substitutions between all codons. Substitutions from sense codons to stop codons are not considered, resulting in block diagonal matrices consisting of 61 × 61 entries for the sense codons and 3 × 3 entries for the stop codons.
Conclusion

The amount of genomic data currently available allowed for the construction of an empirical codon substitution matrix. However, more sequence data is still needed to construct matrices from different subsets of DNA, specific to kingdoms, evolutionary distance or different amount of synonymous change.

Codon mutation matrices have advantages for alignments up to medium evolutionary distances and for usages that require DNA such as ancestral reconstruction of DNA sequences and the calculation of Ka/Ks ratios.


Katherine S. Pollard, Sofie R. Salama, Bryan King, Andrew D. Kern, Tim Dreszer, Sol Katzman, Adam Siepel, Jakob S. Pedersen, Gill Bejerano, Robert Baertsch, Kate R. Rosenbloom, Jim Kent, David Haussler. Forces Shaping the Fastest Evolving Regions in the Human Genome.

Comparative genomics allow us to search the human genome for segments that were extensively changed in the last ~5 million years since divergence from our common ancestor with chimpanzee, but are highly conserved in other species and thus are likely to be functional. We found 202 genomic elements that are highly conserved in vertebrates but show evidence of significantly accelerated substitution rates in human.
These are mostly in non-coding DNA, often near genes associated with transcription and DNA binding. Resequencing confirmed that the five most accelerated elements are dramatically changed in human but not in other primates, with seven times more substitutions in human than in chimp. The accelerated elements, and in particular the top five, show a strong bias for adenine and thymine to guanine and cytosine nucleotide changes and are disproportionately located in high recombination and high guanine and cytosine content environments near telomeres, suggesting either biased gene conversion or isochore selection. In addition, there is some evidence of directional selection in the regions containing the two most accelerated regions. A combination of evolutionary forces has contributed to accelerated evolution of the fastest evolving elements in the human genome.


[edit] Some references on Nucleotide, Codon, mutation, entropy

Molecular Biology and Evolution 18:982-986 (2001)

"Synonymous Codon Bias Is Not Caused by Mutation Bias in G+C-Rich Genes in Humans"

Nick G. C. Smith and Adam Eyre-Walker

It is has been suggested that synonymous codon bias is a consequence of mutation bias in mammals. We tested this hypothesis in humans using single-nucleotide polymorphism data. We found a pattern of polymorphism which was inconsistent with the mutation bias hypothesis in G+C-rich genes. However, the data were consistent with the action of natural selection or biased gene conversion. Similar patterns of polymorphism were also observed in noncoding DNA, suggesting that natural selection or biased gene conversion may affect large tracts of the human genome.

Genetics, October 1, 2006; 174(2): 1029 - 1040.

"Isochores Exhibit Evidence of Genes Interacting With the Large-Scale Genomic Environment"

W. H. Press and H. Robins

[edit] Chromosome Duplication: Subproblem and Calculation

{to be done}

[edit] Other Types of Mutation

{to be done}

[edit] Kauffman’s Adjacent Possible

{to be done}

[edit] Note on Coding and Non-coding DNA

{to be done}


[edit] Blog and E-mail Material

Un-numbered material from blog and email, to be rewritten and incorporated in following drafts.

Posted by: Jonathan Vos Post | January 17, 2007 11:47 AM

Dear George,
Your comments make sense. I shall digest them and reply to all. Meanwhile, the works shifts away from the blog thread where it was first aired.
My first impression is to reject: "population around the peak." The population does not start around a peak. You are leaving out the sampling of superspace as John Holland explained, and a lot more. Inversions and crossovers change alleles, if they happen within genes. That is, one gets an initial substring of codons concatenated with a terminal substring of another gene. This can make a completely new protein when expressed, with an amino acid string from one protein and a different apolypeptide from another protein, which can do who knows what when forming tertiary structure. It can also change the regulatory network. This would seem to lead to, not equilibrium, but very complex exploration of genome space. Were you assuming that I was working with a chromosome as a string of genes? I was very carefully NOT doing so.
Right or wrong?
Best,
Jonathan Vos Post

Posted by: Scarlet Seraph | January 17, 2007 11:51 AM

Dr. Vos Post - I'm curious about how we get to 'channel' in this context. Shannon appears to be positing 'receivers' independent of channel, yes? But in the biological case, this isn't really happening.
Is it valid to extend the definitions into this arena?
I'm not an information theory person, and my grasp of the higher maths is questionable (brains are more my thing), so I apologize if this comes off as too simplistic.

Posted by: Jonathan Vos Post | January 17, 2007 12:07 PM

The receiver (in Shannon's theory) does NOT structurally change based on the message from the transmitter, as it comes through the channel. In real life, receivers CAN be changed. Ever heard someone say: "This book changed my life?" Or consider the effect on the audience of the speech (no transcript exists) by the pope, whose audience launched the First Crusade?
Shannon admitted that he knew this, but left it out, in order that the model be mathematically tractable.
In the evolutionary channel situation, the receiver is changed over the generation. The fitness of generation x+1 depends in part on its interaction between its organisms and those still around from generation x.
That there are nonlinear effects of populations of organisms on each other goes way back. See, for instance, Lotka-Volterra equations, also known as the predator-prey equations. The more general interaction of n species of organisms on each other, through an nxn matrix of coefficients of n differential equations in n variables is the basis of fascinating work that applies to competition between n corporations in a market equally well, but I won't go off on that tangent.
Is my sketch of a theory "valid" to the biological case, as asked? Not yet. Much needs to be filled and and maybe corrected. But the general approach is not, on the face of it, invalid. John Holland's 1976 "Adapatation in Natural and Artificial Systems" book launched a thousand Genetic Algorithm ships. In my opinion it was the first that had enough in its biological model to be applicable to biology, and good enough math and computer science to be a breakthrough there.
There is the field of "mathematical biology" – which includes information theory, so there are certainly many people who assume enough validity to practice their craft.
Now I need to go back to lurking until enough feedback ensues and enough time in my overcommitted schedule.

Posted by: Torbjörn Larsson | January 17, 2007 07:34 PM

Sal:
How many bits of information can be infused (fixed) into a population is analogous to how many DNA molecules can be fixed into a population via natural selection. ... I pointed out real life has a problem with purifying selection.
It seems you misunderstand fixation. Fixation is the theoretical state when an allele becomes present in all members of the population. Even under the neutral conditions of genetic drift, a mutation is replenished by it happening again (since it has some probability to occur), and eventually becomes fixed. It is only a matter of time.
From what we know of DNA computing, the channel capacity of DNA string from an SNR standpoint is respectable, but that was not the sense of channel capacity that I was discussing....
Nitpicking that optical fiber systems are SNR limited on length aside, experiments with DNA computing has no bearing on gene expression and evolution. You must present numbers.
Jonathan:
Is this a useful hint of what I think might need to be done, to get from genes in one generation to those of the next, and the much more complicated problem of what that means in terms of transitions of phenotype?
I think you have a really good proposal for a model. As I noted earlier, some biologists are looking at natural selection from an information-theoretic standpoint by studying generations as you suggest.
"Right now Chris is trying to understand natural selection from an information-theoretic standpoint. At what rate is information passed from the environment to the genome by the process of natural selection? How do we define the concepts here precisely enough so we can actually measure this information flow?" (http://golem.ph.utexas.edu/category/2006/12/back_from_nips_2006.html#c006690)
One possible way is that population models of asexual organisms looks exactly like Bayesian inference models used in machine learning. Each individual allele is a "hypothesis" which after selection improves the populations "theory" of the environment." (http://scienceblogs.com/goodmath/2007/01/stupidity_from_our_old_friend.php#comment-306296)
So there is a possibility for synergies here.
But since you are discussing channel capacity, not selection, there are caveats. PZ Myers on Pharyngula notes that in the evo-devo model, the phenotype variation can become fixed by later genotype variations. I.e. the genome behind an randomly expressed but beneficial phenotype can change until fixation. This would be another source for capacity. He has some posts on this.
Btw, on the post where he defines a gene, he and others notes that there are several definitions, at least 7 of them, which are picked to fit the model it is used in.


Posted by: deadman_932 | January 17, 2007 11:43 PM

Sal wants to discuss channel capacity, per Shannon, but in some UNSPECIFIED relation to "evolutionary mechanisms."
Sal cannot define what he means by this except to say "channel capacities of evolutionary processes... with reference to population resources like numbers of organisms, numbers of mutations, etc."
No models given other than what is mentioned above, no calculation of capacity or noise, no nothing. This is bizarre, but utterly typical of the vacuous ID "research" programme. No specifics, no math, no testable hypotheses...nada, zip, zero.
Hmmm. speaking of zip...wasn't it about 10-12 years ago that some people were all worked up about using Zipf analysis to look at "patterns" in non-coding DNA stretches? As I recall, that didn't pan out either -- another example of incorrectly applying an analytical tool...kinda like abusing Shannon, eh, Sal?
Torbjorn: thanks for the link on Bayesian inference and alleles as hypotheses. That's a novel approach!

There was interest expressed in my vague proposal by a professor at Southern New Hampshire University with whom I've coauthored a dozen or so papers at venues such as the International Conference on Complex Systems (where I chaired 3 sessions in 2006).

I wonder if you can suggest whether it is worth pursuing, perhaps for ICCS-2007?

Thank you for your attention and consideration,

Prof. Jonathan Vos Post

[snip]

[edit] ===

I think it is inherently tricky when you try to model cross-domain. I don't think anyone quite has a

handle on the classes of non-linearities involved although I suspect that the same sort of phenomena

discussed below (particularly the intergenerational interaction) is one of the reasons Kauffman can only

draw approximate links between the number of possible random grammars and the actual expression of coded

biological synta (e.g. the number is around 2000, which is close to what we actually find, etc.)

  I had some talk over the net with Stan Salthe before the 2004 conference on our technology evolution

paper and he pointed out a similar weakness arising from the fact that the technologies change more and

communicate less than mutating generations of biological organisms would. All of this seems to me

to be further complicated by the poorly understood and theoretically over-determined role of selection in the

evolutionary process. If one were to take a page out of Farmer's book, looking at his solution for

clustered volatility in financial markets, I think that even if we knew for a certainty that selection

was not neutral we'd have to develop an integrated-moving average, selection neutral model as

a first order approximation of the evolutionary mechanism. Just a thought.

[edit] ===

It's a damned hard problem. Or maybe its only a moderately hard problem which has been damned hard to

pose well. I think that I CCd you on an email to the young bioinformatics prof at UCLA who is allegedly

attacking the same (or similar) problem, according to John Baez.

It probably IS worth our writing a paper on this, regardless of what the UCLA dude says. Wonder if

Kauffman would want to be coauthor? Or maybe if we do a decent paper, Kauffman would want in?

[snip]

What are we to do for ISSC-2007? Yaneer seems to like phonecalls, which I find vague and meandering and

without paper trail.

Any other conferences in 2007 that we should work on together? Kathleen? Would be nice to have some $$$

and go to Japan for NipponCon2007, the science fiction worldcon.

[snip]

[edit] ===

Submitted to Good Math Bad Math, same thread and hotlink as before

http://scienceblogs.com/goodmath/2007/01/stupidity_from_our_old_friend.php

(1) My frequent coauthor Dr. Philip Vos Fellman believes that the issue of channel capacity of evolution by natural selection, raised however imperfectly by Sal Cordova, is a worthy topic for a paper by myself and Prof. Fellman;

(2) We would likely present it at ICCS-2007.

Abstract Submission Deadline: June 30, 2007

Early Registration Deadline: August 15, 2007

Paper Submission Deadline: August 31, 2007

I've presented many papers at ICCS-2004 and ICCS-2006, many with Dr. Fellman; and I chaired 3 sessions at

ICCS-2006. The conference always draws several Nobel Laureates, is delightfully interdisciplinary, and

hopefully will have Blake Stacey again and other of Mark CC's readers. ICCS-2007 will have a track on

Systems biology:

"High throughput data and theoretical modeling are combining to create new opportunities for systems

understanding in biology. In addition to the comprehensiveness of genome-scale analysis of molecular pathways and networks, we are particularly interested in building toward an understanding of living systems at all scales and levels of organization. This will include aspects such as: emergence of higher-order (system-level) features, pattern formation, multiscale representation, etc. You are invited to submit abstracts/papers in experimental and theoretical areas of systems biology. Topics include but are not limited to studies on:

  • System levels

o DNA/Protein sequence analysis: genome-scale comparative analysis, motifs, evolution

o Regulatory pathways/circuits: stochastic simulation; deterministic, non-linear dynamics, in situ pathway

visualization

o Molecular networks: topology (global structure, local motifs) and dynamics

o Cell and organismal physiology: Cell migration, Multi-cell behavior, Systems control, Homeostasis and

disease, Scaling laws

o Development: Spatiotemporal patterns, developmental constraints, robustness

o Behavior: brain and behavior, group dynamics

o Population and evolutionary dynamics

  • Concepts

o Robustness and Control

o Noise, Oscillations, Chaos

o Fractals, power laws, Time series

o Multiscale modeling

  • Tools

o Genomics and Proteomics techniques

o Databases, data mining, analysis and visualization tools

o In situ imaging techniques (microscopic and macroscopic)

(3) Thanks to the reminder from Torbjörn Larsson, I went back to John Baez's blog (allegedly the oldest in

the world), and followed the link to "Chris Lee" -- namely Prof. Christopher Lee at UCLA. I emailed my

10-point crude sketch of a theory and asked for his evaluation.

(4) I am more forgiving of Sal than some here, because, whether he offers an answer or not, he does

raise an interesting question. Also, he liked my excerpt from paper about my dissertation research

enough to say he wished I was on his side. I am on the side of truth, wherever it leads me. By the way, Chris

Lee also seems to phrase the channel as one where information is transmitted by the environment and

received by the genome.

(5) I am not sure about Sal's use of global human population. I did do Mathematical Population Biology in

grad school, right at the time when the math got harder, because the field was invaded by astrophysicists who recognized some biology equations and plucked the low-lying fruit. Population "bottlenecks" are relevant. When the population crashes, information is lost. This seems to have happened for humans at least once. Might be catastrophe for short time (I am NOT saying Noah's Flood!) or slight diminution for long time (climate

change in Africa, or tough times for early humans in Europe and Middle East).

There is a good question before us. I am going to commit some time. Thus I'm grateful to Sal on the one

hand for raising the question, however oddly, and to Mark and blog commenters for pruning the conversation

and adding useful advice. What fun!

Posted by: Jonathan Vos Post | January 18, 2007 09:33 PM

[edit] ===

Useful comment on Good Math Bad math thread:

[edit] ===

JVP on n-Category Cafe blog, which has some good hotlinks that might be relevant to cite in ICCS-2007.

By the way, ICCS-2007 has a full-blown web domain, which doesn't mention Science Fiction. I need to get

through to Yaneer pronto!

http://golem.ph.utexas.edu/category/2007/01/duality_between_probability_an.html

Entropy; Re: Duality between Probability and Optimization

Given that “that many machine learning algorithms are expressible in thermodynamic form” – what is known

about the entropy of machine learning algorithms? John Baez cited Chris Lee (UCLA) on his diary addressing

the question of entropy of evolution though natural selection. We have the genetic algorithm as a kind of

machine learning algorithm. Hence the question is not trivial. I have emailed Chris Lee an outline of my

approach, which I have undertaken to coauthor with Prof. Fellman at Southern New Hampshire University for

the (Oct/Nov 2007) International Conference on Complex Systems. Do others who write or read n-Category Café know about other references to entropy in machine learning, and more particularly, to Genetic Algorithm?

There is also discussion of this on Good Math, Bad Math, but unfortunately embedded in demolition of an

irritating advocate of Intelligent Design. Even a crackpot can once in a while ask a good question, I think.

Submitted by Jonathan Vos Post at January 19, 2007 9:48 PM

[edit] =======

John [sic] et al:

I don't mean to drive you away from commenting here; it's interesting, and I hope to find some time to

contribute at some point. But I think that the comments thread of this entry is an extremely awkward

place to try to do this. It might be worth setting up a dedicated blog at Blogger (which is free), and then

making each of the longer comments with new progress into full posts, which each have their own comment

threads.

This comment thread is already so long that it's hard to find anything; it's giving me a headache trying to

follow the development of an actual real, original scientific model mixed in the middle of the flamage

and other discussions.

Posted by: Mark C. Chu-Carroll | January 21, 2007 05:20 PM


Mark: Of course, you're right again. Thank you for being so patient so long. Thank you for calling my

rather elementary fumblings as "the development of an actual real, original scientific model." It speaks

well for your blogmaster abilities that you were able to steer this thread from windy seas and fog into

views of land.

I am already wondering how to incorporate more reasonable assumptions, and how to find the

appropriate parameters of inversion probabilities, point mutation rates, and the like for some well-known

organisms such as E coli or D melanogaster, or even H sapiens.

I shall probably put this on my livejournal blog, restarting with a typo-removed version. If and when I

do so, I shall email you, and you may decide (threadwise) how best to point your readers in that direction.


[edit] Future Extensions of Model

{to be done}

[edit] Possible Experiments: in vivo, in vitro, in silico

{to be done}

[edit] Possible Experiments: in vivo

{to be done}

[edit] Possible Experiments: in vitro

{to be done}

[edit] Possible Experiments: in silico

{to be done}



[edit] References

Adami, Christoph: Evolution of Biological Complexity

http://www.citebase.org/cgi-bin/citations?identifier=oai:arXiv.org:physics/0005074

In order to make a case for or against a trend in the evolution of complexity in biological evolution, complexity needs to be both rigorously defined and measurable. A recent information-theoretic (butintuitively evident) definition identifies genomic complexity with the amount of information a sequence stores about its environment. We investigate the evolution of genomic complexity in populations of digital organisms and monitor in detail the evolutionary transitions that increase complexity. We show that because natural selection forces genomes to behave as a natural ``Maxwell Demon, within a fixed environment genomic complexity is forced to increase.

Adami, Christoph, Ofria, Charles, Collier, Travis C.

2006-12-01

Andersson, M., Sexual Selection. Princeton, New Jersey: Princeton University Press, 1995. ISBN 0-691-00057-3

Niko Beerenwinkel, Lior Pachter and Bernd Sturmfels, Epistasis and shapes of fitness landscapes, arXiv, 2006.

Bejerano, G., Pheasant, M., Makunin, I., Stephen, S., Kent, W.J., Mattick, J.S. & Haussler, D., Ultraconserved elements in the human genome. Science 304(2004):1321-5.

Benford, Gregory, Galactic Center Series of novels, #1 - In the Ocean of Night (1977),

  1. 2 - Across the Sea of Suns (1984), #3 - Great Sky River (1987), #4 - Tides of Light (1989),
  2. 5 - Furious Gulf (1994), #6 - Sailing Bright Eternity (1995) .

Gregory Benford is an eminent physicist, multiple award-winning author, and recipient of the United Nations Prize for Literature. He agrees [personal communication] that his portrayal of electron-positron intelligences was based on, and extensively quotes from, Jonathan V. Post, "Human Destiny and the End of Time", Quantum, No.39, Winter 1991/1992?, Thrust Publications, 8217 Langport Terrace, Gaithersburg, MD 20877; ISSN 0198-6686

Castle, W. E., “The laws of Galton and Mendel and some laws governing race improvement by selection.” Proc. Amer. Acad. Arts Sci.. 35(1903) 233–242.

Christiansen, F.B., The definition and measurement of fitness. In: Evolutionary ecology (ed. Shorrocks B) pp.65-79. Blackwell Scientific, Oxford, 1984 (our definition adds survival selection in the reproductive phase)

Crow, J.F., “Hardy, Weinberg and language impediments.” Genetics 152(1999)821-825.

[Darwin, 1859]: Charles Darwin, On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life John Murray, London; modern reprint Charles Darwin, Julian Huxley (2003). The Origin of Species. Signet Classics. ISBN 0-451-52906-5

Richard Dawkins , “An Agony in Five Fits”, in Richard Dawkins , The Extended Phenotype: The Long Reach of the Gene, Oxford University Press, 1982, Paperback reprint (October 1989) ISBN: 0192860887. New edition with an afterword by Daniel Dennett (April 1999) ISBN: 0192880519

Richard Dawkins, Climbing Mount Improbable, New York: Norton, 1996.

Dobzhansky, Th., Genetics and the Origin of Species Columbia University Press, New Yor, 1937; 2nd ed., 1941; 3rd edn., 1951.

Ellis, R. S. Entropy, Large Deviations, and Statistical Mechanics. New York: Springer-Verlag, 1985.

Endler, J.A., Natural Selection in the Wild. Princeton, New Jersey: Princeton University Press, 1986. ISBN 0-691-00057-3

Philip V. Fellman, Jonathan Vos Post, Roxana Wright, and Usha Dasarari, “Adaptation and Coevolution on an Emergent Global Competitive Landscape”, Proc. 5th International Conference on Complex Systems, Boston, Massachisetts, 16-21 May 2004. Abstract: Notions of Darwinian selection have been implicit in economic theory for at least sixty years. Richard Nelson and Sidney Winter have argued that while evolutionary thinking was prevalent in prewar economics, the postwar Neoclassical school became almost entirely preoccupied with equilibrium conditions and their mathematical conditions. One of the problems with the economic interpretation of firm selection through competition has been a weak grasp on an incomplete scientific paradigm. As I. F. Price notes: “The biological metaphor has long lurked in the background of management theory largely because the message of “survival of the fittest” (usually wrongly attributed to Charles Darwin rather than Herbert Spencer) provides a seemingly natural model for market competition (e.g. Alchian 1950, Merrell 1984, Henderson 1989, Moore 1993), without seriously challenging the underlying paradigms of what an organisation is. In this paper we examine the application of dynamic fitness landscape models to economic theory, particularly the theory of technology substitution, drawing on recent work by Kauffman, Arthur, McKelvey, Nelson and Winter, and Windrum and Birchenhall. In particular we use Professor Post’s early work with John Holland on the genetic algorithm to explain some of the key differences between static and dynamic approaches to economic modeling.

Falconer, D.S. & Mackay, T.F.C. (1996) Introduction to Quantitative Genetics Addison Wesley Longman, Harlow, Essex, UK ISBN 0-582-24302-5

R. A Fisher, “Theory of Statistical Estimation”, Proceedings of the Cambridge Philosophical Society, 1925.

R. A. Fisher in the The Genetical Theory of Natural Selection, Clarendon Press, Oxford, 1930; 1958 edition, p.39.

Forward, Robert, Dragon’s Egg, science fiction novel, Tor, 1980. ISBN 0-345-28646-4

see also: http://en.wikipedia.org/wiki/Dragon's_Egg

Futuyma, D.J., Evolution. Sinauer Associates, Inc., Sunderland, Massachusetts, 2005. ISBN 0-87893-187-2

Goldman, S. Information Theory. New York: Dover, 1953.

Anthony J.F. Griffiths, Richard C. Lewontin, Jeffrey H. Miller, and William M. Gelbart, Modern Genetic Analysis: Integrating Genes and Genomes, 2nd hardcover edition, W. H. Freeman Company, Feb 2002

ISBN: 0716743825

Haldane, J.B.S. (1932) The Causes of Evolution;

Haldane, J.B.S., The measurement of natural selection. Proceedings of the 9th International Congress of Genetics. 1(1953): 480-487.

Haldane J.B.S. (1957) The cost of natural selection. J Genet 55:511-24.

Hankerson, D.; Harris, G. A.; and Johnson, P. D. Jr. Introduction to Information Theory and Data Compression. Boca Raton, FL: CRC Press, 1998.

Hartley, Ralph V.L., “Transmission of Information”, Bell System Technical Journal, 1928.

Havil, J. "A Measure of Uncertainty." §14.1 in Gamma: Exploring Euler's Constant. Princeton, NJ: Princeton University Press, pp. 139-145, 2003.

Haygood, Ralph, “Mutation Rate and the Cost of Complexity”, Molecular Biology and Evolution, Volume 23, Number 5, 15 May 2006, pp. 957-963(7)

Abstract:

Two recent theoretical studies of adaptation suggest that more complex organisms tend to adapt more slowly. Specifically, in Fisher's “geometric” model of a finite population where multiple traits are under optimizing selection, the average progress ensuing from a single mutation decreases as the number of traits increases—the “cost of complexity.”

Kauffman, Stuart A., The Origin of order. Self-organization and selection in evolution. New York: Oxford University Press, 1993 ISBN 0-19-507951-5.

Stuart Kauffman. At Home in the Universe: The Search for Laws of Self-Organization and Complexity. New York: Oxford University Press, 1995.

Khinchin, A. I. Mathematical Foundations of Information Theory. New York: Dover, 1957.

Kryukov G.V., Schmidt, S. & Sunyaev, S., Small fitness effect of mutations in highly conserved non-coding regions. Human Molecular Genetics 14(2005):2221-9

Kumar S, and Subramanian S., “Mutation rates in mammalian genomes.” Proc Natl Acad Sci USA. 2002 Jan 22;99(2):803-8. Epub 2002 Jan 15.

Lande, R. & Arnold, S.J., The measurement of selection on correlated characters. Evolution 37(1983):1210-26.

Lasota, A. and Mackey, M. C. Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics, 2nd ed. New York: Springer-Verlag, 1994.

Lee, Y. W. Statistical Theory of Communication. New York: Wiley, 1960.

Lenski, Richard E: Balancing Robustness and Evolvability

http://www.citebase.org/cgi-bin/citations?identifier=oai:pubmedcentral.nih.gov:1750925

Can a single unifying mathematical framework help to explain robustness - the ability of organisms to persist in the face of changing conditions - at all biological scales, from biochemical to ecological?

Lenski, Richard E, Barrick, Jeffrey E, Ofria, Charles

2006-12-29

Lewontin, Lewis, “The Genotype/Phenotype Distinction”, Stanford Encyclopedia of Philosophy,

23 January 2004

http://plato.stanford.edu/entries/genotype-phenotype/

Lossee, Bob, The Beginnings of “Information Theory”, online, 1999-03-10.

http://www.ils.unc.edu/~losee/b5/node7.html

Lotka, A.J. (1922a) Contribution to the energetics of evolution [PDF] Proc Natl Acad Sci USA 8:147–51

Lotka, A.J. (1922b) Natural selection as a physical principle [PDF] Proc Natl Acad Sci USA 8:151–4

Mayr, E., Systematics and the Origin of Species Columbia University Press, New York, 1942. ISBN 0-674-86250-3

Melanie Mitchell. An Introduction to Genetic Algorithms. Cambridge, MA: MIT Press, 1996.

[Myers, 2007]: P. Z. Myers, “Basics: What is a gene?”,

http://scienceblogs.com/pharyngula/2007/01/basics_what_is_a_gene.php

Nahin, P. J., Oliver Heaviside, Sage in Solitude. IEEE Press, New York, 1988. ISBN 0-87942-238-6

Nyquist, Harry, “Certain Factors Affecting Telegraph Speed”, Bell System Technical Journal, 1924.

Dennis O'Neil, Hardy-Weinberg Equilibrium Model, http://anthro.palomar.edu/synthetic/synth_2.htm

Ofria, Charles: Evolution of differentiated expression

patterns in digital organisms

http://www.citebase.org/cgi-bin/citations?identifier=oai:arXiv.org:physics/0002054

We investigate the evolutionary processes behind the development and optimization of multiple threads of execution in digital organisms using the avida platform, a software package that implements Darwinian evolution on populations of self-replicating computer programs. The system is seeded with a linearly executed ancestor capable only of reproducing its own genome, whereas its underlying language has the capacity for multiple threads of execution (i.e., simultaneous expression of sections of the genome.) We witness the evolution to multi-threaded organisms and track the development of distinct expression patterns. Additionally, we examine both the evolvability of multi-threaded organisms and the level of thread differentiation as a function of environmental complexity, and find that differentiation is more pronounced in complex environments.

Appeared in "Advances in Artificial Life", 5th European Conference ECAL'99, D. Floreano et al, eds. (1999) Ofria, Charles, Adami, Christoph, Collier, Travis C., Hsu, Grace K. 2006-12-01

Ofria, Charles: Selective pressures on genomes in molecular evolution

http://www.citebase.org/cgi-bin/citations?identifier=oai:arXiv.org:quant-ph/0301075

We describe the evolution of macromolecules as an information transmission process and apply tools from Shannon information theory to it. This allows us to isolate three independent, competing selective pressures that we term compression, transmission, and neutrality selection. The first two affect genome length: the pressure to conserve resources by compressing the code, and the pressure to acquire additional information that improves the channel, increasing the rate of information transmission into each offspring. Noisy transmission channels (replication with mutations) gives rise to a third pressure that acts on the actual encoding of information; it maximizes the fraction of mutations that are neutral with respect to the phenotype. This neutrality selection has important implications for the evolution of evolvability. We demonstrate each selective pressure in experiments with digital organisms.

Comment: 16 pages, 3 figures, to be published in J. Theor. Biology

Ofria, Charles, Adami, Christoph, Collier, Travis C.

2006-12-01

Ofria, Charles: Evolution of genetic organization in digital organisms

http://www.citebase.org/cgi-bin/citations?identifier=oai:arXiv.org:adap-org/9903003

We examine the evolution of expression patterns and the organization of genetic information in populations of self-replicating digital organisms. Seeding the experiments with a linearly expressed ancestor, we witness the development of complex, parallel secondary expression patterns. Using principles from information theory, we demonstrate an evolutionary pressure towards overlapping expressions causing variation (and hence further evolution) to sharply drop. Finally, we compare the overlapping sections of dominant genomes to those portions which are singly expressed and observe a significant difference in the entropy of their encoding.

Comment: 18 pages with 5 embedded figures. Proc. of DIMACS workshop on "Evolution as Computation", Jan. 11-12, Princeton, NJ. L. Landweber and E. Winfree, eds. (Springer, 1999)

Ofria, Charles, Adami, Christoph

2006-12-01

Ott, E. "Entropies." §4.5 in Chaos in Dynamical Systems. New York: Cambridge University Press, pp. 138-144, 1993.

Pearson, K., “Mathematical contributions to the theory of evolution. XI. On the influence of natural selection on the variability and correlation of organs.” Philosophical Transactions of the Royal Society of London, Ser. A 200(1903): 1–66.

Pierce, J. R. An Introduction to Information Theory. New York: Dover, 1980.

Pollard KS, Salama SR, King B, Kern AD, Dreszer T, et al. (2006) Forces Shaping the Fastest

Evolving Regions in the Human Genome. PLoS Genet 2(10): e168 doi:10.1371/journal.pgen.0020168

Jonathan V. Post, “Analysis of Enzyme Waves: Success through Simulation”, Proceedings of the Summer Computer Simulation Conference, Seattle, WA, 25-27 August 1980, pp.691-695, AFIPS Press, 1815 North Lynn Street, Suite 800, Arlington, VA 22209

Jonathan V. Post, "Simulation of Metabolic Dynamics", Proceedings of the Fourth Annual Symposium on Computer Applications in Medical Care, Washington, DC, 2-5 November 1980

Jonathan V. Post, "Enzyme System Cybernetics", Proceedings of the International Conference on Applied Systems Research and Cybernetics, Acapulco, Mexico, 12-15 December 1980

Jonathan V. Post, "Enzyme System Cybernetics", Applied Systems Research and Cybernetics, ed. G.E. Lasker, Pergamon Press, 1981, Vol.IV, pp.1883-1888, ISBN: 0-08-027196-0 (set), ISBN: 0-08-0271201 (Vol.IV)

Jonathan V. Post, "Alternating Current Chemistry, Enzyme Waves, and Metabolic Chaos", NATO Workshop on Coherent and Emergent Phenomena in Biomolecular Systems, Tucson, AZ 15-19 January 1991

Jonathan V. Post, "Nonlinear Enzyme Waves, Simulated Metabolism Dynamics, and Protein Nanotechnology", poster session, 2nd Artificial Life Workshop, 5-9 Feb 1990, Sana Fe, NM

Jonathan V. Post, "Continuous Semigroups, Nonlinear Enzyme Waves, and Simulated Metabolism Dynamics", accepted for Semigroup Forum (Mathematics journal), 15 May 1990 not published as employer accidently erased only digital file of paper]

Jonathan V. Post, "Is Functional Identity of Products a Necessary Condition for the Selective Neutrality of Structural Gene Allele?", Population Biologists of New England (PBONE), Brown University, Providence, RI, June 1976

Jonathan V. Post, "Enzyme Kinetics and Selection of Structural Gene Products -- A Theoretical Consideration", Society for the Study of Evolution, Ithaca, NY,

June 1977

Jonathan V. Post, "Birth of the Biocomputer", color-videotaped lecture to audience of 200, at opening of A.P.P.L.E.'s new world headquarters, Kent, WA, 15 Mar 1983

Jonathan V. Post et.al., "Part Human, Part Machine", panel discussion on cyborgs, prosthesis, robots, nanotechnology, Westercon 37, Portland Marriott, Portland, OR, 30 Jun 1984

Jonathan V. Post (moderator), Prof. Vernor Vinge, Paul Preuss, Greg Bear, F. Eugene Yates (Director, Crump Institute for Medical Engineering, UCLA), "New Machines, New Life Forms", UCLA Extension's Symposium on Science and Science Fiction, Westwood, CA, 9 Nov 1986

Jonathan V. Post, Dean R. Lambe, Laura Mixon, Walter John Williams, "Nanotechnology", panel discussion, Nolacon: 46th World Science Fiction Convention, Sheraton Grand B, New Orleans, LA, 4 Sep 1988

Jonathan Vos Post, “The Evolution of Controllability in Enzyme System Dynamics”, Proc. 5th International Conference on Complex Systems, Boston, Massachisetts, 16-21 May 2004. Abstract: A building block of all living organisms' metabolism is the "enzyme chain." A chemical "substrate" diffuses into the (open) system. A first enzyme transforms it into a first intermediate metabolite. A second enzyme transforms the first intermediate into a second intermediate metabolite. Eventually, an Nth intermediate, the "product" diffuses out of the open system. What we most often see in nature is that the behavior of the first enzyme is regulated by a feedback loop sensitive to the concentration of product. This is accomplished by the first enzyme in the chain being "allosteric", with one active site for binding with the substrate, and a second active site for binding with the product. Normally, as the concentration of product increases, the catalytic efficiency of the first enzyme is decreased (inhibited). To anthropomorphize, when the enzyme chain is making too much product for the organism’s good, the first enzyme in the chain is told: "whoa, slow down there." Such feedback can lead to oscillation, or, as this author first pointed out, "nonperiodic oscillation" (for which, at the time, the term "chaos" had not yet been introduced). But why that single feedback loop, known as "endproduct inhibition" [Umbarger, 1956], and not other possible control systems? What exactly is evolution doing, in adapting systems to do complex things with control of flux (flux meaning the mass of chemicals flowing through the open system in unit time)? This publication emphasizes the results of Kacser and the results of Savageau, in the context of this author’s theory. Other publications by this author [Post, 9 refs] explain the context and literature on the dynamic behavior of enzyme system kinetics in living metabolisms; the use of interactive computer simulations to analyze such behavior; the emergent behaviors "at the edge of chaos"; the mathematical solution in the neighborhood of steady state of previously unsolved systems of nonlinear Michaelis-Menton equations [Michaelis-Menten, 1913]; and a deep reason for those solutions in terms of Krohn-Rhodes Decomposition of the Semigroup of Differential Operators of the systems of nonlinear Michaelis-Menton equations. Living organisms are not test tubes in which are chemical reactions have reached equilibrium. They are made of cells, each cell of which is an "open system" in which energy, entropy, and certain molecules can pass through cell membranes. Due to conservation of mass, the rate of stuff going in (averaged over time) equals the rate of stuff going out. That rate is called "flux." If what comes into the open system varies as a function of time, what is inside the system varies as a function of time, and what leaves the system varies as a function of time. Post's related publications provide a general solution to the relationship between the input function of time and the output function of time, in the neighborhood of steady state. But the behavior of the open system, in its complexity, can also be analyzed in terms of mathematical Control Theory. This leads immediately to questions of "Control of Flux."

Reza, F. M. An Introduction to Information Theory. New York: Dover, 1994

Rothstein, J. "Information, Measurement, and Quantum Mechanics." Science 114, 171-175, 1951.

Schnakenberg, J. "Network Theory of Microscopic and Macroscopic Behavior of Master Equation Systems." Rev. Mod. Phys. 48, 571-585, 1976.

Schneider, T.D., Stormo, G.D., Gold, L., and Ehrenfeucht, A, "Information Content of Binding Sites

on Nucleotide Sequences", J. Mol. Biol. 188(1986)415-431.

Adami says: "The ability to analyze the entropy of each site in the genome quantifies the loss of variability... This entropy analysis has been carried out in a biological context by [Schneider et al]..."

Adrian Schneider, Gina M. Cannarozzi and Gaston H. Gonnet, “Empirical codon substitution matrix”

BMC Bioinformatics 2005, 6:134

doi:10.1186/1471-2105-6-134

The electronic version of this article is the complete one and can be found online at:

http://www.biomedcentral.com/1471-2105/6/134

Shannon, C. E. "A Mathematical Theory of Communication." The Bell System Technical J. 27, 379-423 and 623-656, July and Oct. 1948. http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf.

Shannon, C. E. and Weaver, W. Mathematical Theory of Communication. Urbana, IL: University of Illinois Press, 1963.

Singh, J. Great Ideas in Information Theory, Language and Cybernetics. New York: Dover, 1966.

Sober, E., The Nature of Selection: Evolutionary Theory in Philosophical Focus University of Chicago Press, 1984; 1993; ISBN 0-226-76748-5

Stern, C., “The Hardy–Weinberg law”. Science 97(1943): 137–138.

Weisstein, Eric W., "Coding Theory." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/CodingTheory.html

[Weisstein, Entropy]:

Weisstein, Eric W. "Entropy." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/Entropy.html

See also (same web domain and author):

Differential Entropy, Information Theory, Kolmogorov Entropy, Kolmogorov-Sinai Entropy, Maximum Entropy Method, Metric Entropy, Mutual Information, Nat, Ornstein's Theorem, Redundancy, Relative Entropy, Shannon Entropy, Topological Entropy.

[Weisstein, Information Theory]:

Weisstein, E. W. "Books about Information Theory." http://www.ericweisstein.com/encyclopedias/books/InformationTheory.html.

John S. Wilkins, Evolving Thoughts, Science blog definition of Fitness

(see also subsequent discussion on that blog)

http://scienceblogs.com/evolvingthoughts/2007/01/fitness.php#more

Williams, G.C., Adaptation and Natural Selection, Oxford University Press, 1966.

Sewall Wright. "The roles of mutation, inbreeding, crossbreeding, and selection in evolution". In Proceedings of the Sixth International Congress on Genetics, pp. 355-366, 1932.

Wright, S., The roles of mutation, inbreeding, crossbreeding and selection in evolution Proc 6th Int Cong Genet 11932:356–66.

Yule, G. U., Mendel's laws and their probable relation to intra-racial heredity. New Phytol. 1(1902) 193–207, 222–238.

Zayed, A. I. Advances in Shannon's Sampling Theory. Boca Raton, FL: CRC Press, 1993.

Personal tools