Tuesday, February 26, 2008

Is Dr. Dembski's Work "Written in Jell-o?"

From here.

CA: interviewer

WD: Dr. Dembski

CA: Your critics (such as Wein, Perakh, Shallit, Elsberry, Wolpert and others) seem unsatisfied with your work. They charge your work as being somewhat esoteric and lacking intellectual rigor. What do you say to that charge?

WD: Most of these critics are responding to my book No Free Lunch. As I explained in the preface of that book, its aim was to provide enough technical details so that experts could fill in details, but enough exposition so that the general reader could grasp the essence of my project. The book seems to have succeeded with the general reader and with some experts, though mainly with those who were already well-disposed toward ID. In any case, it became clear after that publication of that book that I would need to fill in the mathematical details myself, something I have been doing right along (see my articles described under “mathematical foundations of intelligent design” at www.designinference.com) and which has now been taken up in earnest in a collaboration with my friend and Baylor colleague Robert Marks at his Evolutionary Informatics Lab (www.EvoInfo.org).

CA: Are you evading the tough questions?

WD: Of course not. But tough questions take time to answer, and I have been patiently answering them. I find it interesting now that I have started answering the critics’ questions with full mathematical rigor (see the publications page at www.EvoInfo.org) that they are largely silent. Jeff Shallit, for instance, when I informed him of some work of mine on the conservation of information told me that he refuse to address it because I had not adequately addressed his previous objections to my work, though the work on conservation of information about which I was informing him was precisely in response to his concerns. Likewise, I’ve interacted with Wolpert. Once I started filling in the mathematical details of my work, however, he fell silent.

Saturday, February 23, 2008

Published Method of Measuring Specificity (Function)

It looks like a PNAS article has finally caught up with and refined some of the work of Dr. Dembski. Here is the PNAS article that discusses measuring for functional information, and upon first read through seems to measure functional information in an extremely similar manner as Dr. Dembski measures for specificity as it relates to function in "Specification: the Pattern that Signifies Intelligence."

It seems that the main and only significant difference is that the PNAS article uses a measure of functionality (specificity) that doesn't rely on a human linguistic description of the pattern. Although the equation seems to be the same as far as I can tell (log2 [number of specified patterns related to the function * probability of the pattern in question]), the gauge for the number of specified patterns seems to be taken directly from the "independent" description as formulated by the system in question -- ie: the relation between biological function and its independent description in a specified RNA chain as opposed to an independent linguistic description of the biological function. IMO, this provides a more concrete and accurate measure of specificity and still does not detract from Dembski's work on CSI in any way as I had already basically incorporated that same method as used in the recently published paper when I discussed specifications here on this blog. As I have explained: "Now, let’s take a look at proteins. When it comes to measuring specificity, this is exactly like measuring specificity in a meaningful sentence, as I will soon show. Functional specificity merely separates functional pattern “islands” from the sea of random possible patterns. When specific proteins are brought together, you can have a pattern which creates function. That functional pattern itself is formulated by information contained in DNA which is encoded into RNA and decoded into the specific system of functional proteins. The functional pattern as the event in question is defined independently as a pattern of nucleic acids ... When measuring for a functional specification (within a set of functional "islands"), you apply the same equation, however, when measuring the specificity you take into account all other FUNCTIONAL patterns (able to be processed into function *by the system in question*) that have the same probability of appearance as the pattern in question."

As far as I can tell, the PNAS paper doesn't take into account any probabilistic resources, so it is not measuring for CSI; it only measures for SI, that is, specified or functional information (presented as a measure of complexity).

From the PNAS article:
"Functional information provides a measure of complexity by quantifying the probability that an arbitrary configuration of a system of numerous interacting agents (and hence a combinatorially large number of different configurations) will achieve a specified degree of function."


"Letter sequences, Avida genomes and biopolymers all display degrees of functions that are not attainable with individual agents (a single letter, machine instruction, or RNA nucleotide, respectively). In all three cases, highly functional configurations comprise only a small fraction of all possible sequences."

Of course, Dembski's definition of specificity does take specificity beyond merely function, however, in his discussion specificity most definitely includes function and the measurement seems to be in agreement with this recent PNAS article. According to Dembski's definition, specificity includes algorithmic compressibility, semantic meaning, and function. However, the other article uses specificity in a more strict functional sense which includes meaning and other "usable" function, and unlike Dembski has done, this PNAS article doesn't seem to even really attempt to provide a rigorous definition of a specified pattern. Dr. Dembski has defined a specified pattern as an event which can be formulated as a conditionally independent pattern. Of course, as I've already explained and shown, this includes algorithmically compressible patterns, as well as semantically meaningful events and functional events.

Compare the above PNAS article with Dembski's treatment of specificity and check it out for yourself.

Wednesday, February 20, 2008

Thoughts on God (Part 1)

As a Christian, Naturalistic Intelligent Design Advocate , and Panentheist (the most logical conclusion of course, IMO) I see God as perfectly personal, even though He "merely" engineered the universe to such fine tuned precision that it unfolded according to his plan. It seems rather obvious to me, that God operates through the creation of laws -- both spiritual and natural. Those laws, once created, operate "of their own accord." Does this mean God is impersonal? Of course not. As a panentheist, I see God as actually *being* those laws and also existing far beyond those laws at the same time. If God truly is God, how can anything exist apart from him (unless free-will is given, which is a different topic)?

How does this fit into me still being an ID advocate? Well, it is obvious, IMO, that we can scientifically determine the effects of intelligence. IOW, a mere random set of laws and variables (absent intelligence) acted upon by chance will not produce information processing systems or CSI. Thus an intelligence is necessary to cause the production of life within an overarching program.

What Does a Specification Tell Us?

A specification technically only measures what chance and law on its own will not do. The reason that we *infer* intelligence, is for 4 reasons (in fact some are very similar to how past evolution is inferred):

1. Intelligence is another causal factor aside from chance and law because intelligence can control law and chance to produce a future target, however law and chance are blind.

2. Intelligence has been observed creating specifications.

3. To date there is no known specification, in which we know the cause, which has been generated absent intelligence.

4. According to recent information theroems and experiments with information processing sysems and EAs, intelligence is necessary for consistently better than chance results (equating consistently better than chance results with perpetual motion machines). The better than chance results of evolution are balanced with knowledge of the problem/target incorporated into the behavior of the algorithm, thus guiding it to the solution.

Since CSI measures what chance and law (absent intelligence) will not produce, then it errs on the side of caution. ie: if an intelligent agent writes down a random string of 1s and 0s (ignoring the fact that it is written on a piece of lined 8" X 11" piece of paper, which itself may measure as a specification) then there will be no CSI measured. This only tells us that the information content represented by the string itself carries no signs of intelligence.

Therefore, a specification may not catch every single case of intelligent action, however everything that it *does* catch is *necessarily* a result of intelligence. So far, no one has shown any different.

Tuesday, February 12, 2008

New Moderation Policy

This is just a heads up to let anyone with a comment know that I have removed comment moderation from this blog.

I kindly request that you first read through my Case for a Naturalistic Hypothesis of Intelligent Design (at the top of the left sidebar) before posting any comments, constructive or negative. It's just that I don't have all the time in the world to be responding to comments which I may have already answered. If I have to re-quote myself in order to respond to a comment, this merely provides evidence that you have not actually fully read through my case and done due diligent research. Of course, I do understand that a person can honestly miss something I said or misunderstand a certain aspect. Obviously I will keep this in mind. Thank you in advance for your cooperation.

Furthermore, I am responding to misunderstandings of ID directly below my Case for ID in the left sidebar. If you attempt to equate ID with ignorance, continually misrepresent ID, or use any other unscientific rhetorical ploy such as "you must known design method before you can reliably detect design," you may discover that your comment has been added to my compilation of blog posts, at the aforementioned location, responding to obfuscations of Intelligent Design Theory.

Monday, February 11, 2008

Design Detection before Method Detection

Blipey (a contributor to this debate on JoeG's blog):
"If they find something that they can't explain using the lexicon of known methods of design, they don't assume that it was designed."
You are partially on to something here, blipey. However, Stonehenge was known to be designed long before it was even remotely discovered how it might have possibly been designed. First the design detection, then the design method detection. The same holds true for ancient tools. "Oh look we have a designed tool" -- design detection based on context, analogy, and function (functional specificity). "Now let's discover a reasonable hypothesis as to how it was designed" -- design method detection. It quite elementary, actually.

As an aside, specification as an indicator of design is also based on context, analogy, and specificity. It is based within a probabilistic context, draws from the fact that intelligence routinely creates specifications, and it incorporates specificity (which includes, but is not limited to, function). Furthermore, there is to date no counter-example of properly calculated specified complexity that is observed to have been caused by a random set of laws (merely chance and law, absent intelligence).

Now, let's just assume you actually knew what you were talking about. Does the reverse of what you assert hold true? If they find something that they *can* explain using the lexicon of known methods of design, do they assume that it *was* designed?

ie: life is based on an information processing system that follows an evolutionary algorithm.

There is much hardware and software design and goal oriented engineering and programming that goes into the creation of information processing systems that can run an evolutionary algorithm.

The application of these engineering and programming principles are KNOWN METHODS OF INTELLIGENT, GOAL ORIENTED DESIGN that are essential in the generation of information processing systems and evolutionary algorithms.

What's more, systems that run off of information and engineering principles harness and control natural law and chance however these very principles themselves are not defined by chance and natural law. Yet, 100% of the time that we are aware of the causal history of these systems, we know that they are the products of previous KNOWN METHODS OF INTELLIGENT, GOAL ORIENTED DESIGN.

Sum It All Up

To sum up, in light of my case for naturalistic Intelligent Design, there are three options from which we can chose an overall hypothesis (overarching paradigm) for the cause of life in our universe:

1. It is the result of an infinite regress of problem specific information. This suffers from philosophical problems associated with infinite regress.

2. It is the result of a Fortuitous Accident – pure dumb luck -- that just happened to generate problem specific information (consistently better than chance performance), information processors, CSI, and convergent evolution as a result of a truly random assortment of laws. IOW, it is the result of only chance and laws with no previous intelligent input or cause. This suffers from the problems associated with chance of the gaps non-explanations and has never been shown to be a scientific plausibility (because they are so highly improbable that they are, for all practical purposes impossible) and thus belongs in the same category as claims of perpetual motion free energy machines. Furthermore, this is so far not based on any testing and observations and is unfalsifiable. This is the predominant hypothesis being peddled under scientific status today.

3. It is the necessary result of Intelligent Programming (fine tuning) of the laws of physics to converge upon specific targets/functions as potential solutions to problems, thus incorporating problem specific information into the foundation of our universe (as an information processing system). This is based on observation of the Intelligent foresight necessary (so far) to create specific information targeted at future problems (problem specific information/active information) and for the generation and programming of the types of highly improbable systems in question, including CSI. Furthermore, this option is continually testable and able to be refined as a result of continued work on information processing systems, evolutionary algorithms, and information theory. This hypothesis is even falsifiable by demonstrating choice number 2.

So, take your pick. I predict, based on the responses generated here (if there be any), that ID Theory and the naturalistic hypothesis will stand strong as being scientific and the better explanation and that people will not accept it primarily based on their personal wishes and philosophies even though the philosophy of ID is itself logical. So, where to go from here? How about admitting that it is a scientific hypothesis and even if you don’t agree with it, doing what you can to allow it the process of getting published, such as happens with competing scientific hypothesis.

NFL Theorems (Part III) ... So, What's the Point?

Now, apply all that to an evolutionary optimization program operating on a chemical information processing system [within life -- as explained and published by Hubert Yockey] searching a space within a quantum computer system [of our universe -- as hypothesized by Seth Lloyd].

Here [link: http://www.aics-research.com/research/notes.html] also, are some published notes [from IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 5, NO. 1, JANUARY, 1994. pp. 130-148.] explaining how computational simulations of evolution are basically representations of biological evolution. In fact, the author states that “Genetic algorithms are basically a proper simulation of Darwinian evolution. A population of trials is mutated and the best N are retained at each generation,” and, “Darwinian evolution, as a process, is an optimization algorithm. It is not a predictive theory, nor is it a tautology ([5] p. 519, [6] p. 112), as has often been claimed (e.g., [7],[8]). As in most optimization processes, the point(s) of solution wait to be discovered by trial-and-error search.”

(As an aside, which is interesting but not necessary to our discussion:

When discussing philosophical issues, the author states: “ Most troubling has been the elucidation of purpose. In distinct contrast to engineering, where purpose within a design is taken for granted -- and where the author of a design may perhaps still be available for questioning as to his reasons and motivations for specific details -- no such recourse is possible in naturally evolved systems. Indeed, the degree to which to even recognize the nature and extent of purpose within naturally evolved biota has proven to be one of biology's longest and most fundamental internal debates. Haldane once quipped, "Teleology [the study of purpose] is like a mistress to a biologist; he cannot live without her but he's unwilling to be seen with her in public" ([50] p. 392).

But purpose clearly exists in the designs produced by evolution and the reintroduction of purpose into the biological discussion has been championed by biologists such as Pittendridge, Lorenz and Mayr. Pittendridge [52] renamed and redefined the study of purpose in evolved structures to be teleonomy in order to draw as sharp a distinction between it and the mysticism of an older teleology as currently now exists between astronomy and astrology ([49] p. 29).”

... and ...

“When the philosophical perspective is constrained to the clear chain of causation resident in P, "...the designs developed by evolution are so similar in principal to those that would be reached by a conscious designer, ...it seems reasonable to suggest as a general approach to biological problems that the investigator should ask himself what are the essential functions involved and how might a designer provide for them" ([57] p. 4).)

Now, with the understanding that computational evolutionary simulations are understood to be “proper simulations of Darwinian evolution” which operates off an actual information processing system within a larger quantum computer (as hypothesized), please provide any evidence that this biological evolution is a blind process with no problem specific information and no knowledge of optimization problem matched to search algorithm beforehand. In fact, according to what the authors of the NFL Theorems state above, you won’t be able to guarantee anything of the sort using any computational simulation (which then poses the problem as to why biochemical information processing would be any different). This seems to leave only the highly improbable, indefensible, impractical, “chance of the gaps” non-explanation that follows:

- accidental chance events somehow created a lawful program and search space with an exploitable “hill climbing” structure, accidentally generated a replicating information processor (life) and the correct search algorithm to match the exploitable search space structure, and blindly and accidentally caused the necessary problem specific, active information to generate (at a rate far exceeding random chance results by many orders of magnitude) the following features which are known to be causally related to intelligent foresight and intelligent programming/design ...

1. An information processing system which operates off of many layers of algorithmically complex and specified code while evolving further high information content codes (far exceeding 500 bits -- UPB).
2. Repair systems.
3. Logic gates.
4. Complex machinery
5. Complex assembly instructions and pathways.
6. Redundant systems.
8. Intelligent systems.
9. Consistently better than chance performance over separate trials. [Convergent
evolutionary structures and functions] (link to Simon Conway Morris’ book and
the other webpage)

However, not to worry, there is a better explanation:

Given the NFL Theorems, COI Theorems, and understanding of CSI, biological evolution is most probably the necessary result of a teleological (end-goal, solution oriented) law, being guided by problem specific information at the foundation of our universe. IOW, evolution is the result of universal laws which have been intelligently fine tuned (programmed) by matching algorithm to problem by incorporating future knowledge of the optimization problem into the evolutionary algorithm to arrive at solutions to problems/targets (some of which are shown by convergent evolution and the other 8 phenomenon/effects listed above).

The basis for this naturalistic teleological hypothesis is scientific since ...

1. It is based on observations (data) of cause and effect for the types of systems in question (information processing systems performing evolutionary optimization, arriving at above listed effects at better than chance performance),

2. It has begun to be tested and can continue to be tested using evolutionary algorithms, evidence being provided with our knowledge of the programming and problem specific information necessary (according to NFL Theorems) for evolutionary algorithms,

3. It is even falsifiable by showing how stochastic, blind, non-teleologically generated processes can cause information processing systems to self organize, generate layers of further “evolvable” coded information, and account for the problem specific information shown to be necessary by the NFL Theorems.

In fact, I don’t see how anyone could get away with not realizing that a naturalistic teleological hypothesis as the cause of life and it’s subsequent evolution is the best scientific explanation, consisting of greatest explanatory power and scope and based on observation of the types of systems in question.

Do you have any better competing hypothesis? If so, please lay it out. Merely critiquing the teleological hypothesis -- that evolution is the necessary cause of a goal oriented procedure which is necessarily shaped by intelligence -- doesn’t automatically make your position (whatever it may be) the correct position.

Again, if you wish to end this discussion, and prove ID Theory wrong and your side (whatever that may be) as right, merely show how a random set of laws will generate an information processing system, problem specific information (thus an evolutionary algorithm), and finally convergent examples of CSI. Based on my understanding and explanation of NFL Theorems and recently developed Conservation of Information Theorems that random generation of information processing systems and evolutionary algorithms is to information theory what perpetual motion free energy machines are to physics and are thus so highly improbable that they are for all practical purposes impossible. Merely show me some data (observation) that a random set of laws will produce the above mentioned set of effects. Or at least show me the theory underpinning such a hypothesis. Data Trumps ... every time. Intelligent Design Theory has played it’s first card ... now it’s the alternative hypothesis’ turn.

NFL Theorems (Part II)

Now, I will show what Dr. Dembski and Marks state. If you don’t agree with what they write, you’ll have to show me that their conclusions are inconsistent with and/or not built upon the NFL Theorems. Here’s what they say in a nutshell:

“active information ... measures the contribution of problem-specific information for successfully finding a target. This paper develops a methodology based on these information measures to gauge the effectiveness with which problem-specific information facilitates successful search.”

“Active information captures numerically the problem-specific information that assists a search in reaching a target. We therefore find it convenient to use the term “active information” in two equivalent ways:
1) as the specific information about target location and search-space structure incorporated into a search algorithm that guides a search to a solution.
2) as the numerical measure for this information and defined as the difference between endogenous and exogenous information.”

From ActiveInfo “Conclusions:”

“If any search algorithm is to perform better than random search, active information must be resident in it. If the active information is inaccurate, the search can perform worse than random (which, numerically, comes out as negative active information) ... Accordingly, attempts to characterize evolutionary algorithms as “creators of novel information” are inappropriate. To have integrity, all search algorithms, especially computer simulations of evolutionary search, should make explicit (1) a numerical measure of the difficulty of the problem to be solved, i.e., the endogenous information, and (2) a numerical measure of the amount of problem-specific information resident in the search algorithm, i.e., the active information.”

Conservation of Information in Search: Measuring the Cost of Success:

“Search algorithms, including evolutionary searches, do not generate free information. Instead, they consume information, incurring it as a cost. Over 50 years ago, Leon Brillouin, a pioneer in information theory, made this very point: “The [computing] machine does not create any new information, but it performs a very valuable transformation of known information” [3] When Brillouin’s insight is applied to search algorithms that do not employ specific information about the problem being addressed, one finds that no search performs consistently better than any other. Accordingly, there is no magic-bullet search algorithm that successfully resolves all problems [7], [32].”

Thus ...

A. Problem specific information about search space and target must be incorporated into the behavior of evolutionary algorithms in order for them to produce better than chance results.

B. Active information gives a numerical measurement in information theoretic terms, of the amount of problem specific information incorporated into an algorithm to cause it to perform efficiently.

C. Evolutionary algorithms do not create new information. They operate off of previously inputted, correct information about target location and search structure which is then used to find and transform the previously existing information.

No Free Lunch Theorems (Part I)

When discussing the NFL Theorems I do not have the expertise to evaluate the math and its actual relevance to the real world. At the moment, I must take for granted that the authors of the NFL Theorem know what they are talking about. So, taking the NFL Theorems and the authors description of them as granted, I am merely re-stating what the authors already stated about the NFLT.

The quotes from the NFL Theorems Paper are centered and in blue:

“Given our decision to only measure distinct function evaluations even if an algorithm revisits previously searched points, our definition of an algorithm includes all common black-box optimization techniques like simulated annealing and evolutionary algorithms.”
Thus, the algorithms discussed include evolutionary algorithms.

“In this paper we present a formal analysis that contributes towards such an understanding by addressing questions like the following. Given the plethora of black box optimization algorithms and of optimization problems, how can we best match algorithms to problems (i. e. how best can we relax the black box nature of the algorithms and have them exploit some knowledge concerning the optimization problem). In particular, while serious optimization practitioners almost always perform such matching, it is usually on an ad hoc basis; how can such matching be formally analyzed? More generally, what is the underlying mathematical “skeleton” of optimization theory before the flesh of the probability distributions of a particular context and set of optimization problems are imposed.”

Thus, one of the purposes of this paper is to formally analyse how prior knowledge of the optimization problem is exploited and needs to be utilized in order to match it with the proper evolutionary algorithm, which is a feat performed by optimization practitioners (intelligent programmers).

“As emphasized above, the NFL theorems mean that if an algorithm does particularly well on average for one class of problems then it must do worse on average over the remaining problems. In particular, if an algorithm performs better than random search on some class of problems then in must perform worse than random search on the remaining problems. Thus comparisons reporting the performance of a particular algorithm with a particular parameter setting on a few sample problems are of limited utility.”

"In particular, if for some favorite algorithms a certain well behaved f results in better performance than does the random f then that well behaved f gives worse than random behavior on the set all remaining algorithms. In this sense just as there are no universally efficacious search algorithms, there are no universally benign f (optimizations problems) which can be assured of resulting in better than random performance regardless of ones algorithm.”
[parenthesis added]

There is no universal evolutionary algorithm that will solve all optimization problems at better than random performance. Likewise, there is no optimization problem that just any evolutionary algorithm will be able to “solve” at better than random performance. Thus, if an evolutionary algorithm performs better than random search on one class of optimization problems, it will perform worse than random search on average over all remaining classes of optimization problems. Dembski and Marks have reiterated this point in Conservation of Information in Search: Measuring the Cost of Success:: “... there is no magic-bullet search algorithm that successfully resolves all problems [7], [32].”

So, what causes the evolutionary algorithm to produce better than average on a class of problems? The next quotes begin to discuss this ...

“In particular, we show that an algorithm’s average performance is determined by how aligned it is with the underlying probability distribution over optimization problems on which it is run.”

“In any of these cases, P(f) or “p” must match or be aligned with “a” to get desired behavior. This need for matching provides a new perspective on how certain algorithms can perform well in practice on specific kinds of problems.”

“First, if the practitioner has knowledge of problem characteristics but does not incorporate them into the optimization algorithm, then P(f) is effectively uniform. (Recall that P(f) can be viewed as a statement concerning the practitioner’s choice of optimization algorithms.) In such a case, the NFL theorems establish that there are no formal assurances that the algorithm chosen will be at all effective. Second, while most classes of problems will certainly have some structure which, if known, might be exploitable, the simple existence of that structure does not justify choice of a particular algorithm; that structure must be known and reflected directly in the choice of algorithm to serve as such a justification. In other words, the simple existence of structure per se, absent a specification of that structure, cannot provide a basis for preferring one algorithm over another ... The simple fact that the P(f) at hand is non-uniform cannot serve to determine one’s choice of optimization algorithm.”

“Intuitively, the NFL theorem illustrates that even if knowledge of “f” perhaps specified through P(f) is not incorporated into “a” then there are no formal assurances that “a” will be effective. Rather effective optimization relies on a fortuitous matching between “f” [optimization problem] and “a” [evolutionary algorithm].” [brackets added]

The quotes above state that for the algorithm to perform better than random search on a class of problems, the evolutionary algorithm must be correctly matched beforehand, based on knowledge of the problem, to the specific optimization problem. Knowledge of characteristics of the problem must be incorporated into the algorithm. This is similar to how you won’t be able to find a card at consistently better than chance performance unless you are aware of an organization of the deck that would help you find the card and you incorporate this knowledge into the algorithm that you use to find the card. Likewise, the “exploitable structure,” such as hill climbing structures, must be known and incorporated into the choice of algorithm to take any advantage of these useable search structures. As Dr. Dembski has stated in “Active Information in Evolutionary Search,” merely assuming that there does exist an exploitable search structure, without incorporating that knowledge into the choice of algorithm, doesn’t get us anywhere:

“Such assumptions, [of non-uniform search space “links,” “hill climbing optimizations,” etc.] however, are useless when searching to find a sequence of, say, 7 letters from a 26-letter alphabet to form a word that will pass successfully through a spell checker, or choosing a sequence of commands from 26 available commands in an Avida type program to generate a logic operation such as XNOR [16]. With no metric to determine nearness, the search landscape for such searches is binary -- either success or failure. There are no sloped hills to climb. We note that such search problems on a binary landscape are in a different class from those useful for, say, adaptive filters. But, in the context of the NFLT, knowledge of the class in which an optimization problem lies provides active information about the problem.” (Brackets added)

He is basically stating the same thing as the “find a card” example. If you assume that there is a specific organization that will help you find the card without actually knowing for sure, you still have nothing but chance to go on since there may actually be no useful organization at all or out of all possible organizations, the one that you are banking on is such a small possibility that for all practical purposes you still have no better than chance assurance of finding that card.

Dr. Dembski’s quote directly above states the same concept as found within the NFL Theorem as I just quoted a few paras above: “... while most classes of problems will certainly have some structure which, if known, might be exploitable ...; that structure must be known and reflected directly in the choice of algorithm to serve as such a justification [for choice of a particular algorithm]... The simple fact that the P(f) at hand is non-uniform cannot serve to determine one’s choice of optimization algorithm ... Rather effective optimization relies on a fortuitous matching between “f” (optimization problem) and “a” (evolutionary algorithm).” [brackets added]

The above quotes from both the NFL Theorem paper and Dr. Dembski and Marks’ paper state what I have already referred to before – the necessity of using prior knowledge of the search space structure to match search space structure to the correct adaptive algorithm before the search ever begins. The next quote, taken from the conclusion of the paper, states that based on the theorems proved within the paper, it is important that problem-specific knowledge is incorporated into the behavior of the algorithm.
“A framework has been presented in which to compare general purpose optimization algorithms. A number of NFL theorems were derived that demonstrate the danger of comparing algorithms by their performance on a small sample of problems. These same results also indicate the importance of incorporating problem specific knowledge into the behavior of the algorithm.”

Now, let’s tie this all together:

1. The algorithms discussed include evolutionary algorithms.

2. The NFL Theorems formally analyse how prior knowledge of the optimization problem needs to be utilized in order to match it with the proper algorithm.

3. In order for evolutionary algorithms to perform better than average over one class of problems, the characteristics of the problem must be known and incorporated into the search algorithm. The exploitable structure of the search space must be known and incorporated into the choice of algorithm. Furthermore, search algorithm must be matched correctly to the optimization problem by utilizing prior knowledge of the problem.

4. Thus, it is important that problem specific information is incorporated into the behavior of the evolutionary algorithm. Therefore, evolutionary search is not blind search and is guided by incorporating prior knowledge of the problem and knowledge of any exploitable structure [within the search space] into the evolutionary algorithm.

IOW, here is an example of what the NFL Theorem is telling us:

I have a shuffled deck of cards. I lay them face down in a row on a table. Before you turn over any cards can you generate a search algorithm (method of search) that will help you find any card at consistently better than chance performance over many trials with the same shuffled deck?

Also, if by chance there just happens to be some type of exploitable search structure, such as after every heart comes a spade, within the arrangement of cards after random shuffling then, before flipping over any cards can you blindly (without knowledge of that specific arrangement) generate a search algorithm that will help you locate any card at consistently better than chance performance over many trials with the same shuffled deck?

IOW, will a blind set of laws (that is, only law and chance) generate, over many trials, consistently better than chance performance?

According to common sense, testing and observation, and NFL Theorems, the answer to the questions above is simply “no.”

Furthermore, the NFL Theorem applies to all searches which result in better than chance performance over many trials. And now I will leave you with a little gem to think about. Convergent evolution is the observation of many separate biological evolutionary trials converging upon the same extremely highly improbable forms and functions and biochemical systems [link].
Seriously, think about it. Drop any prejudice and use some reason here.

Specifications (Part II) ... Problems with CSI

Now, I would like to address some critical remarks pertaining to CSI. Some people have suggested that the calculation doesn’t really work for the following reasons:

Let’s say that you have a door that opens and closes with random gusts of wind. The open door is marked “state 1" and the closed door is marked “state 0.” After a very long while of marking these states down, you realize that you have a highly complex (in the sense of being highly improbable) and very compressible (specified) pattern of “101010101010101010 ...". This would obviously generate a very high specification, and according to the critic would mean that since we have CSI here, then there must be an intelligent ghost opening and closing the door. Thus, since we know that natural, random gusts of wind can cause a door to open and close, the measurement for specification actually doesn’t work. The critic insists that we have a counter example and thus using a specification as an indication of intelligence is flawed.

Now, before I start, I must remind you that an understanding of the pattern is necessary to calculate its probability, which is essential to calculating a specification. For example, if I see written somewhere the letters “GATTACA,” I could be looking at a short snippet of an actual amino acid sequence or I could merely be looking at the title of a movie. The shannon information content would be either 14 bits or approx. 32.9 bit respectively. In the initial possibility I would be looking at states from a set of 4 potential states and in the latter possibility I would be looking at states from a set of 26 potential states. Thus, based on what the written pattern actually represents, we can have many different probabilities associated to that representational pattern.

So, if we are measuring a representation of a pattern, such as how ACTG can represent adenine, cytosine, thymine, and guanine, on a string of DNA we need to understand the original pattern that we are representing in order to calculate any probabilities.

Now, let’s look at the swinging door example. We have a pattern of 101010101010101010 which represents a swinging door, “1" being open and “0" being closed. Well, what is the probability that we will have a pattern that represents “open door” “closed door” open door” closed door” etc? Well, is there any other possibility? Can we go from “open door” to “open door”? Not unless we have a timer which captures a state every “x” seconds, and in the above scenario that was not included. So no matter how the door is made to swing open and closed we really have a pattern which has a 100% probability, since one state must as a necessity be followed by the other state.

Now, we can measure for specificity and then for a specification. With a probability of 1, we will get a greater than 1 specificity and thus a negative amount of CSI – no specification.

Thus, a proper understanding of the probability of the pattern in question is essential when calculating for a specification.

Specifications (Part I) ... What Exactly are They?

There seems to be much confusion as to what is a specification -- that is, what constitutes CSI? I am no expert on the subject, however, I would like to add my two “sense” based on my understanding of the math and concepts involved. I have started discussing complex specified information here [link]. This is merely an extended discussion, clarification, and a bit deeper of a probe.

First, we must begin by defining a specified pattern. What is the basic idea behind a specified pattern? Why must we define a specified pattern? It is because within patterns, there can be something (potentially non-random characteristics such as function) which separates one set of patterns from all other possible patterns.

Dembski begins to discuss a specified pattern by stating: “The crucial difference between (R) [a random pattern] and (ψR) [a pseudo-random pattern -- the Champernowne sequence] is that (ψR) exhibits a simple, easily described pattern whereas (R) does not. To describe (ψR), it is enough to note that this sequence lists binary numbers in increasing order. By contrast, (R) cannot, so far as we can tell, be described any more simply than by repeating the sequence.” I will continue to discuss the difference between random and pseudo random patterns (as it relates to specificity) further on.

Dr. Dembski describes specified patterns as those patterns which can be described and formulated independent of the event (pattern) in question.

Stephen Myers expounds upon this subject in his paper, “DNA and the Origin of Life: ...”:

“Moreover, given their [proteins’] irregularity, it seemed unlikely that a general chemical law or regularity could explain their assembly. Instead, as Jacques Monod has recalled, molecular biologists began to look for some source of information or “specificity” within the cell that could direct the construction of such highly specific and complex structures. To explain the presence of the specificity and complexity in the protein, as Monod would later insist, “you absolutely needed a code.”21"

... and ...

“In essence, therefore, Shannon’s theory remains silent on the important question of whether a sequence of symbols is functionally specific or meaningful. Nevertheless, in its application to molecular biology, Shannon information theory did succeed in rendering rough quantitative measures of the information-carrying capacity or syntactic information (where those terms correspond to measures of brute complexity).33 ... In essence, therefore, Shannon’s theory remains silent on the important question of whether a sequence of symbols is functionally specific or meaningful. Nevertheless, in its application to molecular biology, Shannon information theory did succeed in rendering rough quantitative measures of the information-carrying capacity or syntactic information (where those terms correspond to measures of brute complexity).33"

... and ...

“Since the late 1950s, biologists have equated the “precise determination of sequence” with the extra-information-theoretic property of specificity or specification. Biologists have defined specificity tacitly as “necessary to achieve or maintain function.” They have determined that DNA base sequences, for example, are specified not by applying information theory but by making experimental assessments of the function of those sequences within the overall apparatus of gene expression.36 Similar experimental considerations established the functional specificity of proteins. Further, developments in complexity theory have now made possible a fully general theoretical account of specification, one that applies readily to biological systems. In particular, recent work by mathematician William Dembski has employed the notion of a rejection region from statistics to provide a formal complexity-theoretic account of specification. According to Dembski, a specification occurs when an event or object (a) falls within an independently given pattern or domain, (b) “matches” or exemplifies a conditionally independent pattern, or (c) meets a conditionally independent set of functional requirements.37"

This ability to formulate a pattern independent of the event in question can be understood with a couple of examples. But, we must first start with a communication system. Why? So that we can exchange a pattern across that communication system and see if it can indeed be formulated independent of the pattern in question.

Let’s say that we wanted to send the pattern: 111111111111111111111111111111 across our communication system. First, can we compress it? Yes we can. How? By defining or formulating it differently. On way to do that is by compressing the pattern to “print 1 X 30.” Thus we can send this information across the communication channel and the receiving party will receive the exact same information than if we had sent the original pattern of 1s. Therefore, the original pattern can be defined independently of the pattern itself. So, compressibility is one way to define a specified pattern.

Now, let’s look at another pattern: 1 2 3 5 7 11 13 17 19 23 ... . Is this a specified pattern? In order to evaluate this and the next examples of patterns, we need to understand the difference between what Dembski calls a pseudo-random and a random pattern. A random pattern is algorithmically complex and can not be defined or formulated according to a system of rules whereas a pseudo-random pattern merely looks random, since it is algorithmically complex (not regular as the last example of 1s), however it is not random since it can be defined and formulated as a separate pattern according to a system of rules. This usually entails meaning or function.

So, let’s return to our example of the pattern: 1 2 3 5 7 11 13 17 19 23 ... . Can we define or formulate it in a different way yet retain the same information? Sure we can. This pattern conforms to a system of mathematical rules and can be defined or formulated differently than by merely repeating the pattern. We can send: “all numbers, beginning with one, which are divisible only by themselves and one” or its equivalent mathematical notation across the communication channel and the other end would be able to reconstruct the original pattern -- a sequence of prime numbers.

Now, let’s take a look at proteins. When it comes to measuring specificity, this is exactly like measuring specificity in a meaningful sentence, as I will soon show. Functional specificity merely separates functional pattern “islands” from the sea of random possible patterns. When specific proteins are brought together, you can have a pattern which creates function [http://en.wikipedia.org/wiki/Function_%28biology%29]. [link] That functional pattern itself is formulated by information contained in DNA which is encoded into RNA and decoded into the specific system of functional proteins. The functional pattern as the event in question is defined independently as a pattern of nucleic acids. Thus, the independent formulation for the system of protein function is sent across a communication channel as RNA which is the independent formulation of the function (I have provided a definition of “function” [here “philosophical foudations”] [link]). The RNA is independent of the function itself for which it codes (that is, the information for formulating the function doesn’t come from the protein pattern itself as per Central Dogma of Biology [http://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology] [link]). You don’t send the function itself across the communication channel, you send an informational pattern. Again, that is what is referred to as functional specificity.

What about the pattern: “Can you understand this?” This pattern is most definitely specified, since it can be defined according to a pre-set English dictionary and produce a function through meaning/ideas. How do you pass an idea/meaning through a communication channel? You do so, by using informational units derived from a pre-set system which are separate from the actual pattern of ideas/meaning and which can be processed according to the rules of that pre-set system at the other end of the communication channel. The fact that you can send the same idea using different patterns in the same language or even different patterns by using another language shows that the ideas themselves are independent from the pattern which is sent across the communication channel. That is how we know that the idea “contained” in the pattern is defined independent of the pattern itself. We could even state the same meaning in a different way – “Do you have the ability to comprehend what these symbols mean?” Either way, the idea contained in the above pattern (question) can be transferred across a communication channel as an independent pattern of letters. This is referred to as functional semantic specificity – where specific groups of patterns which produce semantic/meaningful function are “islands” of specified patterns within a set of all possible patterns.

What about the event: ajdjf9385786gngspaoi-whwtht0wuetghskmnvs-12? Is that pattern specified? Well, is it compressible? Hardly, if at all. Can it be stated any other way, in terms of definition, function, or formulation? Well, this question can only be answered through cryptographic methods. If there is no function, or formulation (description or formulation which is independent of the pattern) then the only way to deliver it across a communication channel is to actually send the pattern itself. Of course, you could send a phonetic spelling of each unit across the communication channel and this would show that the pattern of each unit is specified, but that doesn’t specify the pattern as a whole – the pattern that emerges from the string of specified units. Therefore, until non-arbitrary cryptographic evidence states otherwise, the above pattern is not specified.

To sum up, as has been shown above, a specified pattern is described, independent of the event in question, by the rules of a system. As such, explanations other than chance are to be posited which can create informational patterns that are described by the rules of a system. However, specificity is still not quite good enough by itself to determine previous intelligent cause.

What is needed is a specification, which is a highly improbable specified pattern. But, how do we determine what is highly improbable? We take a look at the available probabilistic resources – that is how many bit operations were needed or used in order to arrive at the specified pattern. Measuring the specified pattern against how long it took to arrive at that pattern and how may different trials where associated with that pattern will tell us if the specified pattern is also highly improbable – beyond all probabilistic resources necessary to generate the specified pattern by chance.

This measure of complexity, which is added onto specificity to create CSI or a specification, is akin to the complexity (improbability) of drawing a royal flush 5 times in a row. Hmmm ... should we begin looking into causes other than chance?

Now, let’s take a look at measuring for a specification. First, it must be understood that this only applies to patterns which can be measured probabilistically. Since a specification includes, but is not limited to function, I will use an example of specification based on compressibility, since compressibility is a way of independently formulating a certain pattern as I have shown above.

Let’s return to our first example – the long pattern of 1s. Dembski has stated that the higher the compressibility of the pattern, the higher the specificity. Why is that the case? That is the case, since the less compressible the pattern is, the more it becomes algorithmically complex and the more random it becomes. These algorithmically complex patterns are the types of patterns that will be generated by random processes.

For example: It is way more likely for a pattern with the same compressibility as “100101111101100001010001011100" to be generated by a random flip of the coin than for a pattern with the same compressibility as “111111111111111111111111111111" to be generated. Why? Because, the only other pattern that can be formulated with the same algorithmic compressibility as the pattern of 30 1s is “000000000000000000000000000000,” which can be compressed to “print 0 X 30.” However, there are many more other patterns with the same compressibility as the first, more random pattern (assortment of 50% 1s and 50% 0s which are sorted in a truly random fashion as per the rules of statistics*). So, it is easier for chance the “find” one of the less compressible patterns, because there are more patterns with the same lower compressibility than there are patterns with the same higher compressibility -- in the above case there are only 2 out of 1,073,741,824 patterns which have the highest compressibility and those two patterns are shown above in the repeating 1s and the repeating 0s.

Now, let’s calculate the pattern: 111111111111111111111111111111 and see if it is a specification.
First we need to calculate it’s specificity. That is done by multiplying its probability (as 1 in 1,073,741,824) with how many other patterns have that same compressibility (in this case only one other pattern as shown above).

So, the specificity of this pattern = 2 * 1/1073741824

Now, in order to move on to finding out if we have a specification here, we must first understand the context in which we actually found this pattern. Let’s say that we are running a search on a string of characters 30 bits long – the size of the pattern above. Let’s say we start at a random point such as: 111100001010011110000110111101. Now, let’s also say that in 30 bit flips (30 operations), we arrive at the above pattern of thirty 1s which we are calculating. Is it reasonable to presume that the pattern was arrived at by pure chance? Let’s make the calculation and find out.

Specification: ?>1 = -log2 [number of bit operations * specificity] = ?
Specification: ?>1 = -log2 [30 * 2/1073741824] = -log2
Specification: ?>1 = approx. 24 bits of CSI = a specification

It is obvious that the discovery of the pattern was not random, but was somehow guided. It can be rejected as being the result of strictly random processes for 2 reasons

1. it is not in the nature of random processes to generate specificity -- in this case regularities (high compressibility) -- and
2. because the number of random bit operations falls extremely short of the probability of arriving at the end pattern taking in account the number of trials (probabilistic resources). To say that this pattern was the result of chance would be to resort to a “chance of the gaps” type of argument. In fact, it has been shown that evolutionary algorithms (which are non-random) are a necessity to arrive at specifications such as the one above. In fact, Dembski and Marks have discussed such algorithms on their evolutionary informatics site.

Now, let’s compare that example to a pattern that has the same probability as a higher percentage of all possible combinations, such as the pattern: 110101000001101001110001111010, which is highly incompressible (thus more algorithmically complex and more random). If you wanted to “compress” this pattern you may end up with: “print 1 X 2, 0101, 0 X 5, 1 X 2, 01, 0 X 2, 1 X 3, 0 X 3, 1 X 4, 010"

Now, let’s get to the calculation.
Specification: ?>1 = -log2 [number of bit flips * number of specified patterns * probability]
Specification: ?>1 = -log2 [30 * X * 1/1,073,741,824]

Now, I haven’t included the number of specified patterns which are as compressible as the above random number, since I am not sure exactly how to calculate that number. However, you could theoretically calculate all other possible compressed patterns which contain the same amount of information and are just as random as the compressed pattern above, and as far as I am aware, that is precisely with what algorithmic information theory deals. Dembski has shown the math involved with algorithmic compressibility in “Specification: The Pattern that Signifies Intelligence” and has shown and concluded: “To sum up, the collection of algorithmically compressible (and therefore nonrandom) sequences has small probability among the totality of sequences, so that observing such a sequence is reason to look for explanations other than chance.”

The corollary to what Dembski has summed up is that algorithmically incompressible sequences make up the rest of the sequence space and thus have a large probability among the totality of sequences. So, we do know that the “X” in the above equation will be a very large number and will produce a less than one amount of CSI and there will be no specification.

For example, even if only 1/32 of all possible patterns are algorithmically random, then the equation would play out as follows:

Specification: ?>1 = -log2 [30 * 33,554,432 * 1/1,073,741,824]
Specification: ?>1 = -log2 [.9375]
Specification: ?>1 = approx. .093 = not > 1 = not a specification

So far, I’ve only shown an example of a specification that was not algorithmically complex. But now, let’s briefly discuss a specification that is algorithmically complex (non-repetitive) and also pseudo-random.

In this case, pseudo-random patterns are those patterns which are algorithmically complex and thus they appear to be random, however, they are specified because of function or they match some independent pattern as set by a system of rules ie: mathematical, linguistic, rules of an information processing system, etc. Basically, as I have stated above, these are the types of patterns which form “islands” of function/pseudo-randomness within a sea of all possible patterns.

When measuring for a functional specification (within a set of functional "islands"), you apply the same equation, however, when measuring the specificity you take into account all other FUNCTIONAL patterns (able to be processed into function *by the system in question*) that have the same probability of appearance as the pattern in question. You do that instead of taking into account all equally probable compressible patterns, since you are now measuring for functional specificity as opposed to compressible specificity. Therefore, you can only measure for functional specificity and then a specification based upon a high understanding of the system and pattern in question.

Furthermore, according to the NFL Theorem, an evolutionary algorithm based on problem specific information is necessary in order to arrive at better than chance performance, which is exactly what a specification is calculating.

The next question: will a random set of laws cause an information processing system and evolutionary algorithm to randomly materialize? According to recent work on Conservation of Information Theorems ID theorists state that the answer is "NO!" In fact, getting consistently better than chance results without previous guiding, problem specific information is to information theory what perpetual motion free energy machines are to physics. To continue to say life was a result of chance would be to appeal to a “chance of the gaps” non-explanation. Physicist Brian Greene states (I found this on God3's Blog [link]):

‘If true, the idea of a multiverse would be a Copernican Revolution realized on a cosmic scale. It would be a rich and astounding upheaval, but one with potentially hazardous consequences. Beyond the inherent difficulty in assessing its validity, when should we allow the multiverse framework to be invoked in lieu of a more traditional scientific explanation? Had this idea surfaced a hundred years ago, might researchers have chalked up various mysteries to how things just happen to be in our corner of the multiverse and not pressed on to discover all the wondrous science of the last century? …The danger, if the multiverse idea takes root, is that researchers may too quickly give up the search for underlying explanations. When faced with seemingly inexplicable observations, researchers may invoke the framework of the multiverse prematurely – proclaiming some phenomenon or other to merely reflect conditions in our own bubble universe and thereby failing to discover the deeper understanding that awaits us. ‘

To invoke multiple universes to explain phenomenon within our universe is merely inflating one’s probabilistic resources beyond reason, thus causing a halt on further investigation since a chance of the gaps “explanation” has already been given.

As a professor Hassofer put it:

“The problem [of falsifiability of a probabilistic statement] has been dealt with in a recent book by G. Matheron, entitled Estimating and Choosing: An Essay on Probability in Practice (Springer-Verlag, 1989). He proposes that a probabilistic model be considered falsifiable if some of its consequences have zero (or in practice very low) probability. If one of these consequences is observed, the model is then rejected.
‘The fatal weakness of the monkey argument, which calculates probabilities of events “somewhere, sometime”, is that all events, no matter how unlikely they are, have probability one as long as they are logically possible, so that the suggested model can never be falsified. Accepting the validity of Huxley’s reasoning puts the whole probability theory outside the realm of verifiable science. In particular, it vitiates the whole of quantum theory and statistical mechanics, including thermodynamics, and therefore destroys the foundations of all modern science. For example, as Bertrand Russell once pointed out, if we put a kettle on a fire and the water in the kettle froze, we should argue, following Huxley, that a very unlikely event of statistical mechanics occurred, as it should “somewhere, sometime”, rather than trying to find out what went wrong with the experiment!’”

So, merely observe an information processing system and evolutionary algorithm self- generate from a truly random set of laws and the foundation of ID Theory is falsified. Or, show how that is even theoretically possible. Science is based on observation and testing hypothesis, and data trumps every time. See the discussion on the Conservation of Information Theorem [link] for why Evolutionary Algorithms will not generate themselves out of a random set of laws.

Sunday, February 10, 2008

Intelligence Law and Chance Working Together

This post is part II of the Philosophical Foundation for ID Theory.

Someone posed the following questions to me:

“"Dembski’s characterization of design as a third mode of explanation apart from chance and/or law is one of the most fundamental problems with his approach. How do you formally (i.e. mathematically) describe chance and law such that their disjunction doesn’t characterize all conceivable events? How do you show that design, or intelligent agency as you say, isn’t an instance of chance and/or law? More importantly, how do you show that it could possibly not be an instance of chance and/or law?"

Those questions bear the same weight as the following question: “Will a random set of laws (that is, only law and chance) generate information processing systems, CSI, evolutionary algorithms, systems based on engineering principles (control nature without being defined by law), and intelligence?” Some theorists state that based on the NFL Theorems and COI Theorems, the answer is a resounding “NO!” Thus, we need another known causal phenomenon which has been observed creating those types of systems and our investigation into ID Theory continues.

You don’t need to formally describe law and chance, unless by formally describe you mean formally analyse how to measure law and chance. Either way, all you need to do is formally describe CSI and show that neither chance (measured and described as randomness) nor law (measured and described as regularities) can characterize this event (which can be made as a potentially falsifiable hypothesis). Once it is shown that intelligence does produce algorithmically complex CSI, and it is shown that intelligence itself is necessarily founded upon this information, then you have a closed loop from intelligence to information to intelligence, with no room for *only* law and chance. Thus, you must tentatively include intelligence as a third mode of explanation until the above hypothesis -- that a random set of laws (merely law and chance) will not generate information processing systems, etc. and intelligence is a necessary cause of those systems -- is falsified. In fact, we see many systems every day which we know are not and most probably could not have been created by mere chance and law not being “filtered” through intelligence as previously defined. Thus, such systems do exist.

Furthermore, you don’t have to completely negate law and chance operating within a system in order to arrive at the conclusion that the system itself is intelligently designed. In fact, that is not how reality works. Even a car, although it is intelligently designed, can show the effects of law and chance. Look at a rusty old beater car. You can’t explain the “rust” feature by design, but that doesn’t negate the fact that the car itself was intelligently designed. Furthermore, you can’t point to the rust as a “faulty feature” and thus arrive at the conclusion that the car could have or must have been created by only chance and law.

Another example is if random processes create a new function or strengthen an older function by damaging or clogging a system, does that prove that the original system was the result of only chance and law? Let’s look at a lock and key mechanism. Just because natural processes can increase the security function of an ancient castle door by “gunking” up the lock, making it unopenable even with the original key, does that mean that unguided law and chance can create the castle, the door, the lock, and the key? Of course not!

I have a further question for you. How can you show that chance and law are possible apart from an intelligently designed system? It seems that you need an information processing system to generate law and you must have law first, to generate chance. Are information processing systems possible without intelligent programming? Just some food for thought. I personally don’t think that you can define any one causal phenomenon at the pure exclusion of the others.

In fact, we see intelligence, law, and chance working together the best within evolutionary algorithms, since intelligence inputs the information necessary to arrive at optimization problems efficiently by programming the laws to make use of controlled chance and do all the dirty work for him/her.

Of course, intelligence needs to be defined, and that may be the hardest part, yet not intractable. I personally define intelligence as “a system which can plan into the future and then sufficiently engineer a solution to accomplish a goal.”

Intelligence can basically be summed up in the ability to produce a goal oriented procedure by utilizing (sufficiently organizing/programming) stochastic (law and chance) processes. We know that we as humans can do this and so intelligence does exist as a causal entity alongside chance and law.

Furthermore, if it is true that the NFLT and Conservation of Information Theorem both show the necessity of prior problem specific information (attained by neither law nor chance) in order to produce CSI, then it can be argued that active information is the formal measurement of that (problem specific information) which is neither law nor chance. In this case, active information is a measurement of one aspect of intelligence in a specific situation (efficient optimization algorithms).

The above stands until anyone else can show that problem specific information (and thus evolutionary algorithms) necessary to generate intelligent systems can be generated by a random assortment of laws – that is laws which are not organized by previous intelligence -- pure law and chance. Until you show that intelligence is not necessarily the result of intelligence, it is quite obvious based on observation, that intelligence is indeed a causal phenomenon alongside chance and law.

Sum up:

1. Intelligence as a foresight using system does exist. The ability to organize law and chance for a future determined goal does exist. Just to be clear, this has nothing to do with “free will.” This only concerns the ability to plan into the future. Whether this ability is a free choice or not is a completely different issue than the fact that this ability exists.

2. There does exist systems which are only generated if “run through” intelligence. Intelligence does generate system where forethought is a necessity.

3. CSI has been formally described such that, in accordance with NFL Theorems, it is for all practical purposes not attainable through only chance and law (random set of laws), just as perpetual motion free energy machines or the past hour of the universe running in reverse are practically not attainable based on our understanding of cause and effect and the flow of energy and information. This has been put foreward as a hypothesis that intelligence is a necessary cause of CSI, since CSI has been observed being created by intelligence. This is obviously potentially falsifiable.

4. Therefore, Intelligence must be tentatively included alongside chance and law until point 3 is falsified.

Science Deals with Evidence

I’ve heard some people make the outrageous claim that there isn’t any evidence of intelligent cause for life. Let’s examine this, and see if these people are merely being selective hyper- sceptics or if they really do have a point that there is no evidence of intelligent cause within life.

First let’s briefly look at intelligence. What is intelligence? There are many ways to discuss intelligence and many different properties that can be attributed to intelligence. However, as far as I have seen, the two most common “tests” of intelligence is the ability to learn and the ability to use functional information and knowledge.

What is learning? Well, one thing is true about learning ... that is, when you learn something, the useful information and knowledge that you possess increases. If an AI system learns, then it increase its useable information as a functional response to its environment. One thing that distinguishes intelligence from other systems is its ability to increase its functional information
and knowledge content as it interacts with its environment.

Intelligence can then use this information to manipulate its environment. That seems to be what an IQ test actually looks for ... the ability to apply information to situations and problems. If intelligence can apply information to environmental situations, it will then be able to utilize stochastic processes (natural law and chance) to produce a goal oriented procedure to solve a particular problem or cause a pre-planned result which is not definable by natural theoretical law and not reasonably attributed to pure chance.

Therefore, ID Theory postulates that certain effects of intelligence can be reliably separated from theoretical law and chance – intelligent cause is detectable. Since intelligence can basically be summed up in the ability to produce a goal oriented procedure by utilizing (sufficiently organizing/programming) stochastic processes, intelligence can be equated with any type of teleological process – teleology being defined as: “the fact or character attributed to nature or natural processes of being directed toward an end or shaped by a purpose.” Neither chance nor a random assortment of law are teleological. However, intelligence is marked by the ability to plan into the future and then sufficiently engineer a solution to accomplish a goal. So the debate concerning Intelligent Design Theory is fundamentally one of accident verses teleology – that is, chance assemblage of law verses intelligent assemblage and fine tuning of law (guidance toward a pre-planned goal). And both of these positions are testable by experimenting with information processing systems.

ID Theory hypothesizes that the systems below are reliable indicators of previous teleological/intelligent cause because they are not theoretically definable by law, neither reasonably attributed nor observed to have arisen by pure chance, yet are (as a positive) observed to have been caused by intelligent systems using foresight to accomplish future goals:

-CSI (Specifications) [link]

-Communication system, Information processor, Coded Information, Coding/Decoding System (Definition of a code: Given a source with probability space [Omega, A, p(A)] and a receiver with probability space [Omega, B, p(B)], then a unique mapping of the letters of alphabet A onto letters of alphabet B is called a code. Here p(A) is the probability vector of the elements of alphabet A and p (B) is the probability vector of the elements of alphabet B. (Perlwitz, Burks and Waterman, 1988). (According to this definition, which is very simple, DNA is a code.)

-Evolutionary algorithms [link], which must be guided by problem-specific active information (knowledge of the problem/targets programmed into the behaviour of the algorithm).

... as well ...

Convergent evolution [link][link]is the observation of many separate biological evolutionary trials converging upon the same forms and functions. Convergent evolution provides evidence that the evolutionary process is focussed upon non-accidental end functions (targets) and is consistent with the hypothesis that there is an ultimate evolutionary end point (the Omega Point) -- the ultimate target of evolution. Convergent evolution shows the constraint of evolution, after multiple separate runs, to the same targets of highly improbable form and function. This provides evidence that biological evolution follows what has already been discovered with the NFL Theorems – that evolutionary algorithms only work if the search procedure is guided to pre- set problems by problem specific information. It’s one thing for evolution to find a highly improbable form or function once, however, when it finds it on multiple separate trials we have evidence that it is indeed a process which is guided toward a palate of pre-determined functions which are then sorted by the environment (natural selection).

As an extra, here is Hameroff’s blog [link] discussing his testable and fasifiable model/hypothesis re: a potential foundation for a quantum based intelligent design – that is: consciousness may actually fundamentally be a property of the operation of space-time’s quantum effects. Here’s a good video lecture [google video] [link] by Hameroff giving the basic explanation of his hypothesis/model.

Philosophical Foundations of ID to Conservation of Information

My understanding of the foundation of ID Theory. This could also be looked at as the philosophy of ID (upon which the SCIENCE OF ID [link] is founded).

Disclaimer: This post is not philosophical in its entirety, as it does borrow from information theorems and scientific experiments with information and programming (evolutionary algorithms). However, this post and Part II [link] do argue for the basic philosophical foundation necessary to be able to point to intelligence, being independent of the physical laws which arise from or govern matter, as a cause when investigating certain phenomenon.

1. Highly improbable (far beyond UPB), algorithmically complex, specified, coded information (ie. sequential arrangement of nucleotides in RNA) is not caused by the physical properties of the materials (the nucleotides) which are merely used to *transfer* information from DNA to Proteins. IOW, it is not the physical attractive properties of the nucleotides in DNA or RNA or the letters on a page which causes their sequential arrangement. Therefore, at the least, this information is not caused by and transcends the physical properties of matter. This information is caused by a non physical-chemical law.

Furthermore, certain systems exist which follow engineering control principles which are not defined by laws of physics or chemistry, yet control or bound laws and chance to produce function:

“A shaping of boundaries may be said to go beyond a mere fixing of boundaries and establishes a ‘controlling principle.’ It achieves control of the boundaries by imprinting a significant pattern on the boundaries of the system. Or, to use information language, we may say that it puts the system under the control of a non-physical-chemical principle by a profoundly informative intervention.”

--Michael Polanyi, “Life Transcending Physics and Chemistry,” Chemical & Engineering News (21 August 1967): 64.

“In the face of the universal tendency for order to be lost, the complex organization of the living organism can be maintained only if work – involving the expenditure of energy – is performed to conserve the order. The organism is constantly adjusting, repairing, replacing, and this requires energy. But the preservation of the complex, improbable organization of the living creature needs more than energy for the work. It calls for information or instructions on how the energy should be expended to maintain the improbable organization. The idea of information necessary for the maintenance and, as we shall see, creation of living systems is of great utility in approaching the biological problems of reproduction.”

George Gaylord Simpson and William S. Beck, Life: An Introduction to Biology, 2nd ed. (London: Routledge and Kegan, 1965), 145

At the top of this post is Hubert Yockey’s diagram published in the “Journal of Theoretical Biology” showing how life actually follows engineering principles and makes use of an actual communication channel and information processor.

From here on, I am using the term “information” to describe low thermodynamic entropy/high improbability systems, which are not defined by laws of physics or chemistry, that cause functional or semantic specificity.

Function can simply be referred to as the transfer of energy to perform work as a by-product of organized units as per the above quote by George Gaylord Simpson. In man-made structures, function defines the purpose for which that structure was created and in biology function defines that which aids in reproductive and survival success as a result of natural selection.

And, no, this is not a circular definition, since organized units are not necessarily information. IE: snowflakes are highly organized units however they are defined by laws of physics and chemistry and do not cause any functional or semantic specificity.[link]

2. If information is precluded by intelligence (as has been observed), and if intelligence is necessarily founded upon CSI and these systems which follow engineering principles (so far this is the case), then …

3. We have a closed loop traveling from information (not caused by properties of material as per #1) to intelligence (which is then subsequently also not caused *only* by material properties) and back to information, with no room for strictly material causation. Material is only used as a conduit to transfer information.

4. Thus, both information and intelligence may be non-material or at the very least must contain or be caused by a non-material property or law (good bye philosophical materialism). After all, what we observe is that information and intelligence flows through material yet is fundamentally not caused by the properties (physical or chemical laws) of that material.

“But,” you may ask, “when the material is removed, doesn’t the information cease to exist? Isn’t information necessarily dependant on the material which ‘contains’ it? What happens to the information held by the 99% of now extinct species? Does the information still exist?”

Well, it does seem upon first glance that information is necessarily linked to material. Ie: destroy the material conduit and information is destroyed. However, this point does not negate the observation that physical properties of material do not cause information (ie: in biology), which is the main point. Since intelligence does cause information and since intelligence is founded upon information, there must exist some aspect of at least one of these two properties which is not material.

And now a new thought jumps across my mind. Does the destruction of the conduit actually destroy the information? If material only acts as a conduit, and intelligence is actually the source/cause of the information, this means that the information existed within some intelligence or non-material information transferring law before it was sent through the conduit. Destruction of the conduit (material) would only seem to destroy that specific *transfer* of the information.

For example, if I write a coded note which reads “meet at bridge at 0900,” have it delivered to an undercover agent, and upon reading it the agent then burns the note. Has the information been destroyed? No, it most definitely has not been destroyed. The information has actually been transferred and stored within another intelligence. However, let’s say that something went wrong and upon arriving at the bridge, the undercover agent is gunned down. Now, is the information destroyed? No, since I (as the intelligent source) still possess all the necessary information to create the exact same message. The information existed within intelligence before it was transmitted and it continues to exist in the source intelligence after transmission and destruction of the pen and paper note – the transferring medium (material).

In the case of biology, if life and evolution are a necessary result of and are guided by the laws of nature, then wouldn’t the information at the foundation of our universe and its natural laws contain the information necessary for the creation of life and evolution?

IOW, if life is a guided search program, the space it is searching as it evolves is a space set up by the laws of nature. Thus any information that life discovers and the guiding, active, problem specific information needed to arrive at those targets (according to the [NFL Theorems [link]) already existed within the program of our universe and the search space upon which our universe’s natural laws operate.

“Unless you can make prior assumptions about the ... [problems] you are working on, then no search strategy, no matter how sophisticated, can be expected to perform better than any other”

--Yu-Chi Ho and D.L. Pepyne, "Simple explanantion of the No Free Lunch Theorem", Proc. 40th IEEE Conf. on Decision and Control, Orlando, Florida, 2001.

This indicate[s] the importance of incorporating problem-specific knowledge into the behavior of the [search] algorithm.”

--David Wolpert and William G. Macready, "No free lunch theorems for optimization", IEEE Trans. Evol. Comp. 1(1) (1997): 67-82.

“The [computing] machine does not create any new information, but it performs a very valuable transformation of known information.”

--Leon Brillouin, Science and Information Theory (Academic Press, New York, 1956).

The understanding that information is never destroyed or created by a computing machine, only transferred is consistent with Conservation of Information theorems, one of which is explained here by Dr. Dembski. The theorem basically shows that, if problem specific information must be added to a search in order to find a target (solve an optimization problem) efficiently – better than chance -- then the information necessary to find that problem specific information is actually just as unlikely or more unlikely and therefore has a higher improbability of being discovered. Thus, the same or larger amount of problem specific information is needed to locate the problem specific information to find the target, ad infinitum.

This is analogous to a library cataloguing system. The catalogue has to be organized in such a non-random way that you can use its organization (problem specific information) to find the information you were looking for at better than random search. Furthermore, that organization (cataloguing system) itself most likely would not have come about through random processes since it would require at least or more than the same amount of problem specific information to find that specific cataloguing organization (information) as that cataloguing system is able to output (measured as an information theoretic probability in bits). COI Theorems show that in order to get better than chance results you must input better than chance results. How do you guarantee better than chance results? According to NFL Theorems, with problem specific information. How do you get problem specific information? Back to COI Theorems: with previous problem specific information ad infinitum, or as observed, from intelligence.

Appeals to non-explanatory “chance of the gaps” in this case are mere hand waving and grasping at straws, and IMO intellectually unfullfiling, especially when the original targets in question are highly improbable, algorithmically complex, specified, not defined by law, yet exert a control aspect on a lawful system and who's operation is highly analogous to observed intelligently designed systems.

Now, we just need to discover how intelligence could have possibly tuned (programmed) and used the universe’s laws of nature to evolve life. In discovering this, we will be discovering how evolution does what it does and the actual amount of fine tuning necessary for different aspects of life. Or is there merely some sort of teleological law inherent in our universe which “just is”? – similar to the intellectually unfullfiling infinite regress of problem specific information. These are some points which can be philosophically debated, while the scientists discover how this intelligence or teleological law designed life.