Tuesday, November 20, 2007

Zachriel's EA and Dembski's quote

Continuation from here

Hello again Zachriel,

Dr.s Dembski and Marks:

“Such assumptions, however, are useless when searching to find a sequence of say, 7 letters from a 26-letter alphabet to form a word that will pass successfully through a spell checker, or choosing a sequence of commands from 26 available commands in an Avida type program to generate a logic operation such as XNOR [16]. With no metric to determine nearness, the search landscape for such searches is binary -- either success or failure. There are no sloped hills to climb. We note that such search problems on a binary landscape are in a different class from those useful for, say, adaptive filters. But, in the context of the NFLT, knowledge of the class in which an optimization problem lies provides active information about the problem.”

Zachriel:

“Dembski has defined the same problem I did above. He claims that without some information built into the algorithm about how words are distributed through sequence space, evolutionary algorithms will be no better than random search. Is this what Dembski is saying? And is he right?”

Dembski is saying precisely what he is stating: “Such assumptions are useless when searching to find a sequence of say, 7 letters from a 26-letter alphabet to form a word that will pass successfully through a spell checker ... with no metric to determine nearness, the search landscape for such searches is binary – either success or failure. There are no slopped hills to climb.”

First, what assumptions is he talking about? In the previous paragraph, he refers to the assumptions:

“Within HÄaggstrÄom's familiarity zone, search structures have “links" in the optimization space and smoothness constraints allowing for use of “hill-climbing" optimization. Making such assumptions about underlying search structures is not only common but also vital to the success of optimizing searchers (e.g., adaptive filters and the training of layered perceptron neural networks | [22]).”

So, basically, Dembski and Marks refer to assumptions of links and hill climbing structures in the search space. The point is that merely assuming that there are these link structures within a non-uniform search space , without actual knowledge of these structures incorporated into the search algorithm, does nothing to affect the probability of a *random* search. What is needed is an algorithm that is matched to and programmed to take advantage of a *known* (so that it can provide accurate guidance) non-uniform search structure which actually contains hill climbing optimizations, etc. If the proper active information in the form of the proper type of adaptive search is not applied, there are no slopped hills to climb by *random* search no matter the underlying search structure.

We know that Dembski and Marks are referring to *random* search because in the first quoted paragraph above, he states that “We note that such search problems on a binary landscape are in a different class from those useful for, say, adaptive filters.”

Your algorithm does make use of at least one adaptive filter such as a “stepping stone search” which is defined as “building on more probable search results to achieve a more difficult search” and one instance of stepping stone search is discussed in “Conservation of Information in Search: Measuring the Cost of Success” and a formula for measuring the active information of the discussed stepping stone search function is provided.

Your algorithm is far from the random search of a 7 letter word that Dembski refers to. First, it builds upon the success of previous results (stepping stone search), sorts fit from unfit (also discussed and measurement of active information given in the aforementioned paper) before lesser probable words (any 7 letter words) are achieved, and may actually include other instances of active information. Thus, since it builds off of previous filtered successes, it fits into the category of adaptive searches which Dembski and Marks note are in a separate class than “success or failure” random searches.

Thus, your EA provides an example of Dembski's point ... that *adaptive filters* (which are in a different category than purely random search) must be *properly* applied to take advantage of *known* (as opposed to merely assumed) search space structure -- the non-random dictionary -- in order to provide better than chance performance and arrive at any targeted 7 letter word that will pass the spell checker test. Try the EA on a dictionary that only contains the 7 letter words.

Dembski also states that “Following HÄaggstrÄom [18], let the search space  be partitioned into a set of acceptable solutions T and a set of unacceptable solutions T. The search problem is to find a point in T as expediently as possible. With no active information, the distribution of the space is assumed uniform. This assumption dates to the 18th century and Bernoulli's principle of insufficient reason [2] which states, \in the absence of any prior knowledge, we must assume that the events [in ] ... have equal probability." This is equivalent to an assumption of maximum (information theoretic) entropy on the search space [21]. Many criticisms of the NFLT are aimed at the uniformity assumption [28]. Knowledge of a deviation from uniformity, however, is a type of active information that is consistent with the premise of the NFLT.

So, you have separated acceptable solutions from unacceptable solutions by creating a dictionary, but as already mentioned, your search algorithm is designed to take advantage of the non-random aspects of your dictionary. IOW, the algorithm is matched to the solutions. Knowledge of the solutions and the type of algorithm which will work is used to create the correct algorithm which builds upon previous results and also sorts these results in order to achieve lesser probable results in the future at better than chance performance. All of this complementary knowledge is extraneous to the evolutionary search itself and is essential to the success of the search and is thus active, problem specific, guiding information.

You do realize that the most that your EA shows is that the targets for life, and search algorithm to necessarily travel those pathways and to reach the targets are set before the search ever begins. Is that the hallmark of a system built upon a random set of variables?

Now, let’s discuss NFL Theorem a bit:

NFL Theorem states that no method of search (algorithm) will, on average, perform better than any other algorithm [to produce consistently better than chance results] without problem specific information guiding it. The example of finding the Ace of Spades in "Active Information in Evolutionary Search" excellently illustrates the point of the NFL Theorem.

IOW, from what I understand, if you have no knowledge of the search space structure, and you separate acceptable from unacceptable targets, any method (search algorithm) of finding an acceptable target will perform as well as any other method and will on average discover the acceptable target at no better than chance probability. That is, unless active information regarding the search space structure and/or information helping to determine the location of any of the acceptable targets is programmed into the search algorithm in the form of "warmer/colder hints," “stepping stone searches,” “partitioned searches,” “fitness selectors” etc. (Many types of adaptive algorithms are discussed in "Conservation of Information in Search: Measuring the Cost of Success.")

So, what separates an EA from random search? According to NFL Theorem, if the EA is to find a target better on average than random search there must be some problem specific information (knowledge about the targeted problem) guiding it. “Concerning the NFLT, Ho and Pepyne write “unless you can make prior assumptions about the ... [problems] you are working on, then no search strategy, no matter how sophisticated, can be expected to perform better than any other" [15]. According to Wolpert and Macready search can be improved only by “incorporating problem-specific knowledge into the behavior of the [optimization or search] algorithm" [30].

Anticipating the NFLT, Schaefer [24] notes "a learner [without active information] ... that achieves at least mildly better-than-chance performance ... is like a perpetual motion machine." The "prior assumptions" and "problem specific knowledge" required for "better-than-chance performance" in evolutionary search derives from active information that, when properly fitted to the search algorithm, favorably guides the search.” Have you discovered, within your program, the information theoretic equivalent to perpetual motion machines? (Of course, understanding this point, you will see why I am so sceptical re: processes allegedly creating information at averages exceeding probability without guiding, problem specific, active information.)

Now, back to the “knowledge of a deviation from uniformity of the search space” that was briefly mentioned earlier.

The conclusion of “Active Information in Evolutionary Search:”

“Active information, when properly prescribed, successfully guides an evolutionary search to a solution by incorporating knowledge about the target and the underlying structure of the search space. That structure determines how the search deviates from the uniformity assumption of the NFLT. HÄaggstrÄom's "geographical structure[s]," "link structure[s]," search space "clustering," and smooth surfaces conducive to "hill climbing" are examples that reinforce rather that refute the conclusion that intelligent design proponents draw from the NFLT, namely, that the success of evolutionary search depends on the front-loading of active information. Thus, in biology as in computing, there is no free lunch [8].”

Since the NFL Theorem assumes a uniform search space (maximum information entropy), we can conclude that any random search spaces being acted upon by any chosen method of search (search algorithm) will not, on average, arrive at targets at better than random chance, nor will one random set of search space and search algorithm consistently perform better than chance over time. This is consistent with what I understand to be part of the basis for Conservation of Information:

A "learner... that achieves at least mildly than better-than-chance performance, on average, ... is like a perpetual motion machine - conservation of generalization performance precludes it.”
--Cullen Schaffer on the Law of Conservation of Generalization Performance. Cullen Schaffer, "A conservation law for generalization performance," in Proc. Eleventh International Conference on Machine Learning, H. Willian and W. Cohen. San Francisco: Morgan Kaufmann, 1994, pp.295-265.

That can be tested by generating a random set of variables (laws), which causes a random set of functional targets within a search space, and applying any method of search to discover those targets.

According to my understand of the NFL Theorem and Conservation of Generalization Performance, there are two types of results that will not occur:

1. The search algorithm performing consistently better than chance over a lengthy run time.

2. After many shorter runs, of different random searches on random targets, the averages of finding the targets being better than chance.

“Although commonly used evolutionary algorithms such as particle swarm optimization [9] and genetic algorithms [11] perform well on a wide spectrum of problems, there is no discrepancy between the successful experience of practitioners with such versatile algorithms and the NFLT imposed inability of the search algorithms themselves to create information [4, 10]. The additional information often lies in the experience of the programmer who prescribes how the external information is to be folded into the search algorithm. The NFLT takes issue with claims that one search procedure invariably performs better than another or that remarkable results are due to the search procedure alone [1, 3, 13, 14, 17, 19, 23, 25, 26].”

"The NFLT puts to rest the inflated claims for the information-generating power of evolutionary simulations such as Avida [16] and ev [25]. The NFLT gauges the amount of active information that needs to be introduced to render an evolutionary search successful [18]. Like an athlete on steroids, many such programs are doctored, intentionally or not, to succeed [17]."

"Christensen and Oppacher note the NFLT is \very useful, especially in light of some of the sometimes outrageous claims that had been made of specific optimization algorithms" [4].

"Search algorithms do not contribute information to the search, and the NFLT exposes the inappropriateness of such claims."

"The NFLT shows that claims about one algorithm outperforming another can only be made in regard to benchmarks set by particular targets and particular search structures. Performance attributes and empirical performance comparisons cannot be extrapolated beyond such particulars. There is no all-purpose \magic bullet" search algorithm for all problems [5, 27]."

IOW, from what I understand, a non-random, non-uniform search space structure must be coupled to the correct search algorithm, utilizing the proper filter in accordance with prior knowledge of the target, in order to produce better than chance performance when searching for a target.

Now, let’s apply this to real life. Within life, a target is that informational sequence which functions within the living organism. Similarly, if you were to put the dictionary of targets within your EA together using random variables, in order to produce anything worth comparing to the evolution of life, the targets would need to conform to a system of rules and interact with each other to create some type of function. In other words, the targets would need to be algorithmically complex and specified, yet generated by a random set of laws and attained through a random method of search at better than chance. That is what you’d have to demonstrate to even begin to show that life has evolved from non-planned, non- intelligently designed laws and information.

As Dembski and Marks write in “Active Information in Evolutionary Search,” In the field of evolutionary computing, to which the weasel example belongs, targets are given extrinsically by programmers who attempt to solve problems of their choice and preference. But in biology, not only has life come about without our choice or preference, but there are only so many ways that matter can be configured to be alive and, once alive, only so many ways it can be configured to serve biologically significant functions. Most of the ways open to biological evolution, however, are dead ends. It follows that survival and reproduction sets the intrinsic targets for biological evolution.

Evolutionary convergence, the independent evolution of similar features (such as the camera eye of humans and squids), provides a striking example of intrinsic biological targets. Cambridge paleobiologist Simon Conway Morris [20] finds that evolution converges to only a few endpoints. He therefore theorizes that if the evolutionary process were restarted from the beginning, the life forms of today, including humans, would re-evolve. From the perspective of the NFLT, these limited number of endpoints on which evolution converges constitute intrinsic targets, crafted in part by the environment and by laws of physics and chemistry.”

This provides evidence that our universe will necessarily arrive at pre-set living targets which are guided by problem specific, active information at the foundation of its natural laws. This is consistent with any natural teleological hypothesis referencing the origin of our universe and the subsequent evolution of life. Evolutionary algorithms, NFL Theorem, and the Law of Generalization Performance also provide evidence against any assertions that just any random set of information and laws will cause evolution (extremely basic: the, on average, better than chance generation of algorithmically complex and specified information) to occur.

So, your example operates off of some active information and it seems to show the lower capability of evolutionary algorithms. If your EA was to even simplistically model the evolution of life, you need to show that the pathway from useable protein target to the next less probable useable proteins has a high enough probability to be achieved in the amount of time available and this would have to be consistent with known mutation rates (‘x’ mutations per generation, ‘x’ generations per year, 4 x 10^9 years available and largest protein = 26,926 amino acids). Of course, the probabilities between targets would also have to apply to the generation of instructions for assembling machines and systems from those proteins.

Furthermore, your program would also need to model the further evolution of information processing systems, the evolution of the different functions of RNA, the evolution of logic gates, the evolution of complex machinery, the evolution of IC systems (attained through indirect pathways), redundant systems, repair systems, and convergent evolution (the independent evolution of similar features), to name a few systems and results which have evolved within life, not to mention intelligent beings and conscious systems. I wonder how much and the type(s) of fine tuned, problem specific, active information would be necessary in order to evolve all of those results. And "answers" about how our EA’s won’t create such systems just because there isn’t enough time don't impress me. I’m perfectly fed up with anti-scientific, progression crushing, “chance of the gaps” non-explanations. I want knowledge of how something is accomplished. I don’t want “pat” answers about how things “just happen” to self- assemble given enough random variation and time. I want to know *how* in a manner consistent with our knowledge of the flow of information. This truly sounds like some excellent ID research.

The conclusion. Evolution does not create any new information, it only converts it from one program to another -- from the problem specific, active informational structure of our natural laws at the foundation of our universe to the information processing system of life. Enter the Law of Conservation of Information. As well, since evolution generates information at greater than random chance, it must make use of the problem specific information to find the informational targets. Furthermore, evolution by natural selection provides evidence of teleological foresight and programming of the active, problem specific information necessary for the consistently better than chance performance of evolution.

"Search algorithms, including evolutionary searches, do not generate free information. Instead, they consume information, incurring it as a cost. Over 50 years ago, Leon Brillouin, a pioneer in information theory, made this very point: .The [computing] machine does not create any new information, but it performs a very valuable transformation of known information. [3] When Brillouin’s insight is applied to search algorithms that do not employ specific information about the problem being addressed, one finds that no search performs consistently better than any other. Accordingly, there is no magic-bullet search algorithm that successfully resolves all problems [7], [32]."

Next question: How does one information processing system create another information processing system within it (ie: universe creates life)? I predict: not by unguided, random accident. Sounds like some more ID research.

36 comments:

Zachriel said...

CJYman: "Your algorithm is far from the random search of a 7 letter word that Dembski refers to."

Of course it's not a random search. It's a standard evolutionary algorithm.

CJYman: "First, it builds upon the success of previous results (stepping stone search)..."

That's what an evolutionary algorithm does.

CJYman: "...sorts fit from unfit..."

That's what an evolutionary algorithm does. The selection criteria is a spellchecker as posited by Dembski.

CJYman: "Knowledge of the solutions and the type of algorithm which will work is used to create the correct algorithm which builds upon previous results and also sorts these results in order to achieve lesser probable results in the future at better than chance performance."

Your own statements indicate that you haven't a clue about what constitutes an evolutionary algorithm. I would be happy to discuss this in more detail, but you don't seem to be open to learning.

{snip unfocused balance of essay}

Zachriel said...

You never answered this question:

Start with a single-letter word, then evolve a population of words by random mutation and recombination. If the resulting sequence is not a valid word, then it is instantly deselected from the population. This is what it might look like (non-random and illustrative only):

a
an
can
cam
came
...

How many mutations would it take to evolve a ten-letter word by this process?

CJYman said...

Zachriel:
"Of course it's not a random search. It's a standard evolutionary algorithm."


Exactly the point! Dembski stated that random search is in a different class than adaptive algorithms. You obviously didn't even read the full paragraph before you implied that Dembski didn't know what he was talking about.

Furthermore, I already discussed this. Do you make it a habit to only read half of what you're responding to before you "debunk" your opponent.

Zachriel:
"That's what an evolutionary algorithm does. The selection criteria is a spellchecker as posited by Dembski."


The criteria was a spellchecker that defines a 7 letter word as a target. The criteria did not allow for a spell checker to sort results before the 7 letter word was reached, since that would be an adaptive filter which utilizes the spell checker before the targeted 7 letter word is reached, as opposed to a random search which merely uses the spell checker as a method of showing which 7 letter word is a target. IMO, Dembski was being crystal clear on this point and I explained it all, but you still missed it.

Zachriel:
"Your own statements indicate that you haven't a clue about what constitutes an evolutionary algorithm. I would be happy to discuss this in more detail, but you don't seem to be open to learning."


Give me some statement of mine which shows that I don't know what constitutes an evolutionary algorithm. Let's discuss it.

... as shown by yourself, YOU obviously don't care to learn about the necessity of active information in evolutionary search. If you disagree with the linked papers and/or my understanding of them, please proceed to take them apart.

P.S.
I should have just {snipped} your whole "response" since you obviously didn't even read the whole paragraph (probably much less the whole paper) which contains the quote that you supposedly refuted. This caused you to miss the whole point and merely create an evolutionary algorithm which re-instates what Dembski wrote.

CJYman said...

Zachriel:
"How many mutations would it take to evolve a ten-letter word by this process?"


Not sure ... why? What does this have to do with anything we are discussing.

I am sure, though, that the answer to your question would depend on the size of the intelligently designed, front-loaded dictionary that you are using.

I have a question for you:
"What would happen if the dictionary did not contain any two or three letter words or if it were put together by completely random processes?"

Zachriel said...

CJYman: "Exactly the point! Dembski stated that random search is in a different class than adaptive algorithms."

The issue is evolutionary algorithms. From the section at issue specific reference is made in the first sentence of the first paragraph, followed by Haggstrom's claim, then the first sentence of the second paragraph, and in the third paragraph with the reference to Avida type programs.

CJYman: "The criteria did not allow for a spell checker to sort results before the 7 letter word was reached, ..."

All a spell checker does is tell us if a given sequence is a valid word. That's it.

CJYman: "I am sure, though, that the answer to your question would depend on the size of the intelligently designed, front-loaded dictionary that you are using."

The dictionary is the landscape to be searched. It's the given as specified by Dembski.

CJYman: "What would happen if the dictionary did not contain any two or three letter words or if it were put together by completely random processes?"

You're starting to think! Yet Dembski did not propose a 'dictionary' of random sequences, but a standard dictionary of English words as determined by a spell checker. Now continue the thought.

Why is Dembski wrong about his claim? You have the answer.

CJYman said...

Zachriel:
"The issue is evolutionary algorithms. From the section at issue specific reference is made in the first sentence of the first paragraph, followed by Haggstrom's claim, then the first sentence of the second paragraph, and in the third paragraph with the reference to Avida type programs."


Wow, I can hardly believe that you're still not getting it! It's laid out plain and simple.

This specific issue is NOT evolutionary algorithms. This specific issue is the "uniformity assumption" and how it effects *random* search. Read it again. Right after the "Avida" comment, he states: "With no metric to determine nearness, the search landscape for such searches is binary|either success or failure" and again, "We note that such search problems on a binary landscape are in a different class from those useful for, say, adaptive filters. But, in the context of the NFLT, knowledge of the class in which an optimization problem lies provides active information about the problem."

Dembski does NOT state that the Avida program doesn't have a metric to determine nearness. He states that WITHOUT a metric to determine nearness, the search is true/false and thus not adaptive and therefore random. He is stating that the Avida program works because it is NOT a random search -- it is in a different class (as an adaptive search). Read the rest of the paper and his other paper co-authored with Marks. They NEVER state that Avida type programs do not work. They continually state that it DOES work because it is NOT *random,* but is an adaptive search, where solution is matched to the problem in some aspect before the search.

Seriously ... in order for me to take you seriously, you should attempt to comprehend what Dembski and Marks are stating. I understand that there may be misunderstandings, but these can usually be clarified by reading through the whole paper and possibly another related paper by the author and discovering what the author is attempting to explain.

Zachriel:
"All a spell checker does is tell us if a given sequence is a valid word. That's it."


Yes, and Dembski stated that we can use the dictionary to determine if the targeted 7 letter word is indeed valid. That is ALL that he said. Nothing more ... nothing less -- other than to find that word using random search on a non-uniform search space. But, you have added to Dembski's scenario, in order to obfuscate and misrepresent what he stated so that you can "prove him wrong." You have gone further and incorporated the DICTIONARY AND FILTER to determine valid words BEFORE the 7 letter word is obtained and thus constrict the search space. Thus, your search for a 7 letter word that will pass the spell checker test is NOT RANDOM before it gets to the 7 letter word that passes the spell checker test. Dembski made no allowance for the spell checker test to be used before the 7 letter word and in order to arrive at the 7 letter word. How can you seriously NOT get this?

Zachriel:
"The dictionary is the landscape to be searched. It's the given as specified by Dembski."


Incorrect. Now you are really beginning to show your ignorance of the subject at hand. The landscape contains ALL POSSIBLE COMBINATIONS OF OUR 26 LETTER ALPHABET. Placing the sorted dictionary anywhere within this landscape, thus providing a non-uniform search space structure, will still not help when applying *random* search algorithm to the search space in attempt to discover any seven letter word that will pass through a spell checker. That is all that Dembski said. The whole point is that the correct adaptive algorithm needs to be matched to the non-uniform search space structure with knowledge of our intended goal.

Zachriel:
"You're starting to think! Yet Dembski did not propose a 'dictionary' of random sequences, but a standard dictionary of English words as determined by a spell checker. Now continue the thought."


NOWHERE did he state that the landscape is the English dictionary. The only thing he stated about the landscape is that we are trying to find a 7 letter word from a 26 letter alphabet. Thus, all possible 26 letter combinations are considered.

The English dictionary only provides the 7 letter target. Read it again.

Now its your turn to start thinking:

"What would happen if the dictionary did not contain any two or three letter words or if it were put together by completely random processes?"

So, what causes evolutionary algorithm?

Zachriel:
"Why is Dembski wrong about his claim? You have the answer."


Which claim? The claim that a non-uniform search space provides no help in discovering a 7 letter target (as defined by the spell checker test) when random search is applied to the search space? Well, that is perfectly consistent with the NFL Theorem which shows us that the correct adaptive algorithm must be matched to the non-uniform search space, thus somehow matching solution to problem before the search ever begins.

So, I really have no idea what you're talking about. But, alas, I'm sure you are confident in your obviously incorrect assertions and obfuscations.

Zachriel said...

Zachriel: "The dictionary is the landscape to be searched. It's the given as specified by Dembski."

CJYman: "Incorrect. Now you are really beginning to show your ignorance of the subject at hand. The landscape contains ALL POSSIBLE COMBINATIONS OF OUR 26 LETTER ALPHABET."

The fitness landscape is the fitness function of each point in space. In this case, the fitness landscape is *defined* by the dictionary. All other points in space have zero fitness and are never explored.

Dembski just doesn't understand the implication of the No Free Lunch Theorem. Even Wolpert, the co-discoverer of the theorem, says that Dembski is just handwaving. Generalized evolutionary algorithms (populations undergoing random mutation and selection) are more than capable of exploring complex landscapes, including words or biological structures. The only thing the No Free Lunch Theorem tells us is that these generalized algorithms may be less than optimal—not that they don't work. They do. Dembski's handwaving is irrelevant.

Zachriel said...

Zachriel: "Why is Dembski wrong about his claim? You have the answer."

CJYman: "So, I really have no idea what you're talking about."

You still have the answer. It is implicit in your questions about modifications to the fitness function. What if the dictionary was composed of random sequences rather than words?

By the way, I have read Dembski's paper several times. It's all about evolutionary algorithms, from start to finish. He is claiming that evolutionary algorithms have to be manipulated to work, that information has been added to any successful algorithm customized to the problem at hand.

This is false. The evolutionary algorithm in Word Mutagenation is highly generalized and was devised *before* its application to word evolution. It was *applied* to the problem. I didn't know if it would work or not, but I strongly suspected it would.

Word Mutagenation is open source. It's just a limited population of strings subject to random mutation and ruthless selection by spell-checker. And the results are irrespective of implementation.

You don't have to believe it. You can try your own implementation. Others have.

Data trumps. Every time.

Zachriel said...

Trying a different tact.

The basic result of Wolpert and Macready's No Free Lunch Theorems is that the average performance of algorithms across all possible problems is identical. But "all possible problems" includes a vast majority of incompressible or nearly incompressible landscapes with extreme irregularity and unpredictability.

This does not mean that a particular algorithm will not work well for a particular problem. And this can't be answered by making vague arguments about "active information". If we take a generalized evolutionary algorithm, will it successfully navigate the wordscape? The answer is yes. And it requires no such thing as "active information" or customization for purpose.

That Dembski thinks he can disprove whether such an algorithm would be effective for a specific problem by referencing Wolpert and Macready is absurd.

"All one can do is squint, furrow one's brows, and then shrug."

Zachriel said...

Some more definitions:

* Evolutionary Algorithm: A limited population of replicators subject to random mutation and selection via a fitness function.

* Word Mutagenation: An Evolutionary Algorithm with replicators composed of letters and a fitness function being a standard word dictionary.

* Effective search strategy: A search algorithm that is better than a random search.


Dembski is claiming that finding words in letter space—because being a word is a binary property, either a letter sequence is a word or it is not—implies that an Evolutionary Algorithm will not be an Effective search strategy. This is false.

We don't have to add "Active Information" to our simple Evolutionary Algorithm. We just test it and find out if it works Effectively. We don't have to make any "non-uniformity assumption". We just test it. It's an empirical question.

The reason why this is relevant to the overall discussion of evolution is because organisms are genetic replicators that are, in some ways, searching a non-random fitness landscape. Whether the biological fitness landscape can be Effectively searched in this manner is also an empirical question. But we *know* that this simple type of Evolutionary Algorithm can Effectively search many complex spaces. Any "proof" claiming otherwise is false prima facie.

One other point. The No Free Lunch Theorems do not properly apply to evolutionary algorithms, as is discussed in Wolpert and Macready's original paper. Nor does being an Effective search strategy mean it is the best possible search strategy.

-----------
Typical Word Mutagenation results for 10-letter word

Random Trial
14,000,000,000 mutants

Word Mutagenation with no selection or population limits
250,000 mutants
3000 words in population
mincemeats, staggering, coastlines

Word Mutagenation with population = 100 selecting longest words
100,000 mutants
clamberers, inversions, plainsongs

Longest word in recorded Mutagenation history? Denominationalists, length 18, scrabble score 21.

Zachriel said...

In the previous comment where I reported "With no selection or population limits", it should read instead,

"With binary selection only (no selection for length) and with no population limit."

Zachriel said...

I decided it might be helpful to again review your initial post.

CJYman: "So, basically, Dembski and Marks refer to assumptions of links and hill climbing structures in the search space. The point is that merely assuming that there are these link structures within a non-uniform search space , without actual knowledge of these structures incorporated into the search algorithm, does nothing to affect the probability of a *random* search. "

This makes no sense. You might want to indicate how you're using the term "random search". It would normally be defined in this context as a random sequence, though perhaps you mean it as a random walk.

CJYman: "If the proper active information in the form of the proper type of adaptive search is not applied, there are no slopped hills to climb by *random* search no matter the underlying search structure."

Again, your use of the term "random search" doesn't make sense in context. A "random search" doesn't climb.

CJYman: "IOW, from what I understand, if you have no knowledge of the search space structure, and you separate acceptable from unacceptable targets, any method (search algorithm) of finding an acceptable target will perform as well as any other method and will on average discover the acceptable target at no better than chance probability."

This is absolutely wrong. Some methods will certainly work better than others for any specific landscape. However, a given method will work no better than any other method when averaged over all possible landscapes.

CJYman: "That is, unless active information regarding the search space structure and/or information helping to determine the location of any of the acceptable targets is programmed into the search algorithm in the form of "warmer/colder hints," “stepping stone searches,” “partitioned searches,” “fitness selectors” etc. "

But we know this isn't true, because we know that Word Mutagenation uses a very generalized evolutionary algorithm to successfully find rare sequences of letters. You can duplicate the results yourself, results which are independent of implementation. A replicating population, random mutation/recombination, binary selection. That's it.

(Word Mutagenation defaults to selection by word length, but just set Size=0 and it will perform a binary search with results similar to those posted earlier.)

CJYman: "Since the NFL Theorem assumes a uniform search space..."

No. No Free Lunch is averaged over all possible landscapes (most of which are highly chaotic and bear little relationship to the highly ordered structures found in our wordscape or the natural environment). In any case, No Free Lunch does not assume a uniform search space.

CJYman: "... we can conclude that any random search spaces being acted upon by any chosen method of search (search algorithm) will not, on average, arrive at targets at better than random chance, nor will one random set of search space and search algorithm consistently perform better than chance over time."

Surprisingly, you have that right. A random search space and a given method; or a given search space and a random method; or a random search space and a random method; all have equivalent associated costs on average.

CJYman: "The NFLT takes issue with claims that one search procedure invariably performs better than another or that remarkable results are due to the search procedure alone."

No it doesn't. A specific search space and a specific method are not covered by No Free Lunch.

A more interesting question is what sorts of landscapes can an evolutionary algorithm effectively search. Indeed, most landscapes cannot be effectively searched by our standard evolutionary algorithm. But we're not talking about most landscapes. We're talking about non-uniform search spaces, such as our wordscape.

Dembski is just wrong that we have to someone manipulate the algorithm to account for the structure of the landscape before a search algorithm will be effective. And if he is wrong about wordscape, then his "proof" is false prima facie.

Rich Hughes said...

Good thread, gents!

CJYman said...

Actually, not so good, Rich.

I'm about to show Zachriel that he has no idea what he's talking about.

In fact, he seems to be purposefully spreading misinformation. But, then again, he may only be mistaken because of certain misunderstandings.

CJYman said...

Zachriel:
"I decided it might be helpful to again review your initial post.

Again, your use of the term "random search" doesn't make sense in context. A "random search" doesn't climb."


That’s the whole point, and that’s the point being made by the authors of the NFL Theorem, as I will show below.

CJYman:
"IOW, from what I understand, if you have no knowledge of the search space structure, and you separate acceptable from unacceptable targets, any method (search algorithm) of finding an acceptable target will perform as well as any other method and will on average discover the acceptable target at no better than chance probability."

Zachriel:
"This is absolutely wrong. Some methods will certainly work better than others for any specific landscape. However, a given method will work no better than any other method when averaged over all possible landscapes."


I have clarified, below, how knowledge of an exploitable search space structure must be known and incorporated into the choice of algorithm for that algorithm to actually use the “hill climbing,” “link,” etc. structure. It’s like trying to find a certain card within a shuffled deck. No matter the arrangement, you are not going to find that specific card at “better than random performance” on average unless you are actually aware of a specific orderly arrangement that the deck is in that will aid in locating your card. That is an example of using prior knowledge of the search space structure (orderly, non-random arrangement of cards) to create an algorithm for better than chance performance on average. Thus, having knowledge of the search space structure which must then be incorporated into the search algorithm is key.

CJYman:
"Since the NFL Theorem assumes a uniform search space..."

Zachriel:
"No. No Free Lunch is averaged over all possible landscapes (most of which are highly chaotic and bear little relationship to the highly ordered structures found in our wordscape or the natural environment). In any case, No Free Lunch does not assume a uniform search space."


Actually, when you average all possible landscapes, you have the equivalent of a uniform search space. That is why the NFLT does assume a uniform search space. Then it goes on to explain why certain search algorithms are efficient in practise.

CJYman:
"The NFLT takes issue with claims that one search procedure invariably performs better than another or that remarkable results are due to the search procedure alone."

Zachriel:
"No it doesn't. A specific search space and a specific method are not covered by No Free Lunch."


What does that have to do with what I just said?

If you actually read the paper, you will see that my quote is correct. Please join me here
to discuss the NFL Theorems.

CJYman said...

Zachriel:
"Dembski is claiming that finding words in letter space—because being a word is a binary property, either a letter sequence is a word or it is not—implies that an Evolutionary Algorithm will not be an Effective search strategy. This is false."


No where does he say that an EA won’t be an effective search strategy. Please show me where he says anything of the sort. In fact, he says the exact opposite. Did you miss the parts where he discussed why EAs work?

All he does is reiterate that merely assuming that there exist exploitable structures in the search space does nothing to affect the probability of search. According to NFLT, what is needed is “problem specific information” – that is, a *knowledge* of these *actually existing* structures and an *accurate incorporation* of these structures into the matching of algorithm with optimization problem. This comes directly from Wolpert and Mcready’s paper on the NFLT.

Zachriel:
"We don't have to add "Active Information" to our simple Evolutionary Algorithm. We just test it and find out if it works Effectively. We don't have to make any "non-uniformity assumption". We just test it. It's an empirical question."


Of course you’re gonna test your EA, just as you’d test a car, to make sure it is designed correctly – ie: to make sure you’ve incorporated active information correctly to guide the search.

Active (or “problem specific” as the authors of the NFLT call it) Information is not “added” to EAs in the sense that an EA can exist without active information. It is added to an algorithm to cause it to perform better than random search, thus creating an EA. And, actually, problem specific information *can* be added to an EA. The more that you align your algorithm with your problem by adding even more problem specific information, making it more efficient, the more active information you are adding. An EA works precisely because it contains active information. The greater the efficiency, the greater the active information needed.

When it doesn’t work effectively because the algorithm is not properly matched to the problem, then what? You fine tune your choice of algorithm, using knowledge of the optimization problem. If the algorithm *was* effective in the first place, it was probably because you were smart enough to align algorithm to problem in the first place rather than randomly matching algorithms to optimization problems. That concept is actually right out of Wolpert and Mcready’s paper.

And yes, we don’t have to make any “non-uniformity assumption.” In fact, both Dr. Dembski and the authors of the NFLT state that merely making that assumption can not guarantee any efficient results. IOW, with just an assumption, you’ve got nothing better than random chance working on your side. Actual knowledge in the form of information of the exploitable, non-uniform, structure must be incorporated correctly into the choice of algorithm.

Zachriel:
"One other point. The No Free Lunch Theorems do not properly apply to evolutionary algorithms, as is discussed in Wolpert and Macready's original paper. Nor does being an Effective search strategy mean it is the best possible search strategy."


Actually, you’re dead wrong on this one. Wolpert and Macready explicitly state that their NFLT applies to evolutionary algorithms. Please join me here to discuss this.

CJYman said...

Zachriel:
"That Dembski thinks he can disprove whether such an algorithm would be effective for a specific problem by referencing Wolpert and Macready is absurd."


Actually you can analyze the search structure of a program, define it probabilistically, as Dembski and Marks have done with EAs such as Schneider's EV program and show whether such a search will be efficient for a specific problem. The NFLT states the necessity of problem specific information for efficient search and Dembski and Marks merely provide an information theoretic measurement of that problem specific information. Then you test it and find that the probabilistic measurement (defined information theoretically as active information) after analyzing the search program is the same as the average of all actual runs of that program. It's all based on testing and data no matter how "absurd" you think it is. Oh, and BTW, you seem to really not understand the NFLT in the first place, since many of your comments on it are completely opposite to what the authors actually state in their paper. You may want to join me on all 3 Parts of my NFLT discussion.

CJYman said...

Zachriel:
"Why is Dembski wrong about his claim? You have the answer."


CJYman: "So, I really have no idea what you're talking about."

Zachriel:
"You still have the answer. It is implicit in your questions about modifications to the fitness function. What if the dictionary was composed of random sequences rather than words?"


That’s what I asked *you* earlier. What would happen to the results if the dictionary of targets was completely random and unknown to you and the method of search algorithm was also completely randomly chosen? Ie: let’s say I had a deck of cards that was well shuffled. Now, I place them side by side face down on a table. Can you create a method of search (search algorithm), before you turn any cards over, that will help you find any card at consistently better than chance performance over many trials with the same shuffled deck? P.S. the correct answer according to both common sense, testing and observation, and the NFLT, is “no.”

Zachriel:
"By the way, I have read Dembski's paper several times. It's all about evolutionary algorithms, from start to finish. He is claiming that evolutionary algorithms have to be manipulated to work, that information has been added to any successful algorithm customized to the problem at hand.


Yes, the paper is about evolutionary algorithms, and the para. in question is about the affect of non-uniformity assumptions.

Yes, the NFL Theorems do hold that in order for the algorithm to be efficient (better than chance performance), and thus qualify as an EA, problem specific information has to be added to the search algorithm by matching the algorithm to the optimization problem at hand and that if there are any exploitable structures (ie: hill climbing structures) within the search space, this knowledge must be known and incorporated into the algorithm for it to take advantage of these “links” or “hill climbing structures.” Go here for my new post discussing NFL Theorem.

Zachriel:
"This is false. The evolutionary algorithm in Word Mutagenation is highly generalized and was devised *before* its application to word evolution. It was *applied* to the problem. I didn't know if it would work or not, but I strongly suspected it would.


It may be highly generalized to a specific class of problems, but to the extent that it performs better than chance on one class of problems it will perform worse than chance on the remaining class of problems. I guarantee that you didn’t just blindly create an algorithm and then just randomly apply it to just any optimization problem. You strongly suspected it would work because you understood the problem you wished to solve enough to know which properly designed algorithm to design in order to solve this particular optimization problem. Even though you may have done it on somewhat of an ad hoc basis (because of your experience and familiarity with programming -- which are the “comfort zones” discussed by Dembski and Marks), it worked because of your understanding of the class of problems you were attempting to solve when you chose your algorithm, and you knew some of the principles that would be needed within the algorithm to solve optimization problems of that type. That is why you strongly suspected it would work. In fact, you imply on your website that you built this algorithm for a specific reason (to prove that there is such thing as better than chance performance) and it just so happens that you designed it correctly to accomplish that goal. You tested it and it worked. This means you knew what you were doing.
Congratulations! Now, will random processes do the same?

In fact, the NFL Theorems discuss this very same issue, and the authors state the exact opposite of what you are trying to sell ... er .... tell me here. They say that your optimization algorithm performs efficiently because of problem specific information. What exactly is and what provides problem specific information? I’m discussing the NFL Theorems at the new blog post linked to above and the Conservation of Information here and the available options re: problem specific information here.

Zachriel:
"Word Mutagenation is open source. It's just a limited population of strings subject to random mutation and ruthless selection by spell-checker. And the results are irrespective of implementation.

You don't have to believe it. You can try your own implementation. Others have.

Data trumps. Every time."


Sure, and the implementation must be within the bounds of a specific class of optimization problems. However, I’m honestly not sure what you mean by “irrespective of implementation” in this case.

If you’re trying to tell me that you have created a universal, general purpose evolutionary algorithm, then you better hurry to get your results published, cause according to the NFL Theorems there is no such animal. The authors of the NFL Theorem explicitly state that “In particular, if for some favorite algorithms a certain well behaved f results in better performance than does the random f then that well behaved f gives worse than random behavior on the set all remaining algorithms. In this sense just as there are no universally efficacious search algorithms, there are no universally benign f [optimizations problems] which can be assured of resulting in better than random performance regardless of ones algorithm.”
[brackets added]

Oh, and back to the topic, Dembski’s quote contains the same concepts that the authors of the NFL Theorem already stated:
“... while most classes of problems will certainly have some structure which, if known, might be exploitable ...; that structure must be known and reflected directly in the choice of algorithm to serve as such a justification [for choice of a particular algorithm]... The simple fact that the P(f) at hand is non-uniform cannot serve to determine one’s choice of optimization algorithm ... Rather effective optimization relies on a fortuitous matching between “f” [optimization problem] and “a” [evolutionary algorithm].” [brackets added]

Thus, Dembski was merely re-stating parts of the NFL Theorem in the context of the question: “what effect do exploitable “hill climbing” and “link” structures within the search space have on the overall performance of the optimization algorithm?” He merely stated that “such assumptions [of exploitable structures] are useless ... [without] a metric to determine nearness [to target] (matching of problem to algorithm).” In that case, the search is “success or failure” ... it is not “adaptive” – the are no “hills to climb.”

However, in order for the search to be adaptive by taking advantage of exploitable structures, that exploitable structure must be known and related directly in the choice of algorithm. Thus, as Dembski concludes in the paragraph, “We note that such search problems on a binary landscape are in a different class from those useful for, say, adaptive filters. But, in the context of the NFLT, knowledge of the class in which an optimization problem lies provides active information about the problem.” [brackets and parenthesis added]

So, how does your evolutionary algorithm refute the NFL Theorem? Or, how does Dembski’s quote not line up with the NFL Theorem? I’ve already shown that it merely reiterates the NFL Theorem. Your turn.

To end this discussion, and prove ID Theory wrong and your side (whatever that may be) as right, merely show how a random set of laws will generate an information processing system, an evolutionary algorithm, and thus CSI. I predict, based on my understanding and explanation of NFL Theorems and recently developed Conservation of Information Theorems that random generation of evolutionary algorithms is to information theory what perpetual motion free energy machines are to physics and are thus so highly improbable that they are for all practical purposes impossible. Merely show me some data (observation) that a random set of laws will produce the above mentioned effects. Or at least show me the theory underpinning such a hypothesis. Data Trumps ... every time.

Zachriel said...

Let's define a generalized evolutionary algorithm. A finite population of discrete sequences. Replication with random point mutation and random recombination. A selection criteria. That's the entire definition.

What can we say about this search algorithm?

1. We can say that it will only perform *on average* as well as random search.

2. We can say that it will perform better on some search spaces than others.

Okay so far?

Zachriel said...

EXTRA CREDIT:

Consider a landscape of a hundred billion trillion (10^23) elements with each element of the set having a random value between 0 and 1. How many samples must we take to be confident that we have found an element with a value in the 99th percentile?

Can we optimize our search strategy beyond random sampling?

CJYman said...

Zachriel: "The dictionary is the landscape to be searched. It's the given as specified by Dembski."

All that Dembski specified was “finding a seven letter word that can pass the spell checker test.” Show me where Dembski said that the dictionary is the landscape to be searched.

CJYman:
"Incorrect. Now you are really beginning to show your ignorance of the subject at hand. The landscape contains ALL POSSIBLE COMBINATIONS OF OUR 26 LETTER ALPHABET."

Zachriel:
"The fitness landscape is the fitness function of each point in space. In this case, the fitness landscape is *defined* by the dictionary. All other points in space have zero fitness and are never explored."


If these points were never explored, then every single trial would produce a valid word. In this case, your algorithm is merely a search of a dictionary. In that case, Dembski never said finding a seven letter word “as you search through a dictionary.” You can put words in Dembski’s mouth all you want, but you can’t use those words that you put in his mouth in order to “prove him wrong.” Clarify, first, before you jump.

He merely said, “finding a seven letter word that can pass the spell checker test.” That is: find a seven letter word that can *also* be found in a dictionary. IOW, find a seven letter word out of all possible combinations (probably starting from either one letter or even from seven random letters). And, as I will show in the comment below, nothing Dembski stated in the paragraph in question contradicts the NFL Theorems. In fact, it is merely restating what the authors of the NFLT already said.

BTW: zero fitness doesn’t mean never explored. It just means that all other fitness points above zero will out compete it.

Zachriel:
"Dembski just doesn't understand the implication of the No Free Lunch Theorem. Even Wolpert, the co-discoverer of the theorem, says that Dembski is just handwaving. Generalized evolutionary algorithms (populations undergoing random mutation and selection) are more than capable of exploring complex landscapes, including words or biological structures. The only thing the No Free Lunch Theorem tells us is that these generalized algorithms may be less than optimal—not that they don't work. They do. Dembski's handwaving is irrelevant."


I do know that Wolpert, in referencing Dembski’s book, said that it was “written in jello.”
This was a reference to how non-technical the book was. However, Dembski even stated that he wrote the book in question for the general public, and so it was meant to give a not-to-technical overview of the subject and how it fit in with ID Theory. His more recent technical work on specification and with Dr. Marks clarify and extend his earlier work.

Some of his more technical papers of the subject of NFL Theorem and Conservation of Information Thoerem can be found at his website and at the website for his evolutionary informatics lab.

BTW: The NFLT explicitly states that an evolutionary algorithm will not work over all optimization problems. Dembski never states that evolutionary algorithms don’t work with their specific optimization problems, he merely explains why they do work – because of problem specific information, according to NFLT.

CJYman said...

Zachriel:
"How many samples must we take to be confident that we have found an element with a value in the 99th percentile?"


First, there is no need to find a value in the 99th percentile. All we need to do in relation to the subject at hand is find a value which is specified or pre-specified and greater than all probabilistic resources.

Thus, you merely measure for a specification which takes into account all probabilistic resources. Thus, Intelligent Design hypothesizes that CSI will *not* be generated by random processes and that an EA is necessary to unfold deeper levels of CSI. Simple as that.

To end this discussion, and prove ID Theory wrong and your side (whatever that may be) as right, merely show how a random set of laws will generate an information processing system, an evolutionary algorithm, and thus CSI. I predict, based on my understanding and explanation of NFL Theorems and recently developed Conservation of Information Theorems that random generation of evolutionary algorithms is to information theory what perpetual motion free energy machines are to physics and are thus so highly improbable that they are for all practical purposes impossible. Merely show me some data (observation) that a random set of laws will produce the above mentioned effects and probabilistic measurements. Or at least show me the theory underpinning such a hypothesis. Data Trumps ... every time.

Now that you have changed the subject (from how you were trying to sell the accusation that your EA somehow contradicted Dembksi’s quote) please discuss specifications at the highlighted link above.

If you can’t back up your accusation, as I have shown in our discussion above and as you have shown in your continual blatant contradiction of what the authors of the NFLT state and in your continual insistence on putting words in Demsbki’s mouth in order to “prove him wrong,” I would expect you to be the honest person and remove the subject matter from your own blog so as not to spread misinformation and misunderstandings.

Zachriel said...

CJYman: "First, there is no need to find a value in the 99th percentile."

Um, that's not an answer. As we're comparing search algorithms to random searches, it would behoove us to understand how random searches work.

CJYman said...

Zachriel:
"How many samples must we take to be confident that we have found an element with a value in the 99th percentile?"


Since the NFL Theorem deals with consistently better than chance performance, we don't have to worry about finding an element in the 99th percentile (if you are looking for a percentile you are dealing with a specification as per my last comment above). You merely measure the fact that the search is performing *consistently* better than chance on one problem/target.

If you want to discuss this further please do so at my blog post discussing NFLT. I'd like to keep this blog open for you initial accusations as I have made clear in my previous comment above.

CJYman said...

Here, let me clarify what I meant:

If a pattern is in the 99th percentile, then you are merely measuring for a very high specification which takes into account all probabilistic resources and other patterns of the same probability -- which is the only way that you can know that your pattern is in the 99th percentile. Thus, Intelligent Design hypothesizes that CSI will *not* be generated by random processes and that an EA is necessary to unfold deeper levels of CSI. Simple as that.

To end this discussion, and prove ID Theory wrong and your side (whatever that may be) as right, merely show how a random set of laws will generate an information processing system, an evolutionary algorithm, and thus CSI. I predict, based on my understanding and explanation of NFL Theorems and recently developed Conservation of Information Theorems that random generation of evolutionary algorithms is to information theory what perpetual motion free energy machines are to physics and are thus so highly improbable that they are for all practical purposes impossible. Merely show me some data (observation) that a random set of laws will produce the above mentioned effects and probabilistic measurements. Or at least show me the theory underpinning such a hypothesis. Data Trumps ... every time.

Now that you have changed the subject (from how you were trying to sell the accusation that your EA somehow contradicted Dembksi’s quote) please discuss specifications at the highlighted link above.

If you can’t back up your accusation, as I have shown in our discussion above and as you have shown in your continual blatant contradiction of what the authors of the NFLT state and in your continual insistence on putting words in Demsbki’s mouth in order to “prove him wrong,” I would expect you to be the honest person and remove the subject matter from your own blog so as not to spread misinformation and misunderstandings.

Zachriel said...

CJYman: "If these points were never explored, then every single trial would produce a valid word. In this case, your algorithm is merely a search of a dictionary."

Selection is very simple. Any mutant not forming a valid word has zero fitness and is therefore stillborn. But just to clarify. Lots of mutants are stillborn, but the algorithm doesn't meander in areas of non-fitness because stillborns don't make babies.

You really just don't seem to understand how evolutionary algorithms work.

CJYman: "I do know that Wolpert, in referencing Dembski’s book, said that it was 'written in jello.'"

Yes, he said it in his paper titled William Dembski's treatment of the No Free Lunch theorems is written in jello. That might be a clue to his views. But what does he know? He only proved the No Free Lunch Theorems.

Zachriel said...

Consider every finite length sequence of letters. If the sequence forms a word, we'll consider it fit; otherwise, we'll consider it unfit.

Thinking of this as a landscape, we might see a few scattered islands of dry land amidst a vast sea of meaningless gibberish. (It's actually a multidimensional space due to recombination, so this is just illustrative.)

Consider now a word in that landscape, "can". Among its offspring might be "cxn" which is stillborn. Or "cane" which is liveborn. (Or "cancan", a type of dance, by recombination.) Indeed, most offspring will be stillborn never entering the population.

Consider now the sequence "xxx". This sequence will never be considered by our evolutionary algorithm. Why? Because it has no possible precursors. Even if "xxx" was the best word in the world (it means kisses), and we gave it infinite fitness, it will never be found.

Indeed, our search will never leave the safety of the known word and venture forth into the vast oceans of gibberish. But that's the whole point! If we were to search those vast oceans, then the evolutionary algorithm surely would be only as powerful as a random search. Instead, the evolutionary algorithm will find words all day long, including seven-letter words, and longer, far, far faster than random search.

And this very general type of evolutionary algorithm can solve many such problems, including in biology. The question isn't whether evolutionary algorithms can efficiently search arbitrarily chaotic spaces, but whether biology happens to be one of those landscapes amenable to evolutionary search. And it is.

Biological problems concern proximity and spatial relationships, both microscopic and macroscopic — exactly the sorts of problems that evolutionary algorithms are adept at solving. While evolution is the natural consequence of imperfect replicators replicating imperfectly, the biological landscape is arranged by the very geometry of existence.

CJYman said...

CJYman: "If these points were never explored, then every single trial would produce a valid word. In this case, your algorithm is merely a search of a dictionary."

Zachriel:
"Selection is very simple. Any mutant not forming a valid word has zero fitness and is therefore stillborn. But just to clarify. Lots of mutants are stillborn, but the algorithm doesn't meander in areas of non-fitness because stillborns don't make babies.

You really just don't seem to understand how evolutionary algorithms work."



Hhehehehe ... good one Zach!

The word is "still born" but it is still explored. IOW, the algorithm can still discover it (it is not blocked from discovery), but it is discarded at the end of that round of selection pressure. The algorithm only keeps those with highest fitness which, according to a step-by-step selectable scenario, are nearest other higher fitness areas, thus causing the algorithm to not meander in the discoverable but also "still-born-able."

Thus, I reiterate, ""If these points were never explored, then every single trial would produce a valid word. In this case, your algorithm is merely a search of a dictionary."

CJYman: "I do know that Wolpert, in referencing Dembski’s book, said that it was 'written in jello.'"

Zachriel:
"Yes, he said it in his paper titled William Dembski's treatment of the No Free Lunch theorems is written in jello. That might be a clue to his views. But what does he know? He only proved the No Free Lunch Theorems."


Yah ... and you musta' missed the part where Wolpert was actually referencing the actual book "NFL" and Dembski had stated that the book was meant to give a non-technical overview of the subject and its application to ID Theory. It was a pop-sci book after all. Dembski's technical papers are found elsewhere. There are some on his site and then there is the evolutionary informatics site.

Furthermore, the quote in question that we are discussing here is written after the "jell-o" accusation and it is the result of further and more recent work and I have included this work in my case.

Can you back up your position, whatever that may be, or not? I've already explained and backed up mine. This is a debate after all, correct?

BTW: it doesn't really seem that you care what Wolpert wrote since most of the things you've said about there paper has been completely incorrect and at times exactly opposite of he stated.

ie:
-Zachriel says "NFLT doesn't apply to EAs."
-NFLT states "our definition of an algorithm includes all common black-box optimization techniques like simulated annealing and evolutionary algorithms."

Were you seriously hoping that I wouldn't do some research for myself? It's obvious that you haven't.

Zachriel:
"Biological problems concern proximity and spatial relationships, both microscopic and macroscopic — exactly the sorts of problems that evolutionary algorithms are adept at solving. While evolution is the natural consequence of imperfect replicators replicating imperfectly, the biological landscape is arranged by the very geometry of existence.


Sure, agreed and understood, and this geometry shows the product of problem specific information as per the NFLT and COI Theorems.

Now, what causes evolutionary algorithms?

Zachriel said...

CJYman: The word is "still born""

Stillborn

Stillbirth

CJYman: IOW, the algorithm can still discover it (it is not blocked from discovery), but it is discarded at the end of that round of selection pressure.

I want to make clear that you understand that the algorithm will never explore the vast majority of possible letter sequences. And that means it is much faster than random sampling.

Zachriel said...

CJYman: Now, what causes evolutionary algorithms?

Imperfect replicators replicating imperfectly.

CJYman said...

Agreed.

CJYman said...

Hello Zachriel,

Now we can get back on topic. I will re-post the part of my comment which you did not respond to and which is directly related to the subject of this post -- your assertion that your EA somehow proved Dembski wrong.

Zachriel:
"The dictionary is the landscape to be searched. It's the given as specified by Dembski."


All that Dembski specified was “finding a seven letter word that can pass the spell checker test.” Show me where Dembski said that the dictionary is the landscape to be searched.

Zachriel:
"The fitness landscape is the fitness function of each point in space. In this case, the fitness landscape is *defined* by the dictionary. All other points in space have zero fitness and are never explored."


If these points were never explored, then every single trial would produce a valid word. In this case, your algorithm is merely a search of a dictionary. In that case, Dembski never said finding a seven letter word “as you search through a dictionary.” You can put words in Dembski’s mouth all you want, but you can’t use those words that you put in his mouth in order to “prove him wrong.” Clarify, first, before you jump.

He merely said, “finding a seven letter word that can pass the spell checker test.” That is: find a seven letter word that can *also* be found in a dictionary. IOW, find a seven letter word out of all possible combinations (probably starting from either one letter or even from seven random letters). And ... [as I have shown] nothing Dembski stated in the paragraph in question contradicts the NFL Theorems. In fact, it is merely restating what the authors of the NFLT already said.

Zachriel:
"Dembski just doesn't understand the implication of the No Free Lunch Theorem. Even Wolpert, the co-discoverer of the theorem, says that Dembski is just handwaving. Generalized evolutionary algorithms (populations undergoing random mutation and selection) are more than capable of exploring complex landscapes, including words or biological structures. The only thing the No Free Lunch Theorem tells us is that these generalized algorithms may be less than optimal—not that they don't work. They do. Dembski's handwaving is irrelevant."


I do know that Wolpert, in referencing Dembski’s book, said that it was "written in jell-o." See link for response.

BTW: The NFLT explicitly states that an evolutionary algorithm will not work over all optimization problems. Dembski never states that evolutionary algorithms don’t work with their specific optimization problems, he merely explains why they do work – because of problem specific information, according to NFLT.

BTW: it doesn't really seem that you care what Wolpert wrote since most of the things you've said about there paper has been completely incorrect and at times exactly opposite of he stated.

ie:
-Zachriel says "NFLT doesn't apply to EAs."
-NFLT states "our definition of an algorithm includes all common black-box optimization techniques like simulated annealing and evolutionary algorithms."

Were you seriously hoping that I wouldn't do some research for myself? It's obvious that you haven't.

Here is Dembski's quote again:

"Such assumptions, however, are useless when searching to find a sequence of say, 7 letters from a 26-letter alphabet to form a word that will pass successfully through a spell checker, or choosing a sequence of commands from 26 available commands in an Avida type program to generate a logic operation such as XNOR [16]. With no metric to determine nearness, the search landscape for such searches is binary -- either success or failure. There are no sloped hills to climb. We note that such search problems on a binary landscape are in a different class from those useful for, say, adaptive filters. But, in the context of the NFLT, knowledge of the class in which an optimization problem lies provides active information about the problem."

As I have already explained, as an example, Dembski is stating that if you have a shuffled deck of cards and you lay them out facedown on a table and try to flip over a full house in ascending order by flipping over one card at a time, having "such assumptions" of any organized "hill climbing" structure is not going to help you. "With no metric to determine nearness" to every card in your ending hand, then every card that you flip over gives you no more information than "yes" or "non" -- the binary search. Thus your chances are no better than random search. Any "non-uniformity" assumptions do not help you at all. However, if the landscape of cards *is* already organized in a "non-uniform" and useful way and you also know this information and apply it properly into a search procedure, then active information (applied information which is not generated by the search itself) has been applied to the problem and you can possibly have an efficient search.

If you can’t back up your accusation, as I have shown in our discussion above and as you have shown in your continual blatant contradiction of what the authors of the NFLT state and in your continual insistence on putting words in Demsbki’s mouth in order to “prove him wrong,” I would expect you to be the honest person and remove the subject matter from your own blog so as not to spread misinformation and misunderstandings.

Zachriel said...

Zachriel: The fitness landscape is the fitness function of each point in space. In this case, the fitness landscape is *defined* by the dictionary. All other points in space have zero fitness and are never explored.

CJYman: If these points were never explored, then every single trial would produce a valid word. In this case, your algorithm is merely a search of a dictionary.

I clarified my comments above. I have even provided the source code for such an algorithm. There should be no ambiguity. Any mutant not comprising a word has zero fitness and is stillborn. The vast majority of sequence space is never explored.

Indeed, many valid words will never be found. Consider the sequence "xxx". This sequence will never be considered by our evolutionary algorithm. Why? Because it has no possible precursors. Even if "xxx" was the best word in the world (it means kisses), and we gave it infinite fitness, it will never be found.

Dembksi posed an empirical question. Can a standard evolutionary algorithm that seeks local maximums find long words? The answer is yes.

CJYman: In that case, Dembski never said finding a seven letter word “as you search through a dictionary."

The algorithm is not constructed to search through the dictionary. It mutates and recombines existing sequences and then determines their fitness. The *effect* is to search only through the dictionary. It is conceivable that the algorithm would never progress, that words would be separated on unreachable islands. But that turns out not to be the case.

It's Dembski's example, not mine. He says that 'hill-climbing' won't work. It does.

CJYman: The NFLT explicitly states that an evolutionary algorithm will not work over all optimization problems.

You are confused on this point. They state a given algorithm may or may not be better than a random search. Not that it won't work at all. You still haven't answered our questions about the effectiveness of random search.

CJYman: NFLT states "our definition of an algorithm includes all common black-box optimization techniques like simulated annealing and evolutionary algorithms."

They modify the definition of an evolutionary algorithm somewhat in their optimization paper so that the algorithm is assumed to never revisit the same point twice, making their theorem relevant. But this assumption does not apply to all evolutionary algorithms on all landscapes, or as they dicuss, for coevolutionary or competitive processes which are typical in biology. It's still an important result, though.

Dembski: Such assumptions, however, are useless when searching to find a sequence of say, 7 letters from a 26-letter alphabet to form a word that will pass successfully through a spell checker, or choosing a sequence of commands from 26 available commands in an Avida type program to generate a logic operation such as XNOR [16]. With no metric to determine nearness, the search landscape for such searches is binary -- either success or failure. There are no sloped hills to climb. We note that such search problems on a binary landscape are in a different class from those useful for, say, adaptive filters. But, in the context of the NFLT, knowledge of the class in which an optimization problem lies provides active information about the problem."

And that's precisely where Dembski is wrong. There *are* slopes to climb. If we take letter sequences, randomly mutate and recombine them, throw out all those that do not form valid words, keep all that do, then repeat this process, we will evolve longer and longer words—much faster than random search.

CJYman: However, if the landscape of cards *is* already organized in a "non-uniform" and useful way ...

But Dembski didn't choose a shuffled deck of cards. He chose words! Words which are organized by their own evolutionary history of development.

CJYman: ... and you also know this information and apply it properly into a search procedure, then active information (applied information which is not generated by the search itself) has been applied to the problem and you can possibly have an efficient search.

Dembski chose a specific landscape (the space of all letter sequences, words having fitness, non-words having no fitness) and a specific algorithm ('hill-climbing'). He said, "There are no sloped hills to climb." This is not correct.

We have a standard evolutionary algorithm that seeks local maximums. It works for word evolution, and it works for biology. The former works because words are not random sequences, but are organized by their own divergence and evolution; the latter because biological success depends on proximate relationships to resources and competitors.

Zachriel said...

Interestingly, the link you provided no longer works. A Google for "Active Information in Evolutionary Search" reveals that your blog and mine are among the top referrers. Apparently, I'm a leading critic.

I did find the relevant paper on Evolutionary Informatics under the title "The Information Cost of No Free Lunch".

A Google of "Information Cost of No Free Lunch" reveals the extent of interest in this paper.

Zachriel said...

CJYman: I would expect you to be the honest person and remove the subject matter from your own blog so as not to spread misinformation and misunderstandings.

I keep rereading what Dembski wrote on the subject, and I still find it to be written in jello. However, I have added a note in comments so that people can read your defense of Dembski themselves.

Zachriel said...

Zachriel: And that's precisely where Dembski is wrong. There *are* slopes to climb. If we take letter sequences, randomly mutate and recombine them, throw out all those that do not form valid words, keep all that do, then repeat this process, we will evolve longer and longer words—much faster than random search.

Just to clarify on the term 'hill climbing'. As pointed out above, the algorithm works even if the wordscape is flat, i.e. all words have the same fitness.

-----------
Typical Word Mutagenation results for 10-letter word

Random Trial
14,000,000,000 mutants

Word Mutagenation with no selection for word length (i.e. flat wordscape)
250,000 mutants
mincemeats, staggering, coastlines

-----------

Notice that the evolutionary algorithm across the flat wordscape is fifty thousand times faster than random trial. Assuming ten thousand trials per second, that's the difference between half a minute and two weeks.

(An analogy would be to water being poured on the floor. We expect the water to spread in all directions. But if we pour water next to a wall, it will tend to spread away from the wall.

The 'wall' in this case is that words must have a postive length, and there are only so many short words. If it can spread stepwise, and it apparently can, it must do so into the space of longer words. As the algorithm 'pours out', it fills the wordscape starting with the shortest words and moving out.)