The quotes from the NFL Theorems Paper are centered and in blue:
“Given our decision to only measure distinct function evaluations even if an algorithm revisits previously searched points, our definition of an algorithm includes all common black-box optimization techniques like simulated annealing and evolutionary algorithms.”Thus, the algorithms discussed include evolutionary algorithms.
“In this paper we present a formal analysis that contributes towards such an understanding by addressing questions like the following. Given the plethora of black box optimization algorithms and of optimization problems, how can we best match algorithms to problems (i. e. how best can we relax the black box nature of the algorithms and have them exploit some knowledge concerning the optimization problem). In particular, while serious optimization practitioners almost always perform such matching, it is usually on an ad hoc basis; how can such matching be formally analyzed? More generally, what is the underlying mathematical “skeleton” of optimization theory before the flesh of the probability distributions of a particular context and set of optimization problems are imposed.”
Thus, one of the purposes of this paper is to formally analyse how prior knowledge of the optimization problem is exploited and needs to be utilized in order to match it with the proper evolutionary algorithm, which is a feat performed by optimization practitioners (intelligent programmers).
“As emphasized above, the NFL theorems mean that if an algorithm does particularly well on average for one class of problems then it must do worse on average over the remaining problems. In particular, if an algorithm performs better than random search on some class of problems then in must perform worse than random search on the remaining problems. Thus comparisons reporting the performance of a particular algorithm with a particular parameter setting on a few sample problems are of limited utility.”
"In particular, if for some favorite algorithms a certain well behaved f results in better performance than does the random f then that well behaved f gives worse than random behavior on the set all remaining algorithms. In this sense just as there are no universally efficacious search algorithms, there are no universally benign f (optimizations problems) which can be assured of resulting in better than random performance regardless of ones algorithm.”
There is no universal evolutionary algorithm that will solve all optimization problems at better than random performance. Likewise, there is no optimization problem that just any evolutionary algorithm will be able to “solve” at better than random performance. Thus, if an evolutionary algorithm performs better than random search on one class of optimization problems, it will perform worse than random search on average over all remaining classes of optimization problems. Dembski and Marks have reiterated this point in Conservation of Information in Search: Measuring the Cost of Success:: “... there is no magic-bullet search algorithm that successfully resolves all problems , .”
So, what causes the evolutionary algorithm to produce better than average on a class of problems? The next quotes begin to discuss this ...
“In particular, we show that an algorithm’s average performance is determined by how aligned it is with the underlying probability distribution over optimization problems on which it is run.”
“In any of these cases, P(f) or “p” must match or be aligned with “a” to get desired behavior. This need for matching provides a new perspective on how certain algorithms can perform well in practice on specific kinds of problems.”
“First, if the practitioner has knowledge of problem characteristics but does not incorporate them into the optimization algorithm, then P(f) is effectively uniform. (Recall that P(f) can be viewed as a statement concerning the practitioner’s choice of optimization algorithms.) In such a case, the NFL theorems establish that there are no formal assurances that the algorithm chosen will be at all effective. Second, while most classes of problems will certainly have some structure which, if known, might be exploitable, the simple existence of that structure does not justify choice of a particular algorithm; that structure must be known and reflected directly in the choice of algorithm to serve as such a justification. In other words, the simple existence of structure per se, absent a specification of that structure, cannot provide a basis for preferring one algorithm over another ... The simple fact that the P(f) at hand is non-uniform cannot serve to determine one’s choice of optimization algorithm.”
“Intuitively, the NFL theorem illustrates that even if knowledge of “f” perhaps specified through P(f) is not incorporated into “a” then there are no formal assurances that “a” will be effective. Rather effective optimization relies on a fortuitous matching between “f” [optimization problem] and “a” [evolutionary algorithm].” [brackets added]
The quotes above state that for the algorithm to perform better than random search on a class of problems, the evolutionary algorithm must be correctly matched beforehand, based on knowledge of the problem, to the specific optimization problem. Knowledge of characteristics of the problem must be incorporated into the algorithm. This is similar to how you won’t be able to find a card at consistently better than chance performance unless you are aware of an organization of the deck that would help you find the card and you incorporate this knowledge into the algorithm that you use to find the card. Likewise, the “exploitable structure,” such as hill climbing structures, must be known and incorporated into the choice of algorithm to take any advantage of these useable search structures. As Dr. Dembski has stated in “Active Information in Evolutionary Search,” merely assuming that there does exist an exploitable search structure, without incorporating that knowledge into the choice of algorithm, doesn’t get us anywhere:
“Such assumptions, [of non-uniform search space “links,” “hill climbing optimizations,” etc.] however, are useless when searching to find a sequence of, say, 7 letters from a 26-letter alphabet to form a word that will pass successfully through a spell checker, or choosing a sequence of commands from 26 available commands in an Avida type program to generate a logic operation such as XNOR . With no metric to determine nearness, the search landscape for such searches is binary -- either success or failure. There are no sloped hills to climb. We note that such search problems on a binary landscape are in a different class from those useful for, say, adaptive filters. But, in the context of the NFLT, knowledge of the class in which an optimization problem lies provides active information about the problem.” (Brackets added)
He is basically stating the same thing as the “find a card” example. If you assume that there is a specific organization that will help you find the card without actually knowing for sure, you still have nothing but chance to go on since there may actually be no useful organization at all or out of all possible organizations, the one that you are banking on is such a small possibility that for all practical purposes you still have no better than chance assurance of finding that card.
Dr. Dembski’s quote directly above states the same concept as found within the NFL Theorem as I just quoted a few paras above: “... while most classes of problems will certainly have some structure which, if known, might be exploitable ...; that structure must be known and reflected directly in the choice of algorithm to serve as such a justification [for choice of a particular algorithm]... The simple fact that the P(f) at hand is non-uniform cannot serve to determine one’s choice of optimization algorithm ... Rather effective optimization relies on a fortuitous matching between “f” (optimization problem) and “a” (evolutionary algorithm).” [brackets added]
The above quotes from both the NFL Theorem paper and Dr. Dembski and Marks’ paper state what I have already referred to before – the necessity of using prior knowledge of the search space structure to match search space structure to the correct adaptive algorithm before the search ever begins. The next quote, taken from the conclusion of the paper, states that based on the theorems proved within the paper, it is important that problem-specific knowledge is incorporated into the behavior of the algorithm.
“A framework has been presented in which to compare general purpose optimization algorithms. A number of NFL theorems were derived that demonstrate the danger of comparing algorithms by their performance on a small sample of problems. These same results also indicate the importance of incorporating problem specific knowledge into the behavior of the algorithm.”
Now, let’s tie this all together:
1. The algorithms discussed include evolutionary algorithms.
2. The NFL Theorems formally analyse how prior knowledge of the optimization problem needs to be utilized in order to match it with the proper algorithm.
3. In order for evolutionary algorithms to perform better than average over one class of problems, the characteristics of the problem must be known and incorporated into the search algorithm. The exploitable structure of the search space must be known and incorporated into the choice of algorithm. Furthermore, search algorithm must be matched correctly to the optimization problem by utilizing prior knowledge of the problem.
4. Thus, it is important that problem specific information is incorporated into the behavior of the evolutionary algorithm. Therefore, evolutionary search is not blind search and is guided by incorporating prior knowledge of the problem and knowledge of any exploitable structure [within the search space] into the evolutionary algorithm.
IOW, here is an example of what the NFL Theorem is telling us:
I have a shuffled deck of cards. I lay them face down in a row on a table. Before you turn over any cards can you generate a search algorithm (method of search) that will help you find any card at consistently better than chance performance over many trials with the same shuffled deck?
Also, if by chance there just happens to be some type of exploitable search structure, such as after every heart comes a spade, within the arrangement of cards after random shuffling then, before flipping over any cards can you blindly (without knowledge of that specific arrangement) generate a search algorithm that will help you locate any card at consistently better than chance performance over many trials with the same shuffled deck?
IOW, will a blind set of laws (that is, only law and chance) generate, over many trials, consistently better than chance performance?
According to common sense, testing and observation, and NFL Theorems, the answer to the questions above is simply “no.”
Furthermore, the NFL Theorem applies to all searches which result in better than chance performance over many trials. And now I will leave you with a little gem to think about. Convergent evolution is the observation of many separate biological evolutionary trials converging upon the same extremely highly improbable forms and functions and biochemical systems [link].
Seriously, think about it. Drop any prejudice and use some reason here.