Saturday, September 15, 2007

Concept of CSI (part 2)

Continuation from here.

Zachriel:
“You keep talking about CSI and complexity, but the only issue at this point is the definition of “specificity”. Your meandering answer is evidence of this extreme overloading of even basic terminology.”

My example of defining “complexity” was to show that even in information theory, some concepts can and must be defined and quantified in different ways.

... and I have given definitions of specificity in different words (pertaining to the definition which aids in ruling out chance occurrences), hoping that you would understand them. However, you continually ignore them. Or did you just miss these?

Meandering, nope. Trying to explain it in terms you will understand (also borrowing from Dembski’s terminology), yep.

Zachriel:
“This is Dembski’s definition of specificity:

Thus, for a pattern T, a chance hypothesis H, and a semiotic agent S for whom ?S measures specificational resources, the specificity ? is given as follows:

? = –log2[ ?S(T)P(T|H)].”

First, you do realize that in order to measure something’s specificity, the event must first qualify as specific, just like how in order to measure an event using shannon information (using the equation which defines and quantifies shannon information) the event must first reach certain qualifiers. I’ve already discussed this.

Now, yes, you are correct. However, you stopped half way through the article and you seemed to have arbitrarily just pulled out one of the equations. Do you even understand what Dembski is saying here? You do also realize that specificity and a specification are different, correct?

Dembski was almost done building his equation, but not quite. You obviously haven’t read through the whole paper. Read through it, then get back to me. You will notice that Dembski later states, in regard to your referenced equation:

“Is this [equation] enough to show that E did not happen by chance? No.” (Italics added)

Wht not? Because he is not done building the equation yet. He hasn’t factored in the probabilistic resources. I’ll get back to this right away, but first ...

The other thing that you must have missed, regarding the symbols used in the equation, directly follows your quote of Dembski’s equation. Here it is:

“Note that T in ϕ S(T) is treated as a pattern and that T in P(T|H) is treated as an event (i.e., the event identified by the pattern).”

It seems that the above referenced equation is showing a comparison of the event in question (the event identified by the pattern) with its independently given pattern, compared to its chance probabilistic hypothesis, thus actually showing that we were both wrong in thinking that just any pattern (event) could be shoved into the above equation. The equation itself only works on those events which have an independently given pattern (thus already qualifying as specific) and giving a measurement of specificity, but not a specification. You will notice that a greater than 1 complex specificity = a specification and thus CSI if you continue to read the paper.

Dembski does point out that in the completed equation, that when a complex specificity produces a greater than 1 result, you have CSI. As far as I understand, this is a result of inputing all available probabilistic resources, which is something normal probability theory does not take into consideration. Normally, probability calculations give you a number between 0 and 1, showing a probability, but they do so without consideration of probabilistic resources and the qualifier of the event conforming to an independently given pattern. Once this is all calculated and it’s measurement is greater than one (greater than the UPB), then you have CSI.
Moreover, the specification is a measurement in bits of information and as such can not be less than 1 anyway, since 1 bit is the smallest amount of measurable information (this has to do with the fact that measurable information must have at least two states -- thus the base unit of the binary digit (bit), which is one of those two states).

You must have seriously missed where Dembski, referencing pure probabilistic methods in “teasing” out non-chance explanations, said (and I already stated a part of this earlier):

“In eliminating H, what is so special about basing the extremal sets Tγ and Tδ on the probability density function f associated with the chance hypothesis H (that is, H induces the probability measure P(.|H) that can be represented as f.dU)? Answer: THERE IS NOTHING SPECIAL ABOUT f BEING THE PROBABILITY DENSITY FUNCTION ASSOCIATED WITH H; INSTEAD, WHAT IS IMPORTANT IS THAT f BE CAPABLE TO BEING DEFINED INDEPENDENTLY OF E, THE EVENT OR SAMPLE THAT IS OBSERVED. And indeed, Fisher’s approach to eliminating chance hypotheses has already been extended in this way, though the extension, thus far, has mainly been tacit rather than explicit.” [caps lock added]

Furthermore ...

Dr. Dembski: “Note that putting the logarithm to the base 2 in front of the product ϕ S(T)P(T|H) has the effect of changing scale and directionality, turning probabilities into number of bits and thereby making the specificity a measure of information. This logarithmic transformation therefore ensures that the simpler the patterns and the smaller the probability of the targets they constrain, the larger specificity.”

Thus, the full equation that you haven’t even referenced yet gives us a measurement (quantity) in bits, of the specified information as a result of ‘log base 2'.

In fact, here is the full equation and Demsbski’s note:

“The fundamental claim of this paper is that for a chance hypothesis H, if the specified complexity χ = –log2[ 120 10  ϕ S(T)P(T|H)] is greater than 1, then T is a specification and the semiotic agent S is entitled to eliminate H as the explanation for the occurrence of any event E that conforms to the pattern T”

In order to understand where this 10^120 comes from, let’s look at a sequence of prime numbers:

“12357" – this sequence is algorithmically complex and yet specific (as per the qualitative definition) to the independently given pattern of prime numbers (stated in the language of mathematics as the sequence of whole numbers divisible only by itself and one), however there is not enough specified complexity to cause this pattern to be a specification greater than 1. It does conform to an independently given pattern, however, it is relatively small and could actually be produced randomly. So, we need to calculate probabilistic resources and this is where the probability bound and the above equation comes into play.

According to probability theory, the first digit has a one in 10 chance of matching up with the sequence of prime numbers (or a pre-specification), however the second digit has a one in 100 chance, and the third a one in 1000 chance, etc. So, how far up the pattern of prime numbers will chance take us before making a mistake? The further you go, the more likely chance processes will deviate from the specific (or pre-specified) pattern. It’s bound to happen eventually as the odds increase dramatically and quickly. But, how do we know where the cut off is? Dembski has introduced a very “giving the benefit of the doubt to chance” type of calculation based on the age of the known universe and other known factors, and actually borrowing from Seth Lloyd’s calculations. Now, it must be noted that as long as the universe is understood to be finite (having a beginning) then there will be a probability bound. This number may increase or decrease based on future knowledge of the age of the universe. However, a UPB will exist and a scientific understanding can only be based on present knowledge. This number, as far as I understand, actually allows chance to produce less than 500 bits of specific information before cutting chance off and saying that everything else that is already specified and above that bound of 500 bits is also definitely beyond the scope of chance operating anywhere within the universe and is thus complex specified information and the result of intelligence. I dare anyone to produce even 100 bits of specified information completely randomly, much less anything on the order of complex specified information.

Clarification: I have no problem with an evolutionary process creating CSI as it makes use of a replicating information processing system. As I have said earlier, it is the “how” that is the real question and the present problem. IMO, the present observations actually seem to support a mechanism which produces CSI in sudden leaps rather than gradually. In fact, Dr. Robert Marks is presently working on evolutionary algorithms to test their abilities and discover experimentally what is necessary to create CSI and how CSI guides evolutionary algorithms towards a goal.

So, do you understand, yet, how to separate the concept of specificity (of which I have provided ample definitions and examples previously) from the measurement of complex specified information as a specification greater than 1 after factoring in the UPB in the completed equation?

Zachriel:
“Dembski’s definition has a multitude of problems in application,”

So, you are now appealing to “fallacy by assertion?” (yes, I think I just made that up)

fallacy by assertion: “the fallacy which is embodied within the idea that simply asserting something as true will thus make it to be true.”

Come on, Zachriel, this is a debate/discussion; not an assertion marathon.

I have already shown you how to apply it earlier and you just conveniently chose not to respond. Remember matching the three different patterns with the three different causal choices? If there are a multitude of problems in the definition of specificity please do bring them foreward. You’ve already brought some up, but after I answered these objections, you haven’t referred to them again.

Actually, to be honest with you, since this is quite a young and recently developed concept, there may be a few problems with the concept of specificity and I welcome the chance to hear from another viewpoint and discuss these problems and see if they are indeed intractable.

Zachriel:

“but grappling with those problems isn’t necessary to show that it is inconsistent with other uses of the word within his argument. This equivocation is at the heart of Dembski’s fallacy.”

Have you ever “dumbed something down” for someone into wording and examples that they would understand in order to explain the concept to them, because they couldn’t comprehend the full detailed explanation? Furthermore, have you ever approached a concept from more than one angle, in order to help someone fully comprehend it? This is indeed a cornerstone principle in teaching. I have employed this countless times, as I’ve worked with kids for ten years.

You have yet to show me where any of Dembski’s definitions of specificity are equivocations rather than saying the same thing in different wording to different audiences of differing aptitudes or rewording as a method of clarification.

Zachriel:
“Dembski has provided a specific equation. This definition should be consistent with other definitions of specificity, as in “This is how specification is used in ID theory..."
Do you accept this definition or not?”

I agree with the concept of specificity and its qualitative definition. As for the equation, I do not understand all of the math involved with the equation, but from what I do understand it does seem to make sense. You do understand the difference between an equation as a definition of something such as *force* in f=ma, as opposed to a qualitative definition of what is “force?”

But, then again, I’ve already been over this with you in discussing shannon information and you chose to completely ignore me. Why should it be any different now?

This definitional equation which provides a quantity of complex specified information with all available probabilistic resources factored in is consistent with all other qualifying definitions of specificity as it contains them within its equation.

You have yet to show anything to the contrary.

As far as application goes, I do think that the equation may be somewhat ambiguous to use on an event which is not based on measurable information. But, then again, I’d have to completely understand the math involved in order to pass my full judgement on the equation.

Furthermore, do you understand the difference between a pre-specification, a specification, specified information, specified complexity, and complex specified information? I ask, because you don’t seem to understand these concepts. If information is specified/specific (which I’ve already explained) and complex (which I’ve already explained), then you can measure for specificity (which is the equation that you have referenced). However, this doesn’t give us a specification, since the probabilistic resources (UPB) are not yet factored in. Once the UPB is factored in, then you can measure the specified complexity for a specification. If the specified complexity is greater than one, then you have a measure of specification and you are dealing with complex specified information.

It is a little confusing, and it has taken me a while to process it all, but how can YOU honestly go around with obfuscating arguments and false accusations (which you haven’t even backed up yet) of equivocations when it is obvious that you don’t even understand the concepts?

Do you do that with articles re: quantum mechanics just because you can’t understand the probabilities and math involved or the concept of wave-particle duality or some other esoteric concept?

I will soon be posting another blog post re: CSI (simplified) and the easy to use filter for determining if something is CSI. Here it is.

P.S. If you want to discuss the theory of ID and my hypothesis go to “Science of Intelligent Design” ...

11 comments:

secondclass said...

CJYman, you said:

So far him and Dembski have shown that in order for evolution (the generation of CSI) to occur according to evolutionary algorithms, there must be previous CSI guiding it toward a solution to a specific known problem.

and later:

Dr. Robert Marks is already examining these types of claims, and along with Dr. Dembski is refuting the claims that evolution creates CSI for free by critically examining the evolutionary algorithms which purportedly show how to get functional information for free. Dr. Marks is using the concept of CSI and Conservation of Information and experimenting in his field of expertise – “computational intelligence” and “evolutionary computing” – to discover how previously existing information guides the evolution of further information.

Just to clear up a misconception, none of the Marks' work makes any reference to CSI. One of the current affiliates of the lab, Tom English, said:

As I see it, complex specified information is dead, and active information is its replacement.

secondclass said...

since 1 bit is the smallest amount of measurable information

Nitpick: This is not true, at least not in Shannon's probabilistic info theory, on which Dembski's definitions are based.

CJYman said...

Hey secondclass,

Your points are understood. Active information is merely the functional type of CSI. CSI is still what it is: a reliable indication of intelligence. I have recently posted a 6 step guide to distinguishing CSI from random and naturally created patterns.

If anyone says CSI is "dead" as a concept, then they'll have to explain to me why, and we can then discuss it from there.

Or if by "dead" they merely mean replaced by a more full concept, then by all means the discovery of CSI is what has guided researchers to the "discovery" of active information.

However, until it is explained to me as otherwise, I still see and will argue that a program of active information over 500 bits is CSI, but not all CSI is active information.

And yes, Dr. Marks work does [at least] implicitly result from the Law of Conservation of Information which is derived from Dr. Dembski's earlier work on CSI.

As to 1 bit being the smallest amount of measurable information, that has somewhat confused me, since you can measure information as a decimal, but you can't represent it as such. IOW, what is the binary digit (fundamental unit of information in information theory) representation of less than 1 bit of information?

Another way to look at this is that 5 bits are necessary to define an alphabet between 17 to 32 units, however, the actual information content will vary from anywhere in between 4 and 5 bits.

It still seems that the bit (as in 1 bit) is the fundamental "quanta," if you will, of information theory.

And yes, that last part re: Dr. Dembski's work and the relation between the measurement of >1 and the bit was only an observation on my part. Albeit one that makes sense as I've explained above.

secondclass said...

However, until it is explained to me as otherwise, I still see and will argue that a program of active information over 500 bits is CSI, but not all CSI is active information.

I don't see how. Active information is defined strictly in terms of searches, and it doesn't require a detachable specification.

And yes, Dr. Marks work does [at least] implicitly result from the Law of Conservation of Information which is derived from Dr. Dembski's earlier work on CSI.

Again, I don't see how. Dembski has proposed two different laws of conservation of information. One deals with the conservation of active information across meta-searches, and the other deals with the conservation of CSI. I don't see how one is derived from the other, even implicitly. I would be interested to hear your reasoning.

As to 1 bit being the smallest amount of measurable information, that has somewhat confused me, since you can measure information as a decimal, but you can't represent it as such. IOW, what is the binary digit (fundamental unit of information in information theory) representation of less than 1 bit of information?

Shannon self-information, which is the basis for Dembski's and Marks's information definitions, is defined as -log2(P(M)). If M has a probability of greater than 1/2, then it has less than a bit of information. The string "1001110" may contain less than a bit of self-information, or it may contain millions of bits, depending on its probability.

CJYman said...

seconclass:
Active information is defined strictly in terms of searches, and it doesn't require a detachable specification.


Whether a search program is utilized directly by an intelligence or programmed into a computational search algorithm, active information is the functional information that guides that program to its end goal.

Active information needs to be independently defined according to a system of rules or else the program (the system of rules) won't be able to "follow/read" it and incorporate it into its search algorithm. It is indeed functional information and functional information over 500 bits is Complex Specified Information, as I have summarized in "CSI ... simplified."

secondclass:
Dembski has proposed two different laws of conservation of information. One deals with the conservation of active information across meta-searches, and the other deals with the conservation of CSI. I don't see how one is derived from the other, even implicitly. I would be interested to hear your reasoning.


Actually whenever it is the goal of the program to generate CSI, then the connection becomes quite explicit. Can a program of less that 500 bits generate CSI without being guided by active information? What is the actual amount of information produced once the active, endogenous, and exogenous information is calculated? Is the connection becoming clear?

IOW, is intelligent (teleological -- that is "goal oriented") input/information necessary in order to search a space and arrive at CSI?

secondclass:
Shannon self-information, which is the basis for Dembski's and Marks's information definitions, is defined as -log2(P(M)). If M has a probability of greater than 1/2, then it has less than a bit of information. The string "1001110" may contain less than a bit of self-information, or it may contain millions of bits, depending on its probability.


But, since information theory deals with a measurement of information entropy -- a measurement of a decrease of uncertainty -- it measures a string of realized probabilities and then represents them in binary digits, such that the smallest amount of representable information is 1 bit, either a 0 or a 1.

How do you represent less than 1 bit of information in binary digits?

ie: let's take a look at that string in your example. Depending on its probability against all other strings derived from the same "alphabet" it will be represented as a bit string.

Let's just say that there are only two possibilities:

-1001110 (your bit string above)

... and ...

-0111001

Furthermore, let's say that they both have equal probability of "appearing." Then they will be represented as:

0: "1001110"

and

1: "0111001"

But, if there were three options, such as "a" "b" and "c" then each letter would have a measurement of shannon information between 1 and 2 bits of information (of course assuming equal probability). However, we would still need no less than two binary digits to represent this information:

00: "a"
01: "b"
10: "c"
11: -nonsense-

Are you seeing what I'm saying in that although you can calculate any decimal number as an amount of information, you can only represent information in discrete units -- binary digits, 0 and 1. You can represent information with no less than 1 binary digit, unless I'm missing something.

But, then again, this really doesn't have anything to do with CSI. When I referred to the bit as the smallest amount of information, I was referring to the representation as a "quanta", and I was just making an observation.

secondclass said...

Active information needs to be independently defined

Independently of what?

Actually whenever it is the goal of the program to generate CSI,

But there is nothing in the definition of active information that says that the search must lead to CSI.

CSI is a property of an event paired with the background knowledge of a specifying agent. Active information, on the other hand, is a property of a search process paired with a target and search space. Active information (according to the definition given) would still exist even if no specifying agents existed; CSI would not.

How do you represent less than 1 bit of information in binary digits?

When the amount of information is measured according to Shannon surprisal, any string of symbols may represent less than one bit of information. The amount of information in a message is measured not by its length, but by its probability.

Zachriel said...

CJYman: "First, you do realize that in order to measure something’s specificity, the event must first qualify as specific, just like how in order to measure an event using shannon information (using the equation which defines and quantifies shannon information) the event must first reach certain qualifiers."

We're not up to Dembski's definition of "specified", yet. I wanted to have a clear understanding of "specificity". I provided a direct quote and yet for some reason it elicited thousands of words of handwaving. Focus, CJYman. Focus!

CJYman: "Dembski was almost done building his equation, but not quite."

The definition of specificity *is* an equation.

CJYman: "You obviously haven’t read through the whole paper. Read through it, then get back to me."

I have read the paper. As I explained in a previous comment, in a mathematical or logical argument, we build from fundamentals, then examine each step carefully. We do this so that all reasonable readers can reach the same conclusion.

I think we need to back up even further. Let's start with the Dembski's Semiotic Agent. Can we reasonably define Semiotic Agent in such a way as to provide a quantitative value suitable for plugging into his equation?

CJYman said...

Hey secondclass, sorry I've taken so long to get back to you. Unfortunately such is the nature of my time constraints and all commentors (the few that be) get the same treatment.

CJYman:
“Active information needs to be independently defined”

secondclass:
“Independently of what?”


My apologies. I should have phrased that better. I should have stated, “that which provides active information needs to be independently defined.” IOW, active information does not exist without a specified pattern. The answer to your question is coming soon but first, here’s an example of active information. Let’s say that I have a random assortment of cards strewn about face down on top of a table. Let’s also say that I know the position of the ace of spades, but you do not. If I ask you to locate the ace of spades, any random method of elimination that you choose will on average not perform any better than any other random process of elimination to discover the ace of spades (No Free Lunch theorem). In fact, violation of the NFL theorem in information theory is akin to perpetual motion machines in physics. However, if I introduce active information by saying “warmer” or “colder” as your hand moves around above the cards, it will on average guide you to the location of the ace of spades better than any other method which does not include active information.

Extremely simplified, in a computer program, active information is provided by the part of the program which says “warmer” or “colder” or provides any other type of guidance. All parts of a program are written in a code and this would also include any part of the program which provides any type of guidance or active information, as far as I understand. Thus the actual code which provides the active information, if over 500 bits, is complex specified information.

Now here is the answer to your question ...

If you know anything about specified patterns, you would know that the part of the program which provides active information needs to be defined independently of the pattern of 1s and 0s which create the active information by guiding the program to a solution. If it could not be defined independently, then the information processing system would not be able to understand it and incorporate it as active information. IE: if “warmer” and “colder” in the “cards on the table” example could not be specified or defined independent of the mere pattern of symbolic vocal inflections, then those words could not be used between you and I (information processing system) as active information.

Important: when discussing this, it is important to notice the difference between the measurement of active information compared to what actually provides the active information. When you are measuring active information, you are measuring the part of the program which provides guidance to a solution.

From “Conservation of Information in Search: Measuring the Cost of Success.”

“Combinatorics shows that even a moderately sized search requires problem-specific information to be successful. Three measures to characterize the information required for successful search are (1) endogenous information, which measures the difficulty of finding a target using random search; (2) exogenous information, which measures the difficulty that remains in finding a target once a search takes advantage of problem-specific information; and (3) active information, which, as the difference between endogenous and exogenous information, measures the contribution of problem-specific information for successfully finding a target.” (italics added)


CJYman:
“Actually whenever it is the goal of the program to generate CSI,”

secondclass:
“But there is nothing in the definition of active information that says that the search must lead to CSI.”


That’s not the point. Did you miss where I said “whenever?” The point is that *when* CSI *is* generated it is always generated by active information thus validating the NFL theorem and the Law of Conservation of Information by showing that there is no such thing as generating CSI by an unguided method and for free. There is always an informational cost incurred, and there is a guiding factor. That is the point of the “me*thinks*it*is*like*a*weasel” example in the article “Active Information in Evolutionary Search.”


secondclass:
“Active information (according to the definition given) would still exist even if no specifying agents existed; CSI would not.”


It is incorrect that active information would still exist apart from specified patterns. There would be no way of delivering active information to a program if it were not a specified pattern. The computer program would not be able to deliver guidance to the creation of information if there were no specified pattern which the program could read. Refer to above example of “warmer” and “colder.” Remember that active information measures the contribution of "problem specific information" which is a specified part of the 1s and 0s in a program which guides the program to a specific target. Thus, without specified information, there is no active information.

Active information takes the concept of CSI and refines and narrows it, measuring the algorithmically complex and specified information in a program against endogenous and exogenous information, so it can be used as a measurement of front loaded problem specific guidance within evolutionary algorithms. Thus, CSI is still what it is but, because of its somewhat broad use in detecting intelligent (teleological) action, the concept may be replaced when used in the context of guiding information as measured against exogenous and endogenous information. Of course the only difference is that, although it is still specified and algorithmically complex, the part of the program which provides active information is not necessarily more than 500 bits of shannon information (above the UPB).


CJYman:
“How do you represent less than 1 bit of information in binary digits?”

secondclass:
“When the amount of information is measured according to Shannon surprisal, any string of symbols may represent less than one bit of information..”


It seems that you seriously do not understand the practical application of the binary digit. In order to transmit information digitally, it needs to be transferred into binary digits. As such, you can only transmit a whole number of bits. If you had less than 1 bit of information to transmit, you would still need to transmit 1 whole binary digit (either a 0 or a 1), since you can’t represent less than 1 bit of information with anything less than 1 binary digit. IOW, you can’t transmit half of a 0 or a 1. This is why there is such thing as “nonsense” (leftover bytes) in information theory. Do you understand what I am saying now? The bit is the discrete quanta so to speak of information theory.


secondclass:
“ The amount of information in a message is measured not by its length, but by its probability.”


Actually you are only partially correct and also fully wrong at the same time. You are correct that the amount of information is measured by its probability, however the information in a message may also be measured by its length when there is a potential string of probabilities. To understand this, you need to understand that information is a measurement of a decrease of uncertainty. This deals both with length and probability. It is much more than just probabilities. It is a measurement, in binary digits, of realized probabilities.

If we said that with ascii each unit has equal probability of appearing, and that each unit contains (is represented by) 7 bits of information, that still doesn’t tell us the amount of information in a string of units, unless we know the length of the string – that is the realized probabilities.

Another example of measuring information: If a guy in jail is only allowed to send his wife one letter and that letter must read “I am doing fine,” (regardless of his condition) then there is a 100% certainty that if his wife receives a letter from him that the letter will read “I am doing fine.” Thus, in this context, “I am doing fine” is only one unit (let’s call it digit “0") out of one choice (again digit “0") and therefore the letter contains 0 bits of information. There is a 100% certainty that if she receives a letter from him, then she will receive the aforementioned letter. There is no uncertainty; thus the measure of decrease of uncertainty (information entropy) is 0 bits.

However, if there is more than one possible “unit” and more than one possible combination of probabilities within the letters, then the amount of uncertainty (information entropy) is higher before the letter is sent than when the letter is received. Once the letter is received and read, the uncertainty of the available probabilities actually used (received) will decrease based on a combination of probabilities (bit size) of each unit and length (amount) of the string of realized (utilized) units. Thus, the amount of information in a message is a measurement of the length of a string of realized (not just potential) probabilities -- it takes into account both probabilities and length.

CJYman said...

Zachriel:
“We're not up to Dembski's definition of "specified", yet. I wanted to have a clear understanding of "specificity". I provided a direct quote and yet for some reason it elicited thousands of words of handwaving. Focus, CJYman. Focus!”


You provided a direct quote, and I provided a direct and somewhat detailed explanation. Do you disagree with the explanation of your quote and cited equation?

Ok so, first you pretend that you don’t know the difference between a definition and an equation for measurement, and now you pretend that you don’t know the difference between hand-waving and explanation. I can see why some other people get so frustrated with you some times. Focus, Zachriel, Focus! You need to read *and* attempt to comprehend. If you are unsure of something just ask. There is no need and I have no time for unfounded rhetorical accusations.

CJYman:
"Dembski was almost done building his equation, but not quite."

Zachriel:
“The definition of specificity *is* an equation.”


What does that have to do with the statement of mine that you just quoted before your response?

You’re still pulling this?!?! How is an equation a definition? Sure, you can have a definition of an equation. The Equation that you mentioned tells us that: “the simpler the patterns [formulations] and the smaller the probability of the targets [specified patterns] they constrain, the larger specificity.” (Brackets added) Specificity is merely an information theoretic measurement of an independent description (as formulated according to a system of rules) against the probability of it’s specified pattern. That is one way to define the equation. Quantity of specificity is a measurement of a specified pattern. You do know the definition of a specified pattern, correct?

Remember our one-sided discussion of the difference between a measurement (equation to determine quantity) of shannon information and what *is* the definition of shannon information? There is a difference between measurement and definition. How can you seriously disagree that quantity (measurement) and definition are two different aspects?

Zachriel:
“Let's start with the Dembski's Semiotic Agent. Can we reasonably define Semiotic Agent in such a way as to provide a quantitative value suitable for plugging into his equation?”


The semiotic agent or information processor is necessary to determine if a pattern is specified since, when dealing with specified patterns, we are then dealing with the independent formulation of a system of signs. Semiotic agents are, by definition, systems which employ (can formulate/use) a system of signs. The *qualitative* value derived from the semiotic agent is the specified pattern. Now, before I continue don’t go in a tangent about a specified pattern being first of all *qualitative*. That only means that only certain patterns qualify as being specified just as only certain patterns qualify as being shannon information (discrete units chosen from a finite set). This is similar to how you can’t quantify the mass of a massless particle because it does not first qualify as “massive.” Likewise, the pattern needs to qualify first, before it can be quantified. Then, the *quantity* of information of the specified information is plugged into the equation.

Thus, a system (semiotic agent) which processes a system of signs (information) according to a system of rules creates specified patterns from those signs. These specified patterns are then measured according to information theory and then measured against the available probabilistic resources. If the specified pattern is already algorithmically complex (which I’ve already explained) then it is not defined by law. If it is also specified and contains more information than the probabilistic resources can randomly generate (in accordance with the UPB, NFL Theorem, and Conservation of Information), then pure chance is not the best explanation, either. (Refer to CSI ... Simplified). In the end, from everything we can scientifically determine through testing and observation, experiments with evolutionary algorithms show that a stochastic (chance + necessity; randomness + law) program needs front loaded, guiding, problem specific information, measured as active information, to generate CSI.

secondclass said...

IOW, active information does not exist without a specified pattern.

There is nothing in Marks & Dembski's definitions of active information that says that. If you think there is, please provide a quote.

All parts of a program are written in a code and this would also include any part of the program which provides any type of guidance or active information, as far as I understand. Thus the actual code which provides the active information, if over 500 bits, is complex specified information.

Compiled or uncompiled? If compiled, for what processor? If uncompiled, what source language? Comparing a program length to a constant is meaningless. That's a fundamental tenet of computing theory.

If you know anything about specified patterns, you would know that the part of the program which provides active information needs to be defined independently of the pattern of 1s and 0s which create the active information by guiding the program to a solution.

The part of the program which provides active information IS the pattern of 1s and 0s which creates the active information, so the two cannot be independent. And nowhere in Marks & Dembski's definitions do they say that active information necessarily entails a program or a pattern of 1s and 0s, and their definitions do not involve specification.

When you are measuring active information, you are measuring the part of the program which provides guidance to a solution.

Actually, you're measuring the performance of a search relative to that of another search.

(Quoting D&M): active information, which, as the difference between endogenous and exogenous information, measures the contribution of problem-specific information for successfully finding a target.

Exactly. Active information does not measure the amount of problem-specific information. It measures the contribution of the problem-specific information toward finding the target. It's a relative measure of performance, nothing more. The term "information" carries connotations that simply don't hold for D&M's concept of active information.

That’s not the point. Did you miss where I said “whenever?” The point is that *when* CSI *is* generated it is always generated by active information thus validating the NFL theorem and the Law of Conservation of Information by showing that there is no such thing as generating CSI by an unguided method and for free.

According to Dembski, search algorithms cannot generate CSI, not even if they're chock-full of active information. They can only shift CSI around.

It is incorrect that active information would still exist apart from specified patterns. There would be no way of delivering active information to a program if it were not a specified pattern. The computer program would not be able to deliver guidance to the creation of information if there were no specified pattern which the program could read. Refer to above example of “warmer” and “colder.” Remember that active information measures the contribution of "problem specific information" which is a specified part of the 1s and 0s in a program which guides the program to a specific target. Thus, without specified information, there is no active information.

The problem is that you're working from your own understanding of active information instead of D&M's definitions. Their definitions do not involve semiotics at all, but you seem to be reading semiotics into them.

Active information takes the concept of CSI and refines and narrows it, measuring the algorithmically complex and specified information in a program against endogenous and exogenous information, so it can be used as a measurement of front loaded problem specific guidance within evolutionary algorithms.

Who says that active information is necessarily algorithmically complex and specified? Marks and Dembski don't.

The problem with this conversation is that you're trying to explain your claims instead of trying to support them with cites or math. (Your one cite above does nothing to support your claims.)

To understand this, you need to understand that information is a measurement of a decrease of uncertainty. This deals both with length and probability. It is much more than just probabilities. It is a measurement, in binary digits, of realized probabilities.

You're still conflating two different definitions of "bit". There's Shannon's usage as a unit of self-information, or surprisal, which is simply a log transform of probability. Then there's the older and separate usage meaning "binary digit". The number of bits of surprisal is equal to the number of binary digits in a message only if it's encoded in binary such that 0s and 1s are equally probable in every position, which virtually never happens in the real world. "Number of binary digits" makes for a lousy information measure since it's dependent on the encoding, which is arbitrary. Shannon's measure is a property of the message itself, so it's a valid information measure of the message. So when we talk about "bits" as a unit of information in the context of information theory, the number of bits can easily be some fractional number less than one. We're probably agreed on all this and just talking past each other.

Another example of measuring information...

In your examples, all that's needed is the probability of the message to measure its surprisal. Whether calculating that probability involves counting symbols in a particular encoding or not, the fact remains that the probability of the message is all that's needed to determine its surprisal, by definition.

Just a heads up: I don't need examples. The problem is not that I don't understand info theory or active info or CSI. The problem is that you're making claims that you're not supporting. I'm saying that your claims are wrong and that they can't be supported. It seems that the appropriate response would be to support your claims. If I say that your understanding of active information is wrong, you might consider providing a cite to support your views instead of continuing to explain them or providing more examples.

CJYman said...

CJYman:
"IOW, active information does not exist without a specified pattern."

secondclass:
"There is nothing in Marks & Dembski's definitions of active information that says that. If you think there is, please provide a quote."


Dr.s Dembski and Marks:
“Active information captures numerically the problem-specific information that assists a search in reaching a target. We therefore find it convenient to use the term “active information” in two equivalent ways:
1) as the specific information about target location and search-space structure incorporated into a search algorithm that guides a search to a solution.
2) as the numerical measure for this information and defined as the difference between endogenous and exogenous information.”

The specific information about target location, etc., which is incorporated into a search algorithm is a functional pattern of 1s and 0s. If these patterns didn’t provide function, then it wouldn’t be able to be processed by the system and guide the search now would it?

Specified patterns include functional patterns. I’ve explained this at my new post "Specifications ..." You can find it in the top category in the left hand side bar on my blog.

All parts of a program are written in a code and this would also include any part of the program which provides any type of guidance or active information, as far as I understand. Thus the actual code which provides the active information, if measured as over the probabilistic resources, is complex specified information.

secondclass:
"Compiled or uncompiled? If compiled, for what processor? If uncompiled, what source language? Comparing a program length to a constant is meaningless. That's a fundamental tenet of computing theory."


Exactly, and that is why the handler and processor of symbols is very key. IOW, the specificity (as functionality or compressibility) would be measured based on the language and processing system used.

Ie: Specificity of a function in a certain language would be measured based on how many other functions exist within that language which have the same probability based on bits of information per command which causes those functions.

If you know anything about specified patterns, you would know that the part of the program which provides active information needs to be defined independently of the pattern of 1s and 0s as a function which creates the active information by guiding the program to a solution.

secondclass:
"The part of the program which provides active information IS the pattern of 1s and 0s which creates the active information, so the two cannot be independent. And nowhere in Marks & Dembski's definitions do they say that active information necessarily entails a program or a pattern of 1s and 0s, and their definitions do not involve specification."


Independent formulation of a function = pattern which causes function (functional proteins caused by RNA pattern). Refer to above definition and explanation of active information. Also refer to blog post "Specifications .. what are they (Part 1)" in top left side bar of my blog.

CJYman:
"When you are measuring active information, you are measuring the part of the program which provides guidance to a solution."

secondclass:
"Actually, you're measuring the performance of a search relative to that of another search."


I’m not sure if this is what you’re saying, but it is measured relative to that of the same search on a different optimization problem or a different search on that same optimization problem. And any better than chance performance is *caused* by the part of the program which provides guidance to the target/problem. That comes straight from the NFLT papers. Do you have any evidence otherwise?

My discussion of the NFLT can also be found in the left side bar on my blog.

Refer to quoted definition of active information above. Are you beginning to see how it all fits together? It’s all quite logically coherent actually, and it really is common sense.

Cjyman:
"(Quoting D&M): active information, which, as the difference between endogenous and exogenous information, measures the contribution of problem-specific information for successfully finding a target."

secondclass:
"Exactly. Active information does not measure the amount of problem-specific information. It measures the contribution of the problem-specific information toward finding the target. It's a relative measure of performance, nothing more. The term "information" carries connotations that simply don't hold for D&M's concept of active information."


Refer to my last comment especially the part that explains: "problem specific information is measured relative to other search algorithm finding the same target or other targets being searched for by the same search algorithm." It's the same as running a bunch of search algorithms and noticing if any of them are consistently performing better than chance on any target. You then measure the probability of finding that target, such as how "hello" has a probability of 1/26^5. Now, compare it to the average probability that any of those search algorithms found the target. That is your active information. Thus, problem specific information (active information) is a measure of performance relative to pure chance probabilities of finding any target.

NFLT states that you can't get better than chance performance without incorporating known characteristics of the problem/target into the search algorithm. The incorporation of the problem characteristics into the algorithm is the problem specific information. The better than chance performance that is observed as a result is measured as active information.

CJYman:
"That’s not the point. Did you miss where I said “whenever?” The point is that *when* CSI *is* generated it is always generated by active information thus validating the NFL theorem and the Law of Conservation of Information by showing that there is no such thing as generating CSI by an unguided method and for free."

secondclass:
"According to Dembski, search algorithms cannot generate CSI, not even if they're chock-full of active information. They can only shift CSI around.


True, that is what the Conservation of Information theorem indicates. In the case of an evolutionary algorithm it is shifted from the problem specific information which is programmed into the search algorithm into the discovery or “generation” of CSI.

It’s all quite simple from an information theoretic point of view if you actually understand what is meant by the flow of information as probabilistic measurements and that CSI is a measurement of the product of a non-random search. Now, what causes a non-random search? Hint Hint: we are discussing it this very topic.

You will actually note that the measurement of active information into a program is the same measurement for the CSI that outputs from a program. I’ve done the math already for myself and it works. Let’s see if you can figure it out. I’ll give you some time to figure it out and then I will post this in the future.

CJYman:
"It is incorrect that active information would still exist apart from specified patterns. There would be no way of delivering active information to a program if it were not a specified pattern. The computer program would not be able to deliver guidance to the creation of information if there were no specified pattern which the program could read. Refer to above example of “warmer” and “colder.” Remember that active information measures the contribution of "problem specific information" which is a specified part of the 1s and 0s in a program which guides the program to a specific target. Thus, without specified information, there is no active information."

secondclass:
"The problem is that you're working from your own understanding of active information instead of D&M's definitions. Their definitions do not involve semiotics at all, but you seem to be reading semiotics into them.


I'm working on a logical (as in this logically follows from that) understanding of active information. Whether this is my own understanding or not is irrelevant. What is relevant is if it is logical, verified experimentally, and logically flows from NFL and COI Theorems.

I’m beginning to wonder if you’ve actually read D&M’s papers, since the “warmer/colder” scenario is an example of providing active or problem specific information that *they* gave. Semiotics (as meaning or function) is essential to functional or semantic specificity and active information although neither specificity nor active information are measurements of semiotics. Functional Specificity is merely a measure of the probability of finding functional patterns among all available patterns. Specificity doesn't actually measure any "amount of function" within a functional pattern.

Again, refer to definition of active information given above.

How else, but through a functional program of 1s and 0s, is a search algorithm guided to a solution. And don’t take “1s and 0s” too literally, since 1s and 0s are not actually flowing through the processor. I am merely discussing the fact that active information is incorporated into a program that guides a search to a solution.

CJYman:
"Active information takes the concept of CSI and refines and narrows it, measuring the algorithmically complex and specified information in a program against endogenous and exogenous information, so it can be used as a measurement of front loaded problem specific guidance within evolutionary algorithms."

secondclass:
"Who says that active information is necessarily algorithmically complex and specified? Marks and Dembski don't.


Give me one example where it isn’t. And yes, it is definitely specified as per the definition of functional specificity.

secondclass:
"You're still conflating two different definitions of "bit". There's Shannon's usage as a unit of self-information, or surprisal, which is simply a log transform of probability. Then there's the older and separate usage meaning "binary digit". The number of bits of surprisal is equal to the number of binary digits in a message only if it's encoded in binary such that 0s and 1s are equally probable in every position, which virtually never happens in the real world. "Number of binary digits" makes for a lousy information measure since it's dependent on the encoding, which is arbitrary. Shannon's measure is a property of the message itself, so it's a valid information measure of the message. So when we talk about "bits" as a unit of information in the context of information theory, the number of bits can easily be some fractional number less than one. We're probably agreed on all this and just talking past each other."


We may be talking past each other still, since the bits/probability itself is based on the context used – probability is not inherent in a symbol or event. So, it is incorrect that Shannon’s measure is a property of the message itself. Instead it is a probabilistic measurement of that message compared to a “palette” of possible states -- the context. “A” does not have a probability without context. When this context is understood, such as 1 out of 26 possible symbols, then we can calculate, using the logarithmic transformation, the information content. And yes, when it comes to transferring information, this logarithmic transformation does equal how many “binary digits” it takes to “encode” that probability. And yes, I do understand this depends on the language and program which is a factor which causes context.

For example, if we have a programming language which uses only 26 letters and two commands such as “print upper case” and “print lower case,” then we can measure probability and the shannon information necessary, actually measured in binary digits, to send this information through a communication channel. Shannon’s Theory of Information was after all a Theory of Communication (in fact that was the original title).
Ie: 26 letters = 5 binary digits each (with some “nonsense”) and two kinds of print commands = 2 binary digits each. Thus, to send the message “print upperclass A” we would send a message of 7 binary digits, and the actual information theoretic probabilistic measurement of that message, again utilizing the context of the program I’ve created, is also 7 bits.

secondclass:
"In your examples, all that's needed is the probability of the message to measure its surprisal. Whether calculating that probability involves counting symbols in a particular encoding or not, the fact remains that the probability of the message is all that's needed to determine its surprisal, by definition.


Agreed, and all that I was pointing out is that in order to calculate the probability of a string, as a specific example, you must know its length and the probability of each unit compared to a set of units. We are merely attacking the same problem from different angles.

secondclass:
"Just a heads up: I don't need examples. The problem is not that I don't understand info theory or active info or CSI. The problem is that you're making claims that you're not supporting. I'm saying that your claims are wrong and that they can't be supported. It seems that the appropriate response would be to support your claims. If I say that your understanding of active information is wrong, you might consider providing a cite to support your views instead of continuing to explain them or providing more examples."


Of course, if *you* say its wrong ... then *I* need to support my claims?!?!?! How about *you* showing me that I’m wrong? I’ve already supported *my* claims with logic conclusions and cites. It’s now *your* turn.
Thank you for giving me the opportunity to expound upon these issues and logically back up the ID claims. I think its about time that you bring in your alternate hypothesis and understanding of the issues since it is clear that your understanding of CSI, active info, and EAs is quite different from how they are actually formulated.

I suggest you go the side bar at the top of my blog and go to “Philosophical Foundations (Part 1 and 2)” “Specifications ...” and “NFL Theorem” and show how the effects discussed can be explained by a random set of laws (only law and chance, absent intelligence).