[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Absence of Evidence III



A Response by Fritz Seiler to Several Posts Addressing
our Comments on "Absence of Evidence is Evidence... "

Joe Alvarez is on vacation and I will answer for the two
of us. Thus the responsibility for the response is mine.


Several commentators criticized the passage in which we talked about the
value V being significant at level alpha.  I apologize for the sloppy
phraseology of that passage, it happened when I cut our letter down to a
manageable size.  The criticisms addressing this aspect are well
deserved, we were just a bit too cavalier in our effort at shortening.
Bill Huber went on to outline the usual statistical practice with
clarity and in admirable brevity.  Before Joe left, he and I agreed that
we have no basic problem with Bill's description of statistical
practice.  We have just some minor quibbles, almost not enough to
bother.  Bill even outlined the conditions under which a hypothesis and
its alternate are complementary and lead to the situation which we
wanted to discuss.

So let me try again, although I am aware that I approach these problems
as a physicist and use the kind of reasoning and sometimes a terminology
that may be more appropriate in statistical mechanics, quantum
statistics, or quantum mechanics, my former field of work.  So my
physicist persona rebels when I read the statement: "States of nature
have no probability".  Well now, what was I doing when I both measured
and calculated the probabilities of different spin states in nuclear
reactions?  Dealing in nonexisting entities?  I do not think so!  I
think that this, like so many other arguments in our discussions are due
to our different ways of approaching and thinking about the problem.

So please bear with me, when I present our side of the argument.  It is
not something that you read in the standard statistical literature.  We
have found nothing that gives guidance for the case of restricted
hypotheses, in particular for complementary hypotheses.  That is why we
avoided the discussion of Type I and Type II errors and did not use the
concept of power.  If you have a "does it, or does it not?"- situation,
these aspects are inextricably meshed and do not seem to have much
individual meaning.

The main point of the Scientific Method, after a number of preconditions
have been fulfilled, is the experimental verification of a prediction.
Let us just consider the two cases that occur regularly in applying the
Scientific Method: ‘Under these conditions, there is no effect yet' and
a ‘There should be an effect of the predicted size'.  Without any real
loss of generality, let us assume normal distributions.

In the first case we have a set of measurements done under certain
conditions which result in a mean and a standard error of the mean (mu;
sem).  There are only two basic possibilities: Either the mean is
compatible with zero or it is not.  If the probability of obtaining a
value x or lower is p, then the probability of finding the value x or
higher is perforce p'  =  1 - p.  It is the closure condition p + p' =
1, which is absent in the usual statistics for hypothesis testing and
demands special consideration.  I think that this case does fulfill the
two conditions given by Bill Huber.  The reason is that if you know the
distribution of p, you automatically know the distribution of p'; also,
if you know a particular value p*, you also know the corresponding value
of p*'.

Now, the Z-test, which looks at the difference of the mean from zero and
divides it by the standard error, gives a critical Z-value for a
significance level p and the confidence level 1 - p.  The S-shaped
cumulative normal distribution p(X) goes from zero to one.  Its
complementary  distribution p'(X) goes from one to zero, and the curves
intersect at p = p' = 0.5.  The value p  =  0.5 is also the symmetry
axis of the two distributions for all values of X..  Thus the closure
condition with its condition that the sum of the two curves be one
everywhere, takes away one degree of freedom, so there is only one
degree of freedom left: the value of p.  Therefore, if I take the
probability p of the null hypothesis as experimental evidence of weight
p for the null hypothesis, I also know that 1 - p is the weight of
evidence against it.

I admired the care Bill gave to a general discussion of the selection
process for the null hypothesis , and his observations of the usage
among investigators.  I saw similar but far less formal practices at a
former place of employment, when I was for three years the statistician
of record while the real statistician got his Ph.D.  That is where I
learned just how  uneasy you can get when the selection of the null
hypothesis was discussed.  It is a quite a bit of an exaggeration, but I
often felt like the guy in the audience, singled out by the stage
magician: "Take a card, any card" followed by "O.K. that is your null
hypothesis!  Now take another card, any card!" followed by "O.K. that is
your alternate hypothesis."  Exaggerated or not, the point to be made
here is that no aspect of the Scientific Method usually enters into
these deliberations.  What about the simplicity of the model, mentioned
by several respondents as a selection criterion?  I do not think so!
That even sounds vaguely unscientific.  Actually, I think the Scientific
Method should exert a strong influence because with a confidence level
of more than 0.5, you bias the decision in favor of the null
hypothesis.  If you use the usual 90% or 95% confidence levels, that
bias is quite decisive (Would you not want to go to Las Vegas and gamble
at these odds?).  So if you start at or near the known facts, never mind
how complex the model, even a strong bias can be defensible by an
argument of the type "O.K., now that would really make me seriously
consider the possibility of a change."

Personally, I think that too many people are just too ready to accept
any null hypothesis and work with confidence levels of 90% or 95%
without much further thought.  Also, I think the Scientific Method
should be given much more open consideration and result in a published
discussion.  Questions such as: "What do we really know?" and "What can
we really base our null hypothesis on?" should be asked and answered in
a detailed manner.  Then our Z-test has some meaning: failure of the
null hypothesis at - say - 89.99%, indicates evidence for the null
hypothesis is a mere 10%, and there is a 90% counter indication for a
nonzero value of the mean.  Yes, I think of such hypotheses as state
vectors.

In the second case, a comparison is usually made between two stochastic
variables, the model prediction and the measurement result.  It is
important to realize that most models contain stochastic model
parameters and produce a stochastic result..  The question is whether
the two values are compatible within their uncertainties or not.  For
normal distributions, this leads to Student's t-distribution which is
formed by the difference of the two means divided by the standard error
of that difference.  (I am sorry Bill, here we seem to disagree, because
the value of Student's t does depend on the sigmas, and those depend on
the sample size n.  Also, the shape must change with sample size n, or
rather with the degrees of freedom f = n - n*, because for large n it
must approach a normal distribution).  Again we have the same situation
as with the Z-test, either it passes the test at a given confidence
level 1 - p, or it does not.  As adherents of the Scientific Method, we
are only interested in the question: "Does the model pass the test or
does it not?"  For p = 0.5, we would have to admit that it could go
either way; at some higher level of 1 - p, however, we will begin to
accept evidence for a nonzero difference.  When we find that there is
not enough evidence to reject the null hypothesis (no difference!) at a
confidence level 1 - p, this situation is tantamount to evidence for the
hypothesis that there is no difference with a weight (or probability) of
p, and the complementary nonzero value hypothesis has a weight of 1 - p.

Now, let us return to the initial statement of this discussion, in which
we took the usual epidemiological statement to task for saying: "Absence
of evidence does not constitute evidence of absence."  Of course it
does!  We think that this epidemiological statement is invalid, merely
on the basis of information theory.  ‘Absence of evidence' conveys
information, although negative information, about the state of the
evidence, and that will impact both hypotheses.  To a Bayesian, that is
evident, but this is much more general, because, for complementary
hypotheses, the evidence for the null hypothesis and the evidence for
the nonzero value hypothesis are inextricably linked.

In this discussion, we must also be careful what we mean by ‘absence of
evidence'.  Without qualification, that term can mean a lot of different
things. We do not mean that there is no evidence at all, although even
then, a case could be made by contending that if an agent is present and
there is no direct evidence of toxicity, then it cannot be very toxic.
However, what we mean by ‘absence of evidence' is that there are
experimental data but they do not show evidence of toxicity in a
statistical test.  Then, within the accuracy of the data, as taken into
account by the test, we have no scientific evidence of toxicity at the
confidence and significance levels given and, by the Scientific Method
or just plain common horse sense, there is no justification for
postulating a toxicity below the sensitivity of the test.  At this
point, too many risk assessors have the tendency of throwing
considerations of risk management into the pot, and that is just plain
wrong.  First, you make the conclusion in the form of the statement:
"Science cannot find any toxicity within the sensitivity of its test, so
anything we do now cannot have a scientific basis."  Then the decision
maker can go on and do whatever he or she wants to do, as long as it is
not called a scientifically based decision.

Finally, let us set all statistical shop talk aside and focus on the
science of the matter: If there is no scientific evidence for a nonzero
value, then there is no scientific justification to postulate a nonzero
value.  If there is evidence with a weight of 1 - p for a nonzero value
and a weight of p for a zero value, and if p is neither too small nor
too large (which would lead to obvious conclusions), then it is up to
the risk assessor to carefully evaluate the implications of that value p
and present his conclusion to the decision maker for insertion into the
decision process.  To further disassociate risk assessment and risk
management, the risk assessor can draw up a table of interpretations for
different values of p.


*************************

Fritz A. Seiler, Ph.D.
Principal
Sigma Five Associates
P.O. Box 14006
Albuquerque, NM 87191-4006
Tel.    505-323-7848
Fax.    505-293-3911
e-mail: faseiler@nmia.com

**************************


************************************************************************
The RADSAFE Frequently Asked Questions list, archives and subscription
information can be accessed at http://www.ehs.uiuc.edu/~rad/radsafe.html