Basic Elements and Problems of Probability Theory - Society for ...
Basic Elements and Problems of Probability Theory - Society for ...
Basic Elements and Problems of Probability Theory - Society for ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Journal <strong>of</strong> Scienti c Exploration, Vol. 13, No. 4, pp. 579–613, 1999 0892-3310/99<br />
© 1999 <strong>Society</strong> <strong>for</strong> Scienti c Exploration<br />
<strong>Basic</strong> <strong>Elements</strong> <strong>and</strong> <strong>Problems</strong> <strong>of</strong> <strong>Probability</strong> <strong>Theory</strong><br />
HANS PRIMAS<br />
Laboratory <strong>of</strong> Physical Chemistry, ETH-Zentrum<br />
CH-8092 Zürich, Switzerl<strong>and</strong><br />
primas@phys.chem.ethz.ch<br />
Abstract — After a brief review <strong>of</strong> ontic <strong>and</strong> epistemic descriptions, <strong>and</strong> <strong>of</strong><br />
subjective, logical <strong>and</strong> statistical interpretations <strong>of</strong> probability, we summarize<br />
the traditional axiomatization <strong>of</strong> calculus <strong>of</strong> probability in terms <strong>of</strong><br />
Boolean algebras <strong>and</strong> its set-theoretical realization in terms <strong>of</strong> Kolmogorov<br />
probability spaces. Since the axioms <strong>of</strong> mathematical probability theory say<br />
nothing about the conceptual meaning <strong>of</strong> “r<strong>and</strong>omness” one considers probability<br />
as property <strong>of</strong> the generating conditions <strong>of</strong> a process so that one can relate<br />
r<strong>and</strong>omness with predictability (or retrodictability). In the measure-theoretical<br />
codification <strong>of</strong> stochastic processes genuine chance processes can be<br />
defined rigorously as so-called regular processes which do not allow a longterm<br />
prediction. We stress that stochastic processes are equivalence classes <strong>of</strong><br />
individual point functions so that they do not refer to individual processes but<br />
only to an ensemble <strong>of</strong> statistically equivalent individual processes.<br />
Less popular but conceptually more important than statistical descriptions<br />
are individual descriptions which refer to individual chaotic processes. First,<br />
we review the individual description based on the generalized harmonic<br />
analysis by Norbert Wiener. It allows the definition <strong>of</strong> individual purely<br />
chaotic processes which can be interpreted as trajectories <strong>of</strong> regular statistical<br />
stochastic processes. Another individual description refers to algorithmic<br />
procedures which connect the intrinsic r<strong>and</strong>omness <strong>of</strong> a finite sequence with<br />
the complexity <strong>of</strong> the shortest program necessary to produce the sequence.<br />
Finally, we ask why there can be laws <strong>of</strong> chance. We argue that r<strong>and</strong>om<br />
events fulfill the laws <strong>of</strong> chance if <strong>and</strong> only if they can be reduced to (possibly<br />
hidden) deterministic events. This mathematical result may elucidate the<br />
fact that not all non-predictable events can be grasped by the methods <strong>of</strong><br />
mathematical probability theory.<br />
Keywords: probability — stochasticity — chaos — r<strong>and</strong>omness —<br />
chance — determinism<br />
Ontic <strong>and</strong> Epistemic Descriptions<br />
Overview<br />
One <strong>of</strong> the most important results <strong>of</strong> contemporary classical dynamics is the<br />
pro<strong>of</strong> that the deterministic differential equations <strong>of</strong> some smooth classical<br />
Hamiltonian systems have solutions exhibiting irregular behavior. The classical<br />
view <strong>of</strong> physical determinism has been eloquently <strong>for</strong>mulated by Pierre<br />
579
580 H. Primas<br />
Simon Laplace. While Newton believed that the stability <strong>of</strong> the solar system<br />
could only be achieved with the help <strong>of</strong> God, Laplace “had no need <strong>of</strong> that<br />
hypothesis” [1] since he could explain the solar system by the deterministic<br />
Newtonian mechanics alone. Laplace discussed his doctrine <strong>of</strong> determinism in<br />
the introduction to his Philosophical Essay on <strong>Probability</strong>, in which he imaged<br />
a superhuman intelligence capable <strong>of</strong> grasping the initial conditions at any<br />
fixed time <strong>of</strong> all bodies <strong>and</strong> atoms <strong>of</strong> the universe, <strong>and</strong> all the <strong>for</strong>ces acting<br />
upon it. For such a superhuman intelligence “nothing would be uncertain <strong>and</strong><br />
the future, as the past, would be present to its eyes.” [2] Laplace’s reference to<br />
the future <strong>and</strong> the past implies that he refers to a fundamental theory with an<br />
unbroken time-reversal symmetry. His reference to a “superhuman intelligence”<br />
suggests that he is not referring to our possible knowledge <strong>of</strong> the<br />
world, but to things “as they really are.” The manifest impossibility to ascertain<br />
experimentally exact initial conditions necessary <strong>for</strong> a description <strong>of</strong><br />
things “as they really are” is what led Laplace to the introduction <strong>of</strong> a statistical<br />
description <strong>of</strong> the initial conditions in terms <strong>of</strong> probability theory. Later<br />
Josiah Willard Gibbs introduced the idea <strong>of</strong> an ensemble <strong>of</strong> a very large number<br />
<strong>of</strong> imaginary copies <strong>of</strong> mutually uncorrelated individual systems, all dynamically<br />
precisely defined but not necessarily starting from precisely the<br />
same individual states. [3] The fact that a statistical description in the sense <strong>of</strong><br />
Gibbs presupposes the existence <strong>of</strong> a well-defined individual description<br />
demonstrates that a coherent statistical interpretation in terms <strong>of</strong> an ensemble<br />
<strong>of</strong> individual systems requires an individual interpretation as a backing.<br />
The empirical inaccessibility <strong>of</strong> the precise initial states <strong>of</strong> most physical<br />
systems requires a distinction between epistemic <strong>and</strong> ontic interpretations. [4]<br />
Epistemic interpretations refer to our knowledge <strong>of</strong> the properties or modes <strong>of</strong><br />
reactions <strong>of</strong> observed systems. On the other h<strong>and</strong>, ontic interpretations refer<br />
to intrinsic properties <strong>of</strong> hypothetical individual entities, regardless <strong>of</strong><br />
whether we know them or not, <strong>and</strong> independently <strong>of</strong> observational arrangements.<br />
Albeit ontic interpretations do not refer to our knowledge, there is a<br />
meaningful sense in which it is natural to speak <strong>of</strong> theoretical entities “as they<br />
really are,” since in good theories they supply the indispensable explanatory<br />
power.<br />
States which refer to an epistemic interpretation are called epistemic states,<br />
<strong>and</strong> they refer to our knowledge. If this knowledge <strong>of</strong> the properties or modes<br />
<strong>of</strong> reactions <strong>of</strong> systems is expressed by probabilities in the sense <strong>of</strong> relative<br />
frequencies in a statistical ensemble <strong>of</strong> independently repeated experiments,<br />
we speak <strong>of</strong> a statistical interpretation <strong>and</strong> <strong>of</strong> statistical states. States which<br />
refer to an ontic interpretation are called ontic states. Ontic states are assumed<br />
to give a description <strong>of</strong> a system “as it really is,” that is, independently <strong>of</strong> any<br />
influences due to observations or measurements. They refer to individual systems<br />
<strong>and</strong> are assumed to give an exhaustive description <strong>of</strong> a system. Since an<br />
ontic description does not encompass any concept <strong>of</strong> observation, ontic states<br />
do not refer to predictions <strong>of</strong> what happens in experiments. At this stage it is
left open to what extent ontic states are knowable. An adopted ontology <strong>of</strong> the<br />
intrinsic description induces an operationally meaningful epistemic interpretation<br />
<strong>for</strong> every epistemic description: an epistemic state refers to our knowledge<br />
<strong>of</strong> an ontic state.<br />
Cryptodeterministic Systems<br />
<strong>Probability</strong> <strong>Theory</strong> 581<br />
In modern mathematical physics Laplacian determinism is rephrased as<br />
Hadamard’s principle <strong>of</strong> scientific determinism according to which every initial<br />
ontic state <strong>of</strong> a physical system determines all future ontic states. [5] An<br />
ontically deterministic dynamical system which even in principle does not<br />
allow a precise <strong>for</strong>ecast <strong>of</strong> its observable behavior in the remote future will<br />
be called cryptodeterministic. [6] Already, Antoine Augustine Cournot<br />
(1801–1877) <strong>and</strong> John Venn (1834–1923) recognized clearly that the dynamics<br />
<strong>of</strong> complex dynamical classical systems may depend in an extremely sensitive<br />
way on the initial <strong>and</strong> boundary conditions. Even if we can determine<br />
these conditions with arbitrary but finite accuracy, the individual outcome<br />
cannot be predicted; the resulting chaotic dynamics allows only an epistemic<br />
description in terms <strong>of</strong> statistical frequencies. [7] The instability <strong>of</strong> such deterministic<br />
processes represents an objective feature <strong>of</strong> the corresponding probabilistic<br />
description. A typical experiment which demonstrates the objective<br />
probabilistic character <strong>of</strong> a cryptodeterministic mechanical system is Galton’s<br />
desk. [8] Modern theory <strong>of</strong> deterministic chaos has shown how unpredictability<br />
can arise from the iteration <strong>of</strong> perfectly well-defined functions because <strong>of</strong> a<br />
sensitive dependence on initial conditions. [9] More precisely, the catchword<br />
“deterministic chaos” refers to ontically deterministic systems with a sensitive<br />
dependence on the ontic initial state such that no measurement on the systems<br />
allows a long-term prediction <strong>of</strong> the ontic state <strong>of</strong> the system.<br />
Predictions refer to inferences <strong>of</strong> the observable future behavior <strong>of</strong> a system<br />
from empirically estimated initial states. While in some simple systems the<br />
ontic laws <strong>of</strong> motion may allow to <strong>for</strong>ecast its observable behavior in the near<br />
future with great accuracy, ontic determinism implies neither epistemic predictability<br />
nor epistemic retrodictability. Laplace knew quite well that a perfect<br />
measurement <strong>of</strong> initial condition is impossible, <strong>and</strong> he never asserted that<br />
deterministic systems are empirically predictable. Nevertheless, many positivists<br />
tried to define determinism by predictability. For example, according to<br />
Herbert Feigl:<br />
The clarified (purified) concept <strong>of</strong> causation is defined in terms <strong>of</strong> predictability according<br />
to a law (or, more adequately, according to a set <strong>of</strong> laws). [10]<br />
Such attempts are based on a notorious category mistake. Determinism does<br />
not deal with predictions. Determinism refers to an ontic description. On the<br />
other h<strong>and</strong>, predictability is an epistemic concept. Yet, epistemic statements
582 H. Primas<br />
are <strong>of</strong>ten confused with ontic assertions. For example, Max Born has claimed<br />
that classical point mechanics is not deterministic since there are unstable mechanical<br />
systems which are epistemically not predictable. [11] Similarly, it has<br />
been claimed that human behavior is not deterministic since it is not predictable.<br />
[12] A related mistaken claim is that “...an underlying deterministic<br />
mechanism would refute a probabilistic theory by contradicting the r<strong>and</strong>omness<br />
which ...is dem<strong>and</strong>ed by such a theory.” [13] As emphasized by John Earman:<br />
The history <strong>of</strong> philosophy is littered with examples where ontology <strong>and</strong> epistemology<br />
have been stirred together into a confused <strong>and</strong> confusing brew. ...Producing an ‘epistemological<br />
sense’ <strong>of</strong> determinism is an abuse <strong>of</strong> language since we already have a perfectly<br />
adequate <strong>and</strong> more accurate term – prediction – <strong>and</strong> it also invites potentially<br />
misleading argumentation – e.g., in such-<strong>and</strong>-such a case prediction is not possible <strong>and</strong>,<br />
there<strong>for</strong>e, determinism fails. [14]<br />
Kinds <strong>of</strong> <strong>Probability</strong><br />
Often, probability theory is considered as the natural tool <strong>for</strong> an epistemic<br />
description <strong>of</strong> cryptodeterministic systems. However, this view is not as evident<br />
as is <strong>of</strong>ten thought. The virtue <strong>and</strong> the vice <strong>of</strong> modern probability theory<br />
are split-up into a probability calculus <strong>and</strong> its conceptual foundation. Nowadays,<br />
mathematical probability theory is just a branch <strong>of</strong> pure mathematics,<br />
based on some axioms devoid <strong>of</strong> any interpretation. In this framework, the<br />
concepts “probability,” “independence,” etc. are conceptually unexplained<br />
notions, they have a purely mathematical meaning. While there is a widespread<br />
agreement concerning the essential features <strong>of</strong> the calculus <strong>of</strong> probability,<br />
there are widely diverging opinions what the referent <strong>of</strong> mathematical<br />
probability theory is. [15] While some authors claim that probability refers<br />
exclusively to ensembles, there are important problems which require a discussion<br />
<strong>of</strong> single r<strong>and</strong>om events or <strong>of</strong> individual chaotic functions. Furthermore,<br />
it is in no way evident that the calculus <strong>of</strong> axiomatic probability theory<br />
is appropriate <strong>for</strong> empirical science. In fact, “probability is one <strong>of</strong> the outst<strong>and</strong>ing<br />
examples <strong>of</strong> the ‘epistemological paradox’ that we can successfully<br />
use our basic concepts without actually underst<strong>and</strong>ing them.” [16]<br />
Surprisingly <strong>of</strong>ten it is assumed that in a scientific context everybody means<br />
intuitively the same when speaking <strong>of</strong> “probability,” <strong>and</strong> that the task <strong>of</strong> an interpretation<br />
only consists in exactly capturing this single intuitive idea. Even<br />
prominent thinkers could not free themselves from predilections which only<br />
can be understood from the historical development. For example, Friedrich<br />
Waismann [17] categorically maintains that there is no other motive <strong>for</strong> the introduction<br />
<strong>of</strong> probabilities than the incompleteness <strong>of</strong> our knowledge. Just as<br />
dogmatically, Richard von Mises [18] holds that, without exceptions, probabilities<br />
are empirical <strong>and</strong> that there is no possibility to reveal the values <strong>of</strong>
<strong>Probability</strong> <strong>Theory</strong> 583<br />
probabilities with the aid <strong>of</strong> another science, e.g. mechanics. On the other<br />
h<strong>and</strong>, Harold Jeffreys maintains that “no ‘objective’ definition <strong>of</strong> probability<br />
in terms <strong>of</strong> actual or possible observations, or possible properties <strong>of</strong> the world,<br />
is admissible.” [19] Leonard J. Savage claims that “personal, or subjective,<br />
probability is the only kind that makes reasonably rigorous sense.” [20] However,<br />
despite many such statements to the contrary, we may state with some<br />
confidence that there is not just one single “correct” interpretation. There are<br />
various valid possibilities to interpret mathematical probability theory. Moreover,<br />
the various interpretations do not fall neatly into disjoint categories. As<br />
Bertr<strong>and</strong> Russell underlines,<br />
in such circumstances, the simplest course is to enumerate the axioms from which the<br />
theory can be deduced, <strong>and</strong> to decide that any concept which satisfies these axioms has<br />
an equal right, from the mathematician’s point <strong>of</strong> view, to be called ‘probability.’ ... It<br />
must be understood that there is here no question <strong>of</strong> truth or falsehood. Any concept<br />
which satisfies the axioms may be taken to be mathematical probability. In fact, it<br />
might be desirable to adopt one interpretation in one context, <strong>and</strong> another in another.<br />
[21]<br />
Subjective <strong>Probability</strong><br />
A probability interpretation is called objective if the probabilities are assumed<br />
to be independent or dissected from any human considerations. Subjective<br />
interpretations consider probability as a rational measure <strong>of</strong> the personal<br />
belief that the event in question occurs. A more opertionalistic view defines<br />
subjective probability as the betting rate on an event which is fair according to<br />
the opinion <strong>of</strong> a given subject. It is required that the assessments a rational person<br />
makes are logically coherent such that no logical contradictions exist<br />
among them. The postulate <strong>of</strong> coherence should make it impossible to set up a<br />
series <strong>of</strong> bets against a person obeying these requirements in such a manner<br />
that the person is sure to lose, regardless <strong>of</strong> the outcome <strong>of</strong> the events being<br />
wagered upon. Subjective probabilities depend on the degree <strong>of</strong> personal<br />
knowledge <strong>and</strong> ignorance concerning the events, objects or conditions under<br />
discussion. If the personal knowledge changes, the subjective probabilities<br />
change too. Often, it is claimed to be evident that subjective probabilities have<br />
no place in a physical theory. However, subjective probability cannot be disposed<br />
<strong>of</strong> quite that simply. It is astonishing, how many scientists uncompromisingly<br />
defend an objective interpretation without knowing any <strong>of</strong> the important<br />
contributions on subjective probability published in the last decades.<br />
Nowadays, there is a very considerable rational basis behind the concept <strong>of</strong><br />
subjective probability. [22]<br />
It is debatable how the pioneers would have interpreted probability, but<br />
their practice suggests that they dealt with some kind <strong>of</strong> “justified degree <strong>of</strong><br />
belief.” For example, in one <strong>of</strong> the first attempts to <strong>for</strong>mulate mathematical
584 H. Primas<br />
“laws <strong>of</strong> chance,” Jakob Bernoulli characterized in 1713 his Ars Conject<strong>and</strong>i<br />
probability as a strength <strong>of</strong> expectation. [23] For Pierre Simon Laplace probabilities<br />
represents a state <strong>of</strong> knowledge, he introduced a priori or geometric<br />
probabilities as the ratio <strong>of</strong> favorable to “equally possible” cases [24] — a definition<br />
<strong>of</strong> historical interest which, however, is both conceptually <strong>and</strong> mathematically<br />
inadequate.<br />
The early subjective interpretations are since long out <strong>of</strong> date, but practicing<br />
statisticians have always recognized that subjective judgments are inevitable.<br />
In 1937, Bruno de Finetti made a fresh start in the theory <strong>of</strong> subjective probability<br />
by introducing the essential new notion <strong>of</strong> exchangeability. [25] de<br />
Finetti’s subjective probability is a betting rate <strong>and</strong> refers to single events. A<br />
set <strong>of</strong> n distinct events E 1 , E 2 , ..., E n are said to be exchangeable if any event depending<br />
on these events has the same subjective probability (de Finetti’s betting<br />
rate) no matter how the E j are chosen or labeled. Exchangeability is sufficient<br />
<strong>for</strong> the validity <strong>of</strong> the law <strong>of</strong> large numbers. The modern concept <strong>of</strong><br />
subjective probability is not necessarily incompatible with that <strong>of</strong> objective<br />
probability. de Finetti’s representation theorem gives convincing explanation<br />
<strong>of</strong> how there can be wide inter-subjective agreement about the values <strong>of</strong> subjective<br />
probabilities. According to Savage, a rational man behaves as if he<br />
used subjective probabilities. [26]<br />
Inductive <strong>Probability</strong><br />
Inductive probability belongs to the field <strong>of</strong> scientific inductive inference.<br />
Induction is the problem <strong>of</strong> how to make inferences from observed to unobserved<br />
(especially future) cases. It is an empirical fact that we can learn from<br />
experience, but the problem is that nothing concerning the future can be logically<br />
inferred from past experience. It is the merit <strong>of</strong> the modern approaches to<br />
have recognized that induction has to be some sort <strong>of</strong> probabilistic inference,<br />
<strong>and</strong> that the induction problem belongs to a generalized logic. Logical probability<br />
is related to, but not identical with subjective probability. Subjective<br />
probability is taken to represent the extent to which a person believes a statement<br />
is true. The logical interpretation <strong>of</strong> probability theory is a generalization<br />
<strong>of</strong> the classical implication, <strong>and</strong> it is not based on empirical facts but on the<br />
logical analysis <strong>of</strong> these. The inductive probability is the degree <strong>of</strong> confirmation<br />
<strong>of</strong> a hypothesis with reference to the available evidence in favor <strong>of</strong> this hypothesis.<br />
The logic <strong>of</strong> probable inference <strong>and</strong> the logical probability concept goes<br />
back to the work <strong>of</strong> John Maynard Keynes who in 1921 defined probability as<br />
a “logical degree <strong>of</strong> belief.” [27] This approach has been extended by Bernard<br />
Osgood Koopman [28] <strong>and</strong> especially by Rudolf Carnap to a comprehensive<br />
system <strong>of</strong> inductive logic. [29] Inductive probabilities occur in science mainly<br />
in connection with judgments <strong>of</strong> empirical results; they are always related to a<br />
single case <strong>and</strong> are never to be interpreted as frequencies. The inductive probability<br />
is also called “non-demonstrative inference,” “intuitive probability”
(Koopman), “logical probability” or “probability 1 ” (Carnap). A hard nut to<br />
crack in probabilistic logic is the proper choice <strong>of</strong> a probability measure — it<br />
cannot be estimated empirically. Given a certain measure inductive logic<br />
works with a fixed set <strong>of</strong> rules so that all inferences can be effected automatically<br />
by a general computer. In this sense inductive probabilities are objective<br />
quantities. [30]<br />
Statistical <strong>Probability</strong><br />
<strong>Probability</strong> <strong>Theory</strong> 585<br />
Historically statistical probabilities have been interpreted as limits <strong>of</strong> frequencies,<br />
that is, as empirical properties <strong>of</strong> the system (or process) considered.<br />
But statistical probabilities cannot be assigned to a single event. This is an old<br />
problem <strong>of</strong> the frequency interpretation <strong>of</strong> which already John Venn was<br />
aware. In 1866 Venn tried to define a probability explicitly in terms <strong>of</strong> relative<br />
frequencies <strong>of</strong> occurrence <strong>of</strong> events “in the long run.” He added that “the run<br />
must be supposed to be very long, in fact never to stop.” [31] Against this simpleminded<br />
frequency interpretation there is a grave objection: any empirical<br />
evidence concerning relative frequencies is necessarily restricted to a finite set<br />
<strong>of</strong> events. Yet, without additional assumptions nothing can be inferred about<br />
the value <strong>of</strong> the limiting frequency <strong>of</strong> a finite segment, no matter how long it<br />
may be. There<strong>for</strong>e, the statistical interpretation <strong>of</strong> the calculus <strong>of</strong> probability<br />
has to be supplemented by a decision technique that allows to decide which<br />
probability statements we should accept. Satisfactory acceptance rules are notoriously<br />
difficult to <strong>for</strong>mulate.<br />
The simplest technique is the old maxim <strong>of</strong> Antoine Augustine Cournot: if<br />
the probability <strong>of</strong> an event is sufficiently small, one should act in a way as if<br />
this event will not occur at a solitary realization. [32] However, the theory<br />
gives no criterion <strong>for</strong> deciding what is “sufficiently small.” A more elegant<br />
(but essentially equivalent) way out is the proposal by Carl Friedrich von<br />
Weizsäcker to consider probability as a prediction <strong>of</strong> a relative frequency, so<br />
that “the probability is only the expectation value <strong>of</strong> the relative frequency.”<br />
[33] That is, we need in addition a judgment about a statement. This idea is in<br />
accordance with Carnap’s view that two meanings <strong>of</strong> probability must be recognized:<br />
the inductive probability (his “probability 1<br />
”), <strong>and</strong> statistical probability<br />
(his “probability 2<br />
”). [34] The logical probability is supposed to express a<br />
logical relation between a given evidence <strong>and</strong> a hypothesis. They “speak about<br />
statements <strong>of</strong> science; there<strong>for</strong>e, they do not belong to science proper but to<br />
the logic or methodology <strong>of</strong> science, <strong>for</strong>mulated in the meta-language.” On the<br />
other h<strong>and</strong>, “the statements on statistical probability, both singular <strong>and</strong> general<br />
statements, e.g., probability laws in physics or in economics, are synthetic<br />
<strong>and</strong> serve <strong>for</strong> the description <strong>of</strong> general features <strong>of</strong> facts. There<strong>for</strong>e, these<br />
statements occur within science, <strong>for</strong> example, in the language <strong>of</strong> physics<br />
(taken as object language).” [35] That is, according to this view, inductive<br />
logic with its logical probabilities is a necessary completion <strong>of</strong> statistical probabilities:<br />
without inductive logic we cannot infer statistical probabilities from
586 H. Primas<br />
observed frequencies. The supplementation <strong>of</strong> the frequency interpretation by<br />
a subjective factor cannot be avoided by introduction <strong>of</strong> a new topology. For<br />
example, if one introduces the topology associated with the field <strong>of</strong> p-adic<br />
numbers [36], one has to select subjectively a finite prime number p. As emphasized<br />
by Wolfgang Pauli, no frequency interpretation can avoid a subjective<br />
factor:<br />
An irgend einer Stelle [muss] eine Regel für die praktische Verhaltungsweise des Menschen<br />
oder spezieller des Natur<strong>for</strong>schers hinzugenommen werden, die auch dem subjektiven<br />
Faktor Rechnung trägt, nämlich: auch die einmalige Realisierung eines sehr<br />
unwahrscheinlichen Ereignisses wird von einem gewissen Punkt an als praktisch unmöglich<br />
angesehen.... An dieser Stelle stösst man schliesslich auf die prinzipielle Grenze<br />
der Durchführbarkeit des ursprünglichen Programmes der rationalen Objektivierung<br />
der einmaligen subjektiven Erwartung. [37]<br />
English translation (taken from Enz <strong>and</strong> von Meyenn (1994), p.45): "[It is] is necessary<br />
somewhere or other to include a rule <strong>for</strong> the attitude in practice <strong>of</strong> the human observer,<br />
or in particular the scientist, which takes account <strong>of</strong> the subjective factor as well, namely<br />
that the realisation, even on a single occasion, <strong>of</strong> a very unlikely event is regarded<br />
from a certain point on as impossible in practice. ... At this point one finally reaches the<br />
limits which are set in principle to the possibility <strong>of</strong> carrying out the original programme<br />
<strong>of</strong> the rational objectivation <strong>of</strong> the unique subjective expectation."<br />
Later Richard von Mises [38] tried to overcome this difficulty by introducing<br />
the notion <strong>of</strong> “irregular collectives,” consisting <strong>of</strong> one infinite sequence in<br />
which the limit <strong>of</strong> the relative frequency <strong>of</strong> each possible outcome exists <strong>and</strong> is<br />
indifferent to a place selection. In this approach the value <strong>of</strong> this limit is called<br />
the probability <strong>of</strong> this outcome. The essential underlying idea was the “impossibility<br />
<strong>of</strong> a successful gambling system.” While at first sight Mises’ arguments<br />
seemed to be reasonable, he could not achieve a convincing success.<br />
[39] However, Mises’ approach provided the crucial idea <strong>for</strong> the fruitful computational-complexity<br />
approach to r<strong>and</strong>om sequences, discussed in more detail<br />
below.<br />
Mathematical <strong>Probability</strong><br />
Mathematical <strong>Probability</strong> as a Measure on a Boolean Algebra<br />
In the mathematical codification <strong>of</strong> probability theory a chance event is defined<br />
only implicitly by axiomatically characterized relations between events.<br />
These relations have a logical character so that one can assign to every event a<br />
proposition stating its occurrence. All codifications <strong>of</strong> classical mathematical<br />
probability theory are based on Boolean classifications or Boolean logic. That<br />
is, the algebraic structure <strong>of</strong> events is assumed to be a Boolean algebra, called<br />
the algebra <strong>of</strong> events. In 1854, George Boole introduced these algebras in<br />
order
to investigate the fundamental laws <strong>of</strong> those operations <strong>of</strong> the mind by which reasoning<br />
is per<strong>for</strong>med; to give expression to them in the symbolic language <strong>of</strong> a Calculus, <strong>and</strong><br />
upon this foundation to establish the science <strong>of</strong> Logic <strong>and</strong> to construct its method; to<br />
make that method itself the basis <strong>of</strong> a general method <strong>for</strong> the application <strong>of</strong> the mathematical<br />
doctrine <strong>of</strong> Probabilities... [40]<br />
s ®<br />
s<br />
2<br />
Mathematical probability is anything that satisfies the axioms <strong>of</strong> mathematical<br />
probability theory. As we will explain in the following in some more detail,<br />
mathematical probability theory is the study <strong>of</strong> a pair (B, p), where the algebra<br />
<strong>of</strong> events is a -complete Boolean algebra B, <strong>and</strong> the map p:B [0,1]<br />
is a -additive probability measure. [41] [42]<br />
An algebra <strong>of</strong> events is a Boolean algebra (B,H,E,^ ). If an element A B<br />
is an event, then A ^<br />
<strong>Probability</strong> <strong>Theory</strong> 587<br />
is the event that A does not take place. The element A EB<br />
is the event which occurs when at least one <strong>of</strong> the events A <strong>and</strong> B occurs, while<br />
A HB is the event when both events A <strong>and</strong> B occur. The unit element 1 represents<br />
the sure event while the zero element 0 represents the impossible element.<br />
If A <strong>and</strong> B are any two elements <strong>of</strong> the Boolean algebra B which satisfies<br />
the relationA EB = B (or the equivalent relation A HB = A ) we say that “A<br />
is smaller than B” or that “A implies B” <strong>and</strong> write A £ B .<br />
<strong>Probability</strong> is defined as a norm p:B® [0,1] on a Boolean algebra B <strong>of</strong><br />
events. That is, to every event A 2 B there is associated a probability p (A) <strong>for</strong><br />
the occurrence <strong>of</strong> the event A. The following properties are required <strong>for</strong> p (A):<br />
p is strictly positive, i.e. p(A) 0 <strong>for</strong> every A 2 B <strong>and</strong> p(A) = 0 if <strong>and</strong><br />
only if A = 0, where 0 is the zero <strong>of</strong> B,<br />
p is normed, i.e. p (1) = 1, where 1 is the unit <strong>of</strong> B,<br />
p is additive, i.e. p(A EB) = p(A) + p(B) if A <strong>and</strong> B are disjoint, that<br />
is if A HB = 0 .<br />
It follows that 0 £ p(A) £ 1 <strong>for</strong> everyA 2 B, <strong>and</strong> A £ B ) p(A) £ p(B ) .<br />
In contrast to a Kolmogorov probability measure, the measure p is strictly<br />
positive. That is, p (B) = 0 implies that B is the unique smallest element <strong>of</strong> the<br />
Boolean algebra B <strong>of</strong> events.<br />
In probability theory it is necessary to consider also countably infinitely<br />
many events so that one needs in addition some continuity requirements. By a<br />
Boolean s -algebra one underst<strong>and</strong>s a Boolean algebra where the addition <strong>and</strong><br />
multiplication operations are per<strong>for</strong>mable on each countable sequence <strong>of</strong><br />
events. That is, in a Boolean s -algebra B there is <strong>for</strong> every infinite sequence<br />
A 1, A 2, A 3,... <strong>of</strong> elements <strong>of</strong> B a smallest element A 1 E A 2 E A 3 × × × 2 B.<br />
The continuity required <strong>for</strong> the probability p is then the so-called s - additivity:<br />
¥<br />
k= 1 p(A k )<br />
a measure p on a s -algebra is s -additive if p{E ¥ k= 1A k } = å<br />
whenever {A k } is a sequence <strong>of</strong> pairwise disjoint events, A j HA k = 0<br />
<strong>for</strong> all j ¹= k .
588 H. Primas<br />
Since not every Boolean algebra is a s -algebra, the property <strong>of</strong> countable additivity<br />
is an essential restriction.<br />
Set-Theoretical <strong>Probability</strong> <strong>Theory</strong><br />
It is almost universally accepted that mathematical probability theory consists<br />
<strong>of</strong> the study <strong>of</strong> Boolean s -algebras. For reasons <strong>of</strong> mathematical convenience,<br />
one usually represents the Boolean algebra <strong>of</strong> events by a Boolean algebra<br />
<strong>of</strong> subsets <strong>of</strong> some set. Using this representation one can go back to a<br />
well-established integration theory, to the theory <strong>of</strong> product measures, <strong>and</strong> to<br />
the Radon–Nikod ¢ym theorem <strong>for</strong> the definition <strong>of</strong> conditional probabilities.<br />
According to a fundamental representation theorem by Marshall Harvey Stone<br />
every ordinary Boolean algebra with no further condition is isomorphic to the<br />
algebra (P(W ),\,[ ,¢ ) <strong>of</strong> all subsets <strong>of</strong> some point set . [43] Here B corresponds<br />
to the power set P(W ) <strong>of</strong> all subsets <strong>of</strong> the set W , the conjunction H<br />
corresponds to the set-theoretical intersection \ , the disjunction E corresponds<br />
to the set-theoretical union [ , <strong>and</strong> the negation ^ corresponds to the<br />
set-theoretical complementation ¢ . The multiplicative neutral element 1 corresponds<br />
to the set W , while the additive neutral element corresponds to the<br />
empty set ; . However, a s -complete Boolean algebra is in general not s -isomorphic<br />
to a s -complete Boolean algebra <strong>of</strong> point sets. Yet, every s -complete<br />
Boolean algebra is s -isomorphic to a s -complete Boolean algebra <strong>of</strong> point<br />
sets modulo a s -ideal in that algebra. [44]<br />
Conceptually, this result is the starting point <strong>for</strong> the axiomatic foundation by<br />
Andrei Nikolaevich Kolmogorov <strong>of</strong> 1933 which reduces mathematical probability<br />
theory to classical measure theory. [45] It is based on a so-called probability<br />
space (W ,S , m) consisting <strong>of</strong> a non-empty set W (called sample space)<br />
<strong>of</strong> points, a class S <strong>of</strong> subsets <strong>of</strong> W which is a s -algebra (i.e. is closed with respect<br />
to the set-theoretical operations executed a countable number <strong>of</strong> times),<br />
<strong>and</strong> a probability measure m on S . Sets that belong to S are called S -measurable<br />
(or just measurable if S is understood). The pair (W ,S ) is called a measurable<br />
space. A probability measure m on (W ,S ) is a function m : S ® [0,1]<br />
satisfying m (; ) = 0, m (W ) = 1 , <strong>and</strong> the condition <strong>of</strong> countable additivity<br />
(that is, m {[<br />
¥ ¥<br />
n= 1 B n<br />
} =å n= 1 m (B n) whenever {B n<br />
} is a sequence <strong>of</strong> members<br />
<strong>of</strong> S which are pairwise disjoint subsets in W ). The points <strong>of</strong> W are called elementary<br />
events. The subsets <strong>of</strong> W belonging to S are referred to as events. The<br />
non-negative number m (B) is called the probability <strong>of</strong> the event B2 S .<br />
In most applications the sample space W contains an uncountable number<br />
<strong>of</strong> points. In this case, there exist non-empty Borel sets in S <strong>of</strong> measure zero,<br />
so that there is no strictly positive s -additive measure on S . But it is possible<br />
to eliminate the sets <strong>of</strong> measure zero by using the s -complete Boolean algebra<br />
/D , where D is the s -ideal <strong>of</strong> Borel sets <strong>of</strong> m -measure zero. With this,<br />
B = S<br />
every Kolmogorov probability space (W ,S ,m) generates probability algebra<br />
with the s -complete Boolean algebra B = S /D <strong>and</strong> the restriction <strong>of</strong> m to B<br />
is a strictly positive measure p. Conversely, every probability algebra (B,p)
<strong>Probability</strong> <strong>Theory</strong> 589<br />
can be realized by some Kolmogorov probability space (W ,S ,m) with<br />
B~S /D , where D is the s -ideal <strong>of</strong> Borel sets <strong>of</strong> m -measure zero.<br />
One usually <strong>for</strong>mulates the set-theoretical version <strong>of</strong> probability theory directly<br />
in terms <strong>of</strong> the conceptually less transparent triple (W ,S , m) , <strong>and</strong> not in<br />
terms <strong>of</strong> the probabilistically relevant Boolean algebra B = S /D . Since there<br />
exist non-empty Borel sets in S (i.e. events different from the impossible<br />
event) <strong>of</strong> measure zero, one has to use the “almost everywhere” terminology.<br />
A statement is said to be true “almost everywhere” or “<strong>for</strong> almost all v ” if it is<br />
true <strong>for</strong> all v 2 W except, may be, in a set N 2 S <strong>of</strong> measure zero, m (N) = 0 .<br />
If the sample space W<br />
contains an uncountable number <strong>of</strong> points, elementary<br />
events do not exist in the operationally relevant version in terms <strong>of</strong> the atomfree<br />
Boolean algebra S /D . Johann von Neumann has argued convincingly<br />
that the finest events which are empirically accessible are given by Borel sets<br />
<strong>of</strong> non-vanishing Lebesgue measure, <strong>and</strong> not by the much larger class <strong>of</strong> all<br />
subsets <strong>of</strong> W . [46]<br />
This setting is almost universally accepted, either explicitly or implicitly.<br />
However, some paradoxical situations do arise unless further restrictions are<br />
placed on the triple (W ,S , m) . The requirement that a probability measure has<br />
to be a perfect measure avoids many difficulties. [47] Furthermore, in all physical<br />
applications there are natural additional regularity conditions. In most examples<br />
the sample space W is polish (i.e. separable <strong>and</strong> metrisable), the s -algebra<br />
S is taken as the s -algebra <strong>of</strong> Borel sets [48] <strong>and</strong> m is a regular Radon<br />
measure. [49] Moreover, there are some practically important problems which<br />
require the use <strong>of</strong> unbounded measures, a feature which does not fit into Kolmogorov’s<br />
theory. A modification, based on conditional probability spaces<br />
(which contains Kolmogorov’s theory as a special case), has been developed<br />
by Alfréd Rényi. [50]<br />
R<strong>and</strong>om Variables in the Sense <strong>of</strong> Kolmogorov<br />
In probability theory observable quantities <strong>of</strong> a statistical experiment are<br />
called statistical observables. In Kolmogorov’s mathematical probability theory<br />
statistical observables are represented by S -measurable functions on the<br />
sample space W . The more precise <strong>for</strong>mulation goes as follows. The Borel s -<br />
algebra S R <strong>of</strong> subsets <strong>of</strong> the set R <strong>of</strong> real numbers is the s - algebra generated<br />
s S ® S<br />
x W ®<br />
by the open subsets <strong>of</strong> R . In Kolmogorov’s set-theoretical <strong>for</strong>mulation, a statistical<br />
observable is a -homomorphism j : R /D . In this <strong>for</strong>mulation,<br />
every observable can be induced by a real-valued Borel function x : R<br />
via the inverse map. [51]<br />
j (R) := x - 1 (R) := {v 2 W ½ x(v) 2 R} , R 2 S R .<br />
In mathematical probability theory a real-valued Borel function x defined<br />
on W is said to be a real-valued r<strong>and</strong>om variable. [52] Every statistical
W<br />
590 H. Primas<br />
observable is induced by a r<strong>and</strong>om variable, but an observable (that is, a s -homomorphism)<br />
defines only an equivalence class <strong>of</strong> r<strong>and</strong>om variables which induce<br />
this homomorphism. Two r<strong>and</strong>om variables x <strong>and</strong> y are said to be equivalent<br />
if they are equal m -almost everywhere, [53]<br />
x(v) ~ y(v) Û m {v 2 W ½ x(v) ¹= y(v)} = 0 .<br />
That is, <strong>for</strong> a statistical description it is not necessary to know the point function<br />
v ½® x(v), it is sufficient to know the observable x , or in other words, the<br />
equivalence class [x(w )] <strong>of</strong> the point functions, which induce the corresponding<br />
s -homomorphism,<br />
j Û [x(v)] := {y(v)½ y(v) ~ x(v)} .<br />
The description <strong>of</strong> a physical system in terms <strong>of</strong> an individual function<br />
v ½® f (v) distinguishes between different points v 2 W <strong>and</strong> corresponds to<br />
an individual description (maybe in terms <strong>of</strong> hidden variables). In contrast, a<br />
description in terms <strong>of</strong> equivalence classes <strong>of</strong> r<strong>and</strong>om variables does not distinguish<br />
between different points <strong>and</strong> corresponds to a statistical ensemble description.<br />
If v ½® x(v) is a r<strong>and</strong>om variable on W , <strong>and</strong> if v ½® x(v) is integrable over<br />
with respect to m , we say that the expectation <strong>of</strong> x with respect to m exists,<br />
<strong>and</strong> we write<br />
e (x) := * V x(v)m (dv) ,<br />
<strong>and</strong> call e (x) the expectation value <strong>of</strong> x. Every Borel-measurable complexvalued<br />
function v ½® f (v) <strong>of</strong> a r<strong>and</strong>om variable v ½® x(v) on (W ,S ,m) is<br />
also a complex-valued r<strong>and</strong>om variable on (W ,S ,m) . If the expectation <strong>of</strong> the<br />
r<strong>and</strong>om variable v ½® f {x(v)} exists, then<br />
A real-valued r<strong>and</strong>om variable v ½®<br />
e(f) = * V f {x(v)}m (dv) .<br />
x(v) on a probability space (W ,S , m) induces<br />
a probability measure m x : S R ® [0, 1] on the state space (S R,R) by<br />
m x (R) := m {x - 1 (R)} = m {v 2 W ½ x(v) 2 R}, R 2 S R,<br />
so that<br />
e(f) := * R f (x)m x (dx) .<br />
Stochastic Processes<br />
The success <strong>of</strong> Kolmogorov’s axiomatization is largely due to the fact that it
does not busy itself with chance. [54] <strong>Probability</strong> has become a branch <strong>of</strong> pure<br />
mathematics. Mathematical probability theory is supposed to provide a model<br />
<strong>for</strong> situations involving r<strong>and</strong>om phenomena, but we are never told what exactly<br />
“r<strong>and</strong>om” conceptually means besides the fact that r<strong>and</strong>om events cannot<br />
be predicted exactly. Even if we have only a rough idea <strong>of</strong> what we mean by<br />
“r<strong>and</strong>om,” it is plain that Kolmogorov’s axiomatization does not give sufficient<br />
conditions <strong>for</strong> characterizing r<strong>and</strong>om events. However, if we adopt the<br />
view proposed by Friedrich Waismann [55] <strong>and</strong> consider probability not as a<br />
property <strong>of</strong> a given sequence <strong>of</strong> events but as a property <strong>of</strong> the generating conditions<br />
<strong>of</strong> a sequence then we can relate r<strong>and</strong>omness with predictability <strong>and</strong><br />
retrodictability.<br />
A family {j(t)½ t 2 R} <strong>of</strong> statistical observables indexed by a time parameter<br />
t is called a stochastic process. In the framework <strong>of</strong> Kolmogorov’s<br />
probability theory a stochastic process is represented by a family<br />
{[x(t½ v)]t 2 R} <strong>of</strong> equivalence classes [x(t½ v)] <strong>of</strong> r<strong>and</strong>om variables [x(t½ v)]<br />
on a common probability space (W ,S , m) ,<br />
<strong>Probability</strong> <strong>Theory</strong> 591<br />
[x(t½ v)] := {y(t½ v)½ y(t½ v) ~ x(t½ v)} .<br />
Two individual point functions (t,v)½® x(t½ v) <strong>and</strong> (t,v)½® y(t½ v) on a<br />
common probability space (W ,S , m) are said to be statistically equivalent (in<br />
the narrow sense), if <strong>and</strong> only if<br />
m {v 2 W ½ x(t ½ v)} = 0 <strong>for</strong> all t 2 R .<br />
Some authors find it convenient to use the same symbol <strong>for</strong> functions <strong>and</strong><br />
equivalent classes <strong>of</strong> functions. We avoid this identification, since it muddles<br />
individual <strong>and</strong> statistical descriptions. A stochastic process is not an individual<br />
function but an indexed family <strong>of</strong> s -homomorphism j (t) : S R ® S /D which<br />
can be represented by an indexed family <strong>of</strong> equivalence classes <strong>of</strong> r<strong>and</strong>om<br />
variables. For fixed t 2 R the function v ½® x(t½ v) is a r<strong>and</strong>om variable. The<br />
point function t ½® [ x(t½ v)] obtained by fixing w is called a realization, or a<br />
sample path, or a trajectory <strong>of</strong> the stochastic process t ½® x(t½ v) . The description<br />
<strong>of</strong> a physical system in terms <strong>of</strong> an individual trajectory t ½® x(t½ v) (w<br />
fixed) <strong>of</strong> a stochastic process {[x(t½ v)]½ t 2 R} corresponds to a point dynamics,<br />
while a description in terms <strong>of</strong> equivalence classes <strong>of</strong> trajectories <strong>and</strong><br />
an associated probability measure corresponds to an ensemble dynamics.<br />
Kolmogorov’s characterization <strong>of</strong> stochastic processes as collections <strong>of</strong><br />
equivalence classes <strong>of</strong> r<strong>and</strong>om variables is much too general <strong>for</strong> science. Some<br />
additional regularity requirements like separability or continuity are necessary<br />
in order that the process has “nice trajectories” <strong>and</strong> does not disintegrate into<br />
an uncountable number <strong>of</strong> events. We will only discuss stochastic processes
592 H. Primas<br />
with some regularity properties, so that we can ignore the mathematical existence<br />
<strong>of</strong> inseparable versions.<br />
Furthermore, the traditional terminology is somewhat misleading since according<br />
to Kolmogorov’s definition precisely predictable processes also are<br />
stochastic processes. However, the theory <strong>of</strong> stochastic processes provides a<br />
conceptually sound <strong>and</strong> mathematically workable distinction between the socalled<br />
singular processes that allow a perfect prediction <strong>of</strong> any future value<br />
from a knowledge <strong>of</strong> the past values <strong>of</strong> the process, <strong>and</strong> the so-called regular<br />
processes <strong>for</strong> which long-term predictions are impossible. [56] For simplicity,<br />
we discuss here only the important special case <strong>of</strong> stationary processes.<br />
A stochastic process is called strictly stationary if all its joint distribution<br />
functions are invariant under time translation, so that they depend only on<br />
time differences. For many applications this is too strict a definition, <strong>of</strong>ten it is<br />
enough to require that the mean <strong>and</strong> the covariance are time-translation invariant.<br />
A stochastic process {[x(t½ v)]t 2 R} is said to be weakly stationary (or:<br />
stationary in the wide sense) if<br />
e {x(t½ v) 2 } < ¥ <strong>for</strong> every t 2 R ,<br />
e {x(t + ¿½ × )} =e{x(t½ × )} <strong>for</strong> all t,¿ 2 R ,<br />
e {x(t + ¿½ × )x(t¢ + ¿½ × )} = e {x(t½ × )x(t¢ ½ × )} <strong>for</strong> all t,t¢ ,¿ 2 R .<br />
Since the covariance function <strong>of</strong> a weakly stationary stochastic process is positive<br />
definite, Bochner’s theorem [57] implies Khintchin’s spectral decomposition<br />
<strong>of</strong> the covariance: [58] A complex-valued function R : R ® C which<br />
is continuous at the origin is the covariance function <strong>of</strong> a complex-valued second-order,<br />
weakly stationary <strong>and</strong> continuous (in the quadratic mean) stochastic<br />
process if <strong>and</strong> only if it can be represented in the <strong>for</strong>m<br />
R (t) =* ¥<br />
- ¥<br />
e i¸t d ˆR (¸) ,<br />
where R : R ® R is a real, never decreasing <strong>and</strong> bounded function, called the<br />
spectral distribution function <strong>of</strong> the stochastic process.<br />
Lebesgue’s decomposition theorem says that every distribution function<br />
R : R ® R can be decomposed uniquely according to<br />
ˆ<br />
R = c d ˆR d + c s ˆR s + c a c ˆR a c , c d 0, c s 0, c a c 0, c d + c s + c a c = 1,<br />
where ˆR d , ˆR s <strong>and</strong> ˆR ac are normalized spectral distribution functions. The<br />
function ˆR d is a step function. Both functions ˆR s <strong>and</strong> ˆR ac are continuous, ˆR s<br />
is singular <strong>and</strong> ˆR ac is absolutely continuous. The absolute continuous part has<br />
a derivative almost everywhere, <strong>and</strong> it is called the spectral density function<br />
¸½® d ˆR a c (¸) /d¸ . The Lebesgue decomposition <strong>of</strong> spectral distribution <strong>of</strong> a<br />
covariance function t ½® R (t) induces an additive decomposition <strong>of</strong> the covariance<br />
function into a discrete distribution function t ½® R d (t), a singular
distribution function t ½® R s (t) , <strong>and</strong> an absolutely continuous distribution<br />
function t ½® R a c (t). The discrete part ˆR d is almost periodic in the sense <strong>of</strong><br />
Harald Bohr, so that its asymptotic behavior is characterized by lim<br />
sup | t| ® ¥ ½ R d (t)½ = 1 . For the singular part the limit lim sup | t| ® ¥ ½ R s (t)½<br />
may be any number between 0 <strong>and</strong> 1. The Riemann–Lebesgue lemma implies<br />
that <strong>for</strong> the absolutely continuous part R ac , we have lim |t| ® ¥ ½ R a c (t)½ = 0 .<br />
A strictly stationary stochastic process {[x(t ½ v)]½ t 2 R} is called singular<br />
if a knowledge <strong>of</strong> its past allows an error-free prediction. A stochastic<br />
process is called regular if it is not singular <strong>and</strong> if the conditional expectation<br />
is the best <strong>for</strong>ecast. The remote past <strong>of</strong> a singular process contains already all<br />
in<strong>for</strong>mation necessary <strong>for</strong> the exact prediction <strong>of</strong> its future behavior, while a<br />
regular process contains no components that can be predicted exactly from an<br />
arbitrary long past record. The optimal prediction <strong>of</strong> a stochastic process is in<br />
general non-linear. [59] Up to now, there is no general workable algorithm <strong>for</strong><br />
non-linear prediction. [60] Most results refer to linear prediction <strong>of</strong> weakly stationary<br />
second-order processes. The famous Wold decomposition says that<br />
every weakly stationary stochastic process is the sum <strong>of</strong> a uniquely determined<br />
linearly singular <strong>and</strong> a uniquely determined linearly regular process. [61] A<br />
weakly stationary stochastic process {[x(t½ v)]½ t 2 R} is called linearly singular<br />
if the optimal linear predictor in terms <strong>of</strong> the past {[x(t½ v)]½ t < 0} allows<br />
an error-free prediction. If a weakly stationary stochastic process does not contain<br />
a linearly singular part, it is called linearly regular.<br />
There is an important analytic criterion <strong>for</strong> the dichotomy between linearly<br />
singular <strong>and</strong> linearly regular processes, the so-called Wiener–Krein criterion<br />
[62]: A weakly stationary stochastic process {[x(t½ v)]½ t 2 R} with mean<br />
value e {x(t½ × )} = 0 <strong>and</strong> the spectral distribution function ¸½® ˆR (¸) is linearly<br />
regular if <strong>and</strong> only if its spectral distribution function is absolutely continuous<br />
<strong>and</strong> if<br />
¥<br />
*<br />
- ¥<br />
<strong>Probability</strong> <strong>Theory</strong> 593<br />
ln { d ˆR (¸) /d¸}<br />
1 + ¸2 d¸ > - ¥ .<br />
Note that <strong>for</strong> a linearly regular process the spectral distribution function<br />
¸½® ˆR (¸) is necessarily absolutely continuous so that the covariance function<br />
t ½® R (t) vanishes <strong>for</strong> t ® ¥ . However, there are exactly predictable<br />
stochastic processes with an asymptotically vanishing covariance function, so<br />
that an asymptotically vanishing covariance function is not sufficient <strong>for</strong> a regular<br />
behavior.<br />
There is a close relationship between regular stochastic processes <strong>and</strong> the irreversibility<br />
<strong>of</strong> physical systems. [63] A characterization <strong>of</strong> genuine irreversibility<br />
<strong>of</strong> classical linear input–output system can be based on the entropyfree<br />
non-equilibrium thermodynamics with the notion <strong>of</strong> lost energy as central<br />
concept. [64] Such a system is called irreversible if the lost energy is strictly<br />
positive. According to a theorem by König <strong>and</strong> Tobergte [65] a linear
594 H. Primas<br />
input–output system behaves irreversible if <strong>and</strong> only if the associated distribution<br />
function fulfills the Wiener–Krein criterion <strong>for</strong> the spectral density <strong>of</strong> a<br />
linearly regular stochastic process.<br />
Birkh<strong>of</strong>f’s Individual Ergodic Theorem<br />
A stochastic process on the probability space (W ,S ,m) is called ergodic if<br />
its associated measure-preserving trans<strong>for</strong>mation ¿ t is ergodic <strong>for</strong> every t 0<br />
(that is, if every s -algebra <strong>of</strong> sets in S , invariant under the measure-preserving<br />
semi-flow associated with the process, is trivial). According to a theorem by<br />
Wiener <strong>and</strong> Akutowicz [66] a strictly stationary stochastic process with an absolutely<br />
continuous spectral distribution function is weakly mixing, <strong>and</strong> hence<br />
ergodic. There<strong>for</strong>e, every regular process is ergodic so that the so-called ergodic<br />
theorems apply. Ergodic theorems provide conditions <strong>for</strong> the equality <strong>of</strong><br />
time averages <strong>and</strong> ensemble averages. Of crucial importance <strong>for</strong> the interpretation<br />
<strong>of</strong> probability theory is the individual (or pointwise) ergodic theorem by<br />
George David Birkh<strong>of</strong>f. [67] The discrete version <strong>of</strong> the pointwise ergodic theorem<br />
is a generalization <strong>of</strong> the strong law <strong>of</strong> large numbers. In terms <strong>of</strong> harmonic<br />
analysis <strong>of</strong> stationary stochastic processes, this theorem can be <strong>for</strong>mulated<br />
as follows. [68] Consider a strictly stationary zero-mean stochastic<br />
process {[x(t½ v)]t 2 R} over the probability space (W ,S , m) , <strong>and</strong> let<br />
v ½® x(t½ v) be quadratically integrable with respect to the measure m . Then<br />
<strong>for</strong> m -almost all w in W , every trajectory t ½® x(t½ v) the individual auto-correlation<br />
function t ½® C ( t½ v) ,<br />
C (t ½<br />
v) := lim<br />
T ® ¥<br />
1<br />
2T<br />
*<br />
+ T<br />
- T<br />
x(¿½ v) * x(t + ¿½ v)d¿, t 2 R, v xed,<br />
exists <strong>and</strong> is continuous on R . Moreover, the auto-correlation function<br />
t ½® C ( t½ v) equals <strong>for</strong> m -almost all w 2 W the covariance function t ½® R (t) ,<br />
C (t½ v) = R (t) <strong>for</strong> m - almost all v 2 W ,<br />
R (t) :=*<br />
V<br />
x(t½ v) x(0½ v) m (dv) .<br />
The importance <strong>of</strong> this relation lies in the fact that in most applications we see<br />
only a single individual trajectory, that is, a particular realization <strong>of</strong> the stochastic<br />
process. Since Kolmogorov’s theory <strong>of</strong> stochastic processes refer to<br />
equivalence classes <strong>of</strong> functions Birkh<strong>of</strong>f’s individual ergodic theorem provides<br />
a crucial link between the ensemble description <strong>and</strong> the individual description<br />
<strong>of</strong> chaotic phenomena. In the next chapter we will sketch two different<br />
direct approaches <strong>for</strong> the description <strong>of</strong> chaotic phenomena which<br />
avoidthe use <strong>of</strong> ensembles.
Individual Descriptions <strong>of</strong> Chaotic Processes<br />
Deterministic Chaotic Processes in the Sense <strong>of</strong> Wiener<br />
More than a decade be<strong>for</strong>e Kolmogorov’s axiomatization <strong>of</strong> mathematical<br />
probability theory, Norbert Wiener invented a possibly deeper paradigm <strong>for</strong><br />
chaotic phenomena: his mathematically rigorous analytic construction <strong>of</strong> an<br />
individual trajectory <strong>of</strong> Einstein’s idealized Brownian motion [69] nowadays<br />
called a Wiener process. [70] In Wiener’s mathematical model chaotic changes<br />
in the direction <strong>of</strong> the Brownian path take place constantly. All trajectories <strong>of</strong> a<br />
Wiener process are almost certainly continuous but nowhere differentiable,<br />
just as conjectured by Jean Baptiste Perrin <strong>for</strong> the Brownian motion. [71]<br />
Wiener’s constructions <strong>and</strong> pro<strong>of</strong> are much closer to physics than Kolmogorov’s<br />
abstract model, but also very intricate so that <strong>for</strong> a long time Kolmogorov’s<br />
approach has been favored. Nowadays, Wiener’s result can be derived<br />
in a much simpler way. The generalized derivative <strong>of</strong> the Wiener process<br />
is called “white noise” since according to the Einstein–Wiener theorem its<br />
spectral measure equals the Lebesgue measure d(¸) /2p. It turned out that<br />
white noise is the paradigm <strong>for</strong> an unpredictable regular process; it serves to<br />
construct other more complicated stochastic structures.<br />
Wiener’s characterization <strong>of</strong> individual chaotic processes is founded on his<br />
basic paper “Generalized harmonic analysis”. [72] The purpose <strong>of</strong> Wiener’s<br />
generalized harmonic analysis is to give an account <strong>of</strong> phenomena which can<br />
neither be described by Fourier analysis nor by almost periodic functions. Instead<br />
<strong>of</strong> equivalence class <strong>of</strong> Lebesgue square summable functions, Wiener<br />
focused his harmonic analysis on individual Borel measurable functions<br />
t ½® x(t) <strong>for</strong> which the individual auto-correlation function<br />
C (t) := lim<br />
T ® ¥<br />
<strong>Probability</strong> <strong>Theory</strong> 595<br />
1<br />
2T<br />
*<br />
+ T<br />
- T<br />
x(t¢ ) x(t + t¢ ) dt¢ , t 2 R,<br />
exists <strong>and</strong> is continuous <strong>for</strong> all t. Wiener’s generalized harmonic analysis <strong>of</strong> an<br />
individual trajectory t ½® x(t) is in an essential way based on the spectral representation<br />
<strong>of</strong> the auto-correlation function. The Bochner–Cramér representation<br />
theorem implies that there exists a non-decreasing bounded function<br />
¸½® Ĉ (¸) , called the spectral distribution function <strong>of</strong> the individual function<br />
t ½® x(t),<br />
C (t) =* ¥<br />
- ¥<br />
e i¸t dĈ (¸)<br />
This relation is usually known under the name individual Wiener–Khintchin<br />
theorem. [73] However, this name is misleading. Khintchin’s theorem [74] relates<br />
the covariance function <strong>and</strong> the spectral function in terms <strong>of</strong> ensemble
¥<br />
¥<br />
596 H. Primas<br />
averages. In contrast, Wiener’s theorem [75] refers to individual functions.<br />
This result was already known to Albert Einstein long be<strong>for</strong>e. [76] The terminology<br />
“Wiener–Khintchin theorem” caused many confusions [77] <strong>and</strong> should<br />
there<strong>for</strong>e be avoided. Here, we refer to the individual theorem as the Einstein –<br />
Wiener theorem. For many applications it is crucial to distinguish between the<br />
Einstein–Wiener theorem which refer to individual functions, <strong>and</strong> the statistical<br />
Khintchin theorem which refers to equivalence classes <strong>of</strong> functions as used<br />
in Kolmogorov’s probability theory. The Einstein–Wiener theorem is in no<br />
way probabilistic. It refers to well-defined single functions rather than to an<br />
ensemble <strong>of</strong> functions.<br />
If an individual function t ½® x(t) has a pure point spectrum, it is almost periodic<br />
in the sense <strong>of</strong> Besicovitch,x(t)~å ˆx j = 1 j exp (i¸j t) . In a physical<br />
context an almost-periodic time function R : R ® C may be considered as<br />
predictable since its future{x(t)½ t > 0} is completely determined by its past<br />
{x(t)½ t £ 0}. If an individual function has an absolutely continuous spectral<br />
distribution, then the auto-correlation function vanishes in the limit as t ® ¥ .<br />
The auto-correlation function t ½® C ( t) provides a measure <strong>of</strong> the memory: if<br />
the individual function t ½® x(t) has a particular value at one moment, its<br />
auto-correlation tells us the degree to which we can guess that it will have<br />
about the same value some time later. In 1932, Koopman <strong>and</strong> von Neumann<br />
conjectured that an absolutely continuous spectral distribution function is the<br />
crucial property <strong>for</strong> the epistemically chaotic behavior <strong>of</strong> an ontic deterministic<br />
dynamical system. [78] In the modern terminology, Koopman <strong>and</strong> von Neumann<br />
refer to the so-called “mixing property.” However, a rapid decay <strong>of</strong> correlations<br />
is not sufficient as a criterion <strong>for</strong> the absence <strong>of</strong> any regularity.<br />
Genuine chaotic behavior requires stronger instability properties than just<br />
mixing. If we know the past {x(t)½ t £ 0} <strong>of</strong> an individual function t ½® x(t),<br />
then the future {x(t)½ t > 0} is completely determined if <strong>and</strong> only if the following<br />
Szegö condition <strong>for</strong> perfect linear predictability is fulfilled, [79]<br />
¥<br />
*<br />
- ¥<br />
ln { d Ĉ a c (¸) /d¸}<br />
1 + ¸2 d¸ = - ¥ ,<br />
where Ĉ ac is the absolutely continuous part <strong>of</strong> the spectral distribution function<br />
<strong>of</strong> the auto-correlation function <strong>of</strong> the individual function t ½® x(t).<br />
Every individual function t ½® x(t) with an absolutely continuous spectral<br />
distribution Ĉ fulfilling the Paley–Wiener criterion<br />
*<br />
- ¥<br />
½ ln dĈ (¸) /d¸½<br />
1 + ¸2 d¸ < ¥ ,<br />
will be called a chaotic function in the sense <strong>of</strong> Wiener.<br />
Wiener’s work initiated the mathematical theory <strong>of</strong> stochastic processes <strong>and</strong>
functional integration. It was a precursor <strong>of</strong> the general probability measures<br />
as defined by Kolmogorov. However, it would be mistaken to believe that the<br />
theory <strong>of</strong> stochastic processes in the sense <strong>of</strong> Kolmogorov has superseded<br />
Wiener’s ideas. Wiener’s approach has been criticized as unnecessarily cumbersome<br />
[80] since it was based on individual functions t ½® x(t), <strong>and</strong> not on<br />
Kolmogorov’s more ef<strong>for</strong>tless definition <strong>of</strong> measure-theoretical stochastic<br />
processes (that is, equivalence classes t ½® [ x(t½ v)]). It has to be emphasized<br />
that <strong>for</strong> many practical problems only Wiener’s approach is conceptually<br />
sound. For example, <strong>for</strong> weather prediction or anti-aircraft fire control there is<br />
no ensemble <strong>of</strong> trajectories but just a single individual trajectory from whose<br />
past behavior one would like to predict something about its future behavior.<br />
The basic link between Wiener’s individual <strong>and</strong> Kolmogorov’s statistical<br />
approach is Birkh<strong>of</strong>f’s individual ergodic theorem. Birkh<strong>of</strong>f’s theorem implies<br />
that m -almost every trajectory <strong>of</strong> an ergodic stochastic process on a Kolmogorov<br />
probability space (W ,S ,m) spends an amount <strong>of</strong> time in the measurable<br />
set B2 S which is proportional to m (B). For m -almost all points w 2 W ,<br />
the trajectory t ½® x(t½ v) (with a precisely fixed w 2 W ) <strong>of</strong> an ergodic regular<br />
stochastic process t ½® [ x(t½ v)] is an individual chaotic function in the sense<br />
<strong>of</strong> Wiener. This result implies that one can switch from an ensemble description<br />
in terms <strong>of</strong> a Kolmogorov probability space (W ,S , m) to an individual<br />
chaotic deterministic description in the sense <strong>of</strong> Wiener, <strong>and</strong> vice versa. Moreover,<br />
Birkh<strong>of</strong>f’s individual ergodic theorem implies the equality<br />
lim<br />
T ® ¥<br />
1<br />
T<br />
*<br />
0<br />
- T<br />
x(¿½ v) x(t + ¿½ v) d¿ = lim<br />
T ® ¥<br />
1<br />
2T<br />
*<br />
+ T<br />
- T<br />
x(¿½ v) x(t + ¿½ v) d¿<br />
so that <strong>for</strong> ergodic processes the auto-correlation function can be evaluated in<br />
principle from observations <strong>of</strong> the past {x(t½ v)½ t £ 0} <strong>of</strong> a single trajectory<br />
t ½® x(t½ v) , a result <strong>of</strong> crucial importance <strong>for</strong> the prediction theory <strong>of</strong> individual<br />
chaotic processes.<br />
Algorithmic Characterization <strong>of</strong> R<strong>and</strong>omness<br />
<strong>Probability</strong> <strong>Theory</strong> 597<br />
The roots <strong>of</strong> an algorithmic definition <strong>of</strong> a r<strong>and</strong>om sequence can be traced to<br />
the pioneering work by Richard von Mises who proposed in 1919 his principle<br />
<strong>of</strong> the excluded gambling system. [81] The use <strong>of</strong> a precise concept <strong>of</strong> an algorithm<br />
has made it possible to overcome the inadequacies <strong>of</strong> the von Mises’ <strong>for</strong>mulations.<br />
von Mises wanted to exclude “all” gambling systems but he did not<br />
properly specify what he meant by “all.” Alonzo Church pointed out that a<br />
gambling system which is not effectively calculable is <strong>of</strong> no practical use. [82]<br />
Accordingly, a gambling system has to be represented mathematically not by<br />
an arbitrary function but as an effective algorithm <strong>for</strong> the calculation <strong>of</strong> the<br />
values <strong>of</strong> a function. In accordance with von Mises’ intuitive ideas <strong>and</strong><br />
Church’s refinement a sequence is called r<strong>and</strong>om if no player who calculates
598 H. Primas<br />
his pool by effective methods can raise his <strong>for</strong>tune indefinitely when playing<br />
on this sequence.<br />
An adequate <strong>for</strong>malization <strong>of</strong> the notion <strong>of</strong> effective computable function<br />
was given in 1936 by Emil Leon Post <strong>and</strong> independently by Alan Mathison<br />
Turing by introducing the concept <strong>of</strong> an ideal computer nowadays called Turing<br />
machine. [83] A Turing machine is essentially a computer having an infinitely<br />
exp<strong>and</strong>able memory; it is an abstract prototype <strong>of</strong> a universal digital<br />
computer <strong>and</strong> can be taken as a precise definition <strong>of</strong> the concept <strong>of</strong> an algorithm.<br />
The so-called Church–Turing thesis states that every functions computable<br />
in any intuitive sense can be computed by a Turing machine. [84] No<br />
example <strong>of</strong> a function intuitively considered as computable but not Turingcomputable<br />
is known. According to the Church–Turing thesis a Turing machine<br />
represents the limit <strong>of</strong> computational power.<br />
The idea, that the computational complexity <strong>of</strong> a mathematical object reflects<br />
the difficulty <strong>of</strong> its computation, allows to give a simple, intuitively appealing<br />
<strong>and</strong> mathematically rigorous definition <strong>of</strong> the notion <strong>of</strong> r<strong>and</strong>omness <strong>of</strong><br />
sequence. Unlike most mathematicians, Kolmogorov himself has never <strong>for</strong>gotten<br />
that the conceptual foundation <strong>of</strong> probability theory is wanting. He was<br />
not completely satisfied with his measure-theoretical <strong>for</strong>mulation. Particularly,<br />
the exact relation between the probability measures m in the basic probability<br />
space (W ,S , m) <strong>and</strong> real statistical experiments remained open. Kolmogorov<br />
emphasized that<br />
the application <strong>of</strong> probability theory ...is always a matter <strong>of</strong> consequences <strong>of</strong> hypotheses<br />
about the impossibility <strong>of</strong> reducing in one way or another the complexity <strong>of</strong> the description<br />
<strong>of</strong> the objects in question. [85]<br />
In 1963, Kolmogorov again took up the concept <strong>of</strong> r<strong>and</strong>omness. He retracted<br />
his earlier view that “the frequency concept ...does not admit a rigorous <strong>for</strong>mal<br />
exposition within the framework <strong>of</strong> pure mathematics,” <strong>and</strong> stated that he<br />
came “to realize that the concept <strong>of</strong> r<strong>and</strong>om distribution <strong>of</strong> a property in a large<br />
finite population can have a strict <strong>for</strong>mal mathematical exposition.” [86] He<br />
proposed a measure <strong>of</strong> complexity based on the “size <strong>of</strong> a program” which,<br />
when processed by a suitable universal computing machine, yields the desired<br />
object. [87] In 1968, Kolmogorov sketched how in<strong>for</strong>mation theory can be<br />
founded without recourse to probability theory <strong>and</strong> in such a way that the concepts<br />
<strong>of</strong> entropy <strong>and</strong> mutual in<strong>for</strong>mation are applicable to individual events<br />
(rather than to equivalence classes <strong>of</strong> r<strong>and</strong>om variables or ensembles). In this<br />
approach the “quantity <strong>of</strong> in<strong>for</strong>mation” is defined in terms <strong>of</strong> storing <strong>and</strong> processing<br />
signals. It is sufficient to consider binary strings, that is, strings <strong>of</strong> bits,<br />
<strong>of</strong> zeros <strong>and</strong> ones.<br />
The concept <strong>of</strong> algorithmic complexity allows to rephrase the old idea that<br />
“r<strong>and</strong>omness consists in a lack <strong>of</strong> regularity” in a mathematically acceptable<br />
way. Moreover, a complexity measure <strong>and</strong> hence algorithmic probability
<strong>Probability</strong> <strong>Theory</strong> 599<br />
refers to an individual object. Loosely speaking the complexity K(x) <strong>of</strong> a binary<br />
string x is the size in bits <strong>of</strong> the shortest program <strong>for</strong> calculating it. If the<br />
complexity <strong>of</strong> x is not smaller than its length l(x) then there is no simpler way<br />
to write a program <strong>for</strong> x than to write it out. In this case the string x shows no<br />
periodicity <strong>and</strong> no pattern. Kolmogorov <strong>and</strong> independently Solomon<strong>of</strong>f <strong>and</strong><br />
Chaitin suggested that patternless finite sequences should be considered as<br />
r<strong>and</strong>om sequences. [88] That is, complexity is a measure <strong>of</strong> irregularity in the<br />
sense that maximal complexity means r<strong>and</strong>omness. There<strong>for</strong>e, it seems natural<br />
to call a binary string r<strong>and</strong>om if the shortest program <strong>for</strong> generating it is as long<br />
as the string itself. Since K(x) is not computable, it is not decidable whether a<br />
string is r<strong>and</strong>om.<br />
This definition <strong>of</strong> r<strong>and</strong>om sequences turned out not to be quite satisfactory.<br />
Using ideas <strong>of</strong> Kolmogorov, Per Martin-Löf succeeded in giving an adequate<br />
precise definition <strong>of</strong> r<strong>and</strong>om sequences. [89] Particularly, Martin-Löf proposed<br />
to define r<strong>and</strong>om sequences as those which withst<strong>and</strong> certain universal<br />
tests <strong>of</strong> r<strong>and</strong>omness, defined as recursive sequential tests. Martin-Löf’s r<strong>and</strong>om<br />
sequences fulfill all stochastic laws as the laws <strong>of</strong> large numbers, <strong>and</strong> the<br />
law <strong>of</strong> the iterated logarithm. A weakness <strong>of</strong> this definition is that Martin-Löf<br />
requires also stochastic properties that cannot be considered as physically<br />
meaningful in the sense that they cannot be tested by computable functions.<br />
A slightly different but more powerful variant is due to Claus-Peter Schnorr.<br />
[90] He argues that a c<strong>and</strong>idate <strong>for</strong> r<strong>and</strong>omness must be rejected if there is an<br />
effective procedure to do so. A sequence such that no effective process can<br />
show its non-r<strong>and</strong>omness must be considered as operationally r<strong>and</strong>om. He<br />
considers the null sets <strong>of</strong> Martin-Löf’s sequential tests in the sense <strong>of</strong> Brower<br />
(i.e. null sets that are effectively computable) <strong>and</strong> defines a sequence to be r<strong>and</strong>om<br />
if it is not contained in any such null set. Schnorr requires the stochasticity<br />
tests to be computable instead <strong>of</strong> being merely constructive. While the Kolmogorov–Martin-Löf<br />
approach is non-constructive, the tests considered by<br />
Schnorr are constructive to such an extent that it is possible to approximate infinite<br />
r<strong>and</strong>om sequences to an arbitrary degree <strong>of</strong> accuracy by computable sequences<br />
<strong>of</strong> high complexity (pseudo-r<strong>and</strong>om sequences). By that, the approximation<br />
will be the better, the greater the ef<strong>for</strong>t required to reject the<br />
pseudo-r<strong>and</strong>om sequence as being truly r<strong>and</strong>om. The fact that the behavior <strong>of</strong><br />
Schnorr’s r<strong>and</strong>om sequences can be approximated by constructive methods is<br />
<strong>of</strong> outst<strong>and</strong>ing conceptual <strong>and</strong> practical importance. R<strong>and</strong>om sequences in the<br />
sense <strong>of</strong> Martin-Löf do not have this approximation property, but non-approximate<br />
r<strong>and</strong>om sequences exist only by virtue <strong>of</strong> the axiom <strong>of</strong> choice.<br />
A useful characterization <strong>of</strong> r<strong>and</strong>om sequences can be given in terms <strong>of</strong><br />
games <strong>of</strong> chance. According to Mises’ intuitive ideas <strong>and</strong> Church’s refinement<br />
a sequence is called r<strong>and</strong>om if <strong>and</strong> only if no player who calculates his pool by<br />
effective methods can raise his <strong>for</strong>tune indefinitely when playing on this sequence.<br />
For simplicity, we restrict our discussion to, the practically important<br />
case <strong>of</strong> r<strong>and</strong>om sequences <strong>of</strong> the exponential type. A gambling rule implies a
600 H. Primas<br />
capital function C from the set I <strong>of</strong> all finite sequences to the set R <strong>of</strong> all real<br />
numbers. In order that a gambler actually can use a rule, it is crucial that this<br />
rule is given algorithmically. That is, the capital function C cannot be any<br />
function I® R , but has to be a computable function. [91] If we assume that<br />
the gambler’s pool is finite, <strong>and</strong> that debts are allowed, we get the following<br />
simple but rigorous characterization <strong>of</strong> a r<strong>and</strong>om sequence:<br />
A sequence {x 1 , x 2 , x 3 , ...} is a r<strong>and</strong>om sequence (<strong>of</strong> the exponential type) if <strong>and</strong> only if<br />
every computable capital function C :I ® R <strong>of</strong> bounded difference fulfills the relation<br />
lim n ® ¥ n - 1 C {x1 ,...,x n }= 0.<br />
According to Schnorr a universal test <strong>for</strong> r<strong>and</strong>omness cannot exist. A sequence<br />
fails to be r<strong>and</strong>om if <strong>and</strong> only if there is an effective process in which<br />
this failure becomes evident. There<strong>for</strong>e, one can refer to r<strong>and</strong>omness only with<br />
respect to a well-specified particular test.<br />
The algorithmic concept <strong>of</strong> r<strong>and</strong>om sequences can be used to derive a model<br />
<strong>for</strong> Kolmogorov’s axioms (in their constructive version) <strong>of</strong> mathematical<br />
probability theory. [92] It turns out that the measurable sets <strong>for</strong>m a s -algebra<br />
(in the sense <strong>of</strong> constructive set theory). This result shows the amazing insight<br />
Kolmogorov had in creating his axiomatic system.<br />
Laws <strong>of</strong> Chance <strong>and</strong> Determinism<br />
Why are There “Laws <strong>of</strong> Chance”?<br />
It would be a logical mistake to assume that arbitrary chance events can be<br />
grasped by the statistical methods <strong>of</strong> mathematical probability theory. <strong>Probability</strong><br />
theory has a rich mathematical structure so we have to ask under what<br />
conditions the usual “laws <strong>of</strong> chance” are valid. The modern concept <strong>of</strong> subjective<br />
probabilities presupposes a coherent rational behavior based on<br />
Boolean logic. That is, it is postulated that a rational man acts as if he had a deterministic<br />
model compatible with his pre-knowledge. Since also in many<br />
physical examples the appropriateness <strong>of</strong> the laws <strong>of</strong> probability can be traced<br />
back to an underlying deterministic ontic description, it is tempting to presume<br />
that chance events which satisfy the axioms <strong>of</strong> classical mathematical probability<br />
theory result always from the deterministic behavior <strong>of</strong> an underlying<br />
physical system. Such a claim cannot be demonstrated.<br />
What can be proven is the weaker statement that every probabilistic system<br />
which fulfills the axioms <strong>of</strong> classical mathematical probability theory can be<br />
embedded into a larger deterministic system. A classical system is said to be<br />
deterministic if there exists a complete set <strong>of</strong> dispersion-free states such that<br />
Hadamard’s principle <strong>of</strong> scientific determinism is fulfilled. Here, a state is said<br />
to be dispersion-free if every observable has a definite dispersion-free value<br />
with respect to this state. For such a deterministic system statistical states are<br />
given by mean values <strong>of</strong> dispersion-free states. A probabilistic system is said to
<strong>Probability</strong> <strong>Theory</strong> 601<br />
allow hidden variables if it is possible to find a hypothetical larger system such<br />
that every statistical state <strong>of</strong> the probabilistic system is a mean value <strong>of</strong> dispersion-free<br />
states <strong>of</strong> the enlarged system. Since the logic <strong>of</strong> classical probability<br />
theory is a Boolean s -algebra we can use the well-known result that a classical<br />
dynamical system is deterministic if <strong>and</strong> only if the underlying Boolean algebra<br />
is atomic. [93] As proved by Franz Kamber, every classical system characterized<br />
by a Boolean algebra allows the introduction <strong>of</strong> hidden variables such<br />
that every statistical state is a mean value <strong>of</strong> dispersion-free states. [94] This<br />
theorem implies that r<strong>and</strong>om events fulfill the laws <strong>of</strong> chance if <strong>and</strong> only if they<br />
can <strong>for</strong>mally be reduced to hidden deterministic events. Such a deterministic<br />
embedding is never unique but <strong>of</strong>ten there is a unique minimal dilation <strong>of</strong> a<br />
probabilistic dynamical system to a deterministic one. [95] Note that the deterministic<br />
embedding is usually not constructive <strong>and</strong> that nothing is claimed<br />
about a possible ontic interpretation <strong>of</strong> hidden variables <strong>of</strong> the enlarged deterministic<br />
system.<br />
Kolmogorov’s probability theory can be viewed as a hidden variable representation<br />
<strong>of</strong> the basic abstract point-free theory. Consider the usual case where<br />
the Boolean algebra B <strong>of</strong> mathematical probability theory contains no atoms.<br />
Every classical probability system (B, p) can be represented in terms <strong>of</strong> some<br />
(not uniquely given) Kolmogorov space (W ,S ,m) as a s -complete Boolean algebra<br />
B=S /D , where D is the s -ideal <strong>of</strong> Borel sets <strong>of</strong> m -measure zero. The<br />
points w 2 W <strong>of</strong> the set W correspond to two-valued individual states (the socalled<br />
atomic or pure states) <strong>of</strong> the fictitious embedding atomic Boolean algebra<br />
P(W ) <strong>of</strong> all subsets <strong>of</strong> the point set W . If (as usual) the set W is not countable,<br />
the atomic states are epistemically inaccessible. Measure-theoretically,<br />
an atomic state corresponding to a point w 2 W is represented by the Dirac measure<br />
± w at the point w 2 W , defined <strong>for</strong> every subset B <strong>of</strong> W by ± w (B)= 1 if<br />
w 2 B <strong>and</strong> ± w (B)= 0 if v 2 / B. Every epistemically accessible state can be described<br />
by a probability density f2 L 1 (W ,S ,m) ) which can be represented as an<br />
average <strong>of</strong> epistemically inaccessible atomic states,<br />
f (v) =*<br />
V<br />
f (v¢ ) ± ! (dv¢ ).<br />
The set-theoretical representation <strong>of</strong> the basic Boolean algebra B in terms <strong>of</strong> a<br />
Kolmogorov probability space (W ,S ,m) is mathematically convenient since it<br />
allows to relate an epistemic dynamics t½® f t in terms <strong>of</strong> a probability density<br />
f t<br />
2 L 1 (W ,S , m) to a fictitious deterministic dynamics <strong>for</strong> the points t½® w 2 W t<br />
by f t<br />
(w ) = f(w - t). [96] It is also physically interesting since all known contextindependent<br />
physical laws are deterministic <strong>and</strong> <strong>for</strong>mulated in terms <strong>of</strong> pure<br />
states. In contrast, every statistical dynamical law depends on some phenomenological<br />
constants (like the half-time constants [97] in the exponential decay<br />
law <strong>for</strong> the spontaneous decay <strong>of</strong> a radioactive nucleus). That is, we can <strong>for</strong>mulate<br />
context-independent laws only if we introduce atomic states.
602 H. Primas<br />
Quantum Mechanics Does Not Imply an Ontological Indeterminism<br />
Although it is in general impossible to predict an individual quantum event,<br />
in an ontic description the most fundamental law-statements <strong>of</strong> quantum theory<br />
are deterministic. Yet, probability is an essential element in every epistemic<br />
description <strong>of</strong> quantum events, but does not indicate an incompleteness <strong>of</strong> our<br />
knowledge. The context-independent laws <strong>of</strong> quantum mechanics (which necessarily<br />
have to be <strong>for</strong>mulated in an ontic interpretation) are strictly deterministic<br />
but refer to a non-Boolean logical structure <strong>of</strong> reality. On the other h<strong>and</strong>,<br />
every experiment ever per<strong>for</strong>med in physics, chemistry <strong>and</strong> biology has a<br />
Boolean operational description. The reason <strong>for</strong> this situation is en<strong>for</strong>ced by<br />
the necessity to communicate about facts in an unequivocal language.<br />
The epistemically irreducible probabilistic structure <strong>of</strong> quantum theory is<br />
induced by the interaction <strong>of</strong> the quantum object system with an external classical<br />
observing system. Quantum mechanical probabilities do not refer to the<br />
object system but to the state transition induced by the interaction <strong>of</strong> the object<br />
system with the measuring apparatus. The non-predictable outcome <strong>of</strong> a<br />
quantum experiment is related to the projection <strong>of</strong> the atomic non-Boolean lattice<br />
<strong>of</strong> the ontic description <strong>of</strong> the deterministic reality to the atom-free<br />
Boolean algebra <strong>of</strong> the epistemic description <strong>of</strong> a particular experiment. The<br />
restriction <strong>of</strong> an ontic atomic state (which gives a complete description <strong>of</strong> the<br />
non-Boolean reality) to a Boolean context is no longer atomic but is given by a<br />
probability measure. The measure generated in this way is a conditional probability<br />
which refers to the state transition induced by the interaction. Such<br />
quantum-theoretical probabilities cannot be attributed to the object system<br />
alone; they are conditional probabilities where the condition is given by experimental<br />
arrangement. The epistemic probabilities depend on the experimental<br />
arrangement but, <strong>for</strong> a fixed context, they are objective since the underlying<br />
ontic structure is deterministic. Since a quantum-theoretical probability refers<br />
to a singled out classical experimental context, it corresponds exactly to the<br />
mathematical probabilities <strong>of</strong> Kolmogorov’s set-theoretical probability theory.<br />
[98] There<strong>for</strong>e, a non-Boolean generalization <strong>of</strong> probability theory is not<br />
necessary since all these measures refer to a Boolean context. The various theorems<br />
which show that it is impossible in quantum theory to introduce hidden<br />
variables only say that it is impossible to embed quantum theory into a deterministic<br />
Boolean theory. [99]<br />
Chance Events <strong>for</strong> Which the Traditional “Laws <strong>of</strong> Chance” Do Not Apply<br />
Conceptually, quantum theory does not require a generalization <strong>of</strong> the traditional<br />
Boolean probability theory. Nevertheless, mathematicians created a<br />
non-Boolean probability theory by introducing a measure on the orthomodular<br />
lattice <strong>of</strong> projection operators on the Hilbert space <strong>of</strong> quantum theory.<br />
[100] The various variants <strong>of</strong> a non-Boolean probability theory are <strong>of</strong> no conceptual<br />
importance <strong>for</strong> quantum theory, but they show that genuine <strong>and</strong> inter-
<strong>Probability</strong> <strong>Theory</strong> 603<br />
esting generalizations <strong>of</strong> traditional probability theory are possible. [101] At<br />
present there are few applications. If we find empirical chance phenomena<br />
with a non-classical statistical behavior, the relevance <strong>of</strong> a non-Boolean theory<br />
should be considered. Worth mentioning are the non-Boolean pattern recognition<br />
methods [102], the attempt to develop a non-Boolean in<strong>for</strong>mation theory<br />
[103], <strong>and</strong> speculations on the mind-body relation in terms <strong>of</strong> non-Boolean<br />
logic. [104]<br />
From a logical point <strong>of</strong> view the existence <strong>of</strong> irreproducible unique events<br />
cannot be excluded. For example, if we deny a strict determinism on the ontological<br />
level <strong>of</strong> a Boolean or non-Boolean reality, then there are no reasons to<br />
expect that every chance event is governed by statistical laws <strong>of</strong> any kind.<br />
Wolfgang Pauli made the inspiring proposal to characterize unique events by<br />
the absence <strong>of</strong> any type <strong>of</strong> statistical regularity:<br />
Die von [Jung] betrachteten Synchronizitätsphänomene ...entziehen sich der Einfangung<br />
in Natur-‘Gesetze’, da sie nicht reproduzierbar, d.h. einmalig sind und durch<br />
die Statistik grosser Zahlen verwischt werden. In der Physik dagegen sind die<br />
‘Akausalitäten’ gerade durch statistische Gesetze (grosse Zahlen) erfassbar. [105]<br />
English translation: The synchronicity phenomena considered by [Jung] ... elude capture<br />
as "laws" <strong>of</strong> nature, since they are not reproducible, that is to say, they are unique<br />
<strong>and</strong> obliterated by the statistics <strong>of</strong> large numbers. In physics, on the other h<strong>and</strong>,<br />
’acausalities’ just become ascertainable by the law <strong>of</strong> large numbers.<br />
Acknowledgment<br />
I would like to thank Harald Atmanspacher <strong>and</strong> Werner Ehm <strong>for</strong> clarifying<br />
discussions <strong>and</strong> a careful reading <strong>of</strong> a draft <strong>of</strong> this paper.<br />
Endnotes<br />
[1] Laplace’s famous reply to Napoleon’s remark that he did not mention God in his Exposition<br />
du Système du Monde.<br />
[2] Laplace (1814). Translation taken from the Dover edition, p.4.<br />
[3] Gibbs (1902). A lucid review <strong>of</strong> Gibbs’ statistical conception <strong>of</strong> physics can be found in<br />
Haas (1936), volume II, chapter R.<br />
[4] This distinction is due to Scheibe (1964), Scheibe (1973), pp.50–51.<br />
[5] Compare Hille <strong>and</strong> Phillips (1957), p.618.<br />
[6] In a slightly weaker <strong>for</strong>m, this concept has been introduced by Edmund Whittaker (1943).<br />
[7] Compare Cournot (1843), §40; Venn (1866).<br />
[8] Galton’s desk (after Francis Galton, 1822–1911) is an inclined plane provided with regularly<br />
arranged nails in n horizontal lines. A ball launched on the top will be diverted at every<br />
line either to left or to right. Under the last line <strong>of</strong> nails there are n +1 boxes (numbered from<br />
the left from k=0 to k=n) in which the balls are accumulated. In order to fall into the k-th box<br />
a ball has to be diverted k times to the right <strong>and</strong> n- k times to the left. If at each nail the probability<br />
<strong>for</strong> the ball to go to left or to right is 1/2, then the distribution <strong>of</strong> the balls is given by
604 H. Primas<br />
the binomial distribution ( n k) (1/2) n , which <strong>for</strong> large n approach a Gaussian distribution.<br />
Our ignorance <strong>of</strong> the precise initial <strong>and</strong> boundary does not allow us to predict individual<br />
events. Nevertheless, the experimental Gaussian distribution does in no way depend on our<br />
knowledge. In this sense, we may speak <strong>of</strong> objective chance events.<br />
[9] For an introduction into the theory <strong>of</strong> deterministic chaos, compare <strong>for</strong> example Schuster<br />
(1984).<br />
[10] Feigl (1953), p.408.<br />
[11] Born (1955a), Born (1955b). For a critique <strong>of</strong> Born’s view compare Von Laue (1955).<br />
[12] Scriven (1965). For a critique <strong>of</strong> Scriven’s view compare Boyd (1972).<br />
[13] Gillies (1973), p.135.<br />
[14] Earman (1986), pp.6–7.<br />
[15] The definition <strong>and</strong> interpretation <strong>of</strong> probability has a long history. There exists an enormous<br />
literature on the conceptual problems <strong>of</strong> the classical probability calculus which cannot be<br />
summarized here. For a first orientation, compare the monographs by Fine (1973), Maistrov<br />
(1974), Von Plato (1994).<br />
[16] von Weizsäcker (1973), p.321.<br />
[17] Waismann (1930).<br />
[18] von Mises (1928).<br />
[19] Jeffreys (1939).<br />
[20] Savage (1962), p.102.<br />
[21] Russell (1948), pp.356–357.<br />
[22] Compare <strong>for</strong> example Savage (1954), Savage (1962), Good (1965), Jeffrey (1965). For a<br />
convenient collection <strong>of</strong> the most important papers on the modern subjective interpretation,<br />
compare Kyburg <strong>and</strong> Smokler (1964).<br />
[23] Bernoulli (1713).<br />
[24] Compare Laplace (1814).<br />
[25] de Finetti (1937). Compare also the collection <strong>of</strong> papers de Finetti (1972) <strong>and</strong> the monographs<br />
de Finetti (1974), de Finetti (1975).<br />
[26] Savage (1954), Savage (1962).<br />
[27] Keynes (1921).<br />
[28] Koopman (1940a), Koopman (1940b), Koopman (1941).<br />
[29] Carnap (1950), Carnap (1952), Carnap <strong>and</strong> Jeffrey (1971). Carnap’s concept <strong>of</strong> logical<br />
probabilities has been critized sharply by Watanabe (1969a).<br />
[30] For a critical evaluation <strong>of</strong> the view that statements <strong>of</strong> probability can be logically true,<br />
compare Ayer (1957), <strong>and</strong> the ensuing discussion, pp.18–30.<br />
[31] Venn (1866), chapter VI, §35 , §36.<br />
[32] Cournot (1843). This working rule was still adopted by Kolmogor<strong>of</strong>f (1933), p.4.<br />
[33] von Weizsäcker (1973), p.326. Compare also von Weizsäcker (1985), pp.100–118.<br />
[34] Carnap (1945), Carnap (1950).<br />
[35] Carnap (1963), p.73.<br />
[36] Compare <strong>for</strong> example Khrennikov (1994), chapters VI <strong>and</strong> VII.<br />
[37] Pauli (1954), p.114.<br />
[38] von Mises (1919), von Mises (1928), von Mises (1931). The English edition <strong>of</strong> von Mises
<strong>Probability</strong> <strong>Theory</strong> 605<br />
Ú Ù<br />
^<br />
(1964) was edited <strong>and</strong> complemented by Hilda Geiringer; it is strongly influenced by the<br />
views <strong>of</strong> Erhard Tornier <strong>and</strong> does not necessarily reflect the views <strong>of</strong> Richard von Mises.<br />
[39] The same is true <strong>for</strong> the important modifications <strong>of</strong> von Mises’ approach by Tornier (1933)<br />
<strong>and</strong> by Reichenbach (1994). Compare also the review by Martin-Löf (1969a).<br />
[40] Boole (1854), p.1.<br />
[41] Compare Halmos (1944), Kolmogor<strong>of</strong>f (1948), Lo)s (1955). A detailed study <strong>of</strong> the purely<br />
lattice-theoretical (“point-free”) approach to classical probability can be found in the<br />
monograph by Kappos (1969).<br />
[42] Pro memoria: Boolean Algebras. A Boolean algebra is a non-empty set B in which two binary<br />
operations (addition or disjunction) <strong>and</strong> (multiplication or conjunction), <strong>and</strong> a<br />
unary operation (complementation or negation) with the following properties are defined:<br />
the operations Ú <strong>and</strong> Ù are commutative <strong>and</strong> associative,<br />
the operation Ú is distributive with respect to Ù , <strong>and</strong> vice versa,<br />
<strong>for</strong> every A Î B <strong>and</strong> every B Î B we have A Ú A ^ = B Ú B ^ <strong>and</strong> A Ù<br />
A Ú (A Ù A ^ )= A Ù ( A Ú A ^ )= A .<br />
A ^<br />
= B Ù B ^<br />
These axioms imply that in every Boolean algebra there are two distinguished elements 1<br />
(called the unit <strong>of</strong> B ) <strong>and</strong> 0 (called the zero <strong>of</strong> B ), defined by A Ú<br />
A ^<br />
= 1, A Ù A ^ = 0 <strong>for</strong><br />
Î Ú Ú<br />
Î Ù Ù<br />
Î<br />
s<br />
s<br />
s s<br />
s s<br />
s<br />
every A B . With this it follows that 0 is the neutral element <strong>of</strong> the addition , A 0= A <strong>for</strong><br />
every A B , <strong>and</strong> that 1 is the neutral element <strong>of</strong> the multiplication , A 1= A <strong>for</strong> every<br />
A B . For more details, compare Sikorski (1969).<br />
[43] Stone (1936).<br />
[44] Loomis (1947).<br />
[45] Kolmogor<strong>of</strong>f (1933). There are many excellent texts on Kolmogorov’s mathematical probability<br />
theory. Compare <strong>for</strong> example: Breiman (1968), Prohorov <strong>and</strong> Rozanov (1969), Laha<br />
<strong>and</strong> Rohatgi (1979), Rényi (1970a), Rényi (1970b). Recommendable introductions to measure<br />
theory are, <strong>for</strong> example: Cohn (1980) , Nielsen (1997).<br />
[46] von Neumann (1932a), pp.595–598. Compare also Birkh<strong>of</strong>f <strong>and</strong> Neumann (1936), p.825.<br />
[47] Compare Gnedenko <strong>and</strong> Kolmogorov (1954), §3.<br />
[48] If V is a topological space, then the smallest -algebra with respect to which all continuous<br />
complex-valued functions on V are measurable is called the Baire -algebra <strong>of</strong> V . The<br />
smallest -algebra containing all open sets <strong>of</strong> V is called the Borel -algebra <strong>of</strong> V . In general,<br />
the Baire -algebra is contained in the Borel -algebra. If V is metrisable, then the<br />
Baire <strong>and</strong> Borel -algebras coincide. Compare Bauer (1974), theorem 40.4, p.198.<br />
[49] A polish space is a separable topological space that can be metrized by means <strong>of</strong> a complete<br />
metric; compare Cohn (1980), chapter 8. For a review <strong>of</strong> probability theory on complete<br />
separable metric spaces, compare Parthasarathy (1967). For a discussion <strong>of</strong> Radon measures<br />
on arbitrary topological spaces, compare Schwartz (1973). For a critical review <strong>of</strong><br />
Kolmogorov’s axioms, compare Fortet (1958), Lorenzen (1978).<br />
[50] Rényi (1955). Compare also chapter 2 in the excellent textbook by Rényi (1970b).<br />
[51] Sikorski (1949).<br />
[52] A usual but rather ill-chosen name since a “r<strong>and</strong>om variable” is neither a variable nor r<strong>and</strong>om.
606 H. Primas<br />
[53] While the equivalence <strong>of</strong> two continuous functions on a closed interval implies their equality,<br />
this is not true <strong>for</strong> arbitrary measurable (that is, in general, discontinuous) functions.<br />
Compare, <strong>for</strong> example Kolmogorov <strong>and</strong> Fomin (1961), p.41.<br />
[54] Compare Aristotle’s criticism in Metaphysica , 1064b 15: “Evidently, none <strong>of</strong> the traditional<br />
sciences busies itself about the accidental.” Quoted from Ross (1924).<br />
[55] Waismann (1930).<br />
[56] Compare <strong>for</strong> example Doob (1953), p.564; Pinsker (1964), section 5.2; Rozanov (1967),<br />
sections II.2 <strong>and</strong> III.2. Sometimes, singular processes are called deterministic, <strong>and</strong> regular<br />
processes are called purely non-deterministic. We will not use this terminology since determinism<br />
refers to an ontic description, while the singularity or the regularity refers to epistemic<br />
predictability <strong>of</strong> the process.<br />
[57] Bochner (1932), §20. The representation theorem by Bochner (1932), §19 <strong>and</strong> §20, refers to<br />
continuous positive-definite functions. Later, Cramér (1939) showed that the continuity assumption<br />
is dispensable. Compare also Cramér <strong>and</strong> Leadbetter (1967), section 7.4.<br />
[58] Khintchine (1934). Often this result is called the Wiener-Khintchin-theorem but this terminology<br />
should be avoided since Khintchin’s theorem relates the ensemble averages <strong>of</strong> the<br />
covariance <strong>and</strong> the spectral functions while the theorem by Wiener (1930), chapter II.3, relates<br />
the auto-correlation function <strong>of</strong> a single function with a spectral function <strong>of</strong> a single<br />
function.<br />
[59] Compare <strong>for</strong> example Rosenblatt (1971), section VI.2.<br />
[60] Compare also the review by Kallianpur (1961).<br />
[61] This decomposition is due to Wold (1938) <strong>for</strong> the special case <strong>of</strong> discrete-time weakly stationary<br />
processes, <strong>and</strong> to Hanner (1950) <strong>for</strong> the case <strong>of</strong> continuous-time processes. The<br />
general decomposition theorem is due to Cramér (1939).<br />
[62] Wiener (1942), republished as Wiener (1949); Krein (1945), Krein (1945). Compare also<br />
Doob (1953), p.584.<br />
[63] Compare also Lindblad (1993).<br />
[64] Meixner (1961), Meixner (1965).<br />
[65] König <strong>and</strong> Tobergte (1963).<br />
[66] Wiener <strong>and</strong> Akutowicz (1957), theorem 4.<br />
[67] Using the linearization <strong>of</strong> a classical dynamical system to Hilbert-space description introduced<br />
by Bernard Osgood Koopman (1931), Johann von Neumann (1932b) (communicated<br />
December 10, 1931, published 1932) was the first to establish a theorem bearing to the quasiergodic<br />
hypothesis: the mean ergodic theorem which refers to L 2 -convergence. Stimulated<br />
by these ideas, one month later George David Birkh<strong>of</strong>f (1931) (communicated December 1,<br />
1931, published 1931) obtained the even more fundamental individual (or pointwise) ergodic<br />
theorem which refers to pointwise convergence. As Birkh<strong>of</strong>f <strong>and</strong> Koopman (1932) explain,<br />
von Neumann communicated his results to them on October 22, 1931, <strong>and</strong> “raised at<br />
once the important question as to whether or not ordinary time means exist along the individual<br />
path-curves excepting <strong>for</strong> a possible set <strong>of</strong> Lebesgue measure zero.” Shortly thereafter<br />
Birkh<strong>of</strong>f proved his individual ergodic theorem.<br />
[68] This <strong>for</strong>mulation has been taken from Masani (1990), p.139–140.<br />
[69] Einstein (1905), Einstein (1906).<br />
[70] Wiener (1923), Wiener (1924).
<strong>Probability</strong> <strong>Theory</strong> 607<br />
®<br />
- - Î<br />
[71] Perrin (1906). A rigorous pro<strong>of</strong> <strong>of</strong> Perrin’s conjecture is due to Paley, Wiener <strong>and</strong> Zygmund<br />
(1933).<br />
[72] Wiener (1930). In his valuable commentary Pesi P. Masani (1979) stresses the importance <strong>of</strong><br />
role <strong>of</strong> generalized harmonic analysis <strong>for</strong> the quest <strong>for</strong> r<strong>and</strong>omness.<br />
[73] Compare <strong>for</strong> example Middleton (1960), p.151.<br />
[74] Khintchine (1934).<br />
[75] Wiener (1930), chapter II.3.<br />
[76] Einstein (1914a), Einstein (1914b).<br />
[77] Compare <strong>for</strong> example the controversy by Brennan (1957), Brennan (1958) <strong>and</strong> Beutler<br />
(1958a), Beutler (1958b), with a final remark by Norbert Wiener (1958).<br />
[78] Koopman <strong>and</strong> Neumann (1932), p.261.<br />
[79] Compare <strong>for</strong>e example Dym <strong>and</strong> McKean (1976), p.84. Note that there are processes which<br />
are singular in the linear sense but allow a perfect nonlinear prediction. An example can be<br />
found in Scarpellini (1979), p.295.<br />
[80] For example by Kakutani (1950).<br />
[81] von Mises (1919). Compare also his later books von Mises (1928), von Mises (1931), von<br />
Mises (1964).<br />
[82] Church (1940).<br />
[83] Post (1936), Turing (1936).<br />
[84] Church (1936)<br />
[85] Kolmogorov (1983a), p.39.<br />
[86] Kolmogorov (1963), p.369.<br />
[87] Compare also Kolmogorov (1968a), Kolmogorov (1968b), Kolmogorov (1983a), Kolmogorov<br />
(1983b), Kolmogorov <strong>and</strong> Uspenskii (1988). For a review, compare Zvonkin <strong>and</strong><br />
Levin (1970).<br />
[88] Compare Solomon<strong>of</strong>f (1964), Chaitin (1966), Chaitin (1969), Chaitin (1970).<br />
[89] Martin-Löf (1966), Martin-Löf (1969b).<br />
[90] Schnorr (1969), Schnorr (1970a), Schnorr (1970b), Schnorr (1971a), Schnorr(1971b) ,<br />
Schnorr (1973).<br />
[91] A function C :I R is called computable if there is a recursive function R such that<br />
| R (n,w) C (w)| < 2 n <strong>for</strong> all w I <strong>and</strong> all nÎ {1,2,3,...} . Recursive functions are functions<br />
computable with the aid <strong>of</strong> a Turing machine.<br />
[92] For a review <strong>of</strong> modern algorithmic probability theory, compare Schnorr (1971b).<br />
[93] Compare <strong>for</strong> example Kronfli (1971).<br />
[94] Kamber (1964), §7, <strong>and</strong> Kamber (1965), §14.<br />
[95] For the Hilbert-space theory <strong>of</strong> such minimal dilations in Hilbert space, compare Sz.-Nagy<br />
<strong>and</strong> Foiaş (1970). More generally, Antoniou <strong>and</strong> Gustafson (1997) have shown that an arbitrary<br />
Markov chain can be dilated to a unique minimal deterministic dynamical system.<br />
[96] For example, every continuous regular Gaussian stochastic processes can be generated by a<br />
deterministic conservative <strong>and</strong> reversible linear Hamiltonian system with an infinite-dimensional<br />
phase space. For an explicit construction, compare <strong>for</strong> instance Picci (1986), Picci<br />
(1988).<br />
[97] These decay constants are not “an invariable property <strong>of</strong> the nucleus, unchangeable by any
608 H. Primas<br />
external influences” (as claimed by Max Born (1949), p.172), but depend <strong>for</strong> example on the<br />
degree <strong>of</strong> ionization <strong>of</strong> the atom.<br />
[98] In quantum theory, a Boolean context is described by a commutative W*-algebra which can<br />
be generated by a single selfadjoint operator, called the observable <strong>of</strong> the experiment. The<br />
expectation value <strong>of</strong> the operator-valued spectral measure <strong>of</strong> this observable is exactly the<br />
probability measure <strong>for</strong> the statistical description <strong>of</strong> the experiment in terms <strong>of</strong> a classical<br />
Kolmogorov probability space.<br />
[99] The claim by Hans Reichenbach (1949) (p.15), “dass das Kausalprinzip in keiner Weise mit<br />
der Physik der Quanta verträglich ist,” is valid only if one restricts arbitrarily the domain <strong>of</strong><br />
the causality principle to Boolean logic.<br />
[100] For an introduction, compare Jauch (1974) <strong>and</strong> Beltrametti <strong>and</strong> Cassinelli (1981), chapters<br />
11 <strong>and</strong> 26.<br />
[101] Compare <strong>for</strong> example Gudder <strong>and</strong> Hudson (1978).<br />
[102] Compare Watanabe (1967), Watanabe (1969b), Schadach (1973). For a concrete application<br />
<strong>of</strong> non-Boolean pattern recognition <strong>for</strong> medical diagnosis, compare Schadach (1973).<br />
[103] Compare <strong>for</strong> example Watanabe (1969a), chapter 9.<br />
[104] Watanabe (1961).<br />
[105] Letter <strong>of</strong> June 3, 1952, by Wolfgang Pauli to Markus Fierz, quoted from von Meyenn<br />
(1996), p.634.<br />
References<br />
Antoniou, I. <strong>and</strong> Gustafson, K. (1997). From irreversible Markov semigroups to chaotic dynamics.<br />
Physica A, 236, 296- 308.<br />
Ayer, A. J. (1957). The conception <strong>of</strong> probability as a logical relation. In: S. Körner (Ed.): Observation<br />
<strong>and</strong> Interpretation in the Philosophy <strong>of</strong> Physics. New York: Dover Publications. pp.<br />
12–17.<br />
Beltrametti, E. G. <strong>and</strong> Cassinelli, G. (1981). The Logic <strong>of</strong> Quantum Mechanics. London: Addison-<br />
Wesley.<br />
Bernoulli, J. (1713). Ars conject<strong>and</strong>i. Basel. German translation by R. Haussner under the title <strong>of</strong><br />
Wahrscheinlichkeitsrechnung. Leipzig: Engelmann, 1899.<br />
Beutler, F. J. (1958a). A further note on differentiability <strong>of</strong> auto-correlation functions. Proceedings<br />
<strong>of</strong> the Institute <strong>of</strong> Radio Engineers, 45, 1759- 1760.<br />
Beutler, F. J. (1958b). A further note on differentiability <strong>of</strong> auto-correlation functions. Author’s<br />
comments. Proceedings <strong>of</strong> the Institute <strong>of</strong> Radio Engineers, 46, 1759–1760.<br />
Birkh<strong>of</strong>f, G. <strong>and</strong> von Neumann, J. (1936): The logic <strong>of</strong> quantum mechanics. Annals <strong>of</strong> Mathematics,<br />
37, 823- 843.<br />
Birkh<strong>of</strong>f, G. D. (1931). Pro<strong>of</strong> <strong>of</strong> the ergodic theorem. Proceedings <strong>of</strong> the National Academy <strong>of</strong> Sciences<br />
<strong>of</strong> the United States <strong>of</strong> America, 17, 656–660.<br />
Birkh<strong>of</strong>f, G. D. <strong>and</strong> Koopman, B. O. (1932). Recent contributions to the ergodic theory. Proceedings<br />
<strong>of</strong> the National Academy <strong>of</strong> Sciences <strong>of</strong> the United States <strong>of</strong> America, 18, 279–282.<br />
Bochner, S. (1932). Vorlesungen über Fouriersche Integrale. Leipzig: Akademische Verlagsgesellschaft.<br />
Boole, G. (1854). An Investigation <strong>of</strong> the Laws <strong>of</strong> Thought. London: Macmillan. Reprint (1958).<br />
New York: Dover Publication.<br />
Born, M. (1949). Einstein’s statistical theories. In: P. A. Schilpp (Ed.): Albert Einstein: Philosopher-Scientist<br />
. Evanston, Illinois: Library <strong>of</strong> Living Philosophers. pp.163–177.<br />
Born, M. (1955a). Ist die klassische Mechanik wirklich deterministisch? Physikalische Blätter, 11,<br />
49- 54.<br />
Born, M. (1955b). Continuity, determinism <strong>and</strong> reality. Danske Videnskabernes Selskab Mathematisk<br />
Fysiske Meddelelser, 30, No.2, pp.1- 26.<br />
Boyd, R. (1972). Determinism, laws, <strong>and</strong> predictability in principle. Philosophy <strong>of</strong> Science, 39,<br />
431- 450.
<strong>Probability</strong> <strong>Theory</strong> 609<br />
Breiman, L. (1968). <strong>Probability</strong> . Reading, Massachusetts: Addison-Wesley.<br />
Brennan, D. G. (1957). Smooth r<strong>and</strong>om functions need not have smooth correlation functions.<br />
Proceedings <strong>of</strong> the Institute <strong>of</strong> Radio Engineers, 45, 1016- 1017.<br />
Brennan, D. G. (1958). A further note on differentiability <strong>of</strong> auto-correlation functions. Proceedings<br />
<strong>of</strong> the Institute <strong>of</strong> Radio Engineers, 46, 1758- 1759.<br />
Carnap, R. (1945). The two concepts <strong>of</strong> probability. Philosophy <strong>and</strong> Phenomenological Research,<br />
5, 513- 532.<br />
Carnap, R. (1950). Logical Foundations <strong>of</strong> <strong>Probability</strong>. Chicago: University <strong>of</strong> Chicago Press.<br />
2nd edition. 1962.<br />
Carnap, R. (1952). The Continuum <strong>of</strong> Inductive Methods. Chicago: University <strong>of</strong> Chicago Press.<br />
Carnap, R. (1963). Intellectual autobiography. In: P. A. Schilpp (Ed.): The Philosophy <strong>of</strong> Rudolf<br />
Carnap. La Salle, Illinois: Open Court. pp.1–84.<br />
Carnap, R. <strong>and</strong> Jeffrey, R. C. (1971). Studies in Inductive Logic <strong>and</strong> <strong>Probability</strong>. Volume I. Berkeley:<br />
University <strong>of</strong> Cali<strong>for</strong>nia Press.<br />
Chaitin, G. (1966). On the length <strong>of</strong> programs <strong>for</strong> computing finite binary sequences. Journal <strong>of</strong><br />
the Association <strong>for</strong> Computing Machinery, 13, 547- 569.<br />
Chaitin, G. (1969). On the length <strong>of</strong> programs <strong>for</strong> computing finite binary sequences: Statistical<br />
considerations. Journal <strong>of</strong> the Association <strong>for</strong> Computing Machinery, 16, 143- 159.<br />
Chaitin, G. (1970). On the difficulty <strong>of</strong> computations. IEEE Transactions on In<strong>for</strong>mation <strong>Theory</strong>,<br />
IT-16, 5- 9.<br />
Church, A. (1936). An unsolvable problem <strong>of</strong> elementary number theory. The American Journal<br />
<strong>of</strong> Mathematics, 58, 345- 363.<br />
Church, A. (1940). On the concept <strong>of</strong> a r<strong>and</strong>om sequence. Bulletin <strong>of</strong> the American Mathematical<br />
<strong>Society</strong>, 46, 130- 135.<br />
Cohn, D. L. (1980). Measure <strong>Theory</strong>. Boston: Birkhäuser.<br />
Cournot, A. A. (1843). Exposition de la théorie des chances et des probabilitiés. Paris.<br />
Cramér, H. (1939). On the representation <strong>of</strong> a function by certain Fourier integrals. Transactions<br />
<strong>of</strong> the American Mathematical <strong>Society</strong>, 46, 191- 201.<br />
de Finetti, B. (1937). La prévision: ses lois logiques, ses sources subjectives. Annales de l’Institut<br />
Henri Poincaré, 7, 1- 68.<br />
de Finetti, B. (1972). <strong>Probability</strong>, Induction <strong>and</strong> Statistics. The Art <strong>of</strong> Guessing. London: Wiley.<br />
de Finetti, B. (1974). <strong>Theory</strong> <strong>of</strong> <strong>Probability</strong>. A Critical Introductory Treatment. Volume 1. London:<br />
Wiley.<br />
de Finetti, B. (1975). <strong>Theory</strong> <strong>of</strong> <strong>Probability</strong>. A Critical Introductory Treatment. Volume 2. London:<br />
Wiley.<br />
Doob, J. L. (1953). Stochastic Processes. New York: Wiley.<br />
Dym, H. <strong>and</strong> McKean, H. P. (1976). Gaussian Processes, Function <strong>Theory</strong>, <strong>and</strong> the Inverse Spectral<br />
Problem. New York: Academic Press.<br />
Earman, J. (1986). A Primer on Determinism. Dordrecht: Reidel.<br />
Einstein, A. (1905). Über die von der molekularkinetischen Theorie der Wärme ge<strong>for</strong>derte Bewegung<br />
von in ruhenden Flüssigkeiten suspendierter Teilchen. Annalen der Physik, 17, 549- 560.<br />
Einstein, A. (1906). Zur Theorie der Brownschen Bewegung. Annalen der Physik, 19, 371- 381.<br />
Einstein, A. (1914a). Méthode pour la détermination de valeurs statistiques d’observations concernant<br />
des gr<strong>and</strong>eurs soumises à des fluctuations irrégulières. Archives des sciences physiques<br />
et naturelles, 37, 254- 256.<br />
Einstein, A. (1914b). Eine Methode zur statistischen Verwertung von Beobachtungen scheinbar<br />
unregelmässig quasiperiodisch verlaufender Vorgänge. Unpublished Manuscript. Reprinted<br />
in: M. J. Klein, A. J. Kox, J. Renn <strong>and</strong> R. Schulmann (Eds.). The Collected Papers <strong>of</strong> Albert<br />
Einstein. Volume 4. The Swiss Years, 1912–1914. Princeton: Princeton University Press. 1995.<br />
pp. 603–607.<br />
Enz, C. P. <strong>and</strong> von Meyenn, K. (1994). Wolfgang Pauli. Writings on Physics <strong>and</strong> Philosophy.<br />
Berlin: Springer.<br />
Feigl, H. (1953). Readings in the philosophy <strong>of</strong> science. In: H. Feigl <strong>and</strong> M. Brodbeck (Eds.).<br />
Notes on Causality. New York: Appleton-Century-Cr<strong>of</strong>ts.<br />
Fine, T. L. (1973). Theories <strong>of</strong> <strong>Probability</strong>. An Examination <strong>of</strong> Foundations. New York: Academic<br />
Press.<br />
Fortet, R. (1958). Recent Advances in <strong>Probability</strong> <strong>Theory</strong>. Surveys in Applied Mathematics. IV.<br />
Some Aspects <strong>of</strong> Analysis <strong>and</strong> <strong>Probability</strong>. New York: Wiley, pp.169- 240.
610 H. Primas<br />
Gibbs, J. W. (1902). Elementary Principles in Statistical Mechanics. New Haven: Yale University<br />
Press.<br />
Gillies, D. A. (1973). An Objective <strong>Theory</strong> <strong>of</strong> <strong>Probability</strong>. London: Methuen.<br />
Gnedenko, B. V. <strong>and</strong> Kolmogorov, A. N. (1954). Limit Distributions <strong>for</strong> Sums <strong>of</strong> Independent R<strong>and</strong>om<br />
Variables. Reading, Massacusetts: Addision-Wesley.<br />
Good, I. J. (1965). The Estimation <strong>of</strong> Probabilities. An Essay on Modern Bayesian Methods. Cambridge,<br />
Massachusetts: MIT Press.<br />
Gudder, S. P. <strong>and</strong> Hudson, R. L. (1978). A noncommutative probability theory. Transactions <strong>of</strong> the<br />
American Mathematical <strong>Society</strong>, 245, 1- 41.<br />
Haas, A. (1936). Commentary <strong>of</strong> the Scientific Writings <strong>of</strong> J. Willard Gibbs. New Haven: Yale<br />
University Press.<br />
Halmos, P. R. (1944). The foundations <strong>of</strong> probability. American Mathematical Monthly, 51, 493-<br />
510.<br />
Hanner, O. (1950). Deterministic <strong>and</strong> non-deterministic stationary r<strong>and</strong>om processes. Arkiv för<br />
Matematik, 1, 161- 177.<br />
Hille, E. <strong>and</strong> Phillips, R. S. (1957). Functional Analysis <strong>and</strong> Semi-groups. Providence, Rhode Isl<strong>and</strong>:<br />
American Mathematical <strong>Society</strong>.<br />
Jauch, J. M. (1974). The quantum probability calculus. Synthese, 29, 131- 154.<br />
Jeffrey, R. C. (1965). The Logic <strong>of</strong> Decision. New York: McGraw-Hill.<br />
Jeffreys, H. (1939). <strong>Theory</strong> <strong>of</strong> <strong>Probability</strong>. Ox<strong>for</strong>d: Clarendon Press. 2nd edition, 1948; 3rd edition,<br />
1961.<br />
Kakutani, S. (1950). Review <strong>of</strong> “Extrapolation, interpolation <strong>and</strong> smoothing <strong>of</strong> stationary time series”<br />
by Norbert Wiener. Bulletin <strong>of</strong> the American Mathematical <strong>Society</strong>, 56, 378- 381.<br />
Kallianpur, G. (1961). Some ramifications <strong>of</strong> Wiener’s ideas on nonlinear prediction. In: P.<br />
Masani (Ed.), Norbert Wiener. Collected Works with Commentaries. Volume III. Cambridge,<br />
Massachusetts: MIT Press, pp.402–424.<br />
Kamber, F. (1964). Die Struktur des Aussagenkalk üls in einer physikalischen Theorie. Nachrichten<br />
der Akademie der Wissenschaften, Göttingen. Mathematisch Physikalische Klasse, 10,<br />
103- 124.<br />
Kamber, F. (1965). Zweiwertige Wahrscheinlichkeitsfunktionen auf orthokomplementären Verbänden.<br />
Mathematische Annalen, 158, 158- 196.<br />
Kappos, D. A. (1969). <strong>Probability</strong> Algebras <strong>and</strong> Stochastic Spaces. New York: Academic Press.<br />
Keynes, J. M. (1921). A Treatise on the Principles <strong>of</strong> <strong>Probability</strong>. London: Macmillan.<br />
Khintchine, A. (1934). Korrelationstheorie der stationären stochastischen Prozesse. Mathematische<br />
Annalen, 109, 604- 615.<br />
Khrennikov, A. (1994). p-Adic Valued Distributions in Mathematical Physics. Dordrecht: Kluwer.<br />
Kolmogor<strong>of</strong>f, A. (1933). Grundbegriffe der Wahrscheinlichkeitsrechnung. Berlin: Springer.<br />
Kolmogor<strong>of</strong>f, A. (1948). Algèbres de Boole métriques complètes. VI. Zjazd Matematyków Polskich.<br />
Annales de la Societe Polonaise de Mathematique, 20, 21–30.<br />
Kolmogorov, A. N. (1963). On tables <strong>of</strong> r<strong>and</strong>om numbers. Sankhyá. The Indian Journal <strong>of</strong> Statistics<br />
A, 25, 369- 376.<br />
Kolmogorov, A. N. (1968a). Three approaches to the quantitative definition <strong>of</strong> in<strong>for</strong>mation. International<br />
Journal <strong>of</strong> Computer Mathematics, 2, 157- 168. Russian original in: Problemy<br />
Peredachy In<strong>for</strong>matsii 1, 3–11 (1965).<br />
Kolmogorov, A. N. (1968b). Logical basis <strong>for</strong> in<strong>for</strong>mation theory <strong>and</strong> probability theoy. IEEE<br />
Transactions on In<strong>for</strong>mation <strong>Theory</strong>, IT-14, 662–664.<br />
Kolmogorov, A. N. (1983a). Combinatorial foundations <strong>of</strong> in<strong>for</strong>mation theory <strong>and</strong> the calculus <strong>of</strong><br />
probability. Russian Mathematical Surveys, 38: 4, 29–40.<br />
Kolmogorov, A. N. (1983b). On logical foundations <strong>of</strong> probability theory. <strong>Probability</strong> <strong>Theory</strong> <strong>and</strong><br />
Mathematical Statistics. Lecture Notes in Mathematics. Berlin: Springer, pp.1–5.<br />
Kolmogorov, A. N. <strong>and</strong> Fomin, S. V. (1961). <strong>Elements</strong> <strong>of</strong> the <strong>Theory</strong> <strong>of</strong> Functions <strong>and</strong> Functional<br />
Analysis. Volume 2. Measure. The Lebesgue Integral. Hilbert Space. Albany: Graylock Press.<br />
Kolmogorov, A. N. <strong>and</strong> Uspenskii, V. A. (1988). Algorithms <strong>and</strong> r<strong>and</strong>omness. <strong>Theory</strong> <strong>of</strong> <strong>Probability</strong><br />
<strong>and</strong> its Applications, 32, 389- 412.<br />
König, H. <strong>and</strong> Tobergte, J. (1963). Reversibilität und Irreversibilität von linearen dissipativen<br />
Systemen. Journal für die reine und angew<strong>and</strong>te Mathematik, 212, 104- 108.<br />
Koopman, B. O. (1931). Hamiltonian systems <strong>and</strong> trans<strong>for</strong>mations in Hilbert space. Proceedings<br />
<strong>of</strong> the National Academy <strong>of</strong> Sciences <strong>of</strong> the United States <strong>of</strong> America, 17, 315–318.
<strong>Probability</strong> <strong>Theory</strong> 611<br />
Koopman, B. O. (1940a). The bases <strong>of</strong> probability. Bulletin <strong>of</strong> the American Mathematical <strong>Society</strong>,<br />
46, 763- 774.<br />
Koopman, B. O. (1940b). The axioms <strong>and</strong> algebra <strong>of</strong> intuitive probability. Annals <strong>of</strong> Mathematics,<br />
41, 269- 292.<br />
Koopman, B. O. (1941). Intuitive probabilities <strong>and</strong> sequences. Annals <strong>of</strong> Mathematics, 42, 169-<br />
187.<br />
Koopman, B. O. <strong>and</strong> von Neumann, J. (1932). Dynamical systems <strong>of</strong> continuous spectra. Proceedings<br />
<strong>of</strong> the National Academy <strong>of</strong> Sciences <strong>of</strong> the United States <strong>of</strong> America, 18, 255- 263.<br />
Krein, M. G. (1945). On a generalization <strong>of</strong> some investigations <strong>of</strong> G. Szegö, W. M. Smirnov, <strong>and</strong><br />
A. N. Kolmogorov. Doklady Akademii Nauk, SSSR, 46, 91- 94 [in Russian].<br />
Krein, M. G. (1945). On a problem <strong>of</strong> extrapolation <strong>of</strong> A. N. Kolmogorov. Doklady Akademii<br />
Nauk, SSSR, 46, 306- 309 [in Russian].<br />
Kronfli, N. S. (1971). Atomicity <strong>and</strong> determinism in Boolean systems. International Journal <strong>of</strong><br />
Theoretical Physics, 4, 141- 143.<br />
Kyburg, H. E. <strong>and</strong> Smokler, H. E. (1964). Studies in Subjective <strong>Probability</strong>. New York: Wiley.<br />
Laha, R. G. <strong>and</strong> Rohatgi, V. K. (1979). <strong>Probability</strong> <strong>Theory</strong>. New York: Wiley.<br />
Laplace, P. S. (1814). Essai Philosophique sur les Probabilités. English translation from the sixth<br />
French edition under the title: A Philosophical Essay on Probabilities. 1951. New York: Dover<br />
Publications.<br />
Lindblad, G. (1993). Irreversibility <strong>and</strong> r<strong>and</strong>omness in linear response theory. Journal <strong>of</strong> Statistical<br />
Physics, 72, 539–554.<br />
Loomis, L. H. (1947). On the representation <strong>of</strong> s-complete Boolean algebras. Bulletin <strong>of</strong> the American<br />
Mathematical <strong>Society</strong>, 53, 757- 760.<br />
Lorenzen, P. (1978). Eine konstruktive Deutung des Dualismus in der Wahrscheinlichkeitstheorie.<br />
Zeitschrift für allgemeine Wissenschaftstheorie, 2, 256- 275.<br />
Lo)s , J. (1955). On the axiomatic treatment <strong>of</strong> probability. Colloquium Mathematicum (Wroclaw),<br />
3, 125- 137.<br />
Maistrov, L. E. (1974). <strong>Probability</strong> <strong>Theory</strong>. A Historical Sketch. New York: Wiley.<br />
Martin-Löf, P. (1966). The definition <strong>of</strong> r<strong>and</strong>om sequences. In<strong>for</strong>mation <strong>and</strong> Control, 9, 602- 619.<br />
Martin-Löf, P. (1969a). The literature on von Mises’ kollektivs revisited. Theoria. A Swedish<br />
Journal <strong>of</strong> Philosophy, 35, 12–37.<br />
Martin-Löf, P. (1969b). Algorithms <strong>and</strong> r<strong>and</strong>omness. Review <strong>of</strong> the International Statistical Institute,<br />
37, 265- 272.<br />
Masani, P. (1979). Commentary on the memoire [30a] on generalized harmonic analysis. In: P.<br />
Masani (Ed.), Norbert Wiener. Collected Works with Commentaries. Volume II. Cambridge,<br />
Massachusetts: MIT Press. pp.333–379.<br />
Masani, R. R. (1990). Norbert Wiener, 1894–1964. Basel: Birkhäuser.<br />
Meixner, J. (1961). Reversibilität und Irreversibilität in linearen passiven Systemen. Zeitschrift<br />
für Natur<strong>for</strong>schung , 16a, 721- 726.<br />
Meixner, J. (1965). Linear passive systems. In: J. Meixner (Ed.), Statistical Mechanics <strong>of</strong> Equilibrium<br />
<strong>and</strong> Non-equilibrium. Amsterdam: North-Holl<strong>and</strong>.<br />
Middleton, D. (1960). Statistical Communication <strong>Theory</strong>. New York: MacGraw-Hill.<br />
Nielsen, O. E. (1997). An Introduction to Integration <strong>and</strong> Measure <strong>Theory</strong>. New York: Wiley.<br />
Paley, R. E. A. C., Wiener, N. <strong>and</strong> Zygmund, A. (1933). Notes on r<strong>and</strong>om functions. Mathematische<br />
Zeitschrift, 37, 647- 668.<br />
Parthasarathy, K. R. (1967). <strong>Probability</strong> Measures on Metric Spaces. New York: Academic Press.<br />
Pauli, W. (1954). Wahrscheinlichkeit und Physik. Dialectica, 8, 112- 124.<br />
Perrin, J. (1906). La discontinuité de la matière. Revue du mois, 1, 323- 343.<br />
Picci, G. (1986). Application <strong>of</strong> stochastic realization theory to a fundamental problem <strong>of</strong> statistical<br />
physics. In: C. I. Byrnes <strong>and</strong> A. Lindquist (Eds.), Modelling, Identification <strong>and</strong> Robust<br />
Control. Amsterdam: North-Holl<strong>and</strong>. pp.211–258.<br />
Picci, G. (1988). Hamiltonian representation <strong>of</strong> stationary processes. In: I. Gohberg, J. W. Helton<br />
<strong>and</strong> L. Rodman (Eds.), Operator <strong>Theory</strong>: Advances <strong>and</strong> Applications. Basel: Birkhäuser.<br />
pp.193–215.<br />
Pinsker, M. S. (1964). In<strong>for</strong>mation <strong>and</strong> In<strong>for</strong>mation Stability <strong>of</strong> R<strong>and</strong>om Variables <strong>and</strong> Processes.<br />
San Francisco: Holden–Day.<br />
Post, E. L. (1936). Finite combinatory processes — <strong>for</strong>multation. Journal <strong>of</strong> Symbolic Logic, 1,<br />
103- 105.<br />
Prohorov, Yu. V. <strong>and</strong> Rozanov, Yu. A. (1969). <strong>Probability</strong> <strong>Theory</strong>. Berlin: Springer.
612 H. Primas<br />
Reichenbach, H. (1949). Philosophische Probleme der Quantenmechani k. Basel: Birkhäuser.<br />
Reichenbach, H. (1994). Wahrscheinlichkeitslehre. Braunschweig: Vieweg. 2. Auflage, auf<br />
Grundlage der erweiterten amerikanischen Ausgabe bearbeitet und herausgegeben von Godehard<br />
Link. B<strong>and</strong> 7 der Gesammelten Werke von Hans Reichenbach.<br />
Rényi, A. (1955). A new axiomatic theory <strong>of</strong> probability. Acta Mathematica Academia Scientiarum<br />
Hungaricae, 6, 285- 335.<br />
Rényi, A. (1970a). <strong>Probability</strong> <strong>Theory</strong>. Amsterdam: North-Holl<strong>and</strong>.<br />
Rényi, A. (1970b). Foundations <strong>of</strong> <strong>Probability</strong>. San Francisco: Holden-Day.<br />
Rosenblatt, M. (1971). Markov Processes: Structure <strong>and</strong> Asymptotic Behavior. Berlin: Springer.<br />
Ross, W. D. (1924). Aristotle’s Metaphysics. Text <strong>and</strong> Commentary. Ox<strong>for</strong>d: Clarendon Press.<br />
Rozanov, Yu. A. (1967). Stationary R<strong>and</strong>om Processes. San Francisco: Holden-Day.<br />
Russell, B. (1948). Human Knowledge. Its Scope <strong>and</strong> Limits. London: Georg Allen <strong>and</strong> Unwin.<br />
Savage, L. J. (1954). The Foundations <strong>of</strong> Statistics. New York: Wiley.<br />
Savage, L. J. (1962). The Foundations <strong>of</strong> Statistical Infereence. A Discussion. London: Methuen.<br />
Scarpellini, B. (1979). Predicting the future <strong>of</strong> functions on flows. Mathematical Systems <strong>Theory</strong><br />
12, 281- 296.<br />
Schadach, D. J. (1973). Nicht-Boolesche Wahrscheinlichkeitsmasse für Teilraummethoden in der<br />
Zeichenerkennung. In: T. Einsele, W. Giloi <strong>and</strong> H.-H. Nagel (Eds.), Lecture Notes in Economics<br />
<strong>and</strong> Mathematical Systems. Vol. 83. Berlin: Springer. pp.29–35.<br />
Scheibe, E. (1964). Die kontingenten Aussagen in der Physik. Frankfurt: Athenäum Verlag.<br />
Scheibe, E. (1973). The Logical Analysis <strong>of</strong> Quantum Mechanics. Ox<strong>for</strong>d: Pergamon Press.<br />
Schnorr, C. P (1969). Eine Bemerkung zum Begriff der zufälligen Folge. Zeitschrift für<br />
Wahrscheinlichkeitstheorie und verw<strong>and</strong>te Gebiete 14, 27- 35.<br />
Schnorr, C. P (1970a). Über die Definition von effektiven Zufallstests. Zeitschrift für Wahrscheinlichkeitstheorie<br />
und verw<strong>and</strong>te Gebiete 15, 297–312, 313–328.<br />
Schnorr, C. P (1970b). Klassifikation der Zufallsgesetze nach Komplexität und Ordnung.<br />
Zeitschrift für Wahrscheinlichkeitstheorie und verw<strong>and</strong>te Gebiete, 16, 1- 26.<br />
Schnorr, C. P (1971a). A unified approach to the definition <strong>of</strong> r<strong>and</strong>om sequencies. Mathematical<br />
System <strong>Theory</strong>, 5, 246- 258.<br />
Schnorr, C. P (1971b). Zufälligkeit und Wahrscheinlichkeit. Eine Algorithmische Begründung<br />
der Wahrscheinlichkeitstheorie. Lecture Notes in Mathematics, Volume 218. Berlin: Springer.<br />
Schnorr, C. P (1973). Process complexity <strong>and</strong> effective r<strong>and</strong>om tests. Journal <strong>of</strong> Computer <strong>and</strong><br />
System Sciences, 7, 376- 388.<br />
Schuster, H. G. (1984). Deterministic Chaos. An Introduction. Weinheim: Physik-Verlag.<br />
Schwartz, L. (1973). Radon Measures on Arbitrary Topological Spaces <strong>and</strong> Cylindrical Measures.<br />
London: Ox<strong>for</strong>d University Press.<br />
Scriven, M. (1965). On essential unpredictability in human behavior. In: B. B. Wolman <strong>and</strong> E.<br />
Nagel (eds.). Scientific Psychology: Principles <strong>and</strong> Approaches. New York: <strong>Basic</strong> Books.<br />
Sikorski, R. (1949). On inducing <strong>of</strong> homomorphism by mappings. Fundamenta Mathematicae, 36,<br />
7- 22.<br />
Sikorski, R. (1969). Boolean Algebras. Berlin: Springer.<br />
Solomon<strong>of</strong>f, R. J. (1964). A <strong>for</strong>mal theory <strong>of</strong> inductive inference. In<strong>for</strong>mation <strong>and</strong> Control, 7,<br />
1–22, 224–254.<br />
Stone, M. H. (1936). The theory <strong>of</strong> representations <strong>for</strong> Boolean algebras. Transactions <strong>of</strong> the<br />
American Mathematical <strong>Society</strong>, 40, 37- 111.<br />
Sz.-Nagy, B. <strong>and</strong> Foiaş , C. (1970). Harmonic Analysis <strong>of</strong> Operators on Hilbert Space. Amsterdam:<br />
North-Holl<strong>and</strong>.<br />
Tornier, E. (1933). Grundlagen der Wahrscheinlichkeitsrechnung. Acta Mathematica, 60, 239-<br />
380.<br />
Turing, A. M. (1936). On computable numbers, with an application to the Entscheidungsprob lem.<br />
Proceedings <strong>of</strong> the London Mathematical <strong>Society</strong>, 42, 230- 256. Corrections: Ibid. 43 (1937)<br />
544–546.<br />
Venn, J. (1866). The Logic <strong>of</strong> Chance. London. An unaltered reprint <strong>of</strong> the third edition <strong>of</strong> 1888<br />
appeared by Chelsea, New York, 1962.<br />
von Laue, M. (1955). Ist die klassische Physik wirklich deterministisch? Physikalische Blätter, 11,<br />
269- 270.<br />
von Meyenn, K. (1996). Wolfgang Pauli. Wissenschaftlicher Briefwechsel, B<strong>and</strong> IV, Teil I:<br />
1950–1952. Berlin: Springer-Verlag.
<strong>Probability</strong> <strong>Theory</strong> 613<br />
von Mises, R. (1919). Grundlagen der Wahrscheinlichkeitsrechnung. Mathematische Zeitschrift 5,<br />
52- 99.<br />
von Mises, R. (1928). Wahrscheinlichkeit, Statistik und Wahrheit. Wien: Springer.<br />
von Mises, R. (1931). Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statistik und theoretischen<br />
Physik. Leipzig: Deuticke.<br />
von Mises, R. (1964). Mathematical <strong>Theory</strong> <strong>of</strong> <strong>Probability</strong> <strong>and</strong> Statitics. Edited <strong>and</strong> Complemented<br />
by Hilda Geiringer. New York: Academic Press.<br />
von Neumann, J. (1932a). Zur Operatorenmethode in der klassischen Mechanik. Annals <strong>of</strong> Mathematics,<br />
33, 587–642, 789–791.<br />
von Neumann, J. (1932b). Pro<strong>of</strong> <strong>of</strong> the quasiergodic hypothesis. Proceedings <strong>of</strong> the National<br />
Academy <strong>of</strong> Sciences <strong>of</strong> the United States <strong>of</strong> America, 18, 70- 82.<br />
von Plato, J. (1994). Creating Modern <strong>Probability</strong>: Its Mathematics, Physics, <strong>and</strong> Philosophy in<br />
Historical Perspective. Cambridge: Cambridge University Press.<br />
von Weizsäcker, C. F. (1973). <strong>Probability</strong> <strong>and</strong> quantum mechanics. British Journal <strong>for</strong> the Philosophy<br />
<strong>of</strong> Science, 24, 321- 337.<br />
von Weizsäcker, C. F. (1985). Aufbau der Physik. München: Hanser Verlag.<br />
Waismann, F. (1930). Logische Analyse des Wahrscheinlichkeitsbegriffs. Erkenntnis, 1, 228- 248.<br />
Watanabe, S. (1961). A model <strong>of</strong> mind-body relation in terms <strong>of</strong> modular logic. Synthese, 13, 261-<br />
302.<br />
Watanabe, S. (1967). Karhunen–Loève expansion <strong>and</strong> factor analysis. Theoretical remarks <strong>and</strong><br />
applications. Transactions <strong>of</strong> the Fourth Prague Conference on In<strong>for</strong>mation <strong>Theory</strong>, Statistical<br />
Decision Functions, R<strong>and</strong>om Processes (Prague, 1965). Prague: Academia. pp.635–660.<br />
Watanabe, S. (1969a). Knowing <strong>and</strong> Guessing. A Quantitative Study <strong>of</strong> Inference <strong>and</strong> In<strong>for</strong>mation.<br />
New York: Wiley.<br />
Watanabe, S. (1969b). Modified concepts <strong>of</strong> logic, probability, <strong>and</strong> in<strong>for</strong>mation based on generalized<br />
continuous characteristic function. In<strong>for</strong>mation <strong>and</strong> Control, 15, 1- 21.<br />
Whittaker, E. T. (1943). Chance, freewill <strong>and</strong> necessity in the scientific conception <strong>of</strong> the universe.<br />
Proceedings <strong>of</strong> the Physical <strong>Society</strong> (London), 55, 459- 471.<br />
Wiener, N. (1930). Generalized harmonic analysis. Acta Mathematica, 55, 117- 258.<br />
Wiener, N. (1942). Response <strong>of</strong> a nonlinear device to noise. Cambridge, Massachusetts: M.I.T.<br />
Radiation Laboratory. Report No. V-186. April 6, 1942.<br />
Wiener, N. (1949). Extrapolation, Interpolation, <strong>and</strong> Smoothing <strong>of</strong> Stationary Times Series. With<br />
Engineering Applications. New York: MIT Technology Press <strong>and</strong> Wiley.<br />
Wiener, N. (1958). A further note on differentiability <strong>of</strong> auto-correlation functions. Proceedings<br />
<strong>of</strong> the Institute <strong>of</strong> Radio Engineers, 46, 1760.<br />
Wiener, N. <strong>and</strong> Akutowicz, E. J. (1957). The definition <strong>and</strong> ergodic properties <strong>of</strong> the stochastic adjoint<br />
<strong>of</strong> a unitary trans<strong>for</strong>mation. Rendiconti del Circolo Matematico di Palermo, 6, 205- 217,<br />
349.<br />
Wold, H. (1938). A Study in the Analysis <strong>of</strong> Stationary Times Series. Stockholm: Almquist <strong>and</strong><br />
Wiksell.<br />
Zvonkin, A. K. <strong>and</strong> Levin, L. A. (1970). The complexity <strong>of</strong> finite objects <strong>and</strong> the development <strong>of</strong><br />
the concepts <strong>of</strong> in<strong>for</strong>mation <strong>and</strong> r<strong>and</strong>omness by means <strong>of</strong> the theory <strong>of</strong> algorithms. Russian<br />
Mathematical Surveys, 25, 83–124.