28.06.2014 Views

Basic Elements and Problems of Probability Theory - Society for ...

Basic Elements and Problems of Probability Theory - Society for ...

Basic Elements and Problems of Probability Theory - Society for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Journal <strong>of</strong> Scienti c Exploration, Vol. 13, No. 4, pp. 579–613, 1999 0892-3310/99<br />

© 1999 <strong>Society</strong> <strong>for</strong> Scienti c Exploration<br />

<strong>Basic</strong> <strong>Elements</strong> <strong>and</strong> <strong>Problems</strong> <strong>of</strong> <strong>Probability</strong> <strong>Theory</strong><br />

HANS PRIMAS<br />

Laboratory <strong>of</strong> Physical Chemistry, ETH-Zentrum<br />

CH-8092 Zürich, Switzerl<strong>and</strong><br />

primas@phys.chem.ethz.ch<br />

Abstract — After a brief review <strong>of</strong> ontic <strong>and</strong> epistemic descriptions, <strong>and</strong> <strong>of</strong><br />

subjective, logical <strong>and</strong> statistical interpretations <strong>of</strong> probability, we summarize<br />

the traditional axiomatization <strong>of</strong> calculus <strong>of</strong> probability in terms <strong>of</strong><br />

Boolean algebras <strong>and</strong> its set-theoretical realization in terms <strong>of</strong> Kolmogorov<br />

probability spaces. Since the axioms <strong>of</strong> mathematical probability theory say<br />

nothing about the conceptual meaning <strong>of</strong> “r<strong>and</strong>omness” one considers probability<br />

as property <strong>of</strong> the generating conditions <strong>of</strong> a process so that one can relate<br />

r<strong>and</strong>omness with predictability (or retrodictability). In the measure-theoretical<br />

codification <strong>of</strong> stochastic processes genuine chance processes can be<br />

defined rigorously as so-called regular processes which do not allow a longterm<br />

prediction. We stress that stochastic processes are equivalence classes <strong>of</strong><br />

individual point functions so that they do not refer to individual processes but<br />

only to an ensemble <strong>of</strong> statistically equivalent individual processes.<br />

Less popular but conceptually more important than statistical descriptions<br />

are individual descriptions which refer to individual chaotic processes. First,<br />

we review the individual description based on the generalized harmonic<br />

analysis by Norbert Wiener. It allows the definition <strong>of</strong> individual purely<br />

chaotic processes which can be interpreted as trajectories <strong>of</strong> regular statistical<br />

stochastic processes. Another individual description refers to algorithmic<br />

procedures which connect the intrinsic r<strong>and</strong>omness <strong>of</strong> a finite sequence with<br />

the complexity <strong>of</strong> the shortest program necessary to produce the sequence.<br />

Finally, we ask why there can be laws <strong>of</strong> chance. We argue that r<strong>and</strong>om<br />

events fulfill the laws <strong>of</strong> chance if <strong>and</strong> only if they can be reduced to (possibly<br />

hidden) deterministic events. This mathematical result may elucidate the<br />

fact that not all non-predictable events can be grasped by the methods <strong>of</strong><br />

mathematical probability theory.<br />

Keywords: probability — stochasticity — chaos — r<strong>and</strong>omness —<br />

chance — determinism<br />

Ontic <strong>and</strong> Epistemic Descriptions<br />

Overview<br />

One <strong>of</strong> the most important results <strong>of</strong> contemporary classical dynamics is the<br />

pro<strong>of</strong> that the deterministic differential equations <strong>of</strong> some smooth classical<br />

Hamiltonian systems have solutions exhibiting irregular behavior. The classical<br />

view <strong>of</strong> physical determinism has been eloquently <strong>for</strong>mulated by Pierre<br />

579


580 H. Primas<br />

Simon Laplace. While Newton believed that the stability <strong>of</strong> the solar system<br />

could only be achieved with the help <strong>of</strong> God, Laplace “had no need <strong>of</strong> that<br />

hypothesis” [1] since he could explain the solar system by the deterministic<br />

Newtonian mechanics alone. Laplace discussed his doctrine <strong>of</strong> determinism in<br />

the introduction to his Philosophical Essay on <strong>Probability</strong>, in which he imaged<br />

a superhuman intelligence capable <strong>of</strong> grasping the initial conditions at any<br />

fixed time <strong>of</strong> all bodies <strong>and</strong> atoms <strong>of</strong> the universe, <strong>and</strong> all the <strong>for</strong>ces acting<br />

upon it. For such a superhuman intelligence “nothing would be uncertain <strong>and</strong><br />

the future, as the past, would be present to its eyes.” [2] Laplace’s reference to<br />

the future <strong>and</strong> the past implies that he refers to a fundamental theory with an<br />

unbroken time-reversal symmetry. His reference to a “superhuman intelligence”<br />

suggests that he is not referring to our possible knowledge <strong>of</strong> the<br />

world, but to things “as they really are.” The manifest impossibility to ascertain<br />

experimentally exact initial conditions necessary <strong>for</strong> a description <strong>of</strong><br />

things “as they really are” is what led Laplace to the introduction <strong>of</strong> a statistical<br />

description <strong>of</strong> the initial conditions in terms <strong>of</strong> probability theory. Later<br />

Josiah Willard Gibbs introduced the idea <strong>of</strong> an ensemble <strong>of</strong> a very large number<br />

<strong>of</strong> imaginary copies <strong>of</strong> mutually uncorrelated individual systems, all dynamically<br />

precisely defined but not necessarily starting from precisely the<br />

same individual states. [3] The fact that a statistical description in the sense <strong>of</strong><br />

Gibbs presupposes the existence <strong>of</strong> a well-defined individual description<br />

demonstrates that a coherent statistical interpretation in terms <strong>of</strong> an ensemble<br />

<strong>of</strong> individual systems requires an individual interpretation as a backing.<br />

The empirical inaccessibility <strong>of</strong> the precise initial states <strong>of</strong> most physical<br />

systems requires a distinction between epistemic <strong>and</strong> ontic interpretations. [4]<br />

Epistemic interpretations refer to our knowledge <strong>of</strong> the properties or modes <strong>of</strong><br />

reactions <strong>of</strong> observed systems. On the other h<strong>and</strong>, ontic interpretations refer<br />

to intrinsic properties <strong>of</strong> hypothetical individual entities, regardless <strong>of</strong><br />

whether we know them or not, <strong>and</strong> independently <strong>of</strong> observational arrangements.<br />

Albeit ontic interpretations do not refer to our knowledge, there is a<br />

meaningful sense in which it is natural to speak <strong>of</strong> theoretical entities “as they<br />

really are,” since in good theories they supply the indispensable explanatory<br />

power.<br />

States which refer to an epistemic interpretation are called epistemic states,<br />

<strong>and</strong> they refer to our knowledge. If this knowledge <strong>of</strong> the properties or modes<br />

<strong>of</strong> reactions <strong>of</strong> systems is expressed by probabilities in the sense <strong>of</strong> relative<br />

frequencies in a statistical ensemble <strong>of</strong> independently repeated experiments,<br />

we speak <strong>of</strong> a statistical interpretation <strong>and</strong> <strong>of</strong> statistical states. States which<br />

refer to an ontic interpretation are called ontic states. Ontic states are assumed<br />

to give a description <strong>of</strong> a system “as it really is,” that is, independently <strong>of</strong> any<br />

influences due to observations or measurements. They refer to individual systems<br />

<strong>and</strong> are assumed to give an exhaustive description <strong>of</strong> a system. Since an<br />

ontic description does not encompass any concept <strong>of</strong> observation, ontic states<br />

do not refer to predictions <strong>of</strong> what happens in experiments. At this stage it is


left open to what extent ontic states are knowable. An adopted ontology <strong>of</strong> the<br />

intrinsic description induces an operationally meaningful epistemic interpretation<br />

<strong>for</strong> every epistemic description: an epistemic state refers to our knowledge<br />

<strong>of</strong> an ontic state.<br />

Cryptodeterministic Systems<br />

<strong>Probability</strong> <strong>Theory</strong> 581<br />

In modern mathematical physics Laplacian determinism is rephrased as<br />

Hadamard’s principle <strong>of</strong> scientific determinism according to which every initial<br />

ontic state <strong>of</strong> a physical system determines all future ontic states. [5] An<br />

ontically deterministic dynamical system which even in principle does not<br />

allow a precise <strong>for</strong>ecast <strong>of</strong> its observable behavior in the remote future will<br />

be called cryptodeterministic. [6] Already, Antoine Augustine Cournot<br />

(1801–1877) <strong>and</strong> John Venn (1834–1923) recognized clearly that the dynamics<br />

<strong>of</strong> complex dynamical classical systems may depend in an extremely sensitive<br />

way on the initial <strong>and</strong> boundary conditions. Even if we can determine<br />

these conditions with arbitrary but finite accuracy, the individual outcome<br />

cannot be predicted; the resulting chaotic dynamics allows only an epistemic<br />

description in terms <strong>of</strong> statistical frequencies. [7] The instability <strong>of</strong> such deterministic<br />

processes represents an objective feature <strong>of</strong> the corresponding probabilistic<br />

description. A typical experiment which demonstrates the objective<br />

probabilistic character <strong>of</strong> a cryptodeterministic mechanical system is Galton’s<br />

desk. [8] Modern theory <strong>of</strong> deterministic chaos has shown how unpredictability<br />

can arise from the iteration <strong>of</strong> perfectly well-defined functions because <strong>of</strong> a<br />

sensitive dependence on initial conditions. [9] More precisely, the catchword<br />

“deterministic chaos” refers to ontically deterministic systems with a sensitive<br />

dependence on the ontic initial state such that no measurement on the systems<br />

allows a long-term prediction <strong>of</strong> the ontic state <strong>of</strong> the system.<br />

Predictions refer to inferences <strong>of</strong> the observable future behavior <strong>of</strong> a system<br />

from empirically estimated initial states. While in some simple systems the<br />

ontic laws <strong>of</strong> motion may allow to <strong>for</strong>ecast its observable behavior in the near<br />

future with great accuracy, ontic determinism implies neither epistemic predictability<br />

nor epistemic retrodictability. Laplace knew quite well that a perfect<br />

measurement <strong>of</strong> initial condition is impossible, <strong>and</strong> he never asserted that<br />

deterministic systems are empirically predictable. Nevertheless, many positivists<br />

tried to define determinism by predictability. For example, according to<br />

Herbert Feigl:<br />

The clarified (purified) concept <strong>of</strong> causation is defined in terms <strong>of</strong> predictability according<br />

to a law (or, more adequately, according to a set <strong>of</strong> laws). [10]<br />

Such attempts are based on a notorious category mistake. Determinism does<br />

not deal with predictions. Determinism refers to an ontic description. On the<br />

other h<strong>and</strong>, predictability is an epistemic concept. Yet, epistemic statements


582 H. Primas<br />

are <strong>of</strong>ten confused with ontic assertions. For example, Max Born has claimed<br />

that classical point mechanics is not deterministic since there are unstable mechanical<br />

systems which are epistemically not predictable. [11] Similarly, it has<br />

been claimed that human behavior is not deterministic since it is not predictable.<br />

[12] A related mistaken claim is that “...an underlying deterministic<br />

mechanism would refute a probabilistic theory by contradicting the r<strong>and</strong>omness<br />

which ...is dem<strong>and</strong>ed by such a theory.” [13] As emphasized by John Earman:<br />

The history <strong>of</strong> philosophy is littered with examples where ontology <strong>and</strong> epistemology<br />

have been stirred together into a confused <strong>and</strong> confusing brew. ...Producing an ‘epistemological<br />

sense’ <strong>of</strong> determinism is an abuse <strong>of</strong> language since we already have a perfectly<br />

adequate <strong>and</strong> more accurate term – prediction – <strong>and</strong> it also invites potentially<br />

misleading argumentation – e.g., in such-<strong>and</strong>-such a case prediction is not possible <strong>and</strong>,<br />

there<strong>for</strong>e, determinism fails. [14]<br />

Kinds <strong>of</strong> <strong>Probability</strong><br />

Often, probability theory is considered as the natural tool <strong>for</strong> an epistemic<br />

description <strong>of</strong> cryptodeterministic systems. However, this view is not as evident<br />

as is <strong>of</strong>ten thought. The virtue <strong>and</strong> the vice <strong>of</strong> modern probability theory<br />

are split-up into a probability calculus <strong>and</strong> its conceptual foundation. Nowadays,<br />

mathematical probability theory is just a branch <strong>of</strong> pure mathematics,<br />

based on some axioms devoid <strong>of</strong> any interpretation. In this framework, the<br />

concepts “probability,” “independence,” etc. are conceptually unexplained<br />

notions, they have a purely mathematical meaning. While there is a widespread<br />

agreement concerning the essential features <strong>of</strong> the calculus <strong>of</strong> probability,<br />

there are widely diverging opinions what the referent <strong>of</strong> mathematical<br />

probability theory is. [15] While some authors claim that probability refers<br />

exclusively to ensembles, there are important problems which require a discussion<br />

<strong>of</strong> single r<strong>and</strong>om events or <strong>of</strong> individual chaotic functions. Furthermore,<br />

it is in no way evident that the calculus <strong>of</strong> axiomatic probability theory<br />

is appropriate <strong>for</strong> empirical science. In fact, “probability is one <strong>of</strong> the outst<strong>and</strong>ing<br />

examples <strong>of</strong> the ‘epistemological paradox’ that we can successfully<br />

use our basic concepts without actually underst<strong>and</strong>ing them.” [16]<br />

Surprisingly <strong>of</strong>ten it is assumed that in a scientific context everybody means<br />

intuitively the same when speaking <strong>of</strong> “probability,” <strong>and</strong> that the task <strong>of</strong> an interpretation<br />

only consists in exactly capturing this single intuitive idea. Even<br />

prominent thinkers could not free themselves from predilections which only<br />

can be understood from the historical development. For example, Friedrich<br />

Waismann [17] categorically maintains that there is no other motive <strong>for</strong> the introduction<br />

<strong>of</strong> probabilities than the incompleteness <strong>of</strong> our knowledge. Just as<br />

dogmatically, Richard von Mises [18] holds that, without exceptions, probabilities<br />

are empirical <strong>and</strong> that there is no possibility to reveal the values <strong>of</strong>


<strong>Probability</strong> <strong>Theory</strong> 583<br />

probabilities with the aid <strong>of</strong> another science, e.g. mechanics. On the other<br />

h<strong>and</strong>, Harold Jeffreys maintains that “no ‘objective’ definition <strong>of</strong> probability<br />

in terms <strong>of</strong> actual or possible observations, or possible properties <strong>of</strong> the world,<br />

is admissible.” [19] Leonard J. Savage claims that “personal, or subjective,<br />

probability is the only kind that makes reasonably rigorous sense.” [20] However,<br />

despite many such statements to the contrary, we may state with some<br />

confidence that there is not just one single “correct” interpretation. There are<br />

various valid possibilities to interpret mathematical probability theory. Moreover,<br />

the various interpretations do not fall neatly into disjoint categories. As<br />

Bertr<strong>and</strong> Russell underlines,<br />

in such circumstances, the simplest course is to enumerate the axioms from which the<br />

theory can be deduced, <strong>and</strong> to decide that any concept which satisfies these axioms has<br />

an equal right, from the mathematician’s point <strong>of</strong> view, to be called ‘probability.’ ... It<br />

must be understood that there is here no question <strong>of</strong> truth or falsehood. Any concept<br />

which satisfies the axioms may be taken to be mathematical probability. In fact, it<br />

might be desirable to adopt one interpretation in one context, <strong>and</strong> another in another.<br />

[21]<br />

Subjective <strong>Probability</strong><br />

A probability interpretation is called objective if the probabilities are assumed<br />

to be independent or dissected from any human considerations. Subjective<br />

interpretations consider probability as a rational measure <strong>of</strong> the personal<br />

belief that the event in question occurs. A more opertionalistic view defines<br />

subjective probability as the betting rate on an event which is fair according to<br />

the opinion <strong>of</strong> a given subject. It is required that the assessments a rational person<br />

makes are logically coherent such that no logical contradictions exist<br />

among them. The postulate <strong>of</strong> coherence should make it impossible to set up a<br />

series <strong>of</strong> bets against a person obeying these requirements in such a manner<br />

that the person is sure to lose, regardless <strong>of</strong> the outcome <strong>of</strong> the events being<br />

wagered upon. Subjective probabilities depend on the degree <strong>of</strong> personal<br />

knowledge <strong>and</strong> ignorance concerning the events, objects or conditions under<br />

discussion. If the personal knowledge changes, the subjective probabilities<br />

change too. Often, it is claimed to be evident that subjective probabilities have<br />

no place in a physical theory. However, subjective probability cannot be disposed<br />

<strong>of</strong> quite that simply. It is astonishing, how many scientists uncompromisingly<br />

defend an objective interpretation without knowing any <strong>of</strong> the important<br />

contributions on subjective probability published in the last decades.<br />

Nowadays, there is a very considerable rational basis behind the concept <strong>of</strong><br />

subjective probability. [22]<br />

It is debatable how the pioneers would have interpreted probability, but<br />

their practice suggests that they dealt with some kind <strong>of</strong> “justified degree <strong>of</strong><br />

belief.” For example, in one <strong>of</strong> the first attempts to <strong>for</strong>mulate mathematical


584 H. Primas<br />

“laws <strong>of</strong> chance,” Jakob Bernoulli characterized in 1713 his Ars Conject<strong>and</strong>i<br />

probability as a strength <strong>of</strong> expectation. [23] For Pierre Simon Laplace probabilities<br />

represents a state <strong>of</strong> knowledge, he introduced a priori or geometric<br />

probabilities as the ratio <strong>of</strong> favorable to “equally possible” cases [24] — a definition<br />

<strong>of</strong> historical interest which, however, is both conceptually <strong>and</strong> mathematically<br />

inadequate.<br />

The early subjective interpretations are since long out <strong>of</strong> date, but practicing<br />

statisticians have always recognized that subjective judgments are inevitable.<br />

In 1937, Bruno de Finetti made a fresh start in the theory <strong>of</strong> subjective probability<br />

by introducing the essential new notion <strong>of</strong> exchangeability. [25] de<br />

Finetti’s subjective probability is a betting rate <strong>and</strong> refers to single events. A<br />

set <strong>of</strong> n distinct events E 1 , E 2 , ..., E n are said to be exchangeable if any event depending<br />

on these events has the same subjective probability (de Finetti’s betting<br />

rate) no matter how the E j are chosen or labeled. Exchangeability is sufficient<br />

<strong>for</strong> the validity <strong>of</strong> the law <strong>of</strong> large numbers. The modern concept <strong>of</strong><br />

subjective probability is not necessarily incompatible with that <strong>of</strong> objective<br />

probability. de Finetti’s representation theorem gives convincing explanation<br />

<strong>of</strong> how there can be wide inter-subjective agreement about the values <strong>of</strong> subjective<br />

probabilities. According to Savage, a rational man behaves as if he<br />

used subjective probabilities. [26]<br />

Inductive <strong>Probability</strong><br />

Inductive probability belongs to the field <strong>of</strong> scientific inductive inference.<br />

Induction is the problem <strong>of</strong> how to make inferences from observed to unobserved<br />

(especially future) cases. It is an empirical fact that we can learn from<br />

experience, but the problem is that nothing concerning the future can be logically<br />

inferred from past experience. It is the merit <strong>of</strong> the modern approaches to<br />

have recognized that induction has to be some sort <strong>of</strong> probabilistic inference,<br />

<strong>and</strong> that the induction problem belongs to a generalized logic. Logical probability<br />

is related to, but not identical with subjective probability. Subjective<br />

probability is taken to represent the extent to which a person believes a statement<br />

is true. The logical interpretation <strong>of</strong> probability theory is a generalization<br />

<strong>of</strong> the classical implication, <strong>and</strong> it is not based on empirical facts but on the<br />

logical analysis <strong>of</strong> these. The inductive probability is the degree <strong>of</strong> confirmation<br />

<strong>of</strong> a hypothesis with reference to the available evidence in favor <strong>of</strong> this hypothesis.<br />

The logic <strong>of</strong> probable inference <strong>and</strong> the logical probability concept goes<br />

back to the work <strong>of</strong> John Maynard Keynes who in 1921 defined probability as<br />

a “logical degree <strong>of</strong> belief.” [27] This approach has been extended by Bernard<br />

Osgood Koopman [28] <strong>and</strong> especially by Rudolf Carnap to a comprehensive<br />

system <strong>of</strong> inductive logic. [29] Inductive probabilities occur in science mainly<br />

in connection with judgments <strong>of</strong> empirical results; they are always related to a<br />

single case <strong>and</strong> are never to be interpreted as frequencies. The inductive probability<br />

is also called “non-demonstrative inference,” “intuitive probability”


(Koopman), “logical probability” or “probability 1 ” (Carnap). A hard nut to<br />

crack in probabilistic logic is the proper choice <strong>of</strong> a probability measure — it<br />

cannot be estimated empirically. Given a certain measure inductive logic<br />

works with a fixed set <strong>of</strong> rules so that all inferences can be effected automatically<br />

by a general computer. In this sense inductive probabilities are objective<br />

quantities. [30]<br />

Statistical <strong>Probability</strong><br />

<strong>Probability</strong> <strong>Theory</strong> 585<br />

Historically statistical probabilities have been interpreted as limits <strong>of</strong> frequencies,<br />

that is, as empirical properties <strong>of</strong> the system (or process) considered.<br />

But statistical probabilities cannot be assigned to a single event. This is an old<br />

problem <strong>of</strong> the frequency interpretation <strong>of</strong> which already John Venn was<br />

aware. In 1866 Venn tried to define a probability explicitly in terms <strong>of</strong> relative<br />

frequencies <strong>of</strong> occurrence <strong>of</strong> events “in the long run.” He added that “the run<br />

must be supposed to be very long, in fact never to stop.” [31] Against this simpleminded<br />

frequency interpretation there is a grave objection: any empirical<br />

evidence concerning relative frequencies is necessarily restricted to a finite set<br />

<strong>of</strong> events. Yet, without additional assumptions nothing can be inferred about<br />

the value <strong>of</strong> the limiting frequency <strong>of</strong> a finite segment, no matter how long it<br />

may be. There<strong>for</strong>e, the statistical interpretation <strong>of</strong> the calculus <strong>of</strong> probability<br />

has to be supplemented by a decision technique that allows to decide which<br />

probability statements we should accept. Satisfactory acceptance rules are notoriously<br />

difficult to <strong>for</strong>mulate.<br />

The simplest technique is the old maxim <strong>of</strong> Antoine Augustine Cournot: if<br />

the probability <strong>of</strong> an event is sufficiently small, one should act in a way as if<br />

this event will not occur at a solitary realization. [32] However, the theory<br />

gives no criterion <strong>for</strong> deciding what is “sufficiently small.” A more elegant<br />

(but essentially equivalent) way out is the proposal by Carl Friedrich von<br />

Weizsäcker to consider probability as a prediction <strong>of</strong> a relative frequency, so<br />

that “the probability is only the expectation value <strong>of</strong> the relative frequency.”<br />

[33] That is, we need in addition a judgment about a statement. This idea is in<br />

accordance with Carnap’s view that two meanings <strong>of</strong> probability must be recognized:<br />

the inductive probability (his “probability 1<br />

”), <strong>and</strong> statistical probability<br />

(his “probability 2<br />

”). [34] The logical probability is supposed to express a<br />

logical relation between a given evidence <strong>and</strong> a hypothesis. They “speak about<br />

statements <strong>of</strong> science; there<strong>for</strong>e, they do not belong to science proper but to<br />

the logic or methodology <strong>of</strong> science, <strong>for</strong>mulated in the meta-language.” On the<br />

other h<strong>and</strong>, “the statements on statistical probability, both singular <strong>and</strong> general<br />

statements, e.g., probability laws in physics or in economics, are synthetic<br />

<strong>and</strong> serve <strong>for</strong> the description <strong>of</strong> general features <strong>of</strong> facts. There<strong>for</strong>e, these<br />

statements occur within science, <strong>for</strong> example, in the language <strong>of</strong> physics<br />

(taken as object language).” [35] That is, according to this view, inductive<br />

logic with its logical probabilities is a necessary completion <strong>of</strong> statistical probabilities:<br />

without inductive logic we cannot infer statistical probabilities from


586 H. Primas<br />

observed frequencies. The supplementation <strong>of</strong> the frequency interpretation by<br />

a subjective factor cannot be avoided by introduction <strong>of</strong> a new topology. For<br />

example, if one introduces the topology associated with the field <strong>of</strong> p-adic<br />

numbers [36], one has to select subjectively a finite prime number p. As emphasized<br />

by Wolfgang Pauli, no frequency interpretation can avoid a subjective<br />

factor:<br />

An irgend einer Stelle [muss] eine Regel für die praktische Verhaltungsweise des Menschen<br />

oder spezieller des Natur<strong>for</strong>schers hinzugenommen werden, die auch dem subjektiven<br />

Faktor Rechnung trägt, nämlich: auch die einmalige Realisierung eines sehr<br />

unwahrscheinlichen Ereignisses wird von einem gewissen Punkt an als praktisch unmöglich<br />

angesehen.... An dieser Stelle stösst man schliesslich auf die prinzipielle Grenze<br />

der Durchführbarkeit des ursprünglichen Programmes der rationalen Objektivierung<br />

der einmaligen subjektiven Erwartung. [37]<br />

English translation (taken from Enz <strong>and</strong> von Meyenn (1994), p.45): "[It is] is necessary<br />

somewhere or other to include a rule <strong>for</strong> the attitude in practice <strong>of</strong> the human observer,<br />

or in particular the scientist, which takes account <strong>of</strong> the subjective factor as well, namely<br />

that the realisation, even on a single occasion, <strong>of</strong> a very unlikely event is regarded<br />

from a certain point on as impossible in practice. ... At this point one finally reaches the<br />

limits which are set in principle to the possibility <strong>of</strong> carrying out the original programme<br />

<strong>of</strong> the rational objectivation <strong>of</strong> the unique subjective expectation."<br />

Later Richard von Mises [38] tried to overcome this difficulty by introducing<br />

the notion <strong>of</strong> “irregular collectives,” consisting <strong>of</strong> one infinite sequence in<br />

which the limit <strong>of</strong> the relative frequency <strong>of</strong> each possible outcome exists <strong>and</strong> is<br />

indifferent to a place selection. In this approach the value <strong>of</strong> this limit is called<br />

the probability <strong>of</strong> this outcome. The essential underlying idea was the “impossibility<br />

<strong>of</strong> a successful gambling system.” While at first sight Mises’ arguments<br />

seemed to be reasonable, he could not achieve a convincing success.<br />

[39] However, Mises’ approach provided the crucial idea <strong>for</strong> the fruitful computational-complexity<br />

approach to r<strong>and</strong>om sequences, discussed in more detail<br />

below.<br />

Mathematical <strong>Probability</strong><br />

Mathematical <strong>Probability</strong> as a Measure on a Boolean Algebra<br />

In the mathematical codification <strong>of</strong> probability theory a chance event is defined<br />

only implicitly by axiomatically characterized relations between events.<br />

These relations have a logical character so that one can assign to every event a<br />

proposition stating its occurrence. All codifications <strong>of</strong> classical mathematical<br />

probability theory are based on Boolean classifications or Boolean logic. That<br />

is, the algebraic structure <strong>of</strong> events is assumed to be a Boolean algebra, called<br />

the algebra <strong>of</strong> events. In 1854, George Boole introduced these algebras in<br />

order


to investigate the fundamental laws <strong>of</strong> those operations <strong>of</strong> the mind by which reasoning<br />

is per<strong>for</strong>med; to give expression to them in the symbolic language <strong>of</strong> a Calculus, <strong>and</strong><br />

upon this foundation to establish the science <strong>of</strong> Logic <strong>and</strong> to construct its method; to<br />

make that method itself the basis <strong>of</strong> a general method <strong>for</strong> the application <strong>of</strong> the mathematical<br />

doctrine <strong>of</strong> Probabilities... [40]<br />

s ®<br />

s<br />

2<br />

Mathematical probability is anything that satisfies the axioms <strong>of</strong> mathematical<br />

probability theory. As we will explain in the following in some more detail,<br />

mathematical probability theory is the study <strong>of</strong> a pair (B, p), where the algebra<br />

<strong>of</strong> events is a -complete Boolean algebra B, <strong>and</strong> the map p:B [0,1]<br />

is a -additive probability measure. [41] [42]<br />

An algebra <strong>of</strong> events is a Boolean algebra (B,H,E,^ ). If an element A B<br />

is an event, then A ^<br />

<strong>Probability</strong> <strong>Theory</strong> 587<br />

is the event that A does not take place. The element A EB<br />

is the event which occurs when at least one <strong>of</strong> the events A <strong>and</strong> B occurs, while<br />

A HB is the event when both events A <strong>and</strong> B occur. The unit element 1 represents<br />

the sure event while the zero element 0 represents the impossible element.<br />

If A <strong>and</strong> B are any two elements <strong>of</strong> the Boolean algebra B which satisfies<br />

the relationA EB = B (or the equivalent relation A HB = A ) we say that “A<br />

is smaller than B” or that “A implies B” <strong>and</strong> write A £ B .<br />

<strong>Probability</strong> is defined as a norm p:B® [0,1] on a Boolean algebra B <strong>of</strong><br />

events. That is, to every event A 2 B there is associated a probability p (A) <strong>for</strong><br />

the occurrence <strong>of</strong> the event A. The following properties are required <strong>for</strong> p (A):<br />

p is strictly positive, i.e. p(A) 0 <strong>for</strong> every A 2 B <strong>and</strong> p(A) = 0 if <strong>and</strong><br />

only if A = 0, where 0 is the zero <strong>of</strong> B,<br />

p is normed, i.e. p (1) = 1, where 1 is the unit <strong>of</strong> B,<br />

p is additive, i.e. p(A EB) = p(A) + p(B) if A <strong>and</strong> B are disjoint, that<br />

is if A HB = 0 .<br />

It follows that 0 £ p(A) £ 1 <strong>for</strong> everyA 2 B, <strong>and</strong> A £ B ) p(A) £ p(B ) .<br />

In contrast to a Kolmogorov probability measure, the measure p is strictly<br />

positive. That is, p (B) = 0 implies that B is the unique smallest element <strong>of</strong> the<br />

Boolean algebra B <strong>of</strong> events.<br />

In probability theory it is necessary to consider also countably infinitely<br />

many events so that one needs in addition some continuity requirements. By a<br />

Boolean s -algebra one underst<strong>and</strong>s a Boolean algebra where the addition <strong>and</strong><br />

multiplication operations are per<strong>for</strong>mable on each countable sequence <strong>of</strong><br />

events. That is, in a Boolean s -algebra B there is <strong>for</strong> every infinite sequence<br />

A 1, A 2, A 3,... <strong>of</strong> elements <strong>of</strong> B a smallest element A 1 E A 2 E A 3 × × × 2 B.<br />

The continuity required <strong>for</strong> the probability p is then the so-called s - additivity:<br />

¥<br />

k= 1 p(A k )<br />

a measure p on a s -algebra is s -additive if p{E ¥ k= 1A k } = å<br />

whenever {A k } is a sequence <strong>of</strong> pairwise disjoint events, A j HA k = 0<br />

<strong>for</strong> all j ¹= k .


588 H. Primas<br />

Since not every Boolean algebra is a s -algebra, the property <strong>of</strong> countable additivity<br />

is an essential restriction.<br />

Set-Theoretical <strong>Probability</strong> <strong>Theory</strong><br />

It is almost universally accepted that mathematical probability theory consists<br />

<strong>of</strong> the study <strong>of</strong> Boolean s -algebras. For reasons <strong>of</strong> mathematical convenience,<br />

one usually represents the Boolean algebra <strong>of</strong> events by a Boolean algebra<br />

<strong>of</strong> subsets <strong>of</strong> some set. Using this representation one can go back to a<br />

well-established integration theory, to the theory <strong>of</strong> product measures, <strong>and</strong> to<br />

the Radon–Nikod ¢ym theorem <strong>for</strong> the definition <strong>of</strong> conditional probabilities.<br />

According to a fundamental representation theorem by Marshall Harvey Stone<br />

every ordinary Boolean algebra with no further condition is isomorphic to the<br />

algebra (P(W ),\,[ ,¢ ) <strong>of</strong> all subsets <strong>of</strong> some point set . [43] Here B corresponds<br />

to the power set P(W ) <strong>of</strong> all subsets <strong>of</strong> the set W , the conjunction H<br />

corresponds to the set-theoretical intersection \ , the disjunction E corresponds<br />

to the set-theoretical union [ , <strong>and</strong> the negation ^ corresponds to the<br />

set-theoretical complementation ¢ . The multiplicative neutral element 1 corresponds<br />

to the set W , while the additive neutral element corresponds to the<br />

empty set ; . However, a s -complete Boolean algebra is in general not s -isomorphic<br />

to a s -complete Boolean algebra <strong>of</strong> point sets. Yet, every s -complete<br />

Boolean algebra is s -isomorphic to a s -complete Boolean algebra <strong>of</strong> point<br />

sets modulo a s -ideal in that algebra. [44]<br />

Conceptually, this result is the starting point <strong>for</strong> the axiomatic foundation by<br />

Andrei Nikolaevich Kolmogorov <strong>of</strong> 1933 which reduces mathematical probability<br />

theory to classical measure theory. [45] It is based on a so-called probability<br />

space (W ,S , m) consisting <strong>of</strong> a non-empty set W (called sample space)<br />

<strong>of</strong> points, a class S <strong>of</strong> subsets <strong>of</strong> W which is a s -algebra (i.e. is closed with respect<br />

to the set-theoretical operations executed a countable number <strong>of</strong> times),<br />

<strong>and</strong> a probability measure m on S . Sets that belong to S are called S -measurable<br />

(or just measurable if S is understood). The pair (W ,S ) is called a measurable<br />

space. A probability measure m on (W ,S ) is a function m : S ® [0,1]<br />

satisfying m (; ) = 0, m (W ) = 1 , <strong>and</strong> the condition <strong>of</strong> countable additivity<br />

(that is, m {[<br />

¥ ¥<br />

n= 1 B n<br />

} =å n= 1 m (B n) whenever {B n<br />

} is a sequence <strong>of</strong> members<br />

<strong>of</strong> S which are pairwise disjoint subsets in W ). The points <strong>of</strong> W are called elementary<br />

events. The subsets <strong>of</strong> W belonging to S are referred to as events. The<br />

non-negative number m (B) is called the probability <strong>of</strong> the event B2 S .<br />

In most applications the sample space W contains an uncountable number<br />

<strong>of</strong> points. In this case, there exist non-empty Borel sets in S <strong>of</strong> measure zero,<br />

so that there is no strictly positive s -additive measure on S . But it is possible<br />

to eliminate the sets <strong>of</strong> measure zero by using the s -complete Boolean algebra<br />

/D , where D is the s -ideal <strong>of</strong> Borel sets <strong>of</strong> m -measure zero. With this,<br />

B = S<br />

every Kolmogorov probability space (W ,S ,m) generates probability algebra<br />

with the s -complete Boolean algebra B = S /D <strong>and</strong> the restriction <strong>of</strong> m to B<br />

is a strictly positive measure p. Conversely, every probability algebra (B,p)


<strong>Probability</strong> <strong>Theory</strong> 589<br />

can be realized by some Kolmogorov probability space (W ,S ,m) with<br />

B~S /D , where D is the s -ideal <strong>of</strong> Borel sets <strong>of</strong> m -measure zero.<br />

One usually <strong>for</strong>mulates the set-theoretical version <strong>of</strong> probability theory directly<br />

in terms <strong>of</strong> the conceptually less transparent triple (W ,S , m) , <strong>and</strong> not in<br />

terms <strong>of</strong> the probabilistically relevant Boolean algebra B = S /D . Since there<br />

exist non-empty Borel sets in S (i.e. events different from the impossible<br />

event) <strong>of</strong> measure zero, one has to use the “almost everywhere” terminology.<br />

A statement is said to be true “almost everywhere” or “<strong>for</strong> almost all v ” if it is<br />

true <strong>for</strong> all v 2 W except, may be, in a set N 2 S <strong>of</strong> measure zero, m (N) = 0 .<br />

If the sample space W<br />

contains an uncountable number <strong>of</strong> points, elementary<br />

events do not exist in the operationally relevant version in terms <strong>of</strong> the atomfree<br />

Boolean algebra S /D . Johann von Neumann has argued convincingly<br />

that the finest events which are empirically accessible are given by Borel sets<br />

<strong>of</strong> non-vanishing Lebesgue measure, <strong>and</strong> not by the much larger class <strong>of</strong> all<br />

subsets <strong>of</strong> W . [46]<br />

This setting is almost universally accepted, either explicitly or implicitly.<br />

However, some paradoxical situations do arise unless further restrictions are<br />

placed on the triple (W ,S , m) . The requirement that a probability measure has<br />

to be a perfect measure avoids many difficulties. [47] Furthermore, in all physical<br />

applications there are natural additional regularity conditions. In most examples<br />

the sample space W is polish (i.e. separable <strong>and</strong> metrisable), the s -algebra<br />

S is taken as the s -algebra <strong>of</strong> Borel sets [48] <strong>and</strong> m is a regular Radon<br />

measure. [49] Moreover, there are some practically important problems which<br />

require the use <strong>of</strong> unbounded measures, a feature which does not fit into Kolmogorov’s<br />

theory. A modification, based on conditional probability spaces<br />

(which contains Kolmogorov’s theory as a special case), has been developed<br />

by Alfréd Rényi. [50]<br />

R<strong>and</strong>om Variables in the Sense <strong>of</strong> Kolmogorov<br />

In probability theory observable quantities <strong>of</strong> a statistical experiment are<br />

called statistical observables. In Kolmogorov’s mathematical probability theory<br />

statistical observables are represented by S -measurable functions on the<br />

sample space W . The more precise <strong>for</strong>mulation goes as follows. The Borel s -<br />

algebra S R <strong>of</strong> subsets <strong>of</strong> the set R <strong>of</strong> real numbers is the s - algebra generated<br />

s S ® S<br />

x W ®<br />

by the open subsets <strong>of</strong> R . In Kolmogorov’s set-theoretical <strong>for</strong>mulation, a statistical<br />

observable is a -homomorphism j : R /D . In this <strong>for</strong>mulation,<br />

every observable can be induced by a real-valued Borel function x : R<br />

via the inverse map. [51]<br />

j (R) := x - 1 (R) := {v 2 W ½ x(v) 2 R} , R 2 S R .<br />

In mathematical probability theory a real-valued Borel function x defined<br />

on W is said to be a real-valued r<strong>and</strong>om variable. [52] Every statistical


W<br />

590 H. Primas<br />

observable is induced by a r<strong>and</strong>om variable, but an observable (that is, a s -homomorphism)<br />

defines only an equivalence class <strong>of</strong> r<strong>and</strong>om variables which induce<br />

this homomorphism. Two r<strong>and</strong>om variables x <strong>and</strong> y are said to be equivalent<br />

if they are equal m -almost everywhere, [53]<br />

x(v) ~ y(v) Û m {v 2 W ½ x(v) ¹= y(v)} = 0 .<br />

That is, <strong>for</strong> a statistical description it is not necessary to know the point function<br />

v ½® x(v), it is sufficient to know the observable x , or in other words, the<br />

equivalence class [x(w )] <strong>of</strong> the point functions, which induce the corresponding<br />

s -homomorphism,<br />

j Û [x(v)] := {y(v)½ y(v) ~ x(v)} .<br />

The description <strong>of</strong> a physical system in terms <strong>of</strong> an individual function<br />

v ½® f (v) distinguishes between different points v 2 W <strong>and</strong> corresponds to<br />

an individual description (maybe in terms <strong>of</strong> hidden variables). In contrast, a<br />

description in terms <strong>of</strong> equivalence classes <strong>of</strong> r<strong>and</strong>om variables does not distinguish<br />

between different points <strong>and</strong> corresponds to a statistical ensemble description.<br />

If v ½® x(v) is a r<strong>and</strong>om variable on W , <strong>and</strong> if v ½® x(v) is integrable over<br />

with respect to m , we say that the expectation <strong>of</strong> x with respect to m exists,<br />

<strong>and</strong> we write<br />

e (x) := * V x(v)m (dv) ,<br />

<strong>and</strong> call e (x) the expectation value <strong>of</strong> x. Every Borel-measurable complexvalued<br />

function v ½® f (v) <strong>of</strong> a r<strong>and</strong>om variable v ½® x(v) on (W ,S ,m) is<br />

also a complex-valued r<strong>and</strong>om variable on (W ,S ,m) . If the expectation <strong>of</strong> the<br />

r<strong>and</strong>om variable v ½® f {x(v)} exists, then<br />

A real-valued r<strong>and</strong>om variable v ½®<br />

e(f) = * V f {x(v)}m (dv) .<br />

x(v) on a probability space (W ,S , m) induces<br />

a probability measure m x : S R ® [0, 1] on the state space (S R,R) by<br />

m x (R) := m {x - 1 (R)} = m {v 2 W ½ x(v) 2 R}, R 2 S R,<br />

so that<br />

e(f) := * R f (x)m x (dx) .<br />

Stochastic Processes<br />

The success <strong>of</strong> Kolmogorov’s axiomatization is largely due to the fact that it


does not busy itself with chance. [54] <strong>Probability</strong> has become a branch <strong>of</strong> pure<br />

mathematics. Mathematical probability theory is supposed to provide a model<br />

<strong>for</strong> situations involving r<strong>and</strong>om phenomena, but we are never told what exactly<br />

“r<strong>and</strong>om” conceptually means besides the fact that r<strong>and</strong>om events cannot<br />

be predicted exactly. Even if we have only a rough idea <strong>of</strong> what we mean by<br />

“r<strong>and</strong>om,” it is plain that Kolmogorov’s axiomatization does not give sufficient<br />

conditions <strong>for</strong> characterizing r<strong>and</strong>om events. However, if we adopt the<br />

view proposed by Friedrich Waismann [55] <strong>and</strong> consider probability not as a<br />

property <strong>of</strong> a given sequence <strong>of</strong> events but as a property <strong>of</strong> the generating conditions<br />

<strong>of</strong> a sequence then we can relate r<strong>and</strong>omness with predictability <strong>and</strong><br />

retrodictability.<br />

A family {j(t)½ t 2 R} <strong>of</strong> statistical observables indexed by a time parameter<br />

t is called a stochastic process. In the framework <strong>of</strong> Kolmogorov’s<br />

probability theory a stochastic process is represented by a family<br />

{[x(t½ v)]t 2 R} <strong>of</strong> equivalence classes [x(t½ v)] <strong>of</strong> r<strong>and</strong>om variables [x(t½ v)]<br />

on a common probability space (W ,S , m) ,<br />

<strong>Probability</strong> <strong>Theory</strong> 591<br />

[x(t½ v)] := {y(t½ v)½ y(t½ v) ~ x(t½ v)} .<br />

Two individual point functions (t,v)½® x(t½ v) <strong>and</strong> (t,v)½® y(t½ v) on a<br />

common probability space (W ,S , m) are said to be statistically equivalent (in<br />

the narrow sense), if <strong>and</strong> only if<br />

m {v 2 W ½ x(t ½ v)} = 0 <strong>for</strong> all t 2 R .<br />

Some authors find it convenient to use the same symbol <strong>for</strong> functions <strong>and</strong><br />

equivalent classes <strong>of</strong> functions. We avoid this identification, since it muddles<br />

individual <strong>and</strong> statistical descriptions. A stochastic process is not an individual<br />

function but an indexed family <strong>of</strong> s -homomorphism j (t) : S R ® S /D which<br />

can be represented by an indexed family <strong>of</strong> equivalence classes <strong>of</strong> r<strong>and</strong>om<br />

variables. For fixed t 2 R the function v ½® x(t½ v) is a r<strong>and</strong>om variable. The<br />

point function t ½® [ x(t½ v)] obtained by fixing w is called a realization, or a<br />

sample path, or a trajectory <strong>of</strong> the stochastic process t ½® x(t½ v) . The description<br />

<strong>of</strong> a physical system in terms <strong>of</strong> an individual trajectory t ½® x(t½ v) (w<br />

fixed) <strong>of</strong> a stochastic process {[x(t½ v)]½ t 2 R} corresponds to a point dynamics,<br />

while a description in terms <strong>of</strong> equivalence classes <strong>of</strong> trajectories <strong>and</strong><br />

an associated probability measure corresponds to an ensemble dynamics.<br />

Kolmogorov’s characterization <strong>of</strong> stochastic processes as collections <strong>of</strong><br />

equivalence classes <strong>of</strong> r<strong>and</strong>om variables is much too general <strong>for</strong> science. Some<br />

additional regularity requirements like separability or continuity are necessary<br />

in order that the process has “nice trajectories” <strong>and</strong> does not disintegrate into<br />

an uncountable number <strong>of</strong> events. We will only discuss stochastic processes


592 H. Primas<br />

with some regularity properties, so that we can ignore the mathematical existence<br />

<strong>of</strong> inseparable versions.<br />

Furthermore, the traditional terminology is somewhat misleading since according<br />

to Kolmogorov’s definition precisely predictable processes also are<br />

stochastic processes. However, the theory <strong>of</strong> stochastic processes provides a<br />

conceptually sound <strong>and</strong> mathematically workable distinction between the socalled<br />

singular processes that allow a perfect prediction <strong>of</strong> any future value<br />

from a knowledge <strong>of</strong> the past values <strong>of</strong> the process, <strong>and</strong> the so-called regular<br />

processes <strong>for</strong> which long-term predictions are impossible. [56] For simplicity,<br />

we discuss here only the important special case <strong>of</strong> stationary processes.<br />

A stochastic process is called strictly stationary if all its joint distribution<br />

functions are invariant under time translation, so that they depend only on<br />

time differences. For many applications this is too strict a definition, <strong>of</strong>ten it is<br />

enough to require that the mean <strong>and</strong> the covariance are time-translation invariant.<br />

A stochastic process {[x(t½ v)]t 2 R} is said to be weakly stationary (or:<br />

stationary in the wide sense) if<br />

e {x(t½ v) 2 } < ¥ <strong>for</strong> every t 2 R ,<br />

e {x(t + ¿½ × )} =e{x(t½ × )} <strong>for</strong> all t,¿ 2 R ,<br />

e {x(t + ¿½ × )x(t¢ + ¿½ × )} = e {x(t½ × )x(t¢ ½ × )} <strong>for</strong> all t,t¢ ,¿ 2 R .<br />

Since the covariance function <strong>of</strong> a weakly stationary stochastic process is positive<br />

definite, Bochner’s theorem [57] implies Khintchin’s spectral decomposition<br />

<strong>of</strong> the covariance: [58] A complex-valued function R : R ® C which<br />

is continuous at the origin is the covariance function <strong>of</strong> a complex-valued second-order,<br />

weakly stationary <strong>and</strong> continuous (in the quadratic mean) stochastic<br />

process if <strong>and</strong> only if it can be represented in the <strong>for</strong>m<br />

R (t) =* ¥<br />

- ¥<br />

e i¸t d ˆR (¸) ,<br />

where R : R ® R is a real, never decreasing <strong>and</strong> bounded function, called the<br />

spectral distribution function <strong>of</strong> the stochastic process.<br />

Lebesgue’s decomposition theorem says that every distribution function<br />

R : R ® R can be decomposed uniquely according to<br />

ˆ<br />

R = c d ˆR d + c s ˆR s + c a c ˆR a c , c d 0, c s 0, c a c 0, c d + c s + c a c = 1,<br />

where ˆR d , ˆR s <strong>and</strong> ˆR ac are normalized spectral distribution functions. The<br />

function ˆR d is a step function. Both functions ˆR s <strong>and</strong> ˆR ac are continuous, ˆR s<br />

is singular <strong>and</strong> ˆR ac is absolutely continuous. The absolute continuous part has<br />

a derivative almost everywhere, <strong>and</strong> it is called the spectral density function<br />

¸½® d ˆR a c (¸) /d¸ . The Lebesgue decomposition <strong>of</strong> spectral distribution <strong>of</strong> a<br />

covariance function t ½® R (t) induces an additive decomposition <strong>of</strong> the covariance<br />

function into a discrete distribution function t ½® R d (t), a singular


distribution function t ½® R s (t) , <strong>and</strong> an absolutely continuous distribution<br />

function t ½® R a c (t). The discrete part ˆR d is almost periodic in the sense <strong>of</strong><br />

Harald Bohr, so that its asymptotic behavior is characterized by lim<br />

sup | t| ® ¥ ½ R d (t)½ = 1 . For the singular part the limit lim sup | t| ® ¥ ½ R s (t)½<br />

may be any number between 0 <strong>and</strong> 1. The Riemann–Lebesgue lemma implies<br />

that <strong>for</strong> the absolutely continuous part R ac , we have lim |t| ® ¥ ½ R a c (t)½ = 0 .<br />

A strictly stationary stochastic process {[x(t ½ v)]½ t 2 R} is called singular<br />

if a knowledge <strong>of</strong> its past allows an error-free prediction. A stochastic<br />

process is called regular if it is not singular <strong>and</strong> if the conditional expectation<br />

is the best <strong>for</strong>ecast. The remote past <strong>of</strong> a singular process contains already all<br />

in<strong>for</strong>mation necessary <strong>for</strong> the exact prediction <strong>of</strong> its future behavior, while a<br />

regular process contains no components that can be predicted exactly from an<br />

arbitrary long past record. The optimal prediction <strong>of</strong> a stochastic process is in<br />

general non-linear. [59] Up to now, there is no general workable algorithm <strong>for</strong><br />

non-linear prediction. [60] Most results refer to linear prediction <strong>of</strong> weakly stationary<br />

second-order processes. The famous Wold decomposition says that<br />

every weakly stationary stochastic process is the sum <strong>of</strong> a uniquely determined<br />

linearly singular <strong>and</strong> a uniquely determined linearly regular process. [61] A<br />

weakly stationary stochastic process {[x(t½ v)]½ t 2 R} is called linearly singular<br />

if the optimal linear predictor in terms <strong>of</strong> the past {[x(t½ v)]½ t < 0} allows<br />

an error-free prediction. If a weakly stationary stochastic process does not contain<br />

a linearly singular part, it is called linearly regular.<br />

There is an important analytic criterion <strong>for</strong> the dichotomy between linearly<br />

singular <strong>and</strong> linearly regular processes, the so-called Wiener–Krein criterion<br />

[62]: A weakly stationary stochastic process {[x(t½ v)]½ t 2 R} with mean<br />

value e {x(t½ × )} = 0 <strong>and</strong> the spectral distribution function ¸½® ˆR (¸) is linearly<br />

regular if <strong>and</strong> only if its spectral distribution function is absolutely continuous<br />

<strong>and</strong> if<br />

¥<br />

*<br />

- ¥<br />

<strong>Probability</strong> <strong>Theory</strong> 593<br />

ln { d ˆR (¸) /d¸}<br />

1 + ¸2 d¸ > - ¥ .<br />

Note that <strong>for</strong> a linearly regular process the spectral distribution function<br />

¸½® ˆR (¸) is necessarily absolutely continuous so that the covariance function<br />

t ½® R (t) vanishes <strong>for</strong> t ® ¥ . However, there are exactly predictable<br />

stochastic processes with an asymptotically vanishing covariance function, so<br />

that an asymptotically vanishing covariance function is not sufficient <strong>for</strong> a regular<br />

behavior.<br />

There is a close relationship between regular stochastic processes <strong>and</strong> the irreversibility<br />

<strong>of</strong> physical systems. [63] A characterization <strong>of</strong> genuine irreversibility<br />

<strong>of</strong> classical linear input–output system can be based on the entropyfree<br />

non-equilibrium thermodynamics with the notion <strong>of</strong> lost energy as central<br />

concept. [64] Such a system is called irreversible if the lost energy is strictly<br />

positive. According to a theorem by König <strong>and</strong> Tobergte [65] a linear


594 H. Primas<br />

input–output system behaves irreversible if <strong>and</strong> only if the associated distribution<br />

function fulfills the Wiener–Krein criterion <strong>for</strong> the spectral density <strong>of</strong> a<br />

linearly regular stochastic process.<br />

Birkh<strong>of</strong>f’s Individual Ergodic Theorem<br />

A stochastic process on the probability space (W ,S ,m) is called ergodic if<br />

its associated measure-preserving trans<strong>for</strong>mation ¿ t is ergodic <strong>for</strong> every t 0<br />

(that is, if every s -algebra <strong>of</strong> sets in S , invariant under the measure-preserving<br />

semi-flow associated with the process, is trivial). According to a theorem by<br />

Wiener <strong>and</strong> Akutowicz [66] a strictly stationary stochastic process with an absolutely<br />

continuous spectral distribution function is weakly mixing, <strong>and</strong> hence<br />

ergodic. There<strong>for</strong>e, every regular process is ergodic so that the so-called ergodic<br />

theorems apply. Ergodic theorems provide conditions <strong>for</strong> the equality <strong>of</strong><br />

time averages <strong>and</strong> ensemble averages. Of crucial importance <strong>for</strong> the interpretation<br />

<strong>of</strong> probability theory is the individual (or pointwise) ergodic theorem by<br />

George David Birkh<strong>of</strong>f. [67] The discrete version <strong>of</strong> the pointwise ergodic theorem<br />

is a generalization <strong>of</strong> the strong law <strong>of</strong> large numbers. In terms <strong>of</strong> harmonic<br />

analysis <strong>of</strong> stationary stochastic processes, this theorem can be <strong>for</strong>mulated<br />

as follows. [68] Consider a strictly stationary zero-mean stochastic<br />

process {[x(t½ v)]t 2 R} over the probability space (W ,S , m) , <strong>and</strong> let<br />

v ½® x(t½ v) be quadratically integrable with respect to the measure m . Then<br />

<strong>for</strong> m -almost all w in W , every trajectory t ½® x(t½ v) the individual auto-correlation<br />

function t ½® C ( t½ v) ,<br />

C (t ½<br />

v) := lim<br />

T ® ¥<br />

1<br />

2T<br />

*<br />

+ T<br />

- T<br />

x(¿½ v) * x(t + ¿½ v)d¿, t 2 R, v xed,<br />

exists <strong>and</strong> is continuous on R . Moreover, the auto-correlation function<br />

t ½® C ( t½ v) equals <strong>for</strong> m -almost all w 2 W the covariance function t ½® R (t) ,<br />

C (t½ v) = R (t) <strong>for</strong> m - almost all v 2 W ,<br />

R (t) :=*<br />

V<br />

x(t½ v) x(0½ v) m (dv) .<br />

The importance <strong>of</strong> this relation lies in the fact that in most applications we see<br />

only a single individual trajectory, that is, a particular realization <strong>of</strong> the stochastic<br />

process. Since Kolmogorov’s theory <strong>of</strong> stochastic processes refer to<br />

equivalence classes <strong>of</strong> functions Birkh<strong>of</strong>f’s individual ergodic theorem provides<br />

a crucial link between the ensemble description <strong>and</strong> the individual description<br />

<strong>of</strong> chaotic phenomena. In the next chapter we will sketch two different<br />

direct approaches <strong>for</strong> the description <strong>of</strong> chaotic phenomena which<br />

avoidthe use <strong>of</strong> ensembles.


Individual Descriptions <strong>of</strong> Chaotic Processes<br />

Deterministic Chaotic Processes in the Sense <strong>of</strong> Wiener<br />

More than a decade be<strong>for</strong>e Kolmogorov’s axiomatization <strong>of</strong> mathematical<br />

probability theory, Norbert Wiener invented a possibly deeper paradigm <strong>for</strong><br />

chaotic phenomena: his mathematically rigorous analytic construction <strong>of</strong> an<br />

individual trajectory <strong>of</strong> Einstein’s idealized Brownian motion [69] nowadays<br />

called a Wiener process. [70] In Wiener’s mathematical model chaotic changes<br />

in the direction <strong>of</strong> the Brownian path take place constantly. All trajectories <strong>of</strong> a<br />

Wiener process are almost certainly continuous but nowhere differentiable,<br />

just as conjectured by Jean Baptiste Perrin <strong>for</strong> the Brownian motion. [71]<br />

Wiener’s constructions <strong>and</strong> pro<strong>of</strong> are much closer to physics than Kolmogorov’s<br />

abstract model, but also very intricate so that <strong>for</strong> a long time Kolmogorov’s<br />

approach has been favored. Nowadays, Wiener’s result can be derived<br />

in a much simpler way. The generalized derivative <strong>of</strong> the Wiener process<br />

is called “white noise” since according to the Einstein–Wiener theorem its<br />

spectral measure equals the Lebesgue measure d(¸) /2p. It turned out that<br />

white noise is the paradigm <strong>for</strong> an unpredictable regular process; it serves to<br />

construct other more complicated stochastic structures.<br />

Wiener’s characterization <strong>of</strong> individual chaotic processes is founded on his<br />

basic paper “Generalized harmonic analysis”. [72] The purpose <strong>of</strong> Wiener’s<br />

generalized harmonic analysis is to give an account <strong>of</strong> phenomena which can<br />

neither be described by Fourier analysis nor by almost periodic functions. Instead<br />

<strong>of</strong> equivalence class <strong>of</strong> Lebesgue square summable functions, Wiener<br />

focused his harmonic analysis on individual Borel measurable functions<br />

t ½® x(t) <strong>for</strong> which the individual auto-correlation function<br />

C (t) := lim<br />

T ® ¥<br />

<strong>Probability</strong> <strong>Theory</strong> 595<br />

1<br />

2T<br />

*<br />

+ T<br />

- T<br />

x(t¢ ) x(t + t¢ ) dt¢ , t 2 R,<br />

exists <strong>and</strong> is continuous <strong>for</strong> all t. Wiener’s generalized harmonic analysis <strong>of</strong> an<br />

individual trajectory t ½® x(t) is in an essential way based on the spectral representation<br />

<strong>of</strong> the auto-correlation function. The Bochner–Cramér representation<br />

theorem implies that there exists a non-decreasing bounded function<br />

¸½® Ĉ (¸) , called the spectral distribution function <strong>of</strong> the individual function<br />

t ½® x(t),<br />

C (t) =* ¥<br />

- ¥<br />

e i¸t dĈ (¸)<br />

This relation is usually known under the name individual Wiener–Khintchin<br />

theorem. [73] However, this name is misleading. Khintchin’s theorem [74] relates<br />

the covariance function <strong>and</strong> the spectral function in terms <strong>of</strong> ensemble


¥<br />

¥<br />

596 H. Primas<br />

averages. In contrast, Wiener’s theorem [75] refers to individual functions.<br />

This result was already known to Albert Einstein long be<strong>for</strong>e. [76] The terminology<br />

“Wiener–Khintchin theorem” caused many confusions [77] <strong>and</strong> should<br />

there<strong>for</strong>e be avoided. Here, we refer to the individual theorem as the Einstein –<br />

Wiener theorem. For many applications it is crucial to distinguish between the<br />

Einstein–Wiener theorem which refer to individual functions, <strong>and</strong> the statistical<br />

Khintchin theorem which refers to equivalence classes <strong>of</strong> functions as used<br />

in Kolmogorov’s probability theory. The Einstein–Wiener theorem is in no<br />

way probabilistic. It refers to well-defined single functions rather than to an<br />

ensemble <strong>of</strong> functions.<br />

If an individual function t ½® x(t) has a pure point spectrum, it is almost periodic<br />

in the sense <strong>of</strong> Besicovitch,x(t)~å ˆx j = 1 j exp (i¸j t) . In a physical<br />

context an almost-periodic time function R : R ® C may be considered as<br />

predictable since its future{x(t)½ t > 0} is completely determined by its past<br />

{x(t)½ t £ 0}. If an individual function has an absolutely continuous spectral<br />

distribution, then the auto-correlation function vanishes in the limit as t ® ¥ .<br />

The auto-correlation function t ½® C ( t) provides a measure <strong>of</strong> the memory: if<br />

the individual function t ½® x(t) has a particular value at one moment, its<br />

auto-correlation tells us the degree to which we can guess that it will have<br />

about the same value some time later. In 1932, Koopman <strong>and</strong> von Neumann<br />

conjectured that an absolutely continuous spectral distribution function is the<br />

crucial property <strong>for</strong> the epistemically chaotic behavior <strong>of</strong> an ontic deterministic<br />

dynamical system. [78] In the modern terminology, Koopman <strong>and</strong> von Neumann<br />

refer to the so-called “mixing property.” However, a rapid decay <strong>of</strong> correlations<br />

is not sufficient as a criterion <strong>for</strong> the absence <strong>of</strong> any regularity.<br />

Genuine chaotic behavior requires stronger instability properties than just<br />

mixing. If we know the past {x(t)½ t £ 0} <strong>of</strong> an individual function t ½® x(t),<br />

then the future {x(t)½ t > 0} is completely determined if <strong>and</strong> only if the following<br />

Szegö condition <strong>for</strong> perfect linear predictability is fulfilled, [79]<br />

¥<br />

*<br />

- ¥<br />

ln { d Ĉ a c (¸) /d¸}<br />

1 + ¸2 d¸ = - ¥ ,<br />

where Ĉ ac is the absolutely continuous part <strong>of</strong> the spectral distribution function<br />

<strong>of</strong> the auto-correlation function <strong>of</strong> the individual function t ½® x(t).<br />

Every individual function t ½® x(t) with an absolutely continuous spectral<br />

distribution Ĉ fulfilling the Paley–Wiener criterion<br />

*<br />

- ¥<br />

½ ln dĈ (¸) /d¸½<br />

1 + ¸2 d¸ < ¥ ,<br />

will be called a chaotic function in the sense <strong>of</strong> Wiener.<br />

Wiener’s work initiated the mathematical theory <strong>of</strong> stochastic processes <strong>and</strong>


functional integration. It was a precursor <strong>of</strong> the general probability measures<br />

as defined by Kolmogorov. However, it would be mistaken to believe that the<br />

theory <strong>of</strong> stochastic processes in the sense <strong>of</strong> Kolmogorov has superseded<br />

Wiener’s ideas. Wiener’s approach has been criticized as unnecessarily cumbersome<br />

[80] since it was based on individual functions t ½® x(t), <strong>and</strong> not on<br />

Kolmogorov’s more ef<strong>for</strong>tless definition <strong>of</strong> measure-theoretical stochastic<br />

processes (that is, equivalence classes t ½® [ x(t½ v)]). It has to be emphasized<br />

that <strong>for</strong> many practical problems only Wiener’s approach is conceptually<br />

sound. For example, <strong>for</strong> weather prediction or anti-aircraft fire control there is<br />

no ensemble <strong>of</strong> trajectories but just a single individual trajectory from whose<br />

past behavior one would like to predict something about its future behavior.<br />

The basic link between Wiener’s individual <strong>and</strong> Kolmogorov’s statistical<br />

approach is Birkh<strong>of</strong>f’s individual ergodic theorem. Birkh<strong>of</strong>f’s theorem implies<br />

that m -almost every trajectory <strong>of</strong> an ergodic stochastic process on a Kolmogorov<br />

probability space (W ,S ,m) spends an amount <strong>of</strong> time in the measurable<br />

set B2 S which is proportional to m (B). For m -almost all points w 2 W ,<br />

the trajectory t ½® x(t½ v) (with a precisely fixed w 2 W ) <strong>of</strong> an ergodic regular<br />

stochastic process t ½® [ x(t½ v)] is an individual chaotic function in the sense<br />

<strong>of</strong> Wiener. This result implies that one can switch from an ensemble description<br />

in terms <strong>of</strong> a Kolmogorov probability space (W ,S , m) to an individual<br />

chaotic deterministic description in the sense <strong>of</strong> Wiener, <strong>and</strong> vice versa. Moreover,<br />

Birkh<strong>of</strong>f’s individual ergodic theorem implies the equality<br />

lim<br />

T ® ¥<br />

1<br />

T<br />

*<br />

0<br />

- T<br />

x(¿½ v) x(t + ¿½ v) d¿ = lim<br />

T ® ¥<br />

1<br />

2T<br />

*<br />

+ T<br />

- T<br />

x(¿½ v) x(t + ¿½ v) d¿<br />

so that <strong>for</strong> ergodic processes the auto-correlation function can be evaluated in<br />

principle from observations <strong>of</strong> the past {x(t½ v)½ t £ 0} <strong>of</strong> a single trajectory<br />

t ½® x(t½ v) , a result <strong>of</strong> crucial importance <strong>for</strong> the prediction theory <strong>of</strong> individual<br />

chaotic processes.<br />

Algorithmic Characterization <strong>of</strong> R<strong>and</strong>omness<br />

<strong>Probability</strong> <strong>Theory</strong> 597<br />

The roots <strong>of</strong> an algorithmic definition <strong>of</strong> a r<strong>and</strong>om sequence can be traced to<br />

the pioneering work by Richard von Mises who proposed in 1919 his principle<br />

<strong>of</strong> the excluded gambling system. [81] The use <strong>of</strong> a precise concept <strong>of</strong> an algorithm<br />

has made it possible to overcome the inadequacies <strong>of</strong> the von Mises’ <strong>for</strong>mulations.<br />

von Mises wanted to exclude “all” gambling systems but he did not<br />

properly specify what he meant by “all.” Alonzo Church pointed out that a<br />

gambling system which is not effectively calculable is <strong>of</strong> no practical use. [82]<br />

Accordingly, a gambling system has to be represented mathematically not by<br />

an arbitrary function but as an effective algorithm <strong>for</strong> the calculation <strong>of</strong> the<br />

values <strong>of</strong> a function. In accordance with von Mises’ intuitive ideas <strong>and</strong><br />

Church’s refinement a sequence is called r<strong>and</strong>om if no player who calculates


598 H. Primas<br />

his pool by effective methods can raise his <strong>for</strong>tune indefinitely when playing<br />

on this sequence.<br />

An adequate <strong>for</strong>malization <strong>of</strong> the notion <strong>of</strong> effective computable function<br />

was given in 1936 by Emil Leon Post <strong>and</strong> independently by Alan Mathison<br />

Turing by introducing the concept <strong>of</strong> an ideal computer nowadays called Turing<br />

machine. [83] A Turing machine is essentially a computer having an infinitely<br />

exp<strong>and</strong>able memory; it is an abstract prototype <strong>of</strong> a universal digital<br />

computer <strong>and</strong> can be taken as a precise definition <strong>of</strong> the concept <strong>of</strong> an algorithm.<br />

The so-called Church–Turing thesis states that every functions computable<br />

in any intuitive sense can be computed by a Turing machine. [84] No<br />

example <strong>of</strong> a function intuitively considered as computable but not Turingcomputable<br />

is known. According to the Church–Turing thesis a Turing machine<br />

represents the limit <strong>of</strong> computational power.<br />

The idea, that the computational complexity <strong>of</strong> a mathematical object reflects<br />

the difficulty <strong>of</strong> its computation, allows to give a simple, intuitively appealing<br />

<strong>and</strong> mathematically rigorous definition <strong>of</strong> the notion <strong>of</strong> r<strong>and</strong>omness <strong>of</strong><br />

sequence. Unlike most mathematicians, Kolmogorov himself has never <strong>for</strong>gotten<br />

that the conceptual foundation <strong>of</strong> probability theory is wanting. He was<br />

not completely satisfied with his measure-theoretical <strong>for</strong>mulation. Particularly,<br />

the exact relation between the probability measures m in the basic probability<br />

space (W ,S , m) <strong>and</strong> real statistical experiments remained open. Kolmogorov<br />

emphasized that<br />

the application <strong>of</strong> probability theory ...is always a matter <strong>of</strong> consequences <strong>of</strong> hypotheses<br />

about the impossibility <strong>of</strong> reducing in one way or another the complexity <strong>of</strong> the description<br />

<strong>of</strong> the objects in question. [85]<br />

In 1963, Kolmogorov again took up the concept <strong>of</strong> r<strong>and</strong>omness. He retracted<br />

his earlier view that “the frequency concept ...does not admit a rigorous <strong>for</strong>mal<br />

exposition within the framework <strong>of</strong> pure mathematics,” <strong>and</strong> stated that he<br />

came “to realize that the concept <strong>of</strong> r<strong>and</strong>om distribution <strong>of</strong> a property in a large<br />

finite population can have a strict <strong>for</strong>mal mathematical exposition.” [86] He<br />

proposed a measure <strong>of</strong> complexity based on the “size <strong>of</strong> a program” which,<br />

when processed by a suitable universal computing machine, yields the desired<br />

object. [87] In 1968, Kolmogorov sketched how in<strong>for</strong>mation theory can be<br />

founded without recourse to probability theory <strong>and</strong> in such a way that the concepts<br />

<strong>of</strong> entropy <strong>and</strong> mutual in<strong>for</strong>mation are applicable to individual events<br />

(rather than to equivalence classes <strong>of</strong> r<strong>and</strong>om variables or ensembles). In this<br />

approach the “quantity <strong>of</strong> in<strong>for</strong>mation” is defined in terms <strong>of</strong> storing <strong>and</strong> processing<br />

signals. It is sufficient to consider binary strings, that is, strings <strong>of</strong> bits,<br />

<strong>of</strong> zeros <strong>and</strong> ones.<br />

The concept <strong>of</strong> algorithmic complexity allows to rephrase the old idea that<br />

“r<strong>and</strong>omness consists in a lack <strong>of</strong> regularity” in a mathematically acceptable<br />

way. Moreover, a complexity measure <strong>and</strong> hence algorithmic probability


<strong>Probability</strong> <strong>Theory</strong> 599<br />

refers to an individual object. Loosely speaking the complexity K(x) <strong>of</strong> a binary<br />

string x is the size in bits <strong>of</strong> the shortest program <strong>for</strong> calculating it. If the<br />

complexity <strong>of</strong> x is not smaller than its length l(x) then there is no simpler way<br />

to write a program <strong>for</strong> x than to write it out. In this case the string x shows no<br />

periodicity <strong>and</strong> no pattern. Kolmogorov <strong>and</strong> independently Solomon<strong>of</strong>f <strong>and</strong><br />

Chaitin suggested that patternless finite sequences should be considered as<br />

r<strong>and</strong>om sequences. [88] That is, complexity is a measure <strong>of</strong> irregularity in the<br />

sense that maximal complexity means r<strong>and</strong>omness. There<strong>for</strong>e, it seems natural<br />

to call a binary string r<strong>and</strong>om if the shortest program <strong>for</strong> generating it is as long<br />

as the string itself. Since K(x) is not computable, it is not decidable whether a<br />

string is r<strong>and</strong>om.<br />

This definition <strong>of</strong> r<strong>and</strong>om sequences turned out not to be quite satisfactory.<br />

Using ideas <strong>of</strong> Kolmogorov, Per Martin-Löf succeeded in giving an adequate<br />

precise definition <strong>of</strong> r<strong>and</strong>om sequences. [89] Particularly, Martin-Löf proposed<br />

to define r<strong>and</strong>om sequences as those which withst<strong>and</strong> certain universal<br />

tests <strong>of</strong> r<strong>and</strong>omness, defined as recursive sequential tests. Martin-Löf’s r<strong>and</strong>om<br />

sequences fulfill all stochastic laws as the laws <strong>of</strong> large numbers, <strong>and</strong> the<br />

law <strong>of</strong> the iterated logarithm. A weakness <strong>of</strong> this definition is that Martin-Löf<br />

requires also stochastic properties that cannot be considered as physically<br />

meaningful in the sense that they cannot be tested by computable functions.<br />

A slightly different but more powerful variant is due to Claus-Peter Schnorr.<br />

[90] He argues that a c<strong>and</strong>idate <strong>for</strong> r<strong>and</strong>omness must be rejected if there is an<br />

effective procedure to do so. A sequence such that no effective process can<br />

show its non-r<strong>and</strong>omness must be considered as operationally r<strong>and</strong>om. He<br />

considers the null sets <strong>of</strong> Martin-Löf’s sequential tests in the sense <strong>of</strong> Brower<br />

(i.e. null sets that are effectively computable) <strong>and</strong> defines a sequence to be r<strong>and</strong>om<br />

if it is not contained in any such null set. Schnorr requires the stochasticity<br />

tests to be computable instead <strong>of</strong> being merely constructive. While the Kolmogorov–Martin-Löf<br />

approach is non-constructive, the tests considered by<br />

Schnorr are constructive to such an extent that it is possible to approximate infinite<br />

r<strong>and</strong>om sequences to an arbitrary degree <strong>of</strong> accuracy by computable sequences<br />

<strong>of</strong> high complexity (pseudo-r<strong>and</strong>om sequences). By that, the approximation<br />

will be the better, the greater the ef<strong>for</strong>t required to reject the<br />

pseudo-r<strong>and</strong>om sequence as being truly r<strong>and</strong>om. The fact that the behavior <strong>of</strong><br />

Schnorr’s r<strong>and</strong>om sequences can be approximated by constructive methods is<br />

<strong>of</strong> outst<strong>and</strong>ing conceptual <strong>and</strong> practical importance. R<strong>and</strong>om sequences in the<br />

sense <strong>of</strong> Martin-Löf do not have this approximation property, but non-approximate<br />

r<strong>and</strong>om sequences exist only by virtue <strong>of</strong> the axiom <strong>of</strong> choice.<br />

A useful characterization <strong>of</strong> r<strong>and</strong>om sequences can be given in terms <strong>of</strong><br />

games <strong>of</strong> chance. According to Mises’ intuitive ideas <strong>and</strong> Church’s refinement<br />

a sequence is called r<strong>and</strong>om if <strong>and</strong> only if no player who calculates his pool by<br />

effective methods can raise his <strong>for</strong>tune indefinitely when playing on this sequence.<br />

For simplicity, we restrict our discussion to, the practically important<br />

case <strong>of</strong> r<strong>and</strong>om sequences <strong>of</strong> the exponential type. A gambling rule implies a


600 H. Primas<br />

capital function C from the set I <strong>of</strong> all finite sequences to the set R <strong>of</strong> all real<br />

numbers. In order that a gambler actually can use a rule, it is crucial that this<br />

rule is given algorithmically. That is, the capital function C cannot be any<br />

function I® R , but has to be a computable function. [91] If we assume that<br />

the gambler’s pool is finite, <strong>and</strong> that debts are allowed, we get the following<br />

simple but rigorous characterization <strong>of</strong> a r<strong>and</strong>om sequence:<br />

A sequence {x 1 , x 2 , x 3 , ...} is a r<strong>and</strong>om sequence (<strong>of</strong> the exponential type) if <strong>and</strong> only if<br />

every computable capital function C :I ® R <strong>of</strong> bounded difference fulfills the relation<br />

lim n ® ¥ n - 1 C {x1 ,...,x n }= 0.<br />

According to Schnorr a universal test <strong>for</strong> r<strong>and</strong>omness cannot exist. A sequence<br />

fails to be r<strong>and</strong>om if <strong>and</strong> only if there is an effective process in which<br />

this failure becomes evident. There<strong>for</strong>e, one can refer to r<strong>and</strong>omness only with<br />

respect to a well-specified particular test.<br />

The algorithmic concept <strong>of</strong> r<strong>and</strong>om sequences can be used to derive a model<br />

<strong>for</strong> Kolmogorov’s axioms (in their constructive version) <strong>of</strong> mathematical<br />

probability theory. [92] It turns out that the measurable sets <strong>for</strong>m a s -algebra<br />

(in the sense <strong>of</strong> constructive set theory). This result shows the amazing insight<br />

Kolmogorov had in creating his axiomatic system.<br />

Laws <strong>of</strong> Chance <strong>and</strong> Determinism<br />

Why are There “Laws <strong>of</strong> Chance”?<br />

It would be a logical mistake to assume that arbitrary chance events can be<br />

grasped by the statistical methods <strong>of</strong> mathematical probability theory. <strong>Probability</strong><br />

theory has a rich mathematical structure so we have to ask under what<br />

conditions the usual “laws <strong>of</strong> chance” are valid. The modern concept <strong>of</strong> subjective<br />

probabilities presupposes a coherent rational behavior based on<br />

Boolean logic. That is, it is postulated that a rational man acts as if he had a deterministic<br />

model compatible with his pre-knowledge. Since also in many<br />

physical examples the appropriateness <strong>of</strong> the laws <strong>of</strong> probability can be traced<br />

back to an underlying deterministic ontic description, it is tempting to presume<br />

that chance events which satisfy the axioms <strong>of</strong> classical mathematical probability<br />

theory result always from the deterministic behavior <strong>of</strong> an underlying<br />

physical system. Such a claim cannot be demonstrated.<br />

What can be proven is the weaker statement that every probabilistic system<br />

which fulfills the axioms <strong>of</strong> classical mathematical probability theory can be<br />

embedded into a larger deterministic system. A classical system is said to be<br />

deterministic if there exists a complete set <strong>of</strong> dispersion-free states such that<br />

Hadamard’s principle <strong>of</strong> scientific determinism is fulfilled. Here, a state is said<br />

to be dispersion-free if every observable has a definite dispersion-free value<br />

with respect to this state. For such a deterministic system statistical states are<br />

given by mean values <strong>of</strong> dispersion-free states. A probabilistic system is said to


<strong>Probability</strong> <strong>Theory</strong> 601<br />

allow hidden variables if it is possible to find a hypothetical larger system such<br />

that every statistical state <strong>of</strong> the probabilistic system is a mean value <strong>of</strong> dispersion-free<br />

states <strong>of</strong> the enlarged system. Since the logic <strong>of</strong> classical probability<br />

theory is a Boolean s -algebra we can use the well-known result that a classical<br />

dynamical system is deterministic if <strong>and</strong> only if the underlying Boolean algebra<br />

is atomic. [93] As proved by Franz Kamber, every classical system characterized<br />

by a Boolean algebra allows the introduction <strong>of</strong> hidden variables such<br />

that every statistical state is a mean value <strong>of</strong> dispersion-free states. [94] This<br />

theorem implies that r<strong>and</strong>om events fulfill the laws <strong>of</strong> chance if <strong>and</strong> only if they<br />

can <strong>for</strong>mally be reduced to hidden deterministic events. Such a deterministic<br />

embedding is never unique but <strong>of</strong>ten there is a unique minimal dilation <strong>of</strong> a<br />

probabilistic dynamical system to a deterministic one. [95] Note that the deterministic<br />

embedding is usually not constructive <strong>and</strong> that nothing is claimed<br />

about a possible ontic interpretation <strong>of</strong> hidden variables <strong>of</strong> the enlarged deterministic<br />

system.<br />

Kolmogorov’s probability theory can be viewed as a hidden variable representation<br />

<strong>of</strong> the basic abstract point-free theory. Consider the usual case where<br />

the Boolean algebra B <strong>of</strong> mathematical probability theory contains no atoms.<br />

Every classical probability system (B, p) can be represented in terms <strong>of</strong> some<br />

(not uniquely given) Kolmogorov space (W ,S ,m) as a s -complete Boolean algebra<br />

B=S /D , where D is the s -ideal <strong>of</strong> Borel sets <strong>of</strong> m -measure zero. The<br />

points w 2 W <strong>of</strong> the set W correspond to two-valued individual states (the socalled<br />

atomic or pure states) <strong>of</strong> the fictitious embedding atomic Boolean algebra<br />

P(W ) <strong>of</strong> all subsets <strong>of</strong> the point set W . If (as usual) the set W is not countable,<br />

the atomic states are epistemically inaccessible. Measure-theoretically,<br />

an atomic state corresponding to a point w 2 W is represented by the Dirac measure<br />

± w at the point w 2 W , defined <strong>for</strong> every subset B <strong>of</strong> W by ± w (B)= 1 if<br />

w 2 B <strong>and</strong> ± w (B)= 0 if v 2 / B. Every epistemically accessible state can be described<br />

by a probability density f2 L 1 (W ,S ,m) ) which can be represented as an<br />

average <strong>of</strong> epistemically inaccessible atomic states,<br />

f (v) =*<br />

V<br />

f (v¢ ) ± ! (dv¢ ).<br />

The set-theoretical representation <strong>of</strong> the basic Boolean algebra B in terms <strong>of</strong> a<br />

Kolmogorov probability space (W ,S ,m) is mathematically convenient since it<br />

allows to relate an epistemic dynamics t½® f t in terms <strong>of</strong> a probability density<br />

f t<br />

2 L 1 (W ,S , m) to a fictitious deterministic dynamics <strong>for</strong> the points t½® w 2 W t<br />

by f t<br />

(w ) = f(w - t). [96] It is also physically interesting since all known contextindependent<br />

physical laws are deterministic <strong>and</strong> <strong>for</strong>mulated in terms <strong>of</strong> pure<br />

states. In contrast, every statistical dynamical law depends on some phenomenological<br />

constants (like the half-time constants [97] in the exponential decay<br />

law <strong>for</strong> the spontaneous decay <strong>of</strong> a radioactive nucleus). That is, we can <strong>for</strong>mulate<br />

context-independent laws only if we introduce atomic states.


602 H. Primas<br />

Quantum Mechanics Does Not Imply an Ontological Indeterminism<br />

Although it is in general impossible to predict an individual quantum event,<br />

in an ontic description the most fundamental law-statements <strong>of</strong> quantum theory<br />

are deterministic. Yet, probability is an essential element in every epistemic<br />

description <strong>of</strong> quantum events, but does not indicate an incompleteness <strong>of</strong> our<br />

knowledge. The context-independent laws <strong>of</strong> quantum mechanics (which necessarily<br />

have to be <strong>for</strong>mulated in an ontic interpretation) are strictly deterministic<br />

but refer to a non-Boolean logical structure <strong>of</strong> reality. On the other h<strong>and</strong>,<br />

every experiment ever per<strong>for</strong>med in physics, chemistry <strong>and</strong> biology has a<br />

Boolean operational description. The reason <strong>for</strong> this situation is en<strong>for</strong>ced by<br />

the necessity to communicate about facts in an unequivocal language.<br />

The epistemically irreducible probabilistic structure <strong>of</strong> quantum theory is<br />

induced by the interaction <strong>of</strong> the quantum object system with an external classical<br />

observing system. Quantum mechanical probabilities do not refer to the<br />

object system but to the state transition induced by the interaction <strong>of</strong> the object<br />

system with the measuring apparatus. The non-predictable outcome <strong>of</strong> a<br />

quantum experiment is related to the projection <strong>of</strong> the atomic non-Boolean lattice<br />

<strong>of</strong> the ontic description <strong>of</strong> the deterministic reality to the atom-free<br />

Boolean algebra <strong>of</strong> the epistemic description <strong>of</strong> a particular experiment. The<br />

restriction <strong>of</strong> an ontic atomic state (which gives a complete description <strong>of</strong> the<br />

non-Boolean reality) to a Boolean context is no longer atomic but is given by a<br />

probability measure. The measure generated in this way is a conditional probability<br />

which refers to the state transition induced by the interaction. Such<br />

quantum-theoretical probabilities cannot be attributed to the object system<br />

alone; they are conditional probabilities where the condition is given by experimental<br />

arrangement. The epistemic probabilities depend on the experimental<br />

arrangement but, <strong>for</strong> a fixed context, they are objective since the underlying<br />

ontic structure is deterministic. Since a quantum-theoretical probability refers<br />

to a singled out classical experimental context, it corresponds exactly to the<br />

mathematical probabilities <strong>of</strong> Kolmogorov’s set-theoretical probability theory.<br />

[98] There<strong>for</strong>e, a non-Boolean generalization <strong>of</strong> probability theory is not<br />

necessary since all these measures refer to a Boolean context. The various theorems<br />

which show that it is impossible in quantum theory to introduce hidden<br />

variables only say that it is impossible to embed quantum theory into a deterministic<br />

Boolean theory. [99]<br />

Chance Events <strong>for</strong> Which the Traditional “Laws <strong>of</strong> Chance” Do Not Apply<br />

Conceptually, quantum theory does not require a generalization <strong>of</strong> the traditional<br />

Boolean probability theory. Nevertheless, mathematicians created a<br />

non-Boolean probability theory by introducing a measure on the orthomodular<br />

lattice <strong>of</strong> projection operators on the Hilbert space <strong>of</strong> quantum theory.<br />

[100] The various variants <strong>of</strong> a non-Boolean probability theory are <strong>of</strong> no conceptual<br />

importance <strong>for</strong> quantum theory, but they show that genuine <strong>and</strong> inter-


<strong>Probability</strong> <strong>Theory</strong> 603<br />

esting generalizations <strong>of</strong> traditional probability theory are possible. [101] At<br />

present there are few applications. If we find empirical chance phenomena<br />

with a non-classical statistical behavior, the relevance <strong>of</strong> a non-Boolean theory<br />

should be considered. Worth mentioning are the non-Boolean pattern recognition<br />

methods [102], the attempt to develop a non-Boolean in<strong>for</strong>mation theory<br />

[103], <strong>and</strong> speculations on the mind-body relation in terms <strong>of</strong> non-Boolean<br />

logic. [104]<br />

From a logical point <strong>of</strong> view the existence <strong>of</strong> irreproducible unique events<br />

cannot be excluded. For example, if we deny a strict determinism on the ontological<br />

level <strong>of</strong> a Boolean or non-Boolean reality, then there are no reasons to<br />

expect that every chance event is governed by statistical laws <strong>of</strong> any kind.<br />

Wolfgang Pauli made the inspiring proposal to characterize unique events by<br />

the absence <strong>of</strong> any type <strong>of</strong> statistical regularity:<br />

Die von [Jung] betrachteten Synchronizitätsphänomene ...entziehen sich der Einfangung<br />

in Natur-‘Gesetze’, da sie nicht reproduzierbar, d.h. einmalig sind und durch<br />

die Statistik grosser Zahlen verwischt werden. In der Physik dagegen sind die<br />

‘Akausalitäten’ gerade durch statistische Gesetze (grosse Zahlen) erfassbar. [105]<br />

English translation: The synchronicity phenomena considered by [Jung] ... elude capture<br />

as "laws" <strong>of</strong> nature, since they are not reproducible, that is to say, they are unique<br />

<strong>and</strong> obliterated by the statistics <strong>of</strong> large numbers. In physics, on the other h<strong>and</strong>,<br />

’acausalities’ just become ascertainable by the law <strong>of</strong> large numbers.<br />

Acknowledgment<br />

I would like to thank Harald Atmanspacher <strong>and</strong> Werner Ehm <strong>for</strong> clarifying<br />

discussions <strong>and</strong> a careful reading <strong>of</strong> a draft <strong>of</strong> this paper.<br />

Endnotes<br />

[1] Laplace’s famous reply to Napoleon’s remark that he did not mention God in his Exposition<br />

du Système du Monde.<br />

[2] Laplace (1814). Translation taken from the Dover edition, p.4.<br />

[3] Gibbs (1902). A lucid review <strong>of</strong> Gibbs’ statistical conception <strong>of</strong> physics can be found in<br />

Haas (1936), volume II, chapter R.<br />

[4] This distinction is due to Scheibe (1964), Scheibe (1973), pp.50–51.<br />

[5] Compare Hille <strong>and</strong> Phillips (1957), p.618.<br />

[6] In a slightly weaker <strong>for</strong>m, this concept has been introduced by Edmund Whittaker (1943).<br />

[7] Compare Cournot (1843), §40; Venn (1866).<br />

[8] Galton’s desk (after Francis Galton, 1822–1911) is an inclined plane provided with regularly<br />

arranged nails in n horizontal lines. A ball launched on the top will be diverted at every<br />

line either to left or to right. Under the last line <strong>of</strong> nails there are n +1 boxes (numbered from<br />

the left from k=0 to k=n) in which the balls are accumulated. In order to fall into the k-th box<br />

a ball has to be diverted k times to the right <strong>and</strong> n- k times to the left. If at each nail the probability<br />

<strong>for</strong> the ball to go to left or to right is 1/2, then the distribution <strong>of</strong> the balls is given by


604 H. Primas<br />

the binomial distribution ( n k) (1/2) n , which <strong>for</strong> large n approach a Gaussian distribution.<br />

Our ignorance <strong>of</strong> the precise initial <strong>and</strong> boundary does not allow us to predict individual<br />

events. Nevertheless, the experimental Gaussian distribution does in no way depend on our<br />

knowledge. In this sense, we may speak <strong>of</strong> objective chance events.<br />

[9] For an introduction into the theory <strong>of</strong> deterministic chaos, compare <strong>for</strong> example Schuster<br />

(1984).<br />

[10] Feigl (1953), p.408.<br />

[11] Born (1955a), Born (1955b). For a critique <strong>of</strong> Born’s view compare Von Laue (1955).<br />

[12] Scriven (1965). For a critique <strong>of</strong> Scriven’s view compare Boyd (1972).<br />

[13] Gillies (1973), p.135.<br />

[14] Earman (1986), pp.6–7.<br />

[15] The definition <strong>and</strong> interpretation <strong>of</strong> probability has a long history. There exists an enormous<br />

literature on the conceptual problems <strong>of</strong> the classical probability calculus which cannot be<br />

summarized here. For a first orientation, compare the monographs by Fine (1973), Maistrov<br />

(1974), Von Plato (1994).<br />

[16] von Weizsäcker (1973), p.321.<br />

[17] Waismann (1930).<br />

[18] von Mises (1928).<br />

[19] Jeffreys (1939).<br />

[20] Savage (1962), p.102.<br />

[21] Russell (1948), pp.356–357.<br />

[22] Compare <strong>for</strong> example Savage (1954), Savage (1962), Good (1965), Jeffrey (1965). For a<br />

convenient collection <strong>of</strong> the most important papers on the modern subjective interpretation,<br />

compare Kyburg <strong>and</strong> Smokler (1964).<br />

[23] Bernoulli (1713).<br />

[24] Compare Laplace (1814).<br />

[25] de Finetti (1937). Compare also the collection <strong>of</strong> papers de Finetti (1972) <strong>and</strong> the monographs<br />

de Finetti (1974), de Finetti (1975).<br />

[26] Savage (1954), Savage (1962).<br />

[27] Keynes (1921).<br />

[28] Koopman (1940a), Koopman (1940b), Koopman (1941).<br />

[29] Carnap (1950), Carnap (1952), Carnap <strong>and</strong> Jeffrey (1971). Carnap’s concept <strong>of</strong> logical<br />

probabilities has been critized sharply by Watanabe (1969a).<br />

[30] For a critical evaluation <strong>of</strong> the view that statements <strong>of</strong> probability can be logically true,<br />

compare Ayer (1957), <strong>and</strong> the ensuing discussion, pp.18–30.<br />

[31] Venn (1866), chapter VI, §35 , §36.<br />

[32] Cournot (1843). This working rule was still adopted by Kolmogor<strong>of</strong>f (1933), p.4.<br />

[33] von Weizsäcker (1973), p.326. Compare also von Weizsäcker (1985), pp.100–118.<br />

[34] Carnap (1945), Carnap (1950).<br />

[35] Carnap (1963), p.73.<br />

[36] Compare <strong>for</strong> example Khrennikov (1994), chapters VI <strong>and</strong> VII.<br />

[37] Pauli (1954), p.114.<br />

[38] von Mises (1919), von Mises (1928), von Mises (1931). The English edition <strong>of</strong> von Mises


<strong>Probability</strong> <strong>Theory</strong> 605<br />

Ú Ù<br />

^<br />

(1964) was edited <strong>and</strong> complemented by Hilda Geiringer; it is strongly influenced by the<br />

views <strong>of</strong> Erhard Tornier <strong>and</strong> does not necessarily reflect the views <strong>of</strong> Richard von Mises.<br />

[39] The same is true <strong>for</strong> the important modifications <strong>of</strong> von Mises’ approach by Tornier (1933)<br />

<strong>and</strong> by Reichenbach (1994). Compare also the review by Martin-Löf (1969a).<br />

[40] Boole (1854), p.1.<br />

[41] Compare Halmos (1944), Kolmogor<strong>of</strong>f (1948), Lo)s (1955). A detailed study <strong>of</strong> the purely<br />

lattice-theoretical (“point-free”) approach to classical probability can be found in the<br />

monograph by Kappos (1969).<br />

[42] Pro memoria: Boolean Algebras. A Boolean algebra is a non-empty set B in which two binary<br />

operations (addition or disjunction) <strong>and</strong> (multiplication or conjunction), <strong>and</strong> a<br />

unary operation (complementation or negation) with the following properties are defined:<br />

the operations Ú <strong>and</strong> Ù are commutative <strong>and</strong> associative,<br />

the operation Ú is distributive with respect to Ù , <strong>and</strong> vice versa,<br />

<strong>for</strong> every A Î B <strong>and</strong> every B Î B we have A Ú A ^ = B Ú B ^ <strong>and</strong> A Ù<br />

A Ú (A Ù A ^ )= A Ù ( A Ú A ^ )= A .<br />

A ^<br />

= B Ù B ^<br />

These axioms imply that in every Boolean algebra there are two distinguished elements 1<br />

(called the unit <strong>of</strong> B ) <strong>and</strong> 0 (called the zero <strong>of</strong> B ), defined by A Ú<br />

A ^<br />

= 1, A Ù A ^ = 0 <strong>for</strong><br />

Î Ú Ú<br />

Î Ù Ù<br />

Î<br />

s<br />

s<br />

s s<br />

s s<br />

s<br />

every A B . With this it follows that 0 is the neutral element <strong>of</strong> the addition , A 0= A <strong>for</strong><br />

every A B , <strong>and</strong> that 1 is the neutral element <strong>of</strong> the multiplication , A 1= A <strong>for</strong> every<br />

A B . For more details, compare Sikorski (1969).<br />

[43] Stone (1936).<br />

[44] Loomis (1947).<br />

[45] Kolmogor<strong>of</strong>f (1933). There are many excellent texts on Kolmogorov’s mathematical probability<br />

theory. Compare <strong>for</strong> example: Breiman (1968), Prohorov <strong>and</strong> Rozanov (1969), Laha<br />

<strong>and</strong> Rohatgi (1979), Rényi (1970a), Rényi (1970b). Recommendable introductions to measure<br />

theory are, <strong>for</strong> example: Cohn (1980) , Nielsen (1997).<br />

[46] von Neumann (1932a), pp.595–598. Compare also Birkh<strong>of</strong>f <strong>and</strong> Neumann (1936), p.825.<br />

[47] Compare Gnedenko <strong>and</strong> Kolmogorov (1954), §3.<br />

[48] If V is a topological space, then the smallest -algebra with respect to which all continuous<br />

complex-valued functions on V are measurable is called the Baire -algebra <strong>of</strong> V . The<br />

smallest -algebra containing all open sets <strong>of</strong> V is called the Borel -algebra <strong>of</strong> V . In general,<br />

the Baire -algebra is contained in the Borel -algebra. If V is metrisable, then the<br />

Baire <strong>and</strong> Borel -algebras coincide. Compare Bauer (1974), theorem 40.4, p.198.<br />

[49] A polish space is a separable topological space that can be metrized by means <strong>of</strong> a complete<br />

metric; compare Cohn (1980), chapter 8. For a review <strong>of</strong> probability theory on complete<br />

separable metric spaces, compare Parthasarathy (1967). For a discussion <strong>of</strong> Radon measures<br />

on arbitrary topological spaces, compare Schwartz (1973). For a critical review <strong>of</strong><br />

Kolmogorov’s axioms, compare Fortet (1958), Lorenzen (1978).<br />

[50] Rényi (1955). Compare also chapter 2 in the excellent textbook by Rényi (1970b).<br />

[51] Sikorski (1949).<br />

[52] A usual but rather ill-chosen name since a “r<strong>and</strong>om variable” is neither a variable nor r<strong>and</strong>om.


606 H. Primas<br />

[53] While the equivalence <strong>of</strong> two continuous functions on a closed interval implies their equality,<br />

this is not true <strong>for</strong> arbitrary measurable (that is, in general, discontinuous) functions.<br />

Compare, <strong>for</strong> example Kolmogorov <strong>and</strong> Fomin (1961), p.41.<br />

[54] Compare Aristotle’s criticism in Metaphysica , 1064b 15: “Evidently, none <strong>of</strong> the traditional<br />

sciences busies itself about the accidental.” Quoted from Ross (1924).<br />

[55] Waismann (1930).<br />

[56] Compare <strong>for</strong> example Doob (1953), p.564; Pinsker (1964), section 5.2; Rozanov (1967),<br />

sections II.2 <strong>and</strong> III.2. Sometimes, singular processes are called deterministic, <strong>and</strong> regular<br />

processes are called purely non-deterministic. We will not use this terminology since determinism<br />

refers to an ontic description, while the singularity or the regularity refers to epistemic<br />

predictability <strong>of</strong> the process.<br />

[57] Bochner (1932), §20. The representation theorem by Bochner (1932), §19 <strong>and</strong> §20, refers to<br />

continuous positive-definite functions. Later, Cramér (1939) showed that the continuity assumption<br />

is dispensable. Compare also Cramér <strong>and</strong> Leadbetter (1967), section 7.4.<br />

[58] Khintchine (1934). Often this result is called the Wiener-Khintchin-theorem but this terminology<br />

should be avoided since Khintchin’s theorem relates the ensemble averages <strong>of</strong> the<br />

covariance <strong>and</strong> the spectral functions while the theorem by Wiener (1930), chapter II.3, relates<br />

the auto-correlation function <strong>of</strong> a single function with a spectral function <strong>of</strong> a single<br />

function.<br />

[59] Compare <strong>for</strong> example Rosenblatt (1971), section VI.2.<br />

[60] Compare also the review by Kallianpur (1961).<br />

[61] This decomposition is due to Wold (1938) <strong>for</strong> the special case <strong>of</strong> discrete-time weakly stationary<br />

processes, <strong>and</strong> to Hanner (1950) <strong>for</strong> the case <strong>of</strong> continuous-time processes. The<br />

general decomposition theorem is due to Cramér (1939).<br />

[62] Wiener (1942), republished as Wiener (1949); Krein (1945), Krein (1945). Compare also<br />

Doob (1953), p.584.<br />

[63] Compare also Lindblad (1993).<br />

[64] Meixner (1961), Meixner (1965).<br />

[65] König <strong>and</strong> Tobergte (1963).<br />

[66] Wiener <strong>and</strong> Akutowicz (1957), theorem 4.<br />

[67] Using the linearization <strong>of</strong> a classical dynamical system to Hilbert-space description introduced<br />

by Bernard Osgood Koopman (1931), Johann von Neumann (1932b) (communicated<br />

December 10, 1931, published 1932) was the first to establish a theorem bearing to the quasiergodic<br />

hypothesis: the mean ergodic theorem which refers to L 2 -convergence. Stimulated<br />

by these ideas, one month later George David Birkh<strong>of</strong>f (1931) (communicated December 1,<br />

1931, published 1931) obtained the even more fundamental individual (or pointwise) ergodic<br />

theorem which refers to pointwise convergence. As Birkh<strong>of</strong>f <strong>and</strong> Koopman (1932) explain,<br />

von Neumann communicated his results to them on October 22, 1931, <strong>and</strong> “raised at<br />

once the important question as to whether or not ordinary time means exist along the individual<br />

path-curves excepting <strong>for</strong> a possible set <strong>of</strong> Lebesgue measure zero.” Shortly thereafter<br />

Birkh<strong>of</strong>f proved his individual ergodic theorem.<br />

[68] This <strong>for</strong>mulation has been taken from Masani (1990), p.139–140.<br />

[69] Einstein (1905), Einstein (1906).<br />

[70] Wiener (1923), Wiener (1924).


<strong>Probability</strong> <strong>Theory</strong> 607<br />

®<br />

- - Î<br />

[71] Perrin (1906). A rigorous pro<strong>of</strong> <strong>of</strong> Perrin’s conjecture is due to Paley, Wiener <strong>and</strong> Zygmund<br />

(1933).<br />

[72] Wiener (1930). In his valuable commentary Pesi P. Masani (1979) stresses the importance <strong>of</strong><br />

role <strong>of</strong> generalized harmonic analysis <strong>for</strong> the quest <strong>for</strong> r<strong>and</strong>omness.<br />

[73] Compare <strong>for</strong> example Middleton (1960), p.151.<br />

[74] Khintchine (1934).<br />

[75] Wiener (1930), chapter II.3.<br />

[76] Einstein (1914a), Einstein (1914b).<br />

[77] Compare <strong>for</strong> example the controversy by Brennan (1957), Brennan (1958) <strong>and</strong> Beutler<br />

(1958a), Beutler (1958b), with a final remark by Norbert Wiener (1958).<br />

[78] Koopman <strong>and</strong> Neumann (1932), p.261.<br />

[79] Compare <strong>for</strong>e example Dym <strong>and</strong> McKean (1976), p.84. Note that there are processes which<br />

are singular in the linear sense but allow a perfect nonlinear prediction. An example can be<br />

found in Scarpellini (1979), p.295.<br />

[80] For example by Kakutani (1950).<br />

[81] von Mises (1919). Compare also his later books von Mises (1928), von Mises (1931), von<br />

Mises (1964).<br />

[82] Church (1940).<br />

[83] Post (1936), Turing (1936).<br />

[84] Church (1936)<br />

[85] Kolmogorov (1983a), p.39.<br />

[86] Kolmogorov (1963), p.369.<br />

[87] Compare also Kolmogorov (1968a), Kolmogorov (1968b), Kolmogorov (1983a), Kolmogorov<br />

(1983b), Kolmogorov <strong>and</strong> Uspenskii (1988). For a review, compare Zvonkin <strong>and</strong><br />

Levin (1970).<br />

[88] Compare Solomon<strong>of</strong>f (1964), Chaitin (1966), Chaitin (1969), Chaitin (1970).<br />

[89] Martin-Löf (1966), Martin-Löf (1969b).<br />

[90] Schnorr (1969), Schnorr (1970a), Schnorr (1970b), Schnorr (1971a), Schnorr(1971b) ,<br />

Schnorr (1973).<br />

[91] A function C :I R is called computable if there is a recursive function R such that<br />

| R (n,w) C (w)| < 2 n <strong>for</strong> all w I <strong>and</strong> all nÎ {1,2,3,...} . Recursive functions are functions<br />

computable with the aid <strong>of</strong> a Turing machine.<br />

[92] For a review <strong>of</strong> modern algorithmic probability theory, compare Schnorr (1971b).<br />

[93] Compare <strong>for</strong> example Kronfli (1971).<br />

[94] Kamber (1964), §7, <strong>and</strong> Kamber (1965), §14.<br />

[95] For the Hilbert-space theory <strong>of</strong> such minimal dilations in Hilbert space, compare Sz.-Nagy<br />

<strong>and</strong> Foiaş (1970). More generally, Antoniou <strong>and</strong> Gustafson (1997) have shown that an arbitrary<br />

Markov chain can be dilated to a unique minimal deterministic dynamical system.<br />

[96] For example, every continuous regular Gaussian stochastic processes can be generated by a<br />

deterministic conservative <strong>and</strong> reversible linear Hamiltonian system with an infinite-dimensional<br />

phase space. For an explicit construction, compare <strong>for</strong> instance Picci (1986), Picci<br />

(1988).<br />

[97] These decay constants are not “an invariable property <strong>of</strong> the nucleus, unchangeable by any


608 H. Primas<br />

external influences” (as claimed by Max Born (1949), p.172), but depend <strong>for</strong> example on the<br />

degree <strong>of</strong> ionization <strong>of</strong> the atom.<br />

[98] In quantum theory, a Boolean context is described by a commutative W*-algebra which can<br />

be generated by a single selfadjoint operator, called the observable <strong>of</strong> the experiment. The<br />

expectation value <strong>of</strong> the operator-valued spectral measure <strong>of</strong> this observable is exactly the<br />

probability measure <strong>for</strong> the statistical description <strong>of</strong> the experiment in terms <strong>of</strong> a classical<br />

Kolmogorov probability space.<br />

[99] The claim by Hans Reichenbach (1949) (p.15), “dass das Kausalprinzip in keiner Weise mit<br />

der Physik der Quanta verträglich ist,” is valid only if one restricts arbitrarily the domain <strong>of</strong><br />

the causality principle to Boolean logic.<br />

[100] For an introduction, compare Jauch (1974) <strong>and</strong> Beltrametti <strong>and</strong> Cassinelli (1981), chapters<br />

11 <strong>and</strong> 26.<br />

[101] Compare <strong>for</strong> example Gudder <strong>and</strong> Hudson (1978).<br />

[102] Compare Watanabe (1967), Watanabe (1969b), Schadach (1973). For a concrete application<br />

<strong>of</strong> non-Boolean pattern recognition <strong>for</strong> medical diagnosis, compare Schadach (1973).<br />

[103] Compare <strong>for</strong> example Watanabe (1969a), chapter 9.<br />

[104] Watanabe (1961).<br />

[105] Letter <strong>of</strong> June 3, 1952, by Wolfgang Pauli to Markus Fierz, quoted from von Meyenn<br />

(1996), p.634.<br />

References<br />

Antoniou, I. <strong>and</strong> Gustafson, K. (1997). From irreversible Markov semigroups to chaotic dynamics.<br />

Physica A, 236, 296- 308.<br />

Ayer, A. J. (1957). The conception <strong>of</strong> probability as a logical relation. In: S. Körner (Ed.): Observation<br />

<strong>and</strong> Interpretation in the Philosophy <strong>of</strong> Physics. New York: Dover Publications. pp.<br />

12–17.<br />

Beltrametti, E. G. <strong>and</strong> Cassinelli, G. (1981). The Logic <strong>of</strong> Quantum Mechanics. London: Addison-<br />

Wesley.<br />

Bernoulli, J. (1713). Ars conject<strong>and</strong>i. Basel. German translation by R. Haussner under the title <strong>of</strong><br />

Wahrscheinlichkeitsrechnung. Leipzig: Engelmann, 1899.<br />

Beutler, F. J. (1958a). A further note on differentiability <strong>of</strong> auto-correlation functions. Proceedings<br />

<strong>of</strong> the Institute <strong>of</strong> Radio Engineers, 45, 1759- 1760.<br />

Beutler, F. J. (1958b). A further note on differentiability <strong>of</strong> auto-correlation functions. Author’s<br />

comments. Proceedings <strong>of</strong> the Institute <strong>of</strong> Radio Engineers, 46, 1759–1760.<br />

Birkh<strong>of</strong>f, G. <strong>and</strong> von Neumann, J. (1936): The logic <strong>of</strong> quantum mechanics. Annals <strong>of</strong> Mathematics,<br />

37, 823- 843.<br />

Birkh<strong>of</strong>f, G. D. (1931). Pro<strong>of</strong> <strong>of</strong> the ergodic theorem. Proceedings <strong>of</strong> the National Academy <strong>of</strong> Sciences<br />

<strong>of</strong> the United States <strong>of</strong> America, 17, 656–660.<br />

Birkh<strong>of</strong>f, G. D. <strong>and</strong> Koopman, B. O. (1932). Recent contributions to the ergodic theory. Proceedings<br />

<strong>of</strong> the National Academy <strong>of</strong> Sciences <strong>of</strong> the United States <strong>of</strong> America, 18, 279–282.<br />

Bochner, S. (1932). Vorlesungen über Fouriersche Integrale. Leipzig: Akademische Verlagsgesellschaft.<br />

Boole, G. (1854). An Investigation <strong>of</strong> the Laws <strong>of</strong> Thought. London: Macmillan. Reprint (1958).<br />

New York: Dover Publication.<br />

Born, M. (1949). Einstein’s statistical theories. In: P. A. Schilpp (Ed.): Albert Einstein: Philosopher-Scientist<br />

. Evanston, Illinois: Library <strong>of</strong> Living Philosophers. pp.163–177.<br />

Born, M. (1955a). Ist die klassische Mechanik wirklich deterministisch? Physikalische Blätter, 11,<br />

49- 54.<br />

Born, M. (1955b). Continuity, determinism <strong>and</strong> reality. Danske Videnskabernes Selskab Mathematisk<br />

Fysiske Meddelelser, 30, No.2, pp.1- 26.<br />

Boyd, R. (1972). Determinism, laws, <strong>and</strong> predictability in principle. Philosophy <strong>of</strong> Science, 39,<br />

431- 450.


<strong>Probability</strong> <strong>Theory</strong> 609<br />

Breiman, L. (1968). <strong>Probability</strong> . Reading, Massachusetts: Addison-Wesley.<br />

Brennan, D. G. (1957). Smooth r<strong>and</strong>om functions need not have smooth correlation functions.<br />

Proceedings <strong>of</strong> the Institute <strong>of</strong> Radio Engineers, 45, 1016- 1017.<br />

Brennan, D. G. (1958). A further note on differentiability <strong>of</strong> auto-correlation functions. Proceedings<br />

<strong>of</strong> the Institute <strong>of</strong> Radio Engineers, 46, 1758- 1759.<br />

Carnap, R. (1945). The two concepts <strong>of</strong> probability. Philosophy <strong>and</strong> Phenomenological Research,<br />

5, 513- 532.<br />

Carnap, R. (1950). Logical Foundations <strong>of</strong> <strong>Probability</strong>. Chicago: University <strong>of</strong> Chicago Press.<br />

2nd edition. 1962.<br />

Carnap, R. (1952). The Continuum <strong>of</strong> Inductive Methods. Chicago: University <strong>of</strong> Chicago Press.<br />

Carnap, R. (1963). Intellectual autobiography. In: P. A. Schilpp (Ed.): The Philosophy <strong>of</strong> Rudolf<br />

Carnap. La Salle, Illinois: Open Court. pp.1–84.<br />

Carnap, R. <strong>and</strong> Jeffrey, R. C. (1971). Studies in Inductive Logic <strong>and</strong> <strong>Probability</strong>. Volume I. Berkeley:<br />

University <strong>of</strong> Cali<strong>for</strong>nia Press.<br />

Chaitin, G. (1966). On the length <strong>of</strong> programs <strong>for</strong> computing finite binary sequences. Journal <strong>of</strong><br />

the Association <strong>for</strong> Computing Machinery, 13, 547- 569.<br />

Chaitin, G. (1969). On the length <strong>of</strong> programs <strong>for</strong> computing finite binary sequences: Statistical<br />

considerations. Journal <strong>of</strong> the Association <strong>for</strong> Computing Machinery, 16, 143- 159.<br />

Chaitin, G. (1970). On the difficulty <strong>of</strong> computations. IEEE Transactions on In<strong>for</strong>mation <strong>Theory</strong>,<br />

IT-16, 5- 9.<br />

Church, A. (1936). An unsolvable problem <strong>of</strong> elementary number theory. The American Journal<br />

<strong>of</strong> Mathematics, 58, 345- 363.<br />

Church, A. (1940). On the concept <strong>of</strong> a r<strong>and</strong>om sequence. Bulletin <strong>of</strong> the American Mathematical<br />

<strong>Society</strong>, 46, 130- 135.<br />

Cohn, D. L. (1980). Measure <strong>Theory</strong>. Boston: Birkhäuser.<br />

Cournot, A. A. (1843). Exposition de la théorie des chances et des probabilitiés. Paris.<br />

Cramér, H. (1939). On the representation <strong>of</strong> a function by certain Fourier integrals. Transactions<br />

<strong>of</strong> the American Mathematical <strong>Society</strong>, 46, 191- 201.<br />

de Finetti, B. (1937). La prévision: ses lois logiques, ses sources subjectives. Annales de l’Institut<br />

Henri Poincaré, 7, 1- 68.<br />

de Finetti, B. (1972). <strong>Probability</strong>, Induction <strong>and</strong> Statistics. The Art <strong>of</strong> Guessing. London: Wiley.<br />

de Finetti, B. (1974). <strong>Theory</strong> <strong>of</strong> <strong>Probability</strong>. A Critical Introductory Treatment. Volume 1. London:<br />

Wiley.<br />

de Finetti, B. (1975). <strong>Theory</strong> <strong>of</strong> <strong>Probability</strong>. A Critical Introductory Treatment. Volume 2. London:<br />

Wiley.<br />

Doob, J. L. (1953). Stochastic Processes. New York: Wiley.<br />

Dym, H. <strong>and</strong> McKean, H. P. (1976). Gaussian Processes, Function <strong>Theory</strong>, <strong>and</strong> the Inverse Spectral<br />

Problem. New York: Academic Press.<br />

Earman, J. (1986). A Primer on Determinism. Dordrecht: Reidel.<br />

Einstein, A. (1905). Über die von der molekularkinetischen Theorie der Wärme ge<strong>for</strong>derte Bewegung<br />

von in ruhenden Flüssigkeiten suspendierter Teilchen. Annalen der Physik, 17, 549- 560.<br />

Einstein, A. (1906). Zur Theorie der Brownschen Bewegung. Annalen der Physik, 19, 371- 381.<br />

Einstein, A. (1914a). Méthode pour la détermination de valeurs statistiques d’observations concernant<br />

des gr<strong>and</strong>eurs soumises à des fluctuations irrégulières. Archives des sciences physiques<br />

et naturelles, 37, 254- 256.<br />

Einstein, A. (1914b). Eine Methode zur statistischen Verwertung von Beobachtungen scheinbar<br />

unregelmässig quasiperiodisch verlaufender Vorgänge. Unpublished Manuscript. Reprinted<br />

in: M. J. Klein, A. J. Kox, J. Renn <strong>and</strong> R. Schulmann (Eds.). The Collected Papers <strong>of</strong> Albert<br />

Einstein. Volume 4. The Swiss Years, 1912–1914. Princeton: Princeton University Press. 1995.<br />

pp. 603–607.<br />

Enz, C. P. <strong>and</strong> von Meyenn, K. (1994). Wolfgang Pauli. Writings on Physics <strong>and</strong> Philosophy.<br />

Berlin: Springer.<br />

Feigl, H. (1953). Readings in the philosophy <strong>of</strong> science. In: H. Feigl <strong>and</strong> M. Brodbeck (Eds.).<br />

Notes on Causality. New York: Appleton-Century-Cr<strong>of</strong>ts.<br />

Fine, T. L. (1973). Theories <strong>of</strong> <strong>Probability</strong>. An Examination <strong>of</strong> Foundations. New York: Academic<br />

Press.<br />

Fortet, R. (1958). Recent Advances in <strong>Probability</strong> <strong>Theory</strong>. Surveys in Applied Mathematics. IV.<br />

Some Aspects <strong>of</strong> Analysis <strong>and</strong> <strong>Probability</strong>. New York: Wiley, pp.169- 240.


610 H. Primas<br />

Gibbs, J. W. (1902). Elementary Principles in Statistical Mechanics. New Haven: Yale University<br />

Press.<br />

Gillies, D. A. (1973). An Objective <strong>Theory</strong> <strong>of</strong> <strong>Probability</strong>. London: Methuen.<br />

Gnedenko, B. V. <strong>and</strong> Kolmogorov, A. N. (1954). Limit Distributions <strong>for</strong> Sums <strong>of</strong> Independent R<strong>and</strong>om<br />

Variables. Reading, Massacusetts: Addision-Wesley.<br />

Good, I. J. (1965). The Estimation <strong>of</strong> Probabilities. An Essay on Modern Bayesian Methods. Cambridge,<br />

Massachusetts: MIT Press.<br />

Gudder, S. P. <strong>and</strong> Hudson, R. L. (1978). A noncommutative probability theory. Transactions <strong>of</strong> the<br />

American Mathematical <strong>Society</strong>, 245, 1- 41.<br />

Haas, A. (1936). Commentary <strong>of</strong> the Scientific Writings <strong>of</strong> J. Willard Gibbs. New Haven: Yale<br />

University Press.<br />

Halmos, P. R. (1944). The foundations <strong>of</strong> probability. American Mathematical Monthly, 51, 493-<br />

510.<br />

Hanner, O. (1950). Deterministic <strong>and</strong> non-deterministic stationary r<strong>and</strong>om processes. Arkiv för<br />

Matematik, 1, 161- 177.<br />

Hille, E. <strong>and</strong> Phillips, R. S. (1957). Functional Analysis <strong>and</strong> Semi-groups. Providence, Rhode Isl<strong>and</strong>:<br />

American Mathematical <strong>Society</strong>.<br />

Jauch, J. M. (1974). The quantum probability calculus. Synthese, 29, 131- 154.<br />

Jeffrey, R. C. (1965). The Logic <strong>of</strong> Decision. New York: McGraw-Hill.<br />

Jeffreys, H. (1939). <strong>Theory</strong> <strong>of</strong> <strong>Probability</strong>. Ox<strong>for</strong>d: Clarendon Press. 2nd edition, 1948; 3rd edition,<br />

1961.<br />

Kakutani, S. (1950). Review <strong>of</strong> “Extrapolation, interpolation <strong>and</strong> smoothing <strong>of</strong> stationary time series”<br />

by Norbert Wiener. Bulletin <strong>of</strong> the American Mathematical <strong>Society</strong>, 56, 378- 381.<br />

Kallianpur, G. (1961). Some ramifications <strong>of</strong> Wiener’s ideas on nonlinear prediction. In: P.<br />

Masani (Ed.), Norbert Wiener. Collected Works with Commentaries. Volume III. Cambridge,<br />

Massachusetts: MIT Press, pp.402–424.<br />

Kamber, F. (1964). Die Struktur des Aussagenkalk üls in einer physikalischen Theorie. Nachrichten<br />

der Akademie der Wissenschaften, Göttingen. Mathematisch Physikalische Klasse, 10,<br />

103- 124.<br />

Kamber, F. (1965). Zweiwertige Wahrscheinlichkeitsfunktionen auf orthokomplementären Verbänden.<br />

Mathematische Annalen, 158, 158- 196.<br />

Kappos, D. A. (1969). <strong>Probability</strong> Algebras <strong>and</strong> Stochastic Spaces. New York: Academic Press.<br />

Keynes, J. M. (1921). A Treatise on the Principles <strong>of</strong> <strong>Probability</strong>. London: Macmillan.<br />

Khintchine, A. (1934). Korrelationstheorie der stationären stochastischen Prozesse. Mathematische<br />

Annalen, 109, 604- 615.<br />

Khrennikov, A. (1994). p-Adic Valued Distributions in Mathematical Physics. Dordrecht: Kluwer.<br />

Kolmogor<strong>of</strong>f, A. (1933). Grundbegriffe der Wahrscheinlichkeitsrechnung. Berlin: Springer.<br />

Kolmogor<strong>of</strong>f, A. (1948). Algèbres de Boole métriques complètes. VI. Zjazd Matematyków Polskich.<br />

Annales de la Societe Polonaise de Mathematique, 20, 21–30.<br />

Kolmogorov, A. N. (1963). On tables <strong>of</strong> r<strong>and</strong>om numbers. Sankhyá. The Indian Journal <strong>of</strong> Statistics<br />

A, 25, 369- 376.<br />

Kolmogorov, A. N. (1968a). Three approaches to the quantitative definition <strong>of</strong> in<strong>for</strong>mation. International<br />

Journal <strong>of</strong> Computer Mathematics, 2, 157- 168. Russian original in: Problemy<br />

Peredachy In<strong>for</strong>matsii 1, 3–11 (1965).<br />

Kolmogorov, A. N. (1968b). Logical basis <strong>for</strong> in<strong>for</strong>mation theory <strong>and</strong> probability theoy. IEEE<br />

Transactions on In<strong>for</strong>mation <strong>Theory</strong>, IT-14, 662–664.<br />

Kolmogorov, A. N. (1983a). Combinatorial foundations <strong>of</strong> in<strong>for</strong>mation theory <strong>and</strong> the calculus <strong>of</strong><br />

probability. Russian Mathematical Surveys, 38: 4, 29–40.<br />

Kolmogorov, A. N. (1983b). On logical foundations <strong>of</strong> probability theory. <strong>Probability</strong> <strong>Theory</strong> <strong>and</strong><br />

Mathematical Statistics. Lecture Notes in Mathematics. Berlin: Springer, pp.1–5.<br />

Kolmogorov, A. N. <strong>and</strong> Fomin, S. V. (1961). <strong>Elements</strong> <strong>of</strong> the <strong>Theory</strong> <strong>of</strong> Functions <strong>and</strong> Functional<br />

Analysis. Volume 2. Measure. The Lebesgue Integral. Hilbert Space. Albany: Graylock Press.<br />

Kolmogorov, A. N. <strong>and</strong> Uspenskii, V. A. (1988). Algorithms <strong>and</strong> r<strong>and</strong>omness. <strong>Theory</strong> <strong>of</strong> <strong>Probability</strong><br />

<strong>and</strong> its Applications, 32, 389- 412.<br />

König, H. <strong>and</strong> Tobergte, J. (1963). Reversibilität und Irreversibilität von linearen dissipativen<br />

Systemen. Journal für die reine und angew<strong>and</strong>te Mathematik, 212, 104- 108.<br />

Koopman, B. O. (1931). Hamiltonian systems <strong>and</strong> trans<strong>for</strong>mations in Hilbert space. Proceedings<br />

<strong>of</strong> the National Academy <strong>of</strong> Sciences <strong>of</strong> the United States <strong>of</strong> America, 17, 315–318.


<strong>Probability</strong> <strong>Theory</strong> 611<br />

Koopman, B. O. (1940a). The bases <strong>of</strong> probability. Bulletin <strong>of</strong> the American Mathematical <strong>Society</strong>,<br />

46, 763- 774.<br />

Koopman, B. O. (1940b). The axioms <strong>and</strong> algebra <strong>of</strong> intuitive probability. Annals <strong>of</strong> Mathematics,<br />

41, 269- 292.<br />

Koopman, B. O. (1941). Intuitive probabilities <strong>and</strong> sequences. Annals <strong>of</strong> Mathematics, 42, 169-<br />

187.<br />

Koopman, B. O. <strong>and</strong> von Neumann, J. (1932). Dynamical systems <strong>of</strong> continuous spectra. Proceedings<br />

<strong>of</strong> the National Academy <strong>of</strong> Sciences <strong>of</strong> the United States <strong>of</strong> America, 18, 255- 263.<br />

Krein, M. G. (1945). On a generalization <strong>of</strong> some investigations <strong>of</strong> G. Szegö, W. M. Smirnov, <strong>and</strong><br />

A. N. Kolmogorov. Doklady Akademii Nauk, SSSR, 46, 91- 94 [in Russian].<br />

Krein, M. G. (1945). On a problem <strong>of</strong> extrapolation <strong>of</strong> A. N. Kolmogorov. Doklady Akademii<br />

Nauk, SSSR, 46, 306- 309 [in Russian].<br />

Kronfli, N. S. (1971). Atomicity <strong>and</strong> determinism in Boolean systems. International Journal <strong>of</strong><br />

Theoretical Physics, 4, 141- 143.<br />

Kyburg, H. E. <strong>and</strong> Smokler, H. E. (1964). Studies in Subjective <strong>Probability</strong>. New York: Wiley.<br />

Laha, R. G. <strong>and</strong> Rohatgi, V. K. (1979). <strong>Probability</strong> <strong>Theory</strong>. New York: Wiley.<br />

Laplace, P. S. (1814). Essai Philosophique sur les Probabilités. English translation from the sixth<br />

French edition under the title: A Philosophical Essay on Probabilities. 1951. New York: Dover<br />

Publications.<br />

Lindblad, G. (1993). Irreversibility <strong>and</strong> r<strong>and</strong>omness in linear response theory. Journal <strong>of</strong> Statistical<br />

Physics, 72, 539–554.<br />

Loomis, L. H. (1947). On the representation <strong>of</strong> s-complete Boolean algebras. Bulletin <strong>of</strong> the American<br />

Mathematical <strong>Society</strong>, 53, 757- 760.<br />

Lorenzen, P. (1978). Eine konstruktive Deutung des Dualismus in der Wahrscheinlichkeitstheorie.<br />

Zeitschrift für allgemeine Wissenschaftstheorie, 2, 256- 275.<br />

Lo)s , J. (1955). On the axiomatic treatment <strong>of</strong> probability. Colloquium Mathematicum (Wroclaw),<br />

3, 125- 137.<br />

Maistrov, L. E. (1974). <strong>Probability</strong> <strong>Theory</strong>. A Historical Sketch. New York: Wiley.<br />

Martin-Löf, P. (1966). The definition <strong>of</strong> r<strong>and</strong>om sequences. In<strong>for</strong>mation <strong>and</strong> Control, 9, 602- 619.<br />

Martin-Löf, P. (1969a). The literature on von Mises’ kollektivs revisited. Theoria. A Swedish<br />

Journal <strong>of</strong> Philosophy, 35, 12–37.<br />

Martin-Löf, P. (1969b). Algorithms <strong>and</strong> r<strong>and</strong>omness. Review <strong>of</strong> the International Statistical Institute,<br />

37, 265- 272.<br />

Masani, P. (1979). Commentary on the memoire [30a] on generalized harmonic analysis. In: P.<br />

Masani (Ed.), Norbert Wiener. Collected Works with Commentaries. Volume II. Cambridge,<br />

Massachusetts: MIT Press. pp.333–379.<br />

Masani, R. R. (1990). Norbert Wiener, 1894–1964. Basel: Birkhäuser.<br />

Meixner, J. (1961). Reversibilität und Irreversibilität in linearen passiven Systemen. Zeitschrift<br />

für Natur<strong>for</strong>schung , 16a, 721- 726.<br />

Meixner, J. (1965). Linear passive systems. In: J. Meixner (Ed.), Statistical Mechanics <strong>of</strong> Equilibrium<br />

<strong>and</strong> Non-equilibrium. Amsterdam: North-Holl<strong>and</strong>.<br />

Middleton, D. (1960). Statistical Communication <strong>Theory</strong>. New York: MacGraw-Hill.<br />

Nielsen, O. E. (1997). An Introduction to Integration <strong>and</strong> Measure <strong>Theory</strong>. New York: Wiley.<br />

Paley, R. E. A. C., Wiener, N. <strong>and</strong> Zygmund, A. (1933). Notes on r<strong>and</strong>om functions. Mathematische<br />

Zeitschrift, 37, 647- 668.<br />

Parthasarathy, K. R. (1967). <strong>Probability</strong> Measures on Metric Spaces. New York: Academic Press.<br />

Pauli, W. (1954). Wahrscheinlichkeit und Physik. Dialectica, 8, 112- 124.<br />

Perrin, J. (1906). La discontinuité de la matière. Revue du mois, 1, 323- 343.<br />

Picci, G. (1986). Application <strong>of</strong> stochastic realization theory to a fundamental problem <strong>of</strong> statistical<br />

physics. In: C. I. Byrnes <strong>and</strong> A. Lindquist (Eds.), Modelling, Identification <strong>and</strong> Robust<br />

Control. Amsterdam: North-Holl<strong>and</strong>. pp.211–258.<br />

Picci, G. (1988). Hamiltonian representation <strong>of</strong> stationary processes. In: I. Gohberg, J. W. Helton<br />

<strong>and</strong> L. Rodman (Eds.), Operator <strong>Theory</strong>: Advances <strong>and</strong> Applications. Basel: Birkhäuser.<br />

pp.193–215.<br />

Pinsker, M. S. (1964). In<strong>for</strong>mation <strong>and</strong> In<strong>for</strong>mation Stability <strong>of</strong> R<strong>and</strong>om Variables <strong>and</strong> Processes.<br />

San Francisco: Holden–Day.<br />

Post, E. L. (1936). Finite combinatory processes — <strong>for</strong>multation. Journal <strong>of</strong> Symbolic Logic, 1,<br />

103- 105.<br />

Prohorov, Yu. V. <strong>and</strong> Rozanov, Yu. A. (1969). <strong>Probability</strong> <strong>Theory</strong>. Berlin: Springer.


612 H. Primas<br />

Reichenbach, H. (1949). Philosophische Probleme der Quantenmechani k. Basel: Birkhäuser.<br />

Reichenbach, H. (1994). Wahrscheinlichkeitslehre. Braunschweig: Vieweg. 2. Auflage, auf<br />

Grundlage der erweiterten amerikanischen Ausgabe bearbeitet und herausgegeben von Godehard<br />

Link. B<strong>and</strong> 7 der Gesammelten Werke von Hans Reichenbach.<br />

Rényi, A. (1955). A new axiomatic theory <strong>of</strong> probability. Acta Mathematica Academia Scientiarum<br />

Hungaricae, 6, 285- 335.<br />

Rényi, A. (1970a). <strong>Probability</strong> <strong>Theory</strong>. Amsterdam: North-Holl<strong>and</strong>.<br />

Rényi, A. (1970b). Foundations <strong>of</strong> <strong>Probability</strong>. San Francisco: Holden-Day.<br />

Rosenblatt, M. (1971). Markov Processes: Structure <strong>and</strong> Asymptotic Behavior. Berlin: Springer.<br />

Ross, W. D. (1924). Aristotle’s Metaphysics. Text <strong>and</strong> Commentary. Ox<strong>for</strong>d: Clarendon Press.<br />

Rozanov, Yu. A. (1967). Stationary R<strong>and</strong>om Processes. San Francisco: Holden-Day.<br />

Russell, B. (1948). Human Knowledge. Its Scope <strong>and</strong> Limits. London: Georg Allen <strong>and</strong> Unwin.<br />

Savage, L. J. (1954). The Foundations <strong>of</strong> Statistics. New York: Wiley.<br />

Savage, L. J. (1962). The Foundations <strong>of</strong> Statistical Infereence. A Discussion. London: Methuen.<br />

Scarpellini, B. (1979). Predicting the future <strong>of</strong> functions on flows. Mathematical Systems <strong>Theory</strong><br />

12, 281- 296.<br />

Schadach, D. J. (1973). Nicht-Boolesche Wahrscheinlichkeitsmasse für Teilraummethoden in der<br />

Zeichenerkennung. In: T. Einsele, W. Giloi <strong>and</strong> H.-H. Nagel (Eds.), Lecture Notes in Economics<br />

<strong>and</strong> Mathematical Systems. Vol. 83. Berlin: Springer. pp.29–35.<br />

Scheibe, E. (1964). Die kontingenten Aussagen in der Physik. Frankfurt: Athenäum Verlag.<br />

Scheibe, E. (1973). The Logical Analysis <strong>of</strong> Quantum Mechanics. Ox<strong>for</strong>d: Pergamon Press.<br />

Schnorr, C. P (1969). Eine Bemerkung zum Begriff der zufälligen Folge. Zeitschrift für<br />

Wahrscheinlichkeitstheorie und verw<strong>and</strong>te Gebiete 14, 27- 35.<br />

Schnorr, C. P (1970a). Über die Definition von effektiven Zufallstests. Zeitschrift für Wahrscheinlichkeitstheorie<br />

und verw<strong>and</strong>te Gebiete 15, 297–312, 313–328.<br />

Schnorr, C. P (1970b). Klassifikation der Zufallsgesetze nach Komplexität und Ordnung.<br />

Zeitschrift für Wahrscheinlichkeitstheorie und verw<strong>and</strong>te Gebiete, 16, 1- 26.<br />

Schnorr, C. P (1971a). A unified approach to the definition <strong>of</strong> r<strong>and</strong>om sequencies. Mathematical<br />

System <strong>Theory</strong>, 5, 246- 258.<br />

Schnorr, C. P (1971b). Zufälligkeit und Wahrscheinlichkeit. Eine Algorithmische Begründung<br />

der Wahrscheinlichkeitstheorie. Lecture Notes in Mathematics, Volume 218. Berlin: Springer.<br />

Schnorr, C. P (1973). Process complexity <strong>and</strong> effective r<strong>and</strong>om tests. Journal <strong>of</strong> Computer <strong>and</strong><br />

System Sciences, 7, 376- 388.<br />

Schuster, H. G. (1984). Deterministic Chaos. An Introduction. Weinheim: Physik-Verlag.<br />

Schwartz, L. (1973). Radon Measures on Arbitrary Topological Spaces <strong>and</strong> Cylindrical Measures.<br />

London: Ox<strong>for</strong>d University Press.<br />

Scriven, M. (1965). On essential unpredictability in human behavior. In: B. B. Wolman <strong>and</strong> E.<br />

Nagel (eds.). Scientific Psychology: Principles <strong>and</strong> Approaches. New York: <strong>Basic</strong> Books.<br />

Sikorski, R. (1949). On inducing <strong>of</strong> homomorphism by mappings. Fundamenta Mathematicae, 36,<br />

7- 22.<br />

Sikorski, R. (1969). Boolean Algebras. Berlin: Springer.<br />

Solomon<strong>of</strong>f, R. J. (1964). A <strong>for</strong>mal theory <strong>of</strong> inductive inference. In<strong>for</strong>mation <strong>and</strong> Control, 7,<br />

1–22, 224–254.<br />

Stone, M. H. (1936). The theory <strong>of</strong> representations <strong>for</strong> Boolean algebras. Transactions <strong>of</strong> the<br />

American Mathematical <strong>Society</strong>, 40, 37- 111.<br />

Sz.-Nagy, B. <strong>and</strong> Foiaş , C. (1970). Harmonic Analysis <strong>of</strong> Operators on Hilbert Space. Amsterdam:<br />

North-Holl<strong>and</strong>.<br />

Tornier, E. (1933). Grundlagen der Wahrscheinlichkeitsrechnung. Acta Mathematica, 60, 239-<br />

380.<br />

Turing, A. M. (1936). On computable numbers, with an application to the Entscheidungsprob lem.<br />

Proceedings <strong>of</strong> the London Mathematical <strong>Society</strong>, 42, 230- 256. Corrections: Ibid. 43 (1937)<br />

544–546.<br />

Venn, J. (1866). The Logic <strong>of</strong> Chance. London. An unaltered reprint <strong>of</strong> the third edition <strong>of</strong> 1888<br />

appeared by Chelsea, New York, 1962.<br />

von Laue, M. (1955). Ist die klassische Physik wirklich deterministisch? Physikalische Blätter, 11,<br />

269- 270.<br />

von Meyenn, K. (1996). Wolfgang Pauli. Wissenschaftlicher Briefwechsel, B<strong>and</strong> IV, Teil I:<br />

1950–1952. Berlin: Springer-Verlag.


<strong>Probability</strong> <strong>Theory</strong> 613<br />

von Mises, R. (1919). Grundlagen der Wahrscheinlichkeitsrechnung. Mathematische Zeitschrift 5,<br />

52- 99.<br />

von Mises, R. (1928). Wahrscheinlichkeit, Statistik und Wahrheit. Wien: Springer.<br />

von Mises, R. (1931). Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statistik und theoretischen<br />

Physik. Leipzig: Deuticke.<br />

von Mises, R. (1964). Mathematical <strong>Theory</strong> <strong>of</strong> <strong>Probability</strong> <strong>and</strong> Statitics. Edited <strong>and</strong> Complemented<br />

by Hilda Geiringer. New York: Academic Press.<br />

von Neumann, J. (1932a). Zur Operatorenmethode in der klassischen Mechanik. Annals <strong>of</strong> Mathematics,<br />

33, 587–642, 789–791.<br />

von Neumann, J. (1932b). Pro<strong>of</strong> <strong>of</strong> the quasiergodic hypothesis. Proceedings <strong>of</strong> the National<br />

Academy <strong>of</strong> Sciences <strong>of</strong> the United States <strong>of</strong> America, 18, 70- 82.<br />

von Plato, J. (1994). Creating Modern <strong>Probability</strong>: Its Mathematics, Physics, <strong>and</strong> Philosophy in<br />

Historical Perspective. Cambridge: Cambridge University Press.<br />

von Weizsäcker, C. F. (1973). <strong>Probability</strong> <strong>and</strong> quantum mechanics. British Journal <strong>for</strong> the Philosophy<br />

<strong>of</strong> Science, 24, 321- 337.<br />

von Weizsäcker, C. F. (1985). Aufbau der Physik. München: Hanser Verlag.<br />

Waismann, F. (1930). Logische Analyse des Wahrscheinlichkeitsbegriffs. Erkenntnis, 1, 228- 248.<br />

Watanabe, S. (1961). A model <strong>of</strong> mind-body relation in terms <strong>of</strong> modular logic. Synthese, 13, 261-<br />

302.<br />

Watanabe, S. (1967). Karhunen–Loève expansion <strong>and</strong> factor analysis. Theoretical remarks <strong>and</strong><br />

applications. Transactions <strong>of</strong> the Fourth Prague Conference on In<strong>for</strong>mation <strong>Theory</strong>, Statistical<br />

Decision Functions, R<strong>and</strong>om Processes (Prague, 1965). Prague: Academia. pp.635–660.<br />

Watanabe, S. (1969a). Knowing <strong>and</strong> Guessing. A Quantitative Study <strong>of</strong> Inference <strong>and</strong> In<strong>for</strong>mation.<br />

New York: Wiley.<br />

Watanabe, S. (1969b). Modified concepts <strong>of</strong> logic, probability, <strong>and</strong> in<strong>for</strong>mation based on generalized<br />

continuous characteristic function. In<strong>for</strong>mation <strong>and</strong> Control, 15, 1- 21.<br />

Whittaker, E. T. (1943). Chance, freewill <strong>and</strong> necessity in the scientific conception <strong>of</strong> the universe.<br />

Proceedings <strong>of</strong> the Physical <strong>Society</strong> (London), 55, 459- 471.<br />

Wiener, N. (1930). Generalized harmonic analysis. Acta Mathematica, 55, 117- 258.<br />

Wiener, N. (1942). Response <strong>of</strong> a nonlinear device to noise. Cambridge, Massachusetts: M.I.T.<br />

Radiation Laboratory. Report No. V-186. April 6, 1942.<br />

Wiener, N. (1949). Extrapolation, Interpolation, <strong>and</strong> Smoothing <strong>of</strong> Stationary Times Series. With<br />

Engineering Applications. New York: MIT Technology Press <strong>and</strong> Wiley.<br />

Wiener, N. (1958). A further note on differentiability <strong>of</strong> auto-correlation functions. Proceedings<br />

<strong>of</strong> the Institute <strong>of</strong> Radio Engineers, 46, 1760.<br />

Wiener, N. <strong>and</strong> Akutowicz, E. J. (1957). The definition <strong>and</strong> ergodic properties <strong>of</strong> the stochastic adjoint<br />

<strong>of</strong> a unitary trans<strong>for</strong>mation. Rendiconti del Circolo Matematico di Palermo, 6, 205- 217,<br />

349.<br />

Wold, H. (1938). A Study in the Analysis <strong>of</strong> Stationary Times Series. Stockholm: Almquist <strong>and</strong><br />

Wiksell.<br />

Zvonkin, A. K. <strong>and</strong> Levin, L. A. (1970). The complexity <strong>of</strong> finite objects <strong>and</strong> the development <strong>of</strong><br />

the concepts <strong>of</strong> in<strong>for</strong>mation <strong>and</strong> r<strong>and</strong>omness by means <strong>of</strong> the theory <strong>of</strong> algorithms. Russian<br />

Mathematical Surveys, 25, 83–124.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!