On the evolution of the concept of probability as a mirror of the evolution of reason
Summary
This paper argues that probability theory is a historically evolving form of rationality, tracing its development from combinatorial games to Bayesian inference and contrasting it with fuzzy logic and deep learning.
View Cached Full Text
Cached at: 06/02/26, 03:45 PM
# On the evolution of the concept of probability as a mirror of the evolution of reason
Source: [https://arxiv.org/html/2606.00102](https://arxiv.org/html/2606.00102)
Courtillot VincentAcadémie des Sciences, Institut de France, Paris, FranceGibert DominiqueDeepField Sensing, FranceVladimir KossobokovInstitute of Earthquake Prediction Theory and Mathematical Geophysics, Russian Academy of Sciences, Moscow, RussiaAccademia Nazionale delle Scienze detta dei XL, Roma, ItaliaBoulé Jean\-BaptisteMuséum National d’Histoire Naturelle, CNRS UMR7196, INSERM U1154, Paris, FranceZuddas PierpaoloSorbonne Université, CNRS, METIS,UMR7619, Paris, FranceLopes FernandoMuséum National d’Histoire Naturelle, CNRS UMR7196, INSERM U1154, Paris, FranceMarccagi PaïkanMuséum National d’Histoire Naturelle, CNRS UMR7196, INSERM U1154, Paris, FranceMaineult AlexisLaboratoire de Géologie de l’ENS, UMR 8538, Paris, France
###### Abstract
Over the centuries, probability theory has evolved from a modest calculus of games of chance into a central framework for scientific reasoning under uncertainty\. This article argues that probability should not be understood merely as a mathematical tool, but as a historically evolving form of rationality, whose successive transformations reflect deep shifts in the structure of scientific thought itself\. From the combinatorial symmetry of Pascal and Fermat to the inductive logic of Bayes and Laplace, from Poisson’s temporalization of events to Kolmogorov’s axiomatic formalization, probability has progressively incorporated uncertainty, time, and coherence into rational judgment\. This historical trajectory culminates in modern Bayesian interpretations, exemplified by Tarantola’s conception of probability as a logic of information, in which prior knowledge and observational data are combined through rational inference\. While this framework represents a high point in the epistemological maturation of probability, it also reveals its internal limits\. Probability theory presupposes well\-defined propositions and measurable events; it quantifies uncertainty about facts, but it remains unable to formalize the imprecision of the concepts through which facts are described\. The article therefore examines the extension of rationality beyond probability\. Fuzzy logic is introduced as a formal response to the problem of vagueness, providing a rigorous language for graded meaning and qualitative judgment\. In contrast, the recent rise of deep learning and neural networks is analyzed as a powerful but epistemologically distinct approach\. By relying on geometric interpolation and optimization rather than on explicit logical structures, deep learning achieves remarkable predictive performance while bypassing uncertainty representation, conceptual qualification, and causal explanation\. By situating probability, fuzzy logic, and deep learning within a unified historical and epistemological perspective, this article clarifies their respective roles and limitations\. It argues that contemporary scientific rationality cannot be reduced to data\-driven performance alone, but requires the explicit articulation of uncertainty, vagueness, and inference\. In this sense, the evolution of probability offers a mirror for the evolution of reason itself, illuminating both the achievements and the unresolved challenges of thinking under uncertainty\.
Keywords:Probability theory; Rationality; Bayes; Laplace; Poisson; Kolmogorov; Tarantola; Zadeh; Epistemology of uncertainty; History of science; fuzzy logic; deep learning
††footnotetext:In Memoriam Jean\-Louis Le Mouël passed away before the finalization of this manuscript\. We dedicate this work to his memory\.## 1Introduction
Over the centuries, the manner in which probability has been conceived, formalized, and employed has undergone a profound transformation, closely intertwined with the evolution of rational thought itself\. For a long time, chance was attributed to fortune, providence, or hidden causes beyond human understanding\. In pre\-modern societies, randomness was often perceived as a manifestation of divine will or cosmic disorder rather than as an object susceptible to rational analysis \(eg\.\[[4](https://arxiv.org/html/2606.00102#bib.bib9),[7](https://arxiv.org/html/2606.00102#bib.bib15),[23](https://arxiv.org/html/2606.00102#bib.bib44)\]\)\. The gradual emergence of probability theory from the seventeenth century onward marks a decisive intellectual rupture: uncertainty ceased to be merely endured or interpreted symbolically and became instead something to be calculated, reasoned about, and eventually integrated into the very structure of scientific explanation\.
The mathematical theory of probability emerged relatively late in the history of ideas, crystallizing in the context of early modern science\. Its first systematic formulations arose not from physics or astronomy, but from problems posed by games of chance\. The celebrated correspondence between Pascal and Fermat in 1654 concerning the “problem of points” constitutes a founding moment in this history \(cf\.\[[16](https://arxiv.org/html/2606.00102#bib.bib33)\], p\. 407\-446 for the french epistolary exchange\)\. What was at stake was not merely the fair division of stakes, but the possibility of subjecting chance itself to rational calculation\. From this initial gesture, probability progressively expanded its scope, evolving from a combinatorial arithmetic of equipossible cases into a general framework for reasoning under uncertainty\.
Throughout its historical development, probability theory has repeatedly been reformulated, each reformulation reflecting a deeper transformation in the way reason confronts uncertainty\. The Bayesian inversion introduced by?\) and generalized by Laplace \(cf\.\[[11](https://arxiv.org/html/2606.00102#bib.bib26),[12](https://arxiv.org/html/2606.00102#bib.bib27)\]\) endowed probability with an explicitly inductive and temporal dimension, allowing causes to be inferred from effects and beliefs to be updated in light of new evidence\. With?\), probability was no longer confined to abstract reasoning but became anchored in empirical reality, giving rise to a genuine dynamics of events and inaugurating modern statistical thinking\. The nineteenth and early twentieth centuries further consolidated this trajectory through frequentist interpretations and the discovery of collective regularities such as the law of large numbers and the central limit theorem\. Finally,?\)’s axiomatization endowed probability with full mathematical rigor, completing its formal closure as a measure\-theoretic discipline\.
Yet probability did not remain a purely mathematical object\. Throughout the twentieth century, its epistemological interpretation continued to evolve\. Thinkers such as de?\),?\) and?\) emphasized that probability should be understood as an extension of logic itself, a calculus of rational belief under uncertainty\. This perspective finds a particularly clear and operational expression in the work of?\), who interprets probability as a logic of information, especially in the context of inverse problems \(eg\.\[[6](https://arxiv.org/html/2606.00102#bib.bib14)\]\)\. In this view, probability is not merely a tool for quantifying randomness or frequencies, but a coherent language for combining prior knowledge and observational data through rational inference\. Probability thus appears not simply as a technical apparatus, but as a historically evolving form of rationality\.
This article defends the thesis that the evolution of probability theory mirrors the evolution of reason itself\. Probability is not reducible to a single interpretation, whether frequentist, subjective, or algorithmic\. Rather, it constitutes a conceptual framework that has progressively incorporated symmetry, temporality, induction, empirical confrontation, and logical coherence into scientific reasoning\. At the same time, this evolution reveals internal limits\. Probability presupposes well\-defined propositions and measurable events; it quantifies uncertainty about facts, but it remains silent about the imprecision of the concepts through which facts are described\. As contemporary science increasingly confronts complex systems, ill\-defined categories, and qualitative judgments, this limitation becomes unavoidable\.
Recognizing these limits motivates the exploration of complementary frameworks\. In particular, fuzzy logic, introduced by?\) and?\), addresses a distinct dimension of uncertainty: the vagueness inherent in meaning itself\. More recently, the rise of artificial intelligence and deep learning has introduced yet another mode of dealing with uncertainty, one that relies on geometric interpolation and optimization rather than on explicit logical structures\. While extraordinarily powerful in practice, these methods raise new epistemological questions concerning explanation, causality, and understanding\.
The aim of this paper is therefore not merely historical, but philosophical\. By tracing the successive transformations of probability, and by situating contemporary computational approaches within this lineage, we seek to clarify what is gained, and what is lost, when reason confronts uncertainty in different ways\. Probability, fuzzy logic, and deep learning are not competing tools addressing the same problem; they embody distinct conceptions of rationality, each with its own strengths and limitations\.
The structure of the paper reflects this historical and conceptual trajectory\. In Section[2](https://arxiv.org/html/2606.00102#S2), we examine the birth of probabilistic reasoning in the work of Pascal and Fermat, emphasizing the role of symmetry and combinatorics in the domestication of chance\. Section[3](https://arxiv.org/html/2606.00102#S3)is devoted to Bayes and Laplace, who introduced induction and an explicit arrow of time into probabilistic reasoning, transforming probability into a tool for rational learning\. Section[4](https://arxiv.org/html/2606.00102#S4)focuses on Poisson and the emergence of probabilistic dynamics, where probability becomes anchored in empirical observation and temporal processes\. Section[5](https://arxiv.org/html/2606.00102#S5)discusses the frequentist turn and the discovery of statistical regularities emerging from disorder, culminating in the law of large numbers and the central limit theorem\. Section[6](https://arxiv.org/html/2606.00102#S6)addresses the axiomatic closure of probability with Kolmogorov and the resulting epistemic silence of the formal theory\. In Section[7](https://arxiv.org/html/2606.00102#S7), we present Tarantola’s interpretation of probability as a logic of information, which represents the most mature epistemological formulation of probabilistic reasoning\. Section[8](https://arxiv.org/html/2606.00102#S8)identifies the internal limits of probabilistic expressivity, showing why uncertainty alone is not sufficient to capture all forms of scientific judgment\. Section[9](https://arxiv.org/html/2606.00102#S9)introduces fuzzy logic as a formal response to the problem of vagueness and graded meaning\. Section[10](https://arxiv.org/html/2606.00102#S10)critically examines deep learning and neural networks, arguing that they constitute a powerful geometric approach that nonetheless bypasses explicit logical reasoning\. The conclusion synthesizes these developments and reflects on the future of rationality in the age of uncertainty\.
## 2The domestication of chance: symmetry and combinatorics\. Pascal, Fermat, and the birth of probabilistic reason
The mathematical theory of probability officially emerged in the mid\-seventeenth century, when Pascal and de Fermat exchanged letters in 1654 \(eg\.\[[16](https://arxiv.org/html/2606.00102#bib.bib33)\]\) concerning a problem of chance that would become famous as the ”problem of points”\. The question was to determine the fair division of stakes in a game interrupted before its conclusion, given the players’ respective scores at the moment of interruption\. Through this exercise, Pascal and Fermat laid the foundations of probabilistic reasoning by introducing a systematic method to quantify each player’s advantage\. Their solution rested on the principle of equipossibility, each future outcome of the game being assumed “equally likely” for the players, and on the concept of mathematical expectation, the mean value of possible gains\. This concept of expectation,ie\.the sum of possible gains weighted by their respective probabilities, allowed them to define a just division of the stakes\. In the simplest version of the problem of points, a game played to three wins, interrupted at a score of 2\-1, Pascal explained that two scenarios remained possible, 2\-2 or 3\-1, each with equal likelihood\. The first player was thus guaranteed to recover at least his own stake,mm, and had a one\-half chance of winning the opponent’s additional stakemm; hence, he should receive a total of3m2\\dfrac\{3m\}\{2\}, while his opponent would receivem2\\dfrac\{m\}\{2\}\. This reasoning marks the first explicit application of combinatorial calculation to randomness, made possible by tools such as Pascal’s arithmetic triangle, now known as Pascal’s triangle, which provides a systematic way to enumerate favorable and unfavorable cases \(cf\.\[[22](https://arxiv.org/html/2606.00102#bib.bib41),[21](https://arxiv.org/html/2606.00102#bib.bib40)\]\)\.
This birth of probability marks a major intellectual turning point\. For the first time, uncertainty was domesticated through calculation\. Whereas earlier civilizations saw in chance the whim of the gods or the workings of fate, Pascal and Fermat proposed to apply reason and logic to games of chance\. They formalized, in particular, the rule according to which the probability of an event can be defined as the ratio between the number of favorable cases and the total number of equiprobable cases,
P\(E\)=favorables casespossible cases\.P\(E\)=\\dfrac\{\\textrm\{favorables cases\}\}\{\\textrm\{possible cases\}\}\.\(1\)
This definition, later known as the classical definition of probability, rests on an assumed symmetry,ie\.equiprobability, among the conceivable outcomes, reflecting the idea that, in the absence of contrary information, reason postulates an indifference among possible results\. Thus, for a fair die, each face has, for example, a1/61/6chance of appearing, by a purely symmetry\-based argument\.
It should be noted that this formalization of chance did not occur overnight\. The Dutch mathematician?\) took up the torch, publishing the first treatise on probability, which helped to disseminate these emerging ideas\. Yet it was indeed Pascal and Fermat who are credited with initiating the ”mathematization of chance”, by laying the first foundations of a mathematical theory of the probable\. In doing so, they opened the way to a new conception of rationality,ie\.the integration of calculation into decision\-making under uncertainty\. This was a natural extension of the emerging scientific spirit,ie\.Galileo, Descartes, and others, which was already seeking rational laws behind natural phenomena; from this point onward, even random phenomena could become the object of rigorous laws and reasoning\.
Moreover, Pascal’s approach already contained, in embryonic form, more advanced notions that deserve to be emphasized\. In his correspondence, he implicitly employed the idea of conditional expectation to estimate the value of a future stake based on the partial information available,eg\.the fact that one player leads 2\-1 in an unfinished game\. This amounts to calculating the expected gain given the current state of the game, an anticipation of what would later become the notion of conditional probability\. This conditional aspect foreshadows one of the fundamental principles of probabilistic reasoning, the updating of an event’s probability when partial information is available\. In short, from its very beginnings, the theory of probability appeared as a mirror of reason in action, combining symmetry,ie\.equal treatment of possible cases, with conditioning by acquired information\.
Figure 1:Pascal’s Triangle\. This combinatorial diagram, which organizes the binomial coefficients, reflects one of the founding moments in the history of probability; the transition, initiated in the seventeenth century by Pascal and Fermat, from an arithmetic of games of chance to a genuine mathematical theory of the possible\. By systematically enumerating the elementary configurations of a binary event, the triangle makes visible the discrete architecture of randomness and foreshadows the formalization of the calculus of chances, from binomial laws to modern continuous models\. It thus plays a key role in the epistemological evolution leading from classical combinatorics to the frequentist interpretation, and ultimately to contemporary Bayesian reinterpretations of random phenomena\.Figure[1](https://arxiv.org/html/2606.00102#S2.F1)provides a visual representation of Pascal’s triangle, as originally employed by Pascal to address early problems in probability\. Each cell in the figure corresponds to a binomial coefficient\(nk\)\\left\(\\begin\{array\}\[\]\{c\}n\\\\ k\\end\{array\}\\right\), that is, the number of distinct ways to obtain exactlykkoccurrences of a given outcome, here heads, which we designate as the event of interest, over successive tosses of a fair coin\. The triangular structure directly displays the combinatorial progression, the first row contains a single ‘1’, the second two ‘1’s, the third ‘1\-2\-1’, and so forth\. The color scale, ranging from pale green to lighter yellow in the central region, highlights the rapid increase in the number of configurations whenkklies nearn/2n/2corresponding to the largest number of distinct sequences yielding an intermediate number of heads\. Conversely, the edges of the triangle, in a deeper green, remind us that there is only one possible sequence that produces either zero heads ornnheads: always obtaining the same face\.
This simple geometric construction encapsulates the core intuition of the early calculus of chances as it emerged in the seventeenth century\. Suppose a fair coin is tossednntimes\. The2n2^\{n\}possible sequences of heads and tails are all assumed to be equally likely, in accordance with Pascal’s principle of indifference, in the absence of further information, reason must treat all possible outcomes as symmetrical\. Thenthn^\{th\}row of Pascal’s triangle shows how these2n2^\{n\}sequences are distributed according to the number of heads observed\. Forn=6n=6, illustrated in the Figure[1](https://arxiv.org/html/2606.00102#S2.F1), one reads the sequence ‘1\-6\-15\-20\-15\-6\-1’, this means that there are 20 sequences containing exactly three heads, but only 6 sequences containing one or five heads, and a single sequence containing zero or six heads\. Dividing each of these counts by26=642^\{6\}=64immediately yields the binomial distribution with parameterp=1/2p=1/2, which describes the exact distribution of the number of heads obtained in six independent trials\. This figure highlights two essential ideas for understanding how Pascal, followed by Fermat, transformed chance into an object of rational analysis\. First, it shows that the apparent ‘disorder’ of successive coin tosses conceals an underlying combinatorial structure that is perfectly regular\. Second, it reveals that this structure is symmetric; the number of sequences yieldingkkheads is equal to the number yieldingn−kn\-kheads\. This symmetry plays a fundamental role in the emergence of the modern notion of equiprobability, in the early games of chance studied by Pascal, nothing allowed one to favor one sequence over another; reason therefore compels us to treat all cases alike, to enumerate them, and to compute probabilities by comparing each count to the total\.
Thus, Pascal’s triangle is not merely a mnemonic device for arranging binomial coefficients, it stands as one of the earliest visual matrices of probabilistic thought\. It reveals how a random situation can be dissected into a finite set of possibilities, all equally plausible, whose examination leads to a quantitative measure of uncertainty\. In this sense, the Figure[1](https://arxiv.org/html/2606.00102#S2.F1)materializes the historical transition traced in the present article, the moment when seventeenth\-century French rationality, heir to Descartes and contemporary with Pascal, begins to calculate chance, reducing uncertainty to a geometry of possibilities\. Beneath the arithmetic of the triangle lies the outline of modern probability, understood as the ratio between the combinatorial structure of the world and the impartiality of reason toward all possible cases\.
## 3Induction and the arrow of time\. Bayes and Laplace: probability as rational learning
After the combinatorial foundations of the seventeenth century, the eighteenth century witnessed a major conceptual expansion; probability became a tool for inductive inference\. Two emblematic figures illustrate this intellectual revolution\. First,?\) proposed to infer the probability of a cause from observed effects, the so\-called problem of inverse probability\. Second,?\), who independently rediscovered Bayes’s result and made it the cornerstone of a general theory of scientific induction\.
Bayes’s theorem states that for two eventsAAandBB, one has,
P\(A\|B\)=P\(B\|A\)P\(A\)P\(B\)\.P\(A\|B\)=\\dfrac\{P\(B\|A\)P\(A\)\}\{P\(B\)\}\.\(2\)
In other words, the probability ofAAgivenBBis obtained by multiplying the prior probability ofAAby the likelihood of observingBBunder the assumptionAA, and by normalizing this product by the marginal probability ofBB\. Bayes presented this theorem in the context of an urn problem, where one seeks to estimate an unknown probability from observed trials, in modern terms, he was computing aposteriorprobability of a parameter by combiningpriorinformation with experimental data\. Although Bayes himself could not fully develop all the implications of his formula, his 1763 paper, published posthumously by his friend Price, laid the foundations of Bayesian inference\. From that moment onward, it became possible to ”reason backward” from effects to causes in a rational way, by quantifying how new observations update our degrees of belief\. In this sense, Bayes introduced a genuine arrow of time into probabilistic reasoning, one starts from an initial state of knowledge,ie\.thepriorprobability, and refines it progressively as new data arrive,ie\.theposteriorprobability\. This temporal orientation, from the past, orpriorhypotheses, toward the future, or updated conclusions, constitutes a major conceptual innovation\. It turns probability into a dynamic tool, no longer a mere static symmetry of idealized cases\.
Laplace, often regarded as the true father of probabilistic statistics, went further by generalizing and systematically applying Bayes’s method\. In hisThéorie analytique des probabilités\(cf\.\[[11](https://arxiv.org/html/2606.00102#bib.bib26)\]\) andEssai philosophique sur les Probabilités\(cf\.\[[12](https://arxiv.org/html/2606.00102#bib.bib27)\]\), Laplace developed a unified vision of probability, at once a combinatorial calculus, inherited from Pascal, and a universal inductive method\. He applied probabilistic reasoning to a wide range of fields; astronomy with measurement errors, determination of orbits, etc\., demography, insurance, and even jurisprudence with assessment of the reliability of testimonies, among others\. It was Laplace who popularized Bayes’s famous formula in its modern form and interpreted it in terms of probable causes and effects\. In a sense, he transformed Bayes’s formula into a principle of reasoning: “to infer causes from observed effects”\.
Laplace is also known for his principle of indifference, also called the principle of insufficient reason, which generalizes the idea of equiprobability; in the absence of any information, all possible outcomes should be assumed equally likely\. For example, we can read in?\), 2sd book, chapter 1, p\. 178,§\\S2 : ”The theory of probability consists in reducing all events that may occur under given circumstances to a certain number of equally possible cases, that is, cases about whose occurrence we are equally undecided, and in determining, among these cases, the number of those that are favorable to the event whose probability is sought”\. This principle allowed him to assigna prioriobjective probabilities even to unique or non\-repeatable events\. For instance, Laplace calculated the probability that a scientific discovery was due to chance, or that an astronomical phenomenon would recur\. A famous example is Laplace’s rule of succession; if an event has occurred n times in succession without exception,eg\.the sunrise observed each morning, then the probability that it will occur again next time is estimated as,
Pnext=n\+1n\+2P\_\{next\}=\\dfrac\{n\+1\}\{n\+2\}\(3\)
Using this formula,?\) obtained a probability extremely close to 1 \(≈0\.99999945\\approx 0\.99999945\), that the Sun would rise again the following day, given that it had always risen on all known days of human history\. This example, seemingly paradoxical at first glance, illustrates how Laplace’s probabilistic reasoning treats an uncertain future on the basis of observed past events; although absolute certainty remains unattainable, the computed probability provides a rational guide, almost a practical certainty in this case, for anticipating future outcomes\. One clearly sees here the role of the arrow of time; probabilistic reasoning combines the past,ie\.thennoccurrences already observed, with a symmetry principle,ie\.nopriorbias, hence the1n\+2\\dfrac\{1\}\{n\+2\}as an initial weight of ignorance, to project a quantitative measure of confidence into the future;ie\.the\(n\+1\)−th\(n\+1\)\-thoccurrence\.
Laplace famously encapsulated this philosophy in a striking sentence: “From this Essay, one sees that the theory of probabilities is, at bottom, nothing more than common sense reduced to calculation\.” \(cf\.\[[12](https://arxiv.org/html/2606.00102#bib.bib27)\], p\. 190,§\\S2\)\. For him, the calculus of probabilities provided a quantitative formalization of what the prudent human mind already does intuitively, weighting its beliefs according to the evidence\. This remark shows that, as early as the beginning of the nineteenth century, probability was perceived as a mirror of sound reasoning under uncertainty, that is, an extension of common sense capable of achieving a level of precision and coherence that intuition alone could not guarantee\. Laplace further added that the theory of probability “It enables us to assess with accuracy what reasonable minds grasp by a kind of instinct, without often being able to account for it\.” \(cf\.\[[12](https://arxiv.org/html/2606.00102#bib.bib27)\], p\. 190,§\\S2\)\. In other words, it renders explicit and refines our intuitive judgments of reason\.
It is worth noting, somewhat ironically, that Laplace remained a convinced determinist at the philosophical level, his famous “Laplace’s demon” an imaginary intellect that could predict the future if it knew all positions and velocities at a given instant, stands as the emblem of this view:“An intellect which, at a given instant, knew all the forces that animate nature and the respective positions of the beings that compose it, and which were moreover vast enough to submit these data to analysis, would embrace in a single formula the movements of the greatest bodies in the universe and those of the lightest atom: nothing would be uncertain for it, and the future, as well as the past, would be present to its eyes” \(cf\.\[[12](https://arxiv.org/html/2606.00102#bib.bib27)\], pp\. 4\-5\)\. For Laplace, chance was merely an appearance born of our ignorance\. Yet it is precisely to compensate for this unavoidable ignorance that probability becomes indispensable; in the absence of perfect knowledge of the world, we employ probabilistic calculation as a substitute for complete understanding\. This tension between determinism and probability is characteristic of the era; classical reason held that uncertainty could, in principle, be eliminated through full information, yet in practice it had to acknowledge the necessity of the probable in order to act without such complete information\. The contribution of Bayes and Laplace was to provide the intellectual tools to rationalize these wagers on the unknown\.
Figure 2:Bayesian inference process applied to estimating the bias of a coin whose true probability of ‘heads’ isθ=0\.7\\theta=0\.7\. The uniform prior \(absence of observation\) represents a state of complete ignorance\. After 10 tosses, the likelihood begins to shape the belief, though uncertainty remains significant\. With 100 and then 200 observations, the posterior distribution becomes increasingly narrow and centered around the correct value\. This sequence illustrates the essential dynamic of the Bayesian approach; a cumulative, coherent, and quantified learning process in which knowledge is not imposed by the data but continuously revised by them\. It also highlights the way in which the notion of probability evolves from a purely combinatorial framework \(cf\.\. Figure[1](https://arxiv.org/html/2606.00102#S2.F1)\) toward an inferential framework based on the rational updating of beliefs\.Figure[2](https://arxiv.org/html/2606.00102#S3.F2)illustrates, using a simple coin\-toss example, how a Bayesian line of reasoning allows one to move from an initial state of ignorance to an increasingly precise estimate of an unknown parameter, in this case, the bias of the coin\. We assume that a coin is tossed repeatedly, with heads coded as the outcome of interest \(value 1\) and tails as 0\. The parameterθ\\thetadenotes the unknown probability of obtaining heads on each toss\. To fix ideas, the data shown in the figure were generated using a ‘true’ valueθtrue=0\.7\\theta\_\{true\}=0\.7, indicated by the red vertical dashed line\. The observer, however, does not know this value: it must be inferred from the successive outcomes of the tosses\.
In summary, with Bayes and Laplace, probability acquired an inductive and temporal dimension\. No longer was it merely a matter of enumeratinga priorisymmetric configurations; one now learns froma posterioridata\. The concept of conditional probability and Bayes’s formula introduce a directionality into reasoning; we move from past correlations,ie\.observed data and initial hypotheses, toward a future updating of our beliefs,ie\.theposterior\. The language of probability calculus thus becomes the language of experimental reason; with each new observation, reason revises its conclusions coherently\. This idea of a continuous adaptation of beliefs under the influence of new information represents a genuine epistemological arrow of time, the hallmark of the Enlightenment’s recognition, in the eighteenth century, that uncertainty is not a failure of knowledge, but an intrinsic component of it\.
## 4From expectation to events\. Poisson and the emergence of probabilistic dynamics
Within the great lineage of thinkers who shaped the theory of probability, Poisson occupies a singular place, at once the heir of Laplace and a precursor to modern developments in statistics and stochastic processes\. Poisson was both a student and an admirer of Laplace, whom he readily described as his master, and he extended Laplace’s work by seeking to anchor probability in measurable reality, beyond its purely combinatorial foundations\. His name remains attached to a fundamental law, the Poisson distribution, introduced in his 1837 treatise ”Recherches sur la probabilité des jugements en matière criminelle et en matière civile”, yet his influence extends far beyond that formula\. In a sense,?\) transformed probability into a dynamics of chance; he gave substance to the idea that random events are not merely to be counted, but can be conceived as a continuous temporal process endowed with its own statistical regularities\.
In his treatise,?\) set out to solve a concrete problem; how to evaluate, through calculation, the reliability of verdicts delivered by criminal juries\. Following in the footsteps of?\) and?\), he sought to apply probabilistic reasoning to human decisions, while for the first time incorporating empirical data drawn from official French criminal statistics \(1825\-1835\)\. The undertaking was audacious,?\) aimed to quantify the probability of judicial error as a function of the average competence of jurors and the observed rate of convictions, thereby producing a kind of moral and rational assessment of French justice\. He called this endeavor “measuring the moral state of the country”\. This intellectual gesture marks a turning point; probability ceased to be merely an art of hypothetical reasoning and became an instrument of social observation, grounded in data\. Through this attempt to synthesize theoretical calculation and empirical measurement, Poisson inaugurated what could be described as mathematical statistics\.
But posterity would remember above all from this treatise the unveiling of a new law of chance, the Poisson law, which formalizes the probability that a rare event occurs a certain number of times within a given interval\.?\) began with the binomial law of?\) and?\) and derived from it a remarkable limit; when the number of trials becomes very large and the probability of successppbecomes very small, so thatnp=λnp=\\lambda, remains constant, the distribution of occurrences tends toward,
P\(k\)=e−λλkk\!\.P\(k\)=\\dfrac\{e^\{\-\\lambda\}\\lambda^\{k\}\}\{k\!\}\.\(4\)
This law, sometimes called the “law of small numbers”, elegantly captures the regularity of rare phenomena, whether the number of deaths from a specific cause, the number of accidents occurring in a single day, or, later, the arrival of telephone calls at a switchboard\. Beneath this simple formula lies a profound intuition: even chance itself, when observed across large ensembles, obeys stable regularities\. The apparent disorder of individual events resolves, through aggregation, into a lasting statistical structure\. This idea, that the multiplicity of contingencies gives rise to order, stands as one of the guiding threads of modern probabilistic thought and one of Poisson’s most fertile intellectual legacies\.
Figure 3:Empirical manifestation of the Poisson law, the “law of small numbers” introduced by Poisson \(1837\)\. The left panel compares the empirical distribution of the number of occurrences observed within a unit intervalΔt=1\\Delta t=1to the theoretical Poisson probability mass functionP\(k\)=e−λλkk\!P\(k\)=\\dfrac\{e^\{\-\\lambda\}\\lambda^\{k\}\}\{k\!\}withλ=2\.5\\lambda=2\.5\. Although each event is rare and independent, their aggregated counts exhibit a remarkable stability, a property that Poisson derived as a limit of the binomial law of Bernoulli \(1713\) and Laplace \(1812\) when the number of trials becomes large and the success probability small\. The right panel shows a sample path of the corresponding Poisson counting process: a staircase\-like trajectory whose jumps occur at unpredictable times yet accumulate at an average rate that remains strikingly regular\. This dual perspective, distributional and temporal, embodies Poisson’s central insight; that even phenomena governed by chance alone can form enduring statistical structures when considered over large ensembles\. The law of small numbers thus illustrates the emergence of order from contingency, one of the foundational intuitions of modern probabilistic thought\.Figure[3](https://arxiv.org/html/2606.00102#S4.F3)offers a concrete and intuitive illustration of the statistical structure captured by Poisson’s “law of small numbers”\. The left panel shows the empirical distribution of the number of events occurring in a fixed interval of lengthΔt=1\\Delta t=1, obtained from repeated simulations of a rare\-event mechanism with average rateλ=2\.5\\lambda=2\.5\. Although each trial is governed by chance and nothing guarantees that the same number of events should appear from one realization to the next, the histogram rapidly stabilizes around the theoretical Poisson distributionP\(k\)=e−λλkk\!P\(k\)=\\dfrac\{e^\{\-\\lambda\}\\lambda^\{k\}\}\{k\!\}represented by the black markers\. This agreement exemplifies the remarkable regularity that Poisson \(1837\) uncovered when he derived his law as the limiting form of the binomial distribution studied earlier by?\) and?\); when the number of trials becomes very large while the probability of success becomes very small, the distribution of the total number of occurrences settles into a universal and analytically tractable shape\. The right panel complements this distributional perspective by showing a sample path of the associated Poisson counting processN\(t\)N\(t\)\. The trajectory evolves in irregular jumps separated by random waiting times, capturing the unpredictability of each individual occurrence, yet the overall growth of the process reflects the stable average rateλ\\lambda\. The path thus reveals, in temporal form, the same epistemological message conveyed by the histogram; beneath the apparent disorder of isolated events lies a persistent statistical order, emerging from the aggregation of many independent contingencies\. Whether we examine the counts over a fixed interval or the accumulation of events through time, the Poisson law demonstrates how the study of rare phenomena gave rise to one of the foundational insights of modern probabilistic reasoning: the idea that randomness, when viewed across large ensembles, obeys precise and reproducible laws\.
Beyond the law itself, Poisson implicitly introduced a new notion, that of the Poisson process, a temporal flow of independent random events occurring at a constant average rate\. Without formulating it with modern rigor, Poisson was already conceiving of chance as a function of time, rather than as a mere distribution of states\. He assumed that events occur independently of one another, in disjoint intervals, and that their number over a given duration follows the law he had discovered\. This vision, remarkably modern, anticipated an entire branch of twentieth\-century stochastic theory, from the counting processes of?\) and?\) to the foundations of queueing theory and statistical physics\. By establishing a connection between probability, frequency, and time, Poisson ushered the theory of chance into a kinematics of events; the question was no longer only how many times an outcome occurs, but how and at what rate it unfolds over time\. In this lies, in embryonic form, the birth of the modern theory of stochastic processes\.
Poisson’s work also exemplifies the nineteenth century’s characteristic tension between mathematical reason and social reality\. In seeking to apply the calculus of probability to the administration of justice, he extended the French rationalist tradition, the lineage from Pascal to Laplace that sought to subject chance to reason, but at the same time revealed its practical limits\. His contemporaries, in both the legal and scientific worlds, received his work with caution\. Applying the calculus of the probable to moral or judicial decisions appeared to them both bold and misplaced; as with Condorcet before him, society was not yet ready to accept that mathematical reason could pass judgment on human affairs\. Yet behind these resistances, Poisson raised an essential question: what is the true scope of probabilistic reasoning? Can the uncertainty of human judgment really be quantified ? Is probability confined to the science of the external world, or can it be extended to the domains of decision, ethics, and society ? These questions, which would reemerge a century later in the debates on applied statistics and artificial intelligence, show how deeply Poisson anticipated the epistemological challenges of modern modeling\.
?\) must finally be situated within the historical dialogue between?\) and?\)\. While Laplace had given the calculus of probability its philosophical and universal dimension, that of a general method of rational induction, Poisson ensured its descent into the concrete\. Where Laplace established the principles of probabilistic reasoning and proposed ideal applications \(eg\.celestial mechanics, the theory of errors, hypothetical jury votes\), Poisson turned toward the actual measurement of phenomena, the use of available statistical data, and the construction of quantitative indicators of society\. In this sense, he was a forerunner of the frequentist perspective; the goal was no longer merely to calculatea prioriprobabilities, but to estimate them from observed frequencies, following an empirical inductive logic\. His approach, without breaking with that of Laplace, both complemented and extended it; he introduced into the theory a dimension of observation and confrontation with the real world\. If Bayes and Laplace had made the calculus of probability an instrument of reasoning, Poisson made it an instrument of measurement as well\.
## 5Regularity from disorder\. Frequencies, large numbers, and statistical determinism
Throughout the nineteenth and early twentieth centuries, the concept of probability continued to grow in scope and precision, in parallel with advances in science and in the philosophy of science\. Two complementary trends characterized this period; on the one hand, the desire to relate probability to observed frequencies,ie\.the so\-called frequentist or objective view; and on the other hand, the pursuit of axiomatic rigor aimed at establishing probability theory as an autonomous mathematical discipline\.
?\) had paved the way for frequentism with his law of large numbers\. This theorem, published posthumously inArs Conjectandi, established that the observed frequency of an event tends to approach its theoretical probability as the number of trials becomes very large\. More precisely, Bernoulli showed that if an experiment is repeated a large numberNNf times, the proportion of observed successes converges, in a probabilistic sense, towardppthe probability of success in each trial\. In doing so, Bernoulli expressed for the first time a formal link between mathematical probability and its empirical representation in the real world\. This result had a profound conceptual impact; it justified the estimation of probabilities through experimental statistics and reinforced the idea that the laws of chance can be verified through the accumulation of data, something far from evidenta priori\.
In the nineteenth century, this frequentist interpretation gradually took root\. Philosophers and mathematicians such as?\) and?\) emphasized that “probability has meaning only as the long\-run limit of frequency”\. Probability was then defined as the value toward which the frequency of occurrence of an event tends when the number of trials approaches infinity\. In this view, time, or at least the number of repetitions, plays an essential role; it is by lettingNNgrow that probability reveals itself\.
Frequentist reasoning thus rests on an objectivist conception; probability is seen as a property of the real world, a stable frequency, that can be estimated with increasing accuracy through large series of observations\. This perspective complements the preceding Bayesian one; whereas Bayes and Laplace regarded probability as the degree of belief of a rational observer,ie\.a rather subjective viewpoint, even though Laplace considered it universal, frequentism views it as a measurable property of repetitive phenomena\. In practice, the two often converge, stable frequencies justifya prioriintuitions, andvice\-versa, but philosophically, a shift occurs in probabilistic reasoning by the end of the nineteenth century, toward a more empirical outlook\.
At the same time, major mathematical advances came to consolidate the theory\. The development of error calculus and mathematical statistics, around figures such as?\) and?\), integrated probability into the analysis of scientific data\.?\) introduced the famous normal law, or Gaussian distribution, to describe the distribution of measurement errors\. A few years earlier,?\) and?\) had discovered the central limit theorem, showing that the sum of many independent random effects tends toward a Gaussian distribution\. This probabilistic explanation of the ubiquity of the “bell curve” in natural phenomena provided further evidence that randomness obeys regular laws when a large number of variables are considered\. The idea that statistical order emerges from individual disorder reinforced confidence in probabilistic reasoning as an integral part of scientific thought, scientific reason came to acknowledge that, even without strict determinism, there exists a collective determinism, a regularity in the large numbers, that probability allows us to grasp\.
Figure 4:Illustration of the Central Limit Theorem, as formulated by de Moivre \(1718\) and Laplace \(1812\)\. The panels display the empirical distribution of the normalized sumSn/nS\_\{n\}/nofnnindependent and strongly skewed random variables \(green\)\. For smallnn, the distribution is far from Gaussian, but as the number of components increases, fromn=1n=1ton=1000n=1000, the shape stabilizes and converges toward the limiting Gaussian law𝒩\(0,1\)\\mathcal\{N\}\(0,1\)\(black curve\)\. This visual demonstration reflects a foundational idea in the history of probabilistic reasoning; even when individual causes are irregular or asymmetric, their collective effect exhibits a universal and highly regular structure\. In this sense, the “bell curve” emerges not from determinism, but from the statistical order produced by large numbers\.Figure[4](https://arxiv.org/html/2606.00102#S5.F4)offers a detailed and vivid illustration of the central idea formulated by?\) and later extended by?\); that the combined effect of many independent random contributions tends toward a universal and highly regular form\. In each panel, we observe the empirical distribution of the normalized sumSn/nS\_\{n\}/\\sqrt\{n\}ofnnindependent random variables drawn from a markedly asymmetric exponential distribution\. The choice of such a skewed base distribution is deliberate; it emphasizes that nothing in the microscopic behavior of the individual terms resembles the Gaussian curve, nor even hints at symmetry\. Forn=1n=1, the distribution is entirely dominated by this asymmetry, presenting the familiar long tail of the exponential\. Forn=2n=2andn=5n=5, the asymmetry persists, though it becomes softened by the simple act of aggregation\. Asnnincreases to 10, 100, and ultimately 1000, a striking transformation takes place\. The distribution becomes progressively smoother and more symmetric, its peak stabilizes, and its tails adjust in a way that draws the entire histogram closer to the superimposed Gaussian curve𝒩\(0,1\)\\mathcal\{N\}\(0,1\)\. This visual convergence is not merely a numerical curiosity; it embodies a profound epistemological shift in the understanding of randomness\. de Moivre and Laplace recognized that even when individual events are governed by irregular, unpredictable fluctuations, the aggregated behavior exhibits a remarkable regularity; one that is stable, universal, and mathematically quantifiable\. The Figure[4](https://arxiv.org/html/2606.00102#S5.F4)thus dramatizes one of the foundational insights of probabilistic thinking since the Enlightenment: order emerges from disorder when the number of contributing variables becomes large\. This emergence is not deterministic in the classical sense, for the micro\-level remains governed by chance; yet the collective produces a form of determinism at the level of ensembles, what Laplace himself would later describe as a “remarkable regularity” intrinsic to large numbers\. The “bell curve”, apparently ubiquitous across biological, physical, and social phenomena, finds its explanation not in the structure of the underlying causes, but in the combinatorial laws that govern their accumulation\. By showing a highly skewed distribution spontaneously organizing itself into a Gaussian profile through mere summation, Figure[4](https://arxiv.org/html/2606.00102#S5.F4)embodies this conceptual leap; the realization that probability does not merely quantify uncertainty but reveals a deep structural tendency of nature\. It shows that randomness, when considered in isolation, is chaotic, but when viewed collectively, obeys regular laws that allow prediction, inference, and scientific explanation\. In this sense, the central limit theorem is not just a technical result; it is a cornerstone of the modern scientific worldview, reconciling the absence of strict determinism at the microscopic level with the emergence of robust statistical determinism at the macroscopic scale\.
Despite its many successes, by the end of the nineteenth century probability theory still suffered from an ambiguous epistemological status\. Many mathematicians regarded it as insufficiently rigorous, resting on poorly defined notions: what exactly is “equiprobability” or “randomness” outside the context of finite combinatorial cases? This is why the twentieth century marked the completion of a crucial stage: the axiomatic formalization of probability\.
## 6Axiomatization and epistemic silence\. Kolmogorov and the mathematical closure of probability
?\) published Foundations of the Theory of Probability, in which he proposed a rigorous axiomatic framework inspired by measure theory\. Kolmogorov defined a probabilityPPas a measure, in the sense of Lebesgue measure theory, on a sample spaceΩ\\Omegasatisfying three simple axioms 1\)P\(A\)≥0P\(A\)\\geq 0for every eventAA, 2\)P\(Ω\)=1P\(\\Omega\)=1and 3\) for any countable family of disjoint eventsAiA\_\{i\},P\(⋃iAi\)=∑iP\(Ai\)P\(\\bigcup\_\{i\}A\_\{i\}\)=\\sum\_\{i\}P\(A\_\{i\}\)\. Kolmogorov’s axioms gave probability the same degree of rigor as geometry or algebra\. In particular, the entire theory follows logically from these postulates, with notions such as conditional probability and independence becoming derived definitions rather than primitive concepts\.
Kolmogorov’s contribution was to establish probability as a fully fledged mathematical discipline, independent of the need for philosophical interpretation in order to be coherent\. From that point onward, one could say, “no matter what chance is in itself, mathematical probability is a consistent and well\-defined tool”\. This axiomatization legitimized probability in the eyes of all mathematicians\. It was this stage that legitimized probability as a mathematical discipline in its own right” \(eg\.\[[1](https://arxiv.org/html/2606.00102#bib.bib1)\]\)\. It also responded to a program envisaged by Hilbert, his sixth problem, which aimed to axiomatize physics and all branches of applied mathematics\. After?\), probability theory could thus be taught in the same way as topology or algebra, without concern for earlier semantic paradoxes, such as Bertrand’s paradox of geometric probabilities, now elegantly resolved through the framework of Lebesgue measure\.
However, the philosophical interpretation of probability continued to evolve throughout the twentieth century\. An intense debate persisted between advocates of an objective view of probability, ie\. as frequency or physical propensity, and those defending a subjective view,ie\.as degree of belief\. The statistician Fisher and the frequentist school \(eg\.\[[19](https://arxiv.org/html/2606.00102#bib.bib38),[20](https://arxiv.org/html/2606.00102#bib.bib39)\]\) dominated the first half of the twentieth century in practice, rejecting the use of Bayesiana prioriassumptions and favoring methods based solely on observed frequencies, such as hypothesis testing and confidence intervals\. Conversely, thinkers such as de?\) in Italy and?\) in England defended, in the 1930s, a radically subjective interpretation; “Probability does not exist,” de Finetti claimed, it is merely the way a coherent individual bets on an event\. De Finetti even provided a behavioral characterization of probability as the fair betting rate of a rational agent, leading to the idea that the coherence of bets enforces the usual laws of probability\. This subjectivist interpretation thus rejoined the Bayesian conception by a different path; it is rational to apply Bayes’s theorem to update one’s beliefs after new observations, and although the measure of these beliefs is personal, it remains bound by universal constraints of coherence, those expressed by?\)’s axioms, or equivalently by?\)’s axioms of inductive logic\.
The mid\-twentieth century also witnessed a reversal of trends, with a revival of the Bayesian approach led by figures such as?\) argued that “Bayes’s theorem stands to probability as Pythagoras’s theorem does to geometry” \(eg\.\[[24](https://arxiv.org/html/2606.00102#bib.bib46),[5](https://arxiv.org/html/2606.00102#bib.bib10),[15](https://arxiv.org/html/2606.00102#bib.bib32)\]\)\. In particular,?\) formulated the idea that probability theory is an extension of classical logic to situations of uncertainty, in other words, a generalized logical calculus in which “true” and “false” are replaced by degrees of plausibility between 0 and 1\. This perspective provides a mature synthesis of the idea that probability is a mirror of reason reasoning under uncertainty: just as deductive logic reflects the structure of certain reasoning, probability reflects the structure of uncertain reasoning\.
From a scientific standpoint, the second half of the twentieth century and the beginning of the twenty\-first have seen probability theory permeate virtually every field: quantum physics \(where probability is intrinsic to the laws of nature, breaking Laplacian determinism\), Earth sciences, biology, social sciences, economics, and of course computer science and artificial intelligence\. The concept of the arrow of time has even become central in statistical physics, where the increase of entropy, defined in probabilistic terms, explains the irreversibility of macroscopic phenomena\.
Contemporary scientific reason thus accepts a form of fundamental indeterminism,eg\.in quantum mechanics, the outcome of a measurement is inherently probabilistic, while using probability to produce the most precise predictions possible\. This natural incorporation of randomness into the very core of our understanding of the world may well represent the culmination of the evolution of reason itself; what was once perceived as a deficiency of knowledge,ie\.an ignorance to be overcome, is now recognized as an irreducible feature of reality, one that knowledge must integrate\. In this sense, probability theory has become as indispensable to modern science as differential calculus once was to classical mechanics\.
Thus, at the dawn of the twenty\-first century, probability stands simultaneously as a rigorous mathematical theory, a methodological foundation for empirical science, and a component of the philosophy of knowledge\. Its historical evolution, from games of chance to machine\-learning algorithms, mirrors the paradigm shifts in the way human reason approaches uncertainty\.
## 7Probability as a logic of information\. Tarantola and the epistemology of inference
As we have emphasized, a modern and unifying view considers probability theory as an extended logic for reasoning under uncertainty\. This view is exemplified in the work of Tarantola, who played a major role in introducing Bayesian methods into inverse problems, that is, the inference of causes from observed effects, for example determining the internal structure of the Earth from seismic recordings\. Tarantola adopts a remarkably pure epistemological perspective; in a scientific problem where one seeks to estimate unknown parameters from measured data, all unknown quantities must be modeled as random variables representing our uncertainty\. Rather than searching for a single deterministic solution to an inverse problem, often ill\-posed and non\-unique, he proposes to characterize the entire set of possible solutions through a posterior probability density \(eg\.\[[25](https://arxiv.org/html/2606.00102#bib.bib47)\]\)\.
?\)’s approach is fundamentally Bayesian\. One starts from apriordensityP\(p\)P\(p\), representing the initial information about the unknown parameterspp,eg\.a probability distribution expressing an initial estimate of the geological structure; then one models the measurement process through a conditional data density, or likelihoodP\(d\|p\)P\(d\|p\), giving the probability of obtaining the dataddfor a given choice of parameterspp; finally, one applies?\)’s formula to obtain the posterior densityP\(p\|d\)∝P\(d\|p\)P\(p\)P\(p\|d\)\\propto P\(d\|p\)P\(p\), which constitutes the complete solution to the inverse problem\. Tarantola emphasizes that this combination of prior information, arising for instance, from physical knowledge or an initial model, with the information contained in new data is analogous to the logical operation “AND”; both pieces of information must be satisfied simultaneously\. Mathematically, the fusion of these two independent sources of knowledge is achieved by a product of densities, or more generally a convolution in certain continuous cases, just as in Boolean logic conjunction requires that two conditions be true at the same time\. Tarantola formalized this analogy by requiring that the rule for combining information obey the same properties as logical conjunction, transposed into probabilistic terms\. This perspective amounts to saying: “Reasoning with probabilities is reasoning almost as in logic, but allowing for intermediate degrees of truth”\.
?\)’s approach illustrates how contemporary probabilistic reasoning operates; all sources of uncertainty, measurement errors, natural variability, and model imprecision, are integrated into a single coherent framework, and the rules of probability \(ie\.Bayes and related principles\) are then used to deduce the final state of knowledge once the data have been taken into account\. This approach is now common not only in geophysics but also across many fields of engineering and science\. For example, in robotics and artificial intelligence, one speaks of a Bayes filter, of which the well\-known?\) filter is a linear\-Gaussian case \(eg\.\[[2](https://arxiv.org/html/2606.00102#bib.bib5)\]\); an algorithm that updates, in real time, the probability distribution of a system’s state as new observations arrive\. Once again, the idea is that the machine’s knowledge at any given time is represented by a probability distribution over possible states, which is combined, via a likelihood computation, often equivalent to a convolution with a transition model followed by multiplication by an observation density and normalization, with incoming data to yield the updated knowledge\. The parallel with Tarantola’s framework is clear\. In this sense, Tarantola extends Laplace’s vision: “common sense reduced to calculation” becomes, in Tarantola’s formulation, “logic reduced to probabilistic calculation\.”
It is interesting to note that Tarantola, like Jaynes and de Finetti before him, recognized that the rigor of probabilistic calculation is precisely what guarantees the global coherence of inductive reasoning\. Any other ad hoc way of combining information would risk violating rational principles,eg\.by ignoring relevant evidence or by counting it twice\. By using probability as a language, one automatically inherits the additive and multiplicative coherence enforced by Kolmogorov’s axioms\. In a sense, one can say that reason has now integrated chance: what once belonged to intuition or instinct, such as giving greater weight to more precise information, is now codified by computation\. An observation with low uncertainty yields a sharply peaked likelihood and therefore carries greater weight in Bayesian updating than a noisy observation with a diffuse likelihood\.
Tarantola summarizes this by emphasizing the necessity of accounting for all possible states of information; the solution to a problem is no longer a single number or model, but the entire collection of possible models, each weighted by its posterior probability\. This epistemological dimension of Tarantola’s work was later emphasized by Mosegaard \(cf\.\[[18](https://arxiv.org/html/2606.00102#bib.bib37)\]\), who described Tarantola’s legacy as a quest for consistency, symmetry, and simplicity\. In this sense, the probabilistic formulation of inverse problems is not merely a computational technique, but the expression of a broader rational requirement: all pieces of information entering an inference problem must be combined according to rules that are coherent, invariant with respect to arbitrary choices of parametrization, and as simple as possible\.
In this way, he echoes the well\-known statement that “correlation is not causation”\. Indeed, in a purely frequentist or descriptive approach, finding a strong correlation between two phenomena is not sufficient to infer a causal link\. The Bayesian\-logical framework complements this insight by providing a structure for testing alternative causal hypotheses and determining which is most probable in light of the data, that is, by enabling probabilistic causal inference\.
Figure 5:Conceptual demonstration of Bayesian inference in a two\-dimensional parameter space, following the epistemological interpretation introduced by Tarantola\. In the left panel, the prior distributionp\(m\)p\(m\)represents the initial state of knowledge about the model parameters \(m1,m2m\_\{1\},m\_\{2\}\); the density is centered on ana prioriestimatempriorm\_\{prior\}reflecting information available before any observation is made\. The middle panel shows the likelihood functionp\(dobs\|m\)p\(d\_\{obs\}\|m\), which quantifies the compatibility between the data and each possible model\. Its elongated shape reflects the fact that many parameter combinations fit the observation equally well, forming a “manifold of acceptable models”\. The right panel displays the posterior distributionp\(m\|dobs\)p\(m\|d\_\{obs\}\)obtained through the Bayesian update\. Mathematically, this update is the product of the prior and likelihood densities; conceptually, it corresponds to a logical conjunction \(“AND”\): the posterior contains only those models that satisfy both the initial knowledge and the observational evidence\. The resulting density is narrower, more informative, and sharply peaked around the true parameter valuemtruem\_\{true\}\. This figure exemplifies the central Bayesian idea that scientific inference does not replace prior knowledge with data, but rather fuses the two into a coherent, quantitatively updated state of belief, a synthesis that reflects how scientific reasoning integrates experience with established understanding\.Figure[5](https://arxiv.org/html/2606.00102#S7.F5)illustrates, in a two\-dimensional parameter space, the full logic of Bayesian inference as interpreted by Tarantola; not as a mechanical update rule, but as a fusion of independent sources of information\. The left panel shows the prior distributionp\(m\)p\(m\), which expresses what is known about the parameters \(m1,m2m\_\{1\},m\_\{2\}\) before any observation is taken into account\. The density is centered near ana prioriestimatempriorm\_\{prior\}, shown by the open circle, and spreads broadly across the model space, reflecting initial uncertainty\. The true modelmtruem\_\{true\}represented by a cross, lies outside the region of maximal prior probability, emphasizing that prior knowledge alone is incomplete or biased\. The middle panel displays the likelihood functionp\(dobs\|m\)p\(d\_\{obs\}\|m\), which evaluates how compatible each model is with the observed data\. Its geometry is striking; rather than isolating a single parameter value, the likelihood forms a narrow, elongated ridge, a set of models that all reproduce the observation equally well\. This ridge highlights an essential aspect of inverse problems; data typically constrain only certain combinations of parameters, leaving entire directions of uncertainty unconstrained\. The dashed line represents the axis of this degeneracy, passing close to the true modelmtruem\_\{true\}\. The right panel shows the posterior distributionp\(m\|dobs\)p\(m\|d\_\{obs\}\)which emerges from the Bayesian synthesis of these two informational components\. Mathematically, theposterioris the product of prior and likelihood; epistemologically, it is the region of the model space where the two sources of knowledge overlap\. As Tarantola emphasized, the Bayesian update is not a replacement ofpriorknowledge by data but a logical conjunction; theposteriorcontains only those models that satisfy both thepriorinformation and the observational constraint\. The resulting density is sharply focused nearmtruem\_\{true\}demonstrating that the fusion of incomplete but complementary information leads to a far more precise characterization of the parameters than either source alone\. Thus, the figure makes visually explicit a central idea of modern probabilistic reasoning; knowledge in science does not arise from data alone, nor from theory alone, but from the coherent integration of the two\. Through this fusion, uncertainty is reduced in a principled way, and the true model becomes identifiable within a structured landscape of probabilities\.
## 8When uncertainty is not enough\. The limits of probabilistic expressivity
The historical trajectory traced so far suggests that probability theory has progressively extended the scope of rationality by providing ever more refined tools for reasoning under uncertainty\. From the combinatorial symmetry of Pascal \(cf\.section[2](https://arxiv.org/html/2606.00102#S2)\) to the inductive dynamics of Bayes and Laplace \(cf\.section[3](https://arxiv.org/html/2606.00102#S3)\), from Poisson’s \(cf\.section[4](https://arxiv.org/html/2606.00102#S4)\) temporalization of events to Kolmogorov’s axiomatic closure \(cf\.section[6](https://arxiv.org/html/2606.00102#S6)\), and finally to Tarantola’s interpretation of probability as a logic of information \(cf\.section[7](https://arxiv.org/html/2606.00102#S7)\), probability has gradually acquired the status of a universal language for scientific inference\. In its most mature form, probabilistic reasoning appears capable of integrating prior knowledge, observational data, model uncertainty, and measurement noise into a single coherent framework\. One might therefore be tempted to conclude that probability, properly understood, exhausts the rational treatment of uncertainty\.
Yet this conclusion proves premature when one examines more closely the conditions under which probabilistic reasoning operates\. Probability theory presupposes that the objects of reasoning are well defined\. Events must be identifiable, hypotheses must be clearly formulated, and the space of possible outcomes must be specified in advance, even if only implicitly\. Uncertainty, in the probabilistic sense, concerns the truth value of propositions whose meaning is already fixed\. One asks whether a given hypothesis is true or false, whether a parameter takes one value rather than another, or whether an event occurs or not, and probability assigns degrees of confidence to these alternatives\. In all these cases, the indeterminacy lies in our knowledge of the world, not in the definition of the concepts themselves\.
However, many scientific situations do not conform to this idealized structure\. In a wide range of domains, the difficulty is not merely to decide whether a well\-posed hypothesis is true, but to determine what the hypothesis actually means\. The problem is no longer confined to uncertainty about outcomes, but extends to imprecision in the very categories used to describe them\. Scientific practice abounds in statements such as “the signal is weak,” “the model is acceptable,” “the structure is compatible with the data,” “the anomaly is significant,” or “the system is close to equilibrium\.” These judgments are neither purely subjective nor strictly binary\. They rely on graded assessments, contextual interpretation, and qualitative distinctions that resist sharp boundaries\. In such cases, asking for the probability of an event presupposes that the event has already been crisply delineated, which is precisely what is at stake\.
This limitation becomes particularly evident in inverse problems and complex systems, where Tarantola’s probabilistic framework otherwise proves remarkably powerful\. Even when a posterior distribution has been rigorously constructed, the interpretation of its structure often requires additional layers of judgment\. Which regions of parameter space should be regarded as “plausible,” “acceptable,” or “physically meaningful”? At what point does a model cease to be “compatible” with the data? Such questions cannot be answered by probability values alone, because they involve thresholds and categories whose definition is not dictated by probability theory itself\. The posterior density quantifies degrees of belief, but it does not, by itself, specify how these degrees should be mapped onto qualitative decisions or linguistic descriptions\.
This observation reveals an internal boundary of probabilistic expressivity\. Probability excels at quantifying uncertainty about well\-defined propositions, but it remains silent about the vagueness of the propositions themselves\. It assumes that the language in which hypotheses are formulated is already precise, whereas in practice, scientific reasoning often operates at the interface between quantitative data and qualitative concepts\. The difficulty is not accidental; it reflects a structural feature of probability theory\. As a calculus of measures, probability requires measurable sets, that is, sharply defined subsets of a sample space\. Vagueness, by contrast, concerns situations where membership itself is a matter of degree, where an object can belong partially to a category without fully satisfying its defining criteria\.
Historically, this tension has been obscured by the success of probabilistic methods in domains where concepts could be idealized and boundaries drawn with sufficient clarity, such as games of chance, physical measurements, or repeated experiments under controlled conditions\. As scientific inquiry increasingly turns toward complex, heterogeneous, and poorly delimited phenomena, however, the limits of this idealization become apparent\. In such contexts, uncertainty is inseparable from imprecision, and reasoning requires tools capable of handling both simultaneously\. Probability alone cannot adjudicate questions whose formulation already involves graded notions and continuous transitions between categories\.
Recognizing this limitation does not amount to a rejection of probability theory\. On the contrary, it is precisely because probability has achieved such a high degree of coherence and universality that its boundaries can now be clearly identified\. The issue is not that probabilistic reasoning is flawed, but that it is incomplete when confronted with forms of uncertainty that arise from the indeterminacy of meaning rather than from the randomness of outcomes\. This realization prepares the ground for an extension of rationality beyond probability, one that seeks to formalize not only uncertainty about facts, but also the vagueness of the concepts through which facts are apprehended\.
In this sense, the question that now arises is no longer “what is the probability that a given event occurs?” but rather “to what extent does a given situation belong to a given conceptual category?” Addressing this question requires a different, though complementary, formal language\. The historical evolution of rationality thus appears to demand a new step, analogous to that which led from combinatorics to induction, and from induction to axiomatic probability\. It is at this juncture that fuzzy logic enters the scene, not as a competitor to probability, but as a response to a distinct and irreducible dimension of uncertainty: the imprecision inherent in meaning itself\.
## 9Beyond probability: fuzzy logic and the formalization of vagueness
The limitation identified in the previous section calls for a formal extension of rationality capable of addressing not uncertainty about facts, but imprecision in the meaning of the concepts used to describe them\. This extension was proposed in the second half of the twentieth century by?\) and?\), through the introduction of fuzzy sets and fuzzy logic\. Whereas probability theory assigns degrees of belief to well\-defined propositions, fuzzy logic assigns degrees of membership to categories whose boundaries are intrinsically gradual\. The two frameworks thus address distinct, though complementary, dimensions of uncertainty\.
In fuzzy logic, a statement such as “xxbelongs to the setAA” is no longer treated as either true or false\. Instead, membership is quantified by a functionμA\(x\)\\mu\_\{A\}\(x\)taking values between 0 and 1, which expresses to what extentxxsatisfies the defining properties ofAA\. This formalism captures a mode of reasoning that is ubiquitous in scientific practice and natural language, where concepts such as “large,” “stable,” “near,” or “acceptable” do not admit sharp thresholds\. Importantly, this is not a matter of ignorance or incomplete information; it reflects the structure of the concepts themselves\. Vagueness is not noise to be eliminated, but a constitutive feature of many epistemic categories\.
From this perspective, fuzzy logic does not compete with probability theory, nor does it aim to replace it\. Probability expresses a degree of confidence in a hypothesis whose meaning is fixed; fuzzy logic expresses a degree of compatibility between a situation and a concept whose meaning is inherently graded\. The two logics operate at different levels\. Probability quantifies uncertainty about truth, while fuzzy logic formalizes imprecision of meaning\. Their domains overlap in practice, but their epistemological roles are distinct\. A statement may be highly probable while remaining conceptually vague, or conceptually precise while probabilistically uncertain\.
This distinction becomes particularly salient in complex systems and inverse problems, where probabilistic reasoning often reaches its expressive limit\. Even when a posterior distribution is rigorously constructed, the interpretation of its structure frequently relies on qualitative judgments: which models should be regarded as plausible, which regions of parameter space are acceptable, and which solutions are clearly inadmissible\. These judgments implicitly invoke graded categories that are not encoded in probability densities themselves\. Fuzzy logic makes this implicit layer explicit by providing a formal language for such qualitative assessments\.
An illustrative example is provided by the comparison between classical probabilistic simulated annealing and a fuzzy\-logic variant applied to an optimization problem\. In the probabilistic scheme, candidate solutions are accepted or rejected according to a stochastic criterion, allowing even clearly suboptimal configurations to be explored with non\-zero probability\. In the fuzzy scheme, candidate solutions are evaluated through graded linguistic categories such as “bad,” “medium,” or “good,” which act as qualitative filters\. The resulting behavior is not merely a matter of computational efficiency; it reflects a different epistemic stance\. Random exploration is no longer unconstrained, but modulated by explicit judgments of coherence and plausibility\. The algorithm thus embodies, in formal terms, a mode of reasoning closer to human evaluative judgment\.
Figure 6:Comparison of the classical Kirkpatrick simulated annealing \(light green\) and a fuzzy\-logic variant \(dark green\) applied to the traveling salesman problem\. Panels \(a\) and \(b\) show the final tours found by the two algorithms: although both reach acceptable solutions, the fuzzy version produces a shorter and more regular path\. Panel \(c\) shows the evolution of tour length over the cooling schedule\. The classical method displays the characteristic volatility of probabilistic annealing: large fluctuations, irregular jumps, and repeated excursions into highly suboptimal regions\. By contrast, the fuzzy algorithm exhibits a smoother, more disciplined trajectory, in which decreases in tour length are more consistent and inefficient configurations are naturally suppressed by fuzzy membership rules\. This illustrates a conceptual difference between probability and fuzzy logic: whereas stochastic annealing treats every trial as potentially acceptable, fuzzy evaluation imposes qualitative judgments \(“bad”, “medium”, “good”\) that act as epistemic filters, reducing the acceptance of implausible models and guiding the search more rationally through the solution space\.Figure[6](https://arxiv.org/html/2606.00102#S9.F6)offers a detailed comparison between the classical?\) simulated annealing algorithm and a fuzzy\-logic analogue applied to the traveling salesman problem, revealing not only differences in performance but also deep contrasts in the way uncertainty is treated within each framework\. Panels a\) and b\) show the final tours obtained after cooling\. The classical probabilistic scheme \(light green\), which accepts or rejects trial moves according to the traditional?\) rule, converges to a path that is globally coherent but locally irregular, with detours that reflect the algorithm’s willingness to accept poor configurations in the hope of escaping local minima\. The fuzzy version \(dark green\), by contrast, produces a tour that is slightly shorter and structurally smoother\. This improvement stems not from a more aggressive optimization strategy but from a fundamentally different treatment of decision\-making: candidate solutions are not judged in a binary fashion \(“accept” or “reject”\), but according to graded membership functions that quantify how “bad”, “medium”, or “good” a tour is\. The presence of these qualitative layers biases the search away from implausible or incoherent models even before the algorithm evaluates their numerical merit\. Panel c\) makes this epistemological contrast explicit by plotting the evolution of the tour length throughout the cooling schedule\. The classical method displays a characteristic noise\-sawtooth pattern: large amplitude fluctuations, sudden drops followed by dramatic deterioration, and intermittent excursions toward highly suboptimal regions\. These oscillations reflect the stochastic heart of probabilistic annealing; random exploration grants freedom to the algorithm but also exposes it to noise and to what might be called “epistemic instability”\. At any given time, a poor solution may be accepted simply because the Metropolis criterion, governed by temperature, still authorizes such steps\. This behavior is mathematically sound and historically central to the method, yet from an epistemological standpoint it exemplifies a mode of reasoning in which uncertainty is embraced as an engine of exploration\. The fuzzy algorithm behaves differently\. Its trajectory descends more monotonically, with fluctuations that are both smaller and more structured\. The membership functions act as qualitative filters that temper the randomness inherent in the optimization procedure\. A candidate tour that is extremely long, corresponding to a model that is clearly unacceptable, receives a near\-zero degree of membership in the “good” set, and the algorithm is correspondingly disinclined to accept it, even early in the cooling\. Conversely, moderately suboptimal tours may still be considered “acceptable” with a certain degree, allowing the method to retain some of the explorative power of classical annealing while avoiding its most erratic excursions\. From this perspective, fuzzy logic does not merely modify the acceptance rule; it reshapes the epistemic landscape of the search by embedding qualitative judgment into the algorithmic process\. Seen through this lens, the comparison between probability and fuzzy logic mirrors a broader philosophical dichotomy between two ways of handling uncertainty\. Probabilistic annealing relies on randomness as a tool for discovering structure; it navigates by chance, accepting disorder in the short term for the promise of order in the long term\. Fuzzy annealing, on the other hand, introduces an intermediate evaluative layer that reflects a kind of “algorithmic common sense”; before randomness can act, candidate models are screened according to linguistic categories that encode coherence, plausibility, or desirability\. By doing so, fuzzy logic restricts the exploration to cognitively meaningful regions of the model space, leading to a more stable and directed search\. The resulting optimization is not only more efficient but also more interpretable, as each decision can be traced back to explicit qualitative criteria rather than to probabilistic fluctuations alone\. In this sense, Figure[6](https://arxiv.org/html/2606.00102#S9.F6)does more than compare two numerical methods; it stages a conceptual dialogue between two philosophies of inference\. The classical probabilistic approach asserts that global structure can emerge out of local randomness, while the fuzzy approach asserts that randomness itself benefits from being modulated by qualitative judgment\. Their juxtaposition makes clear that the logic of optimization, like the logic of scientific reasoning, can be framed either as the aggregation of stochastic trials or as the interaction between numerical evidence and qualitative constraints\. The “superiority” of the fuzzy method in this example is therefore not merely computational; it reflects a deeper epistemological insight into how models should be evaluated when uncertainty is present\.
Seen in this light, fuzzy logic represents a continuation of the historical movement traced throughout this article\. Just as probability extended classical logic by allowing intermediate degrees of belief, fuzzy logic extends rational reasoning by allowing intermediate degrees of meaning\. It responds directly to the limitation identified in probabilistic expressivity: the inability to represent vagueness as such\. By formalizing graded membership, fuzzy logic enables reason to operate coherently in domains where concepts do not admit sharp boundaries, without collapsing vagueness into randomness\.
The emergence of fuzzy logic therefore marks a new stage in the evolution of rationality\. After having learned to domesticate chance and to quantify uncertainty, reason now confronts the task of formalizing imprecision itself\. This task does not undermine the achievements of probability theory; it complements them\. Together, probability and fuzzy logic form a richer epistemic framework, capable of addressing both uncertainty about facts and vagueness of concepts, two dimensions that modern scientific inquiry can no longer afford to conflate\.
## 10Geometry without logic\. Artificial neural networks and the illusion of model\-free knowledge
The historical trajectory traced in this article has progressively clarified what constitutes rational inference under uncertainty\. Probability theory provided a coherent logic for updating beliefs in time, allowing scientific reasoning to integrate new evidence without sacrificing consistency\. Fuzzy logic, in turn, addressed a distinct limitation of probabilistic reasoning by formalizing vagueness and graded meaning, making it possible to reason rigorously when concepts themselves lack sharp boundaries\. Together, these frameworks articulate a form of rationality that explicitly represents uncertainty, interpretation, and judgment\. Against this background, the recent rise of deep learning invites a careful epistemological reassessment\.
In what follows, the expression “neural networks” refers to artificial neural networks used as computational models, and not to biological neural systems\. The discussion below does not address the general philosophical question of whether an embodied artificial system could, in principle, develop forms of understanding, intentionality, or consciousness\. It concerns a narrower issue: the epistemic status of present deep\-learning architectures when they are used as scientific models for prediction, explanation, and inference\. In this restricted sense, the question is not whether artificial intelligence could one day understand the world, but whether current neural architectures make their assumptions, uncertainties, causal structures, and conceptual categories explicit\.
Over the past decade, deep artificial neural networks have achieved spectacular empirical success in domains ranging from image recognition to speech processing and time\-series prediction\. Architectures such as Long Short\-Term Memory networks \(LSTM;\[[8](https://arxiv.org/html/2606.00102#bib.bib16)\]\) and nonlinear autoregressive models with exogenous inputs \(NARX;\[[13](https://arxiv.org/html/2606.00102#bib.bib30),[14](https://arxiv.org/html/2606.00102#bib.bib31)\]\) are now routinely employed to model highly complex dynamical systems\. In many scientific and engineering contexts, these models appear capable of capturing long\-range dependencies, nonlinear interactions, and subtle regularities that were previously inaccessible\. This success has fueled the widespread claim that deep learning represents an epistemic rupture: a form of “model\-free” intelligence able to extract knowledge directly from data, without explicit hypotheses, prior structures, or interpretative frameworks\.
Such claims, however, conflate predictive performance with epistemic reasoning\. While artificial neural networks undoubtedly extend our capacity to approximate complex input/output relations, their mode of operation differs fundamentally from both probabilistic and fuzzy forms of inference\. Unlike Bayesian reasoning, they do not update explicit hypotheses in light of new evidence; unlike fuzzy logic, they do not manipulate graded concepts or qualitative categories\. Instead, in their standard use as scientific predictors, they implement a largely geometric form of computation, whose internal coherence is numerical rather than explicitly epistemic\.
Mathematically, a deep neural network is a parametric function defined on a high\-dimensional vector space\. In the now\-standard case of ReLU\-based architectures \(cf\.\[[10](https://arxiv.org/html/2606.00102#bib.bib18)\]\), each neuron defines a half\-space, and each layer composes these half\-spaces into a partition of the input space into polytopes of increasing combinatorial complexity \(cf\.Appendix[B](https://arxiv.org/html/2606.00102#A2)\)\. After training, the network realizes a piecewise\-affine mapping whose parameters have been adjusted to minimize a loss function over a dataset\. Universality theorems guarantee that, given sufficient depth and width, such networks can approximate any continuous function on a compact domain \(cf\.Appendix[C](https://arxiv.org/html/2606.00102#A3)\)\. This expressive power, however, should not be mistaken for understanding\. Approximation does not entail explanation, and interpolation does not constitute inference\.
Sequential architectures such as LSTM or NARX networks illustrate this distinction particularly clearly\. These models are often described as possessing “memory,” since their outputs depend on past inputs or internal states\. Yet this memory is not epistemic in nature\. It does not correspond to the updating of beliefs, the revision of hypotheses, or the accumulation of evidence in a logical sense\. Rather, it is a state\-dependent parameterization of a function that encodes correlations across time\. The network does not ask whether a hypothesis remains plausible in light of new data; it simply computes a new output given an updated internal state\. Temporal dependence, in this framework, is not equivalent to an arrow of epistemic time \(cf\.Appendices[A](https://arxiv.org/html/2606.00102#A1)and[E](https://arxiv.org/html/2606.00102#A5)\)\.
Figure 7:Interpolation versus extrapolation in an LSTM model\. A Long Short\-Term Memory \(LSTM\) network is trained on a stationary regime of a mechanistic dynamical system over the intervalt∈\[0,40\]t\\in\[0,40\], wherettdenotes a dimensionless time variable\. Top panel: within the training domain, the network achieves an excellent fit \(in red\) and closely interpolates the observed data \(in green\)\. Bottom panel: when the same network is deployed in closed\-loop prediction beyond the training interval, under a regime change affecting the external forcing, the predicted signal progressively drifts in phase and amplitude\. Although the underlying physical dynamics remain unchanged, the LSTM fails to extrapolate coherently, illustrating the distinction between prediction by interpolation and explanation by invariant mechanisms\.Figure[7](https://arxiv.org/html/2606.00102#S10.F7)provides a concrete numerical illustration of the epistemological distinction developed in this section between interpolation\-based prediction and mechanistic explanation\. The figure is based on a forced damped oscillator, in which the measured quantity is the scalar displacementx\(t\)x\(t\)\. The LSTM is trained only on the first regime,t∈\[0,40\]t\\in\[0,40\], where the forcing is stationary\. Att=40t=40, the governing differential equation is unchanged, but the external forcing changes in amplitude and frequency\. The purpose of the experiment is therefore not to show that the physical system changes, but to show that the network has learned the statistical regime of the observations rather than the invariant mechanism generating them\.
The upper panel displays the behavior of the LSTM within the training domain\. The network is trained exclusively on data \(in green\) spanning the intervalt∈\[0,40\]t\\in\[0,40\], during which the system is excited by a stationary forcing\. In this regime, the agreement between the observed signal and the LSTM prediction \(in red\) is remarkably good\. The network reproduces both the amplitude and the phase of the oscillations, even in the presence of noise \(∼5%\\sim 5\\%\)\. At this stage, the model appears to have successfully captured the statistical regularities of the training regime\. This visual impression reflects a genuine computational achievement: the LSTM is able to interpolate with high accuracy within the regime represented in the training data\.
The lower panel reveals a fundamentally different behavior once the network is asked to operate outside the conditions under which it was trained\. Beyond the training limit, the external forcing undergoes a change in amplitude and frequency, while the governing physical law of the system remains unchanged\. The LSTM is then run in closed\-loop mode, meaning that its own predictions are recursively fed back as inputs for future predictions\. Under these conditions, a progressive divergence between the predicted signal and the observed data becomes apparent\. The network exhibits a growing phase shift and a loss of amplitude fidelity, ultimately producing oscillations that no longer correspond to the true system response\.
This degradation is not the consequence of noise amplification or numerical instability, but rather the manifestation of a deeper limitation\. The LSTM has not learned the invariant structure of the dynamical system; instead, it has learned an observational mapping that is valid within the statistical regime represented in the training data\. When the input distribution changes, even though the physical mechanism does not, the learned mapping no longer provides reliable guidance\. The network continues to generate predictions, but these predictions are no longer constrained by the underlying causal structure of the system\.
Figure[7](https://arxiv.org/html/2606.00102#S10.F7)thus makes visible a crucial epistemological point: excellent predictive performance within a training domain does not imply explanatory understanding\. In the training regime, interpolation suffices, and the distinction between correlation and causation remains latent\. Once the regime changes, however, the absence of an explicit representation of dynamical invariants becomes evident\. The LSTM’s internal memory, often invoked as a surrogate for temporal reasoning, does not encode the system’s governing law; it merely stores and propagates patterns extracted from past observations\.
By contrasting the apparent success of the network within the training interval with its failure beyond it, Figure[7](https://arxiv.org/html/2606.00102#S10.F7)illustrates the limits of model\-free learning in a particularly transparent way\. The network’s behavior underscores the difference between learning a mechanism, which generalizes by invariance, and learning a correlation, which generalizes only by interpolation\. This distinction lies at the heart of the argument developed in this section: deep learning methods can be extraordinarily effective predictors under stable conditions, yet remain epistemologically silent with respect to explanation when confronted with changes that require an understanding of underlying structure rather than statistical regularity\.
This absence of explicit reasoning has important consequences\. Probabilistic inference distinguishes between prior knowledge and observational evidence, and combines them through well\-defined logical operations\. Fuzzy logic distinguishes between degrees of truth and degrees of membership, allowing qualitative judgment to be formalized\. Artificial neural networks, by contrast, tend to collapse heterogeneous sources of information into a single optimization criterion\. The loss function does not discriminate, by itself, between uncertainty, vagueness, noise, or conceptual ambiguity; it merely quantifies numerical discrepancy\. As a result, the internal representations learned by the network often remain opaque with respect to meaning, causality, and interpretation\.
From an epistemological standpoint, this opacity is not accidental but structural\. In their standard use as scientific predictors, artificial neural networks are not theories about the world; they are geometric machines for approximating mappings between data spaces\. They do not, by themselves, formulate hypotheses, test alternatives, or assign degrees of plausibility to competing explanations\. They construct a partition of a vector space in which inputs that are close in a numerical sense tend to produce similar outputs\. This mode of operation is highly effective when the task at hand is interpolation within a stable domain of observations\. It becomes fragile, however, when confronted with distributional shifts, conceptual changes, or questions that require explicit justification rather than numerical accuracy\.
The frequent analogy between deep learning and Laplace’s demon is therefore misleading\. Where Laplace imagined a deterministic intelligence endowed with complete knowledge of causes, present deep\-learning architectures generally operate without explicit causal models\. Their apparent determinism can mask a deeper epistemic indifference: they can produce a prediction without representing why that prediction holds\. In this respect, deep learning revives, in technological form, an older temptation of scientific reason: to equate regularity with law, correlation with explanation, and performance with understanding\.
This analysis does not deny the practical utility of artificial neural networks, nor does it diminish their computational achievements\. Rather, it situates them within a broader epistemological landscape\. Deep learning excels at capturing correlations in high\-dimensional spaces, but, in its usual form, it often does so by bypassing the very structures that probability and fuzzy logic were designed to make explicit: uncertainty, interpretation, and judgment\. Its power lies primarily in geometry and optimization, not in explicit logic\. The resulting outputs may be accurate, sometimes remarkably so, but they remain epistemically incomplete when they are not accompanied by an explicit account of uncertainty, causality, and conceptual meaning\.
The illusion of model\-free knowledge thus arises from a category error\. Artificial neural networks are not free of models; they embody implicit geometric and statistical assumptions, even when these assumptions are not expressed in the language of hypotheses, priors, likelihoods, or qualitative concepts\. What they lack, in their standard use as scientific predictors, is therefore not structure, but an explicit rational structure accessible to interpretation\. This statement should not be understood as a claim about the theoretical impossibility of artificial understanding or artificial consciousness\. It concerns the epistemic status of present deep\-learning architectures when they are used as scientific models\. In contrast to probabilistic and fuzzy frameworks, which aim to render uncertainty, inference, and conceptual qualification explicit, deep learning often conceals its assumptions behind layers of numerical optimization\. This concealment explains both its empirical power and its epistemological limitations\.
In the context of the historical evolution examined in this article, deep learning appears not as the culmination of explicit rationality, but as a powerful yet incomplete detour\. It demonstrates how far computation can go when liberated from explicit logic, and at the same time, how much is lost when reasoning is reduced to geometry alone\. The challenge for contemporary science is therefore not to oppose artificial neural networks to probabilistic or fuzzy reasoning, but to articulate them within a broader epistemic framework in which uncertainty, vagueness, and causality are once again made explicit\.
## 11Conclusion: reason after probability\. Reason after probability
By retracing the historical development of probability theory from its early combinatorial origins to its contemporary interpretations, this article has argued that probability is not merely a technical apparatus for managing randomness, but a historically evolving form of rationality\. Each major transformation of probability theory corresponds to a deep shift in how scientific reason conceives uncertainty, time, and inference\. From Pascal’s symmetry of equipossible cases to Bayes and Laplace’s introduction of inductive learning, from Poisson’s temporalization of events to Kolmogorov’s axiomatic closure, probability has progressively incorporated the arrow of time into rational judgment\. With Tarantola’s interpretation of probability as a logic of information, this evolution reaches a point of remarkable coherence, where uncertainty, data, and prior knowledge are unified within a single epistemic framework\.
Yet this very success makes the internal limits of probabilistic reasoning visible\. As shown in this article, probability theory presupposes that the propositions to which it assigns degrees of belief are already well defined\. It quantifies uncertainty about facts, but it does not address the imprecision of the concepts through which facts are apprehended\. In many contemporary scientific contexts, uncertainty is inseparable from vagueness: models are judged plausible or implausible, signals weak or strong, structures compatible or incompatible with data, without admitting sharp boundaries\. Recognizing this limitation does not undermine probability theory; rather, it reveals the need for a complementary extension of rationality\.
Fuzzy logic responds to this need by formalizing graded meaning and qualitative judgment\. By introducing degrees of membership rather than degrees of belief, it provides a rigorous language for reasoning in domains where concepts are intrinsically imprecise\. In this sense, fuzzy logic occupies a position analogous to that once occupied by probability itself: it extends rational reasoning into regions that classical logic and probabilistic calculus cannot fully reach\. Probability and fuzzy logic thus address distinct but complementary dimensions of epistemic uncertainty, one concerning the truth of propositions, the other concerning the meaning of the categories involved\.
Placed within this historical trajectory, the rise of deep learning acquires a clearer epistemological significance\. Neural networks do not represent a further extension of rationality in the sense traced throughout this article, but rather a powerful computational detour\. By operating through geometric interpolation in high\-dimensional spaces, they achieve impressive predictive performance while bypassing explicit representations of uncertainty, causality, and conceptual judgment\. Their success illustrates the effectiveness of correlation\-based computation, but also its epistemic opacity\. Deep learning excels where interpolation suffices, yet it remains silent with respect to explanation, justification, and meaning\.
The contrast is therefore not between old and new technologies, nor between human and artificial intelligence, but between different conceptions of rationality\. Probabilistic reasoning introduces coherence and temporal updating; fuzzy logic introduces qualification and graded interpretation; deep learning introduces geometric power without explicit logic\. Confusing these modes of reasoning leads to the illusion of model\-free knowledge, in which performance is mistaken for understanding and regularity for law\.
The broader lesson of this inquiry is that rationality does not progress by accumulation alone, but by differentiation\. Each historical extension of reason has clarified what previous frameworks could not express\. In this light, the future of scientific reasoning does not lie in the abandonment of probabilistic or fuzzy frameworks, nor in the uncritical celebration of data\-driven methods, but in their careful articulation\. A mature epistemology of uncertainty must integrate quantitative inference, qualitative judgment, and computational efficiency, without reducing one to the others\.
By viewing probability as an evolving form of rationality rather than a fixed mathematical tool, this article offers a perspective from which contemporary debates on artificial intelligence, inference, and explanation can be reassessed\. The evolution of probability mirrors the evolution of reason itself: a continuous effort to make uncertainty intelligible, without erasing the complexity of the world it seeks to understand\.
## References
- \[1\]J\. Barone and A\. Novikoff\(1978\)A history of the axiomatic formulation of probability from borel to kolmogorov: part i\.Archive for history of exact sciences18\(2\),pp\. 123–190\.Cited by:[§6](https://arxiv.org/html/2606.00102#S6.p2.1.2)\.
- \[2\]Z\. Chenet al\.\(2003\)Bayesian filtering: from kalman filters to particle filters, and beyond\.Statistics182\(1\),pp\. 1–69\.Cited by:[§7](https://arxiv.org/html/2606.00102#S7.p3.1.7)\.
- \[3\]G\. Cybenko\(1989\)Approximation by superpositions of a sigmoidal function\.Mathematics of control, signals and systems2\(4\),pp\. 303–314\.Cited by:[Appendix C](https://arxiv.org/html/2606.00102#A3.p1.2.2)\.
- \[4\]L\. J\. Daston\(1992\)The doctrine of chances without chance: determinism, mathematical probability, and quantification in the seventeenth century\.InThe Invention of Physical Science: Intersections of Mathematics, Theology and Natural Philosophy Since the Seventeenth Century Essays in Honor of Erwin N\. Hiebert,pp\. 27–50\.Cited by:[§1](https://arxiv.org/html/2606.00102#S1.p1.1.2)\.
- \[5\]W\. Edwards, H\. Lindman, and L\. J\. Savage\(1963\)Bayesian statistical inference for psychological research\.\.Psychological review70\(3\),pp\. 193\.Cited by:[§6](https://arxiv.org/html/2606.00102#S6.p4.1.6)\.
- \[6\]D\. Gibert, F\. Lopes, V\. Courtillot, J\. L\. Mouël, and J\. Boulé\(2024\)Information theory, signal analysis and inverse problem\.arXiv preprint arXiv:2408\.16361\.Cited by:[§1](https://arxiv.org/html/2606.00102#S1.p4.1.10)\.
- \[7\]A\. Hald\(2005\)A history of probability and statistics and their applications before 1750\.John Wiley & Sons\.Cited by:[§1](https://arxiv.org/html/2606.00102#S1.p1.1.2)\.
- \[8\]S\. Hochreiter and J\. Schmidhuber\(1997\)Long short\-term memory\.Neural computation9\(8\),pp\. 1735–1780\.Cited by:[§10](https://arxiv.org/html/2606.00102#S10.p3.1)\.
- \[9\]K\. Hornik\(1991\)Approximation capabilities of multilayer feedforward networks\.Neural networks4\(2\),pp\. 251–257\.Cited by:[Appendix C](https://arxiv.org/html/2606.00102#A3.p1.2.2)\.
- \[10\]A\. S\. Householder\(1941\)A theory of steady\-state activity in nerve\-fiber networks: i\. definitions and preliminary lemmas\.The bulletin of mathematical biophysics3\(2\),pp\. 63–69\.Cited by:[§10](https://arxiv.org/html/2606.00102#S10.p5.1.2)\.
- \[11\]P\. S\. Laplace\(1812\)Théorie analytique des probabilités\.Vol\.7,Courcier\.Cited by:[§1](https://arxiv.org/html/2606.00102#S1.p3.1.4),[§3](https://arxiv.org/html/2606.00102#S3.p4.1.4)\.
- \[12\]P\. S\. Laplace\(1814\)Essai philosophique sur les prababilités\.Courcier\.Cited by:[§1](https://arxiv.org/html/2606.00102#S1.p3.1.4),[§3](https://arxiv.org/html/2606.00102#S3.p4.1.8),[§3](https://arxiv.org/html/2606.00102#S3.p7.2.4),[§3](https://arxiv.org/html/2606.00102#S3.p7.2.9),[§3](https://arxiv.org/html/2606.00102#S3.p8.1.4)\.
- \[13\]I\. J\. Leontaritis and S\. A\. Billings\(1985\)Input\-output parametric models for non\-linear systems part i: deterministic non\-linear systems\.International journal of control41\(2\),pp\. 303–328\.Cited by:[§10](https://arxiv.org/html/2606.00102#S10.p3.1)\.
- \[14\]I\. Leontaritis and S\. A\. Billings\(1985\)Input\-output parametric models for non\-linear systems part ii: stochastic non\-linear systems\.International journal of control41\(2\),pp\. 329–344\.Cited by:[§10](https://arxiv.org/html/2606.00102#S10.p3.1)\.
- \[15\]D\. V\. Lindley\(1972\)Bayesian statistics: a review\.SIAM\.Cited by:[§6](https://arxiv.org/html/2606.00102#S6.p4.1.6)\.
- \[16\]P\. Lyraud and L\. Plazenet \(Eds\.\)\(2003\)L’oeuvre de blaise pascal\.Collection Mollat,Éditions du 400e anniversaire,France\(french\)\.Cited by:[§1](https://arxiv.org/html/2606.00102#S1.p2.1.2),[§2](https://arxiv.org/html/2606.00102#S2.p1.4.2)\.
- \[17\]G\. Montúfar, R\. Pascanu, K\. Cho, and Y\. Bengio\(2014\)On the number of linear regions of deep neural networks\.Advances in neural information processing systems27\.Cited by:[Appendix B](https://arxiv.org/html/2606.00102#A2.p3.1.2)\.
- \[18\]K\. Mosegaard\(2011\)Quest for consistency, symmetry, and simplicity—the legacy of albert tarantola\.Geophysics76\(5\),pp\. W51–W61\.Cited by:[§7](https://arxiv.org/html/2606.00102#S7.p5.1.2)\.
- \[19\]J\. Neyman and E\. Pearson\(1933\)On the problem of the most efficient tests of statistical\.London: Philosophical Transactions of the Royal Society of London\.Cited by:[§6](https://arxiv.org/html/2606.00102#S6.p3.1.4)\.
- \[20\]J\. Neyman\(1977\)Frequentist probability and frequentist statistics\.Synthese36\(1\),pp\. 97–131\.Cited by:[§6](https://arxiv.org/html/2606.00102#S6.p3.1.4)\.
- \[21\]O\. Ore\(1960\)Pascal and the invention of probability theory\.The American Mathematical Monthly67\(5\),pp\. 409–419\.Cited by:[§2](https://arxiv.org/html/2606.00102#S2.p1.4.10)\.
- \[22\]B\. Pascal\(1665\)Traité du triangle arithmetique: avec quelques autres petits traitez sur la mesme matiere\.Chez Guillaume Desprez\.Cited by:[§2](https://arxiv.org/html/2606.00102#S2.p1.4.10)\.
- \[23\]J\. Reeves\(2015\)The secularization of chance: toward understanding the impact of the probability revolution on christian belief in divine providence\.Zygon®50\(3\),pp\. 604–620\.Cited by:[§1](https://arxiv.org/html/2606.00102#S1.p1.1.2)\.
- \[24\]L\. J\. Savage\(1954\)The foundations of statistics\.Courier Corporation\.Cited by:[§6](https://arxiv.org/html/2606.00102#S6.p4.1.6)\.
- \[25\]A\. Tarantola, B\. Valette,et al\.\(1982\)Inverse problems= quest for information\.Journal of geophysics50\(1\),pp\. 159–170\.Cited by:[§7](https://arxiv.org/html/2606.00102#S7.p1.1.2)\.
## Appendix ACorrelations and convolutions: symmetry of time vs\. arrow of time
To better understand some of the concepts discussed above, it is instructive to make a more technical digression on two mathematical operations closely related to probability: correlation and convolution\. These two notions offer a vivid analogy for the idea of temporal symmetry versus the arrow of time, which we have already mentioned in passing\. In probability theory as in signal processing, a correlation measures the degree of association or similarity between two random variables, or two signals, without implying any direction of causality\. Mathematically, the correlation between two variablesXXandYYcan be quantified by the Pearson correlation coefficient,
ρX,Y=Cov\(X,Y\)σXσY,\\rho\_\{X,Y\}=\\dfrac\{Cov\(X,Y\)\}\{\\sigma\_\{X\}\\sigma\_\{Y\}\},\(A\.01\)
where,Cov\(X,Y\)=E\[\(X−E\[X\]\)\(Y−E\[Y\]\)\]\\mathrm\{Cov\}\(X,Y\)=E\[\(X\-E\[X\]\)\(Y\-E\[Y\]\)\]is the covariance\. This quantity is symmetric,ie\.ρXY=ρYX\\rho\_\{XY\}=\\rho\_\{YX\}\. More generally, the cross\-correlation function between two signalsx\(t\)x\(t\)andy\(t\)y\(t\)is defined, in signal processing, as,
\(x⋆y\)\(τ\)=∫−∞\+∞x\(t\)y\(t\+τ\)𝑑t\(x\\star y\)\(\\tau\)=\\int\_\{\-\\infty\}^\{\+\\infty\}x\(t\)\\,y\(t\+\\tau\)\\,dt\(A\.02\)
which is essentially a convolution product without time reversal of one of the signals\. In the particular case of the autocorrelation of a stationary signalx\(t\)x\(t\), one obtains a functionRxx\(τ\)R\_\{xx\}\(\\tau\)hat depends on the time lagτ\\tauand satisfiesRxx\(−τ\)=Rxx\(τ\)R\_\{xx\}\(\-\\tau\)=R\_\{xx\}\(\\tau\)\. The autocorrelation is an even \(ie\. symmetric\) function ofτ\\tau\. Intuitively, this means that the similarity ofx\(t\)x\(t\)with itself shifted by\+τ\+\\tauis the same as when shifted by−τ\-\\tau, only the degree of similarity matters, not the temporal order\. In terms of reasoning, correlation reveals a mutual relationship between two phenomena without indicating which is the cause and which the effect\. This is why one often says that “correlation does not imply causation”; correlations are symmetric and timeless\. They may reflect underlying symmetries of the system,eg\.two variables influenced by the same hidden factor, or mere statistical coincidence, but to remain at the level of correlation is not yet to introduce a causal arrow\.
By contrast, convolution is an operation that, when used to combine an input signal with the impulse response of a system, introduces an explicit temporal orientation\. The convolution of two functionsffandggis defined as,
\(f∗g\)\(t\)=∫−∞\+∞f\(u\)g\(t−u\)𝑑u\.\(f\*g\)\(t\)=\\int\_\{\-\\infty\}^\{\+\\infty\}f\(u\)g\(t\-u\)du\.\(A\.03\)
Mathematically, convolution is also commutativef⋆g=g⋆ff\\star g=g\\star f, but whenffis interpreted as a cause, an input signal, andggas a response kernel, the system, one usually imposesg\(t\)=0∀t<0g\(t\)=0\\quad\\forall t<0, the condition of physical causality, meaning that the system cannot respond before the signal is applied\. In this case, the formula reduces to,
\(f∗g\)\(t\)=∫0tf\(u\)g\(t−u\)𝑑u∀t≥0,\(f\*g\)\(t\)=\\int\_\{0\}^\{t\}f\(u\)\\,g\(t\-u\)\\,du\\quad\\forall t\\geq 0,\(A\.04\)
which shows that the output at time depends on the past values of the input,ie\.\. from 0 tott, but not on its future values\. This expresses an arrow of time; the direction runs from cause to effect, from past to present\. Every causal convolution thus encodes a form of practical irreversibility\. In principle, givenggand the output, one can retrieve the input by deconvolution, but this operation is often nontrivial and highly sensitive to perturbations, a consequence of the information loss inherent in the mixing performed by convolution\.
In probability theory, convolution appears in several temporal contexts\. For example, the distribution of the sum of two independent random variables is the convolution of their individual distributions: ifXXandYYare independent, the density ofZ=X\+YZ=X\+Yis,
\(fX∗fY\)\(z\)=∫fX\(x\)fY\(z−x\)𝑑x\.\(f\_\{X\}\*f\_\{Y\}\)\(z\)=\\int f\_\{X\}\(x\)\\,f\_\{Y\}\(z\-x\)\\,dx\.\(A\.05\)AlthoughX\+Y=Y\+XX\+Y=Y\+X, a symmetry, this calculation can be interpreted as the sequential aggregation of two random contributions\. Similarly, in Markov processes, the evolution of the probability distribution follows the Chapman\-Kolmogorov equation, which is an integral convolution; the distribution after two steps is the convolution of the distribution after one step with the transition distribution of the next step\. This formalizes the idea that the future is constructed from the present through a probabilistic transition kernel\. This construction has an intrinsic irreversibility, except in special cases of reversible processes, because it is built upon a given temporal orientation, the composition of transitions in the direction of time\.
To fix ideas, consider the following analogy; correlation is like comparing two books to see whether they resemble each other, same words, same chapters, without worrying about who might have copied from whom, whereas convolution is like reading a book sequentially to see how the story unfolds chapter by chapter\. Correlation detects a global symmetry or similarity, regardless of order, while convolution represents an ordered accumulation, where each chapter builds upon the previous one\. In science, identifying a correlation is often the starting point, we observe that two phenomena vary together, but explaining that correlation requires the introduction of a causal model, which, in essence, is of the convolution type,eg\.phenomenonAAinfluencingBBover time\.
Thus, returning to probability theory and to reason, one might say that reasoning by correlation corresponds to the phase in which we gather evidence and observe symmetric relationships,eg\.\. two correlated symptoms of a disease, a form of static associative reasoning\. Reasoning by convolution, or through a dynamic model, goes a step further by introducing a sequential structure; one imagines a mechanism that produces these correlations in a given temporal direction,eg\.\. the progression of the disease first causing symptomAAand then symptomBB\. The sequential Bayesian inference discussed earlier, the step\-by\-step updating of probability, is precisely convolutive in nature; prior knowledge is combined with new information to yield posterior knowledge, and this process unfolds recursively\. Each update acts as a small convolutional fold,ie\.a multiplication by the normalized likelihood, that integrates the effect of new data\.
In terms of temporal symmetry, one can say that correlation preserves a past\-future symmetry, it would remain the same if time were reversed in a stationary system, whereas convolution breaks this symmetry by introducing a preferred direction\. This reflects a fundamental feature of modern reason; to understand the world, we no longer limit ourselves to identifying symmetries and static structures; we also seek directed laws of evolution\. We accept that to know something is also to know how it changes and how it influences other things over time\. Probability, too, has had to integrate this dimension, from an originally static theory, Laplace could calculatea prioriprobabilities of causes or events outside of time, it has become a dynamic theory\. Today, the Kalman filter or dynamic Bayesian networks are standard tools for tracking systems in evolution\.
In conclusion to this section, the opposition between correlation and convolution is nothing less than a mathematical metaphor for the evolution of probabilistic thought: from the static to the dynamic, from simple association to causality, from timeless symmetry to temporal orientation\. Probability theory provides the language to quantify and manipulate both correlations, measuring the joint entropy of variables and their symmetric interdependencies, and convolutions, combining distributions throughout a process\. It thus enables reason to address both problems of static understanding, what is the structure of relationships between variables ? and problems of evolution, how do knowledge and systems change over time?
## Appendix BViewing the deep network as a piecewise affine map
Let us consider a ReLU activated network of depthLL, producing a mapping,
f:ℝd→ℝk,f:\\mathbb\{R\}^\{d\}\\rightarrow\\mathbb\{R\}^\{k\},\(B\.01\)
the ReLU nonlinearity is defined by,
ReLU\(x\)=max\(0,x\)\.\\textrm\{ReLU\}\(x\)=max\(0,x\)\.\(B\.02\)
A central property of modern deep learning is therefore \(eg\.\[[17](https://arxiv.org/html/2606.00102#bib.bib36)\]\): any ReLU network implements a piecewise\-affine mapping, that is,
f\(x\)=𝒜i\(x\)\+bi,∀x∈ℛif\(x\)=\\mathcal\{A\}\_\{i\}\(x\)\+b\_\{i\},\\quad\\forall x\\in\\mathcal\{R\}\_\{i\}\(B\.03\)
where theℛi\\mathcal\{R\}\_\{i\}are convex polytopes forming a partition of the input space\. This structure is crucial for understanding the nature of neural networks; the network learns no law, it extracts no causality, it infers no probability, it approximates a function by juxtaposing a large number of affine planes\. Depth plays an exponential role here\. The maximal number of affine regionsℛi\\mathcal\{R\}\_\{i\}realizable by a network of widthnnand depthLLsatisfies,
Nregions\(n,L\)≥∏l=1L∑j=0d\(nlj\)N\_\{regions\}\(n,L\)\\geq\\prod\_\{l=1\}^\{L\}\\sum\_\{j=0\}^\{d\}\\left\(\\begin\{array\}\[\]\{c\}n\_\{l\}\\\\ j\\end\{array\}\\right\)\(B\.04\)
which explodes as soon asLLexceeds 5 or 6 layers\. In other words, a deep network constructs a fractal geometry of polytopes, capable of representing virtually any shape, yet without ever reaching the very notion of a model\.
## Appendix CStrong interpolation, nonexistent extrapolation
By construction, the function learned by a network within the training domain behaves as a very high\-dimensional interpolation\. The universality of neural networks can be formulated precisely as follows \(eg\.\[[3](https://arxiv.org/html/2606.00102#bib.bib8),[9](https://arxiv.org/html/2606.00102#bib.bib17)\]\), for any continuous function defined on a compact setK⊂ℝdK\\subset\\mathbb\{R\}^\{d\}and anyε\>0\\varepsilon\>0, there exists a network such that,
suppx∈K\|f\(x\)−g\(x\)\|<ε\\underset\{x\\in K\}\{\\textrm\{supp\}\}\|f\(x\)\-g\(x\)\|<\\varepsilon\(C\.01\)
But this theorem applies only to a compact setKK, with no temporal structure, no notion of cause, and no extrapolation beyondKK\. Thus, its essential implication is negative, a network knows nothing outside the support of the data\. The function obtained outside the training domain is often arbitrary, linear, because no new affine region is activated, or on the contrary divergent, because the matrix stack amplifies directions that were never tested\. The machine excels at reproducing what is known, yet remains blind to what has never been seen\. This is the opposite of probabilistic rationality, where a law entails out\-of\-sample predictions\.
## Appendix DAbsence of internal temporality: contrast with Bayes and Tarantola
Throughout the probabilistic tradition, from Bayes to Kolmogorov, and later Tarantola, the inference process is written,
p\(θ,d\)∝p\(d,θ\)p\(θ\),p\(\\theta,d\)\\propto p\(d,\\theta\)p\(\\theta\),\(D\.01\)
and it is the observation that shapes the likelihood\. This dynamic is intrinsically temporal; it possesses an epistemological arrow of time\. Nothing comparable exists in deep learning\. Training a network simply amounts to minimizing a cost function,
min𝑊1N∑i=1Nl\(fW\(xi\),yi\),\\underset\{W\}\{\\textrm\{min\}\}\\dfrac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}l\(f\_\{W\}\(x\_\{i\}\),y\_\{i\}\),\(D\.02\)
whereWWdenotes the collection of weights\. Gradient descent does nothing conceptually: it aggregates no information, and has no internal logic akin to the probabilistic “AND” introduced by Tarantola,
content…posterior∼prior⏟pre\-existing informationANDdata⏟new informationcontent\.\.\.\\textrm\{posterior\}\\sim\\underbrace\{\\textrm\{prior\}\}\_\{\\textrm\{pre\-existing information\}\}\\textrm\{AND\}\\underbrace\{\\textrm\{data\}\}\_\{\\textrm\{new information\}\}
The network knows neither hypotheses, nor uncertainties, nor likelihoods\. It optimizes a score\.
## Appendix EStructural confusion between correlation and causation
Since the learned function is an interpolation inℝd\\mathbb\{R\}^\{d\}, any deep network in fact performs geometric correlations learned locally in the data\. No internal operation is capable of distinguishing an accidental regularity, from a structural law, from a causal mechanism\. Mathematically, a deep network implements a function of the form,
f\(x\)=∑iαi𝕀ℛi\(x\)\(𝒜ix\+bi\),f\(x\)=\\sum\_\{i\}\\alpha\_\{i\}\\mathbb\{I\}\_\{\\mathcal\{R\}\_\{i\}\}\(x\)\(\\mathcal\{A\}\_\{i\}x\+b\_\{i\}\),\(E\.01\)
that is, a combination of planes with no internal syntax\. There is no hypothesis space, no competition between causes, no Bayesian weighting,P\(Hi\|d\)P\(H\_\{i\}\|d\)no such thing exists\. In emblematic real\-world applications \(Google Flu Trends, trading systems, medical imaging\), this absence of internal causality leads to abrupt collapses as soon as the data distribution shifts \(concept drift\)\. The machine has not failed: it had never inferred a law in the first place\.Similar Articles
A Renaissance gambling dispute spawned probability theory
A Scientific American article recounts how a 17th-century gambling puzzle, the “problem of points,” led Pascal and Fermat to invent modern probability theory.
Spacetime Formation under Requirements: Contextual Realization and Form-Dependent Probability
This paper proposes that quantum probability can be understood as a projection of contextual spacetime formation under finite-state requirements, reinterpreting interference and noncommutativity as mismatches from a fixed classical spacetime projection.
When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions
This paper investigates when chain-of-thought reasoning is beneficial for LLMs, showing that early-stage entropy dynamics reliably indicate reasoning utility, and introduces EDRM, a lightweight, training-free framework that adaptively selects inference strategies to achieve significant token savings while maintaining or improving accuracy.
Breaking the Chains of Probability: Neutrosophic Logic as a New Framework for Epistemic Uncertainty in Large Language Models
This paper investigates Neutrosophic Logic as a framework for modeling epistemic states in Large Language Models, demonstrating that it can capture 'hyper-truth' states beyond traditional probability constraints, leading to more transparent and ethically aware AI systems.
Bounded-Rationality, Hedging, and Generalization
This paper studies generalization in learning through the lens of bounded-rational decision theory, where the learner's response law induces a tradeoff between training loss and sample dependence. The authors show that this tradeoff is governed by an f-divergence regularizer and that generalization can be certified from the learner's hedging behavior.