People are rational; but are they logical? [Creativity – Part 2]

By Keith Devlin @KeithDevlin@fediscience.org

Line drawing of a human head showing the brain engaging in assembling the pieces of a jigsaw puzzle. — Image: Andrew Krasovitckii/Shutterstock

Well, maybe I should amend my title to begin “People can be rational.” But my focus here is whether being rational is the same as being logical. If by the latter we mean “follow the rules of logic,” then I would say the answer is “No.” Even in mathematics, though we produce logical proofs to demonstrate mathematical facts to be so, we don’t use logical reasoning to find those proofs.

What we do use, and what in my view (and I am by no means alone here) humans — and all other living creatures — rely on to act rationally, are type-recognition and type-determined action. Our brains, and our bodies, learn (from experience) to recognize types and produce responses appropriate to, in the first instance, our survival. The more types we can recognize, and the more fine-grained they are, the better equipped we are to produce appropriate responses; and the more types of response we can generate, and the more fine-grained they are, the better our chances of survival.

To give one very obvious example, to indicate that “types” are pretty basic things (i.e., they are abstractions that reify something very common), our recognition of “sunny days” and “rain-likely days” — both weather types — influences many of the things we do and the decisions we make every day. In modern society, we don’t normally think of this as a survival issue, but on occasions it can be (people have died after getting it wrong when they set out for a hike in a wilderness area), and for our early ancestors clearly often would have been.

For many creatures, surely most of them, survival is really the only issue here. But for higher mammals (and other creatures, such as octupuses and dolphins), other factors come into play. We humans can acquire types by way of observation, reflection, instruction, training, and other means, and we do so to act to our advantage in pursuit of many goals besides survival.

For instance, when we visit our doctor, we see first-hand the benefits of there being some people in society who have dedicated years of their lives studying, training, and practicing their craft to acquire a large repertoire of “human organism types,” “ailment types,” “infection types,” and the like, and developing the fluent ability to respond to their observations (= “type recognitions”) with interventions of appropriate “treatment types.”

More generally, what we term “expertise” in some domain means the individual has acquired a large repertoire of domain types they can recognize (often instantaneously, but sometimes only after some exploration and effort), each of which they associate with one or more — or perhaps even a small range of — possible response-action types.

Deliberate, step-by-step logical reasoning can certainly play a role in expert decision making. It’s a valuable tool. But, by and large, its utility is (1) it can lead to the creation of new types, and (2) it can prepare the ground for the expert-brain to recognize the appropriate type. (We also use it to provide after-the-fact justifications for what we have done — mathematical proofs being one example.)

Type recognition and associating a responsive action-type is what the brain evolved to do; all brains, not just human ones. (There are simple organisms living in water, that can recognize the type “poisonous” and will move away toward a safer location in response.) Logical reasoning is a cognitive tool we humans have created.

Mathematics is no different here. For all that the discipline defines itself in terms of logical proof, the principal tool an expert employs in solving a novel problem is type recognition — over and over again. UCLA mathematician Terence Tao wrote a superb blog post about this some years ago, titled There’s more to mathematics than rigour and proofs. He did not formulate his essay in terms of types, but that’s what he was talking about. It’s definitely worth a read — or a re-read if you have encountered it already. His “post-rigorous thinking” amounts to having available a wide range of types with which to view the issue at hand, and a range of types of possible actions to take.

Which brings me back to the question of mathematical creativity I wrote about in last month’s post.

I didn’t plan on following up on that post. It was a somewhat speculative, “New Year” topic I thought would be appropriate for the symbolic start of what we surely all assume is our return to “normal life” after a two-year hiatus due to the pandemic.

Well, it turns out my post was not quite as speculative as I thought. A reader sent me a link to a 2022 paper in Nature that described recent neuroscience research consistent with the theory I advanced. What I referred to as the “Imaginary World Brain” (IWB) has a name in contemporary molecular psychiatry: the Default Network (DNW). The paper, which is titled The default network is causally linked to creative thinking, describes fMRI studies of patients undergoing brain surgery where they need to be conscious, and could thus be presented with tasks to measure creativity while the surgery proceeds. The researchers’ experimental findings are entirely consistent with the phenomenological theory I proposed.

Though I never knew Molecular Psychiatry was even a thing until I read their paper (it has its own journal and a Wikipedia page), I didn’t find it too surprising that such work was going on. (Their experiment is very ingenious, by the way.) The theory I was proposing emerged from a multi-discipline Department-of-Defense funded project to improve intelligence analysis that I worked on for several years, and we looked for ideas from a wide variety of theories and studies about creativity. (Intelligence analysis is about telling a story based on, and consistent with, the available data. That’s why we had an academic expert on creative (fiction-) writing on our team.) Since Molecular Psychiatry, or at least the journal by that name, was founded back in 1997, and since the leader of my team was an MIT AI graduate who had worked with DARPA and the DoD for many years, I suspect I became aware of current theories on the DNW — but not the name — as just one part of a mass of tacit knowledge of various theories of cognition and creativity I acquired from working (and talking) with him over the years. To say nothing of all the seminars on cognitive science I attended at Stanford over two decades. (I guess I have a vast store of types at my disposal as a result!)

In any event, since the project goal was to build better systems for intelligence analysis, not advance scientific knowledge, it didn’t matter if the sources of our ideas were at an uncertain stage and as yet without a lot of solid supporting evidence. “Does it work?” is a different criterion for a project than “Is it correct?” (which, truth be told, is rarely achievable), or even “Is the supporting evidence solid?” (which in practice means “Is there scientific consensus that it’s as good as we can get right now.”) This was not a safety-critical engineering project. In intelligence analysis, any improvement can be beneficial.

Given the alignment with the result in that DNW paper, I thought I’d say a little bit more about the pragmatic phenomenological types (PPT) framework I referred to in last month’s post; including its relevance to understanding creativity (mathematical creativity, in particular). Not least because I think the framework has beneficial implications for education. (It has already proved to be of value in psychiatry as well as intelligence analysis.)

A poster for the movie Memento alongside a book cover for the fairy story Little Red Riding Hood — The main two applications domains for our intelligence analysis project were movies and written fiction. We conducted detailed tests of our framework on the Christopher Nolan movie *Memento* (released in 2000) and a political satire version of the children’s fairy tale *Little Red Riding Hood*, both chosen because they have features that are very significant in intelligence analysis.

With improved post-9/11 intelligence analysis at the goal, we were looking for a “logic” (i.e., a theory of reasoning) that allowed for deductive chains that involved changes in the deductive rules at any point, based on either the acquisition of new information or on the reasoning so far.

In my case, I drew on research I had done with the socio-linguist and computer scientist Duska Rosenberg in the 1990s. That research, initially sponsored by industry, was focused on understanding how workplace communication between domain experts can go wrong, and how systems could be designed to prevent it. To that end, we started with a detailed, mathematically-based analysis of some of the work of the ethnomethodologist Harvey Sacks on how everyday uses of simple statements serve to convey information. Our analysis had to be grounded in mathematics, since one of the project’s main goals was to develop digital communication technologies to improve workplace communication.

In the post-9/11 intelligence-analysis project, I tried to extend the methods Rosenberg and I had developed in the workplace-language project, to try to analyze the Christopher Nolan movie Memento, the team having determined that it provided a good test bed for intelligence analysis. (If you’ve seen the movie, you will realize why that would be the case. If our framework could not handle Memento, it would have little chance deployed in the IA field, since Nolan littered his movie with clues to help the audience – the opposite of the situation facing an intelligence analyst.)

Another team member focused on analyzing a reader’s understanding of written children’s stories (in particular Little Red Riding Hood as a Dictator Would Tell It), while the team leader was looking to develop an automated system for analyzing movies in general. Again, everything had to be grounded in mathematics to facilitate (digital) system design. (We were not trying to develop automated systems to replace trained human analysts, by the way; rather to build systems that could help them be more effective in their human reasoning. The former would have been be a fool’s errand; the latter was challenging, but not prima face impossible.)

The result of that research was, in effect, a “new logic,” an (implementable) analytic framework for handling complex human reasoning in terms of adaptive type-recognition and type-response, as opposed to the more familiar, rule-based, logical reasoning.

A new “logic”

The figure shows the system architecture we developed (in its most recent version). Some of the terminology reflects, in particular, the digital-system-design goal of the project (and its funders), but the heart of this framework is it provides a “formal logic” (or, if you prefer, a “formal rationality framework”) that supports the kind of analysis we want to be able to carry out.

A line drawing of a flow chart. The paper cited in the January post provides a detailed description of the system the chart illustrates. — The *Pragmatic Phenomenological Type* framework (PPT) for analyzing human reasoning, action, and communication.

For a full discussion of this architecture, together with an example analysis of reading a story, see the paper I referred to last month. I’ll provide a brief overview here to convey the general idea.

But note that the paper was directed towards system engineering, not scientific analysis, and some of the points raised are specific to that application; e.g. the decision to do as much in L1 as possible, using L2 as a “fallback”. If the goal is understanding, focusing on, and probing, the part played by L2 is crucial.

In our book Language at Work (1996), Rosenberg and I described an analytic framework we developed called Layered Formalism and Zooming to analyze transcripts of events. The “layered formalism” took place in L1; “zooming” often meant looking into L2, though that interpretation was possible only later, after the PPT framework had been developed.

There are two primary reasoning systems, each with a type hierarchy, one generational from primitives, the other abstractional (both from experience and from the reasoning itself). Abstraction goes from right to left in the diagram. (Ignore the two bottom ovals for now; they denote the particular application domain, in this case the creativity framework I used to analyze and provide and explanation of mathematical creativity.)

The Level 1 system (L1) is the one we are familiar with (both as people, and as logicians and theorists), where we reason with (more precisely, our reasoning can be modeled by) natural language, logical statements and formal models. This encompasses all regular mathematics, scientific theory, reasoning and engineering. Techniques include arithmetic, probabilistic, relational and logical methods.

The L1 primitives are domain-specific, though new primitives may be created by the reasoning process itself (top left-to-right broken-arrow from L2 primitives to L1 primitives). The L1 system can be modeled set-theoretically, along the lines described in my book Logic and Information (referred to last in last month’s post).

The L2 system is where the phenomenological aspects of reasoning are handled. It is categoric in nature, and the mathematical modeling uses category theory (with digital implementation by functional programming).

A system L2 —> L1 tracks the way L2 categories influence, and create, the objects in the L1 ontology; a second system L1 —> L2 handles situations when activity in L1 results in new categories added to L2.

In terms of the (mathematical) creativity discussion in last month’s post, L2 is where all the background (“possible worlds”) processing is carried out, including massive parallelism (likely involving quantum phenomena), not accessible to our conscious experience.

The question of how feasible it is to build digital systems along these lines, that have significant payoff in intelligence analysis, is not in my area of expertise. The testbed examples we used in our project (built around understanding and analyzing written stories and movies) were more constrained in their scope, and there the framework functioned well, in my view.

The second published paper I cited last month also described, inter alia, how the framework can be used to understand PTSD and its treatment. That is definitely well outside my domain, and I played no part in writing that part of the paper. But that story did not end there.

Not long after I retired from Stanford at the end of 2018, I became involved for a period with the care of someone in my close circle who was recovering from a brain injury having an effect somewhat like a massive stroke. Much to my initial surprise, the PPT framework from our intelligence-analysis project enabled me to make sense of virtually every step in the two-year process of recovery of their cognitive faculties, and frequently able to predict with success what was likely to happen next (often when the medical professionals professed to have little idea of why things were progressing as they did).

The recovery process paralleled very closely the way an adaptive neural network learns. I’d expected that, though that expectation was based only on my knowledge of ANNs, with at best an amateur’s knowledge of clinical neurology, backed up with frequent Web searches for information. One place where the PPT framework came in — in particular the maps between L1 and L2 — was in providing (what I took to be) a plausible explanation of why occasions when key cognitive abilities were regained were accompanied by significant PTSD-like episodes; something I had never expected. But the PPT provides an explanation, which at the very least worked for me! In brief, once a type has been abstracted into L2, it surely never goes away, and under appropriate circumstances, given the right trigger, can influence activity in L1. So, given how creativity looks through the PPT lens, those episodes are, in fact, to be half-expected.

[Incidentally, creativity came back in clearly identifiable stages over many months, per the individual’s self-reports and my analysis as a caregiver. I had not expected that either, though it may be well known in clinical psychiatry, and on a theoretical level would surely not surprise anyone steeped in machine learning. I should note that I am using the term “creative” to mean “non-algorithmic cognitive activity.” The individual I was helping is in fact a creative artist (how that came back in stages is a fascinating story of its own), but a seemingly simple task like sorting items to throw away, sell, or store, and how and where to store them if they are to be kept, was one of the last creative activities they were able to do again. Since sorting generally requires creating new categories (types) on-the-fly, I assume it requires fully-functioning L1 —> L2 and L2 —> L1 systems, and they take a long time to re-build.]

I didn’t approach this activity as a scholar, by the way. I was just helping someone I knew. But after many years immersed in that PPT research, I automatically viewed things through that lens, as I struggled to make sense of what to me was a very unfamiliar human situation. (I’d actually forgotten about the specific PTSD application in the paper I had co-authored, coming across it again only after that two-year cognitive recovery episode was over, as I searched my files in preparation for an upcoming talk on the PPT framework. Life can be surprising at times.)