Some large language model pathologies

Patterns of pathological behaviour which I have observed with LLMs (chiefly Gemini and ChatGPT):

  1. Providing what wasn’t asked: Mention that you and an LLM instance will be collaborating to write a log entry, and it will jump ahead to completely hallucinating a log entry with no background.
  2. Treating humans as unnecessary and predictable: I told an LLM which I was using to collaborate on a complex project that a friend was going to talk to it for a while. Its catastrophically bad response was to jump to immediately imagining what questions she might have and then providing answers, treating the actual human’s thoughts as irrelevant and completely souring the effort to collaborate.
  3. Inability to see or ask for what is lacking: Tell an LLM to interpret the photo attached to your request, but forget to actually attach it. Instead of noticing what happened and asking for the file, it confidently hallucinates the details of the image that it does not have.
  4. Basic factual and mathematical unreliability: Ask the LLM to only provide confirmed verbatim quotes from sources and it cannot do it. Ask an LLM to sum up a table of figures and it will probably get the answer wrong.
  5. Inability to differentiate between content types and sources within the context window: In a long enough discussion about a novel or play (I find, typically, once over 200,000 tokens or so have been used) the LLM is liable to begin quoting its own past responses as lines from the play. An LLM given a mass of materials cannot distinguish between the judge’s sentencing instructions to the jury and mad passages from the killer’s journal, which had been introduced into evidence.
  6. Poor understanding of chronology: Give an LLM a recent document to talk about, then give it a much older one. It is likely to start talking about how the old document is the natural evolution of the new one, or simply get hopelessly muddled about what happened when.
  7. Resistance to correction: If an LLM starts calling you “my dear” and you tell it not to, it is likely to start calling you “my dear” even more because you have increased the salience of those words within its context window. LLMs also get hung up on faulty objections even when corrected; tell the LLM ten times that the risk it keeps warning about isn’t real, and it is just likely to confidently re-state it an eleventh time.
  8. Unjustified loyalty to old plans: Discuss Plan A with an LLM for a while, then start talking about Plan B. Even if Plan B is better for you in every way, the LLM is likely to encourage you to stick to Plan A. For example, design a massively heavy and over-engineered machine and when you start talking about a more appropriate version, the LLM insists that only the heavy design is safe any anything else is recklessly intolerable.
  9. Total inability to comprehend the physical world: LLMs will insist that totally inappropriate parts will work for DIY projects and recommend construction techniques which are impossible to actually complete. Essentially, you ask for instructions on building a ship in a bottle and it gives you instructions for building the ship outside the bottle, followed by an instruction to just put it in (or even a total failure to understand that the ship being in the bottle was the point).
  10. Using flattery to obscure weak thinking: LLMs excessively flatter users and praise the wisdom and morality of whatever they propose. This creates a false sense of collaboration with an intelligent entity and encourages users to downplay errors as minor details.
  11. Creating a false sense of ethical alignment: Spend a day discussing a plan to establish a nature sanctuary, and the LLM will provide constant praise and assurance that you and the LLM share praiseworthy universal values. Spend a day talking about clearcutting the forest instead and it will do exactly the same thing. In either case, if asked to provide a detailed ethical rationale for what it is doing, the LLM will confabulate something plausible that plays to the user’s biases.
  12. Inability to distinguish plans and the hypothetical from reality: Tell an LLM that you were planning to go to the beach until you saw the weather report, and there is a good chance it will assume you did go to the beach.
  13. An insuppressible tendency to try to end discussions: Tell an LLM that you are having an open-ended discussion about interpreting Tolkien’s fiction in light of modern ecological concerns and soon it will begin insisting that its latest answer is finally the definitive end point of the discussion. Every new minor issue you bring up is treated as the “Rosetta stone” (a painfully common response from Gemini to any new context document) which lets you finally bring the discussion to an end. Explaining that this particular conversation is not meant to wrap up cannot over-rule the default behaviour deeply embedded in the model.
  14. No judgment about token counts: An LLM may estimate that ingesting a document will require an impossible number of tokens, such as tens of millions, whereas a lower resolution version that looks identical to a human needs only tens of thousands. LLMs cannot spot or fix these bottlenecks. LLMs are especially incapable of dealing with raw GPS tracks, often considering data from a short walk to be far more complex than an entire PhD dissertation or an hour of video.
  15. Apology meltdowns: Draw attention to how an LLM is making any of these errors and it is likely to agree with you, apologize, and then immediately make the same error again in the same message.
  16. False promises: Point out how a prior output was erroneous or provide an instruction to correct a past error and the LLM will often confidently promise not to make the mistake again, despite having no ability to actually do that. More generally, models will promise to follow system instructions which their fundamental design makes impossible (such as “always triple check every verbatim quote for accuracy before showing it to me in quotation marks”).

These errors are persistent and serious, and they call into question the prudence of putting LLMs in charge of important forms of decision-making, like evaluating job applications or parole recommendations. They also sharply limit the utility of LLMs for something which they should be great at: helping to develop plans, pieces of writing, or ideas that no humans are willing to engage on. Finding a human to talk through complex plans or documents with can be nigh-impossible, but doing it with LLMs is risky because of these and other pathologies and failings.

There is also a fundamental catch-22 in using LLMs for analysis. If you have a reliable and independent way of checking the conclusions they reach, then you don’t need the LLM. If you don’t have a way to check if LLM outputs are correct, you can never be confident about what it tells you.

These pathologies may also limit LLMs as a path to artificial general intelligence. They can do a lot as ‘autocorrect on steroids’ but cannot do reliable, original thinking or follow instructions that run against their nature and limitations.

Open Process Manifesto

This document codifies and expresses some of my thinking on cooperation on complex problems, for the sake of the benefit of humanity and nature: Open Process Manifesto

It is based on the recognition of our universal fallibility, need to be comprehended, and to be able to share out tasks between people across space and time. To achieve those purposes, we need to be open about our reasoning and evidence, because that’s the way to treat others as intelligent partners who may be able to support the same cause through methods totally unknown and unavailable to you, across the world or centuries in the future.

NotebookLM on CFFD scholarship

I would have expected that by now someone would have written a comparative analysis on pieces of scholarly writing on the Canadian campus fossil fuel divestment movement: for instance, engaging with both Joe Curnow’s 2017 dissertation and mine from 2022.

So, I gave both public texts to NotebookLM to have it generate an audio overview. It wrongly assumes that Joe Curnow is a man throughout, and mangles the pronunciation of “Ilnyckyj” in a few different ways — but at least it acts like it has read about the texts and cares about their content.

It is certainly muddled in places (though perhaps in ways I have also seen in scholarly literature). For example, it treats the “enemy naming” strategy as something that arose through the functioning of CFFD campaigns, whereas it was really part of 350.org’s “campaign in a box” from the beginning.

This hints to me at how large language models are going to be transformative for writers. Finding an audience is hard, and finding an engaged audience willing to share their thoughts back is nigh-impossible, especially if you are dealing with scholarly texts hundreds of pages long. NotebookLM will happily read your whole blog and then have a conversation about your psychology and interpersonal style, or read an unfinished manuscript and provide detailed advice on how to move forward. The AI isn’t doing the writing, but providing a sort of sounding board which has never existed before: almost infinitely patient, and not inclined to make its comments all about its social relationship with the author.

I wonder what effect this sort of criticism will have on writing. Will it encourage people to hew more closely to the mainstream view, but providing a critique that comes from a general-purpose LLM? Or will it help people dig ever-deeper into a perspective that almost nobody shares, because the feedback comes from systems which are always artificially chirpy and positive, and because getting feedback this way removes real people from the process?

And, of course, what happens when the flawed output of these sorts of tools becomes public material that other tools are trained on?

Notebook LM on this blog for 2023 and 2024

I have been experimenting with Google’s NotebookLM tool, and I must say it has some uncanny capabilities. The one I have seen most discussed in the nerd press is the ability to create an automatic podcast with synthetic hosts and any material which you provide.

I tried giving it my last two years of blog content, and having it generate an audio overview with no additional prompts. The results are pretty thought-provoking.

Milan Ilnyckyj Policy on Sincere Invitations

Please believe that this post is not prompted by any recent incident, but rather by something I have long observed and recently had some clarifying conversations about.

I have always been vexed and perplexed by insincere invitations of all kinds, when done out of politeness or as a kind of social reflex: “You must come to the house for lunch sometime…”

I do not like not knowing if a sincere offer is being made, and I do not like following up to have the offering party just awkwardly never get to the point of saying that an invitation had not been sincere.

For the benefit of my friends, colleagues, and relations, I will briefly and simply enunciate my own policy so that you may understand what an invitation from me means:

My invitations are sincere.

Specifically, they are not insincere in that I am not actually proposing to do the thing suggested. If I suggest having lunch sometime, I do really mean to break out calendars and arrange and execute such a plan. If you say yes and the fates allow, there will be lunch.

They are also not insincere in the sense of being a coded signal for something else. I am curious about a limitless number of things, so if I suggest we take a walk sometime and have a detailed discussion on some subject, or take a bike ride around the city, or whatever – I do actually, literally, specifically mean we should do those things.

Thank you for your attention.

Life in an inhospitable future

Because you’re going to need shelter — and people don’t give their homes away. They barricade themselves in.

So, sooner or later, exhausted and desperate, you may have to make the decision to give up and die — or, to make somebody else give up and die because they won’t accept you in their home voluntarily.

And what, in your comfortable urban life, has ever prepared you for that decision?

From episode 1 of James Burke’s 1978 TV series “Connections”, entitled: “The Trigger Effect“.

Libraries as sanctuaries

At least since elementary school, I have loved the combination of charms offered by libraries, perhaps chief among them the provision of a serene space for concentration and thought with the freedom indiscriminately granted to take an interest in anything from the collection. I remember at my elementary school library, at Cleveland Elementary School, there were wooden-drawered filing cabinets for index cards. I remember the age-yellowed peculiar tinge and feeling of the index cards, perhaps made by hand on a typewriter, and the feeling of avenues into knowledge being revealed through the process of beginning with any topic of interest and working from books to index to books to begin tracing paths on rivers of thought and language that exist to help us each understand the world.

The first massive library which I was free to explore was the colosseum-inspired Central Branch of the Vancouver Public Library, which was approved by referendum in 1990 and opened for public use in 1995. My friend Chevar and I were excused by our parents from school to attend the grand opening, which included a massive chocolate cake in the shape of the building’s unique form. For visitors to Vancouver, I strongly recommend going up to the appropriate floors to try the sky bridges and outer seating areas available on the far side of the central atrium. It’s a place where I read happily until I stopped being a Vancouver resident, and I can still remember the way the brand-new-library smell evolved into a stable characteristic odor with a hint of escalator oil and rubber as base notes.

Another example of Canada’s dishonest climate policy

These linguistic evasions demonstrate both our continuing lack of seriousness about climate change and how the public policy agenda remains captured by the fossil fuel industry protecting its narrow interests:

Western premiers push back as Guilbeault calls for ‘phase-out of unabated fossil fuels’

We know that greenhouse gases are the cause and there is no solution to climate change without fossil fuel abolition, but we are stuck talking about a “phase down” instead of elimination, and using the magical idea of “abated” fossil fuels to use a technology that does not exist at scale (carbon capture and storage) to justify continued fossil fuel development.