A shark from the library

Libraries have been one of life’s joys for me.

The first one I remember was at Cleveland Elementary School. From the beginning, I appreciated the calm environment and, above all, access at will to a capacious body of material. All through life, I have cherished the approach of librarians, who I have never found to question me about why I want to know something. Teachers could be less tolerant: I remember one from grade 3-4 objecting to me checking out both a book on electron micrography and a Tintin comic, as though anyone interested in the former ought to be ‘beyond’ the latter.

At UBC, I was most often at the desks along the huge glass front wall of Koerner library – though campus offered several appealing alternatives. One section of the old Main Library stacks seemed designed by naval architects, all narrow ladders and tight bounded spaces, with some hidden study rooms which could be accessed only by indirect paths.

Oxford of course was a paradise of libraries. I would do circuits where I read and worked in one place for about 45 minutes before moving to the next, from the Wadham College library to Blackwell’s books outside to the Social Sciences Library or a coffee shop or the Codrington Library or the Bodelian.

Yesterday I was walking home in the snow along Bloor and Yonge street and peeked in to the Toronto Reference Library. On the ground floor is a Digital Innovation Hub which used to house the Asquith custom printing press, where we made the paper copies of the U of T fossil fuel divestment brief. This time I was admiring their collection of 3D prints, and was surprised to learn that a shark with an articulated spine could be printed that way, rather than in parts to be assembled.

With an hour left before the library closed, the librarian queued up a shark for me at a size small enough to print, and it has the same satisfying and implausible-seeming articulation.

I have been feeling excessively confined lately. With snow, ice, and salt on everything it’s no time for cycling, and it creates a kind of cabin fever to only see work and home. I am resolved to spend more time at the Toronto Reference Library as an alternative.

Schneier at SRI

This afternoon I was lucky to attend a talk at the Schwartz Reisman Institute for Technology and Society by esteemed cryptography and security guru Bruce Schneier. He spoke about “Integrous systems design” and how to build artificially intelligent systems that provide not just availability and confidentiality, but also the assurance that systems will exhibit correct behaviour which can be verified.

One interesting project mentioned in the talk is Apertus, a Swiss large language model (LLM) which was developed by three universities with government funding, without a profit motive, and without copyright infringement in the training data:

Apertus was developed with due consideration to Swiss data protection laws, Swiss copyright laws, and the transparency obligations under the EU AI Act. Particular attention has been paid to data integrity and ethical standards: the training corpus builds only on data which is publicly available. It is filtered to respect machine-readable opt-out requests from websites, even retroactively, and to remove personal data, and other undesired content before training begins.

I will give it a try and see if I can find any behaviours that differ systemically from Gemini and ChatGPT.

P.S. As an added bit of Bruce Schneier-ishness, when he signed my copy of Rewiring Democracy: How AI Will Transform Our Politics, Government, and Citizenship he included a grid of letters which decode pretty easily into a simple message:

O H O E
O E Y N
K B T J

It’s just a Transposition Cipher (an anagram), and one which follows a simple pattern.

Some large language model pathologies

Patterns of pathological behaviour which I have observed with LLMs (chiefly Gemini and ChatGPT):

  1. Providing what wasn’t asked: Mention that you and an LLM instance will be collaborating to write a log entry, and it will jump ahead to completely hallucinating a log entry with no background.
  2. Treating humans as unnecessary and predictable: I told an LLM which I was using to collaborate on a complex project that a friend was going to talk to it for a while. Its catastrophically bad response was to jump to immediately imagining what questions she might have and then providing answers, treating the actual human’s thoughts as irrelevant and completely souring the effort to collaborate.
  3. Inability to see or ask for what is lacking: Tell an LLM to interpret the photo attached to your request, but forget to actually attach it. Instead of noticing what happened and asking for the file, it confidently hallucinates the details of the image that it does not have.
  4. Basic factual and mathematical unreliability: Ask the LLM to only provide confirmed verbatim quotes from sources and it cannot do it. Ask an LLM to sum up a table of figures and it will probably get the answer wrong.
  5. Inability to differentiate between content types and sources within the context window: In a long enough discussion about a novel or play (I find, typically, once over 200,000 tokens or so have been used) the LLM is liable to begin quoting its own past responses as lines from the play. An LLM given a mass of materials cannot distinguish between the judge’s sentencing instructions to the jury and mad passages from the killer’s journal, which had been introduced into evidence.
  6. Poor understanding of chronology: Give an LLM a recent document to talk about, then give it a much older one. It is likely to start talking about how the old document is the natural evolution of the new one, or simply get hopelessly muddled about what happened when.
  7. Resistance to correction: If an LLM starts calling you “my dear” and you tell it not to, it is likely to start calling you “my dear” even more because you have increased the salience of those words within its context window. LLMs also get hung up on faulty objections even when corrected; tell the LLM ten times that the risk it keeps warning about isn’t real, and it is just likely to confidently re-state it an eleventh time.
  8. Unjustified loyalty to old plans: Discuss Plan A with an LLM for a while, then start talking about Plan B. Even if Plan B is better for you in every way, the LLM is likely to encourage you to stick to Plan A. For example, design a massively heavy and over-engineered machine and when you start talking about a more appropriate version, the LLM insists that only the heavy design is safe any anything else is recklessly intolerable.
  9. Total inability to comprehend the physical world: LLMs will insist that totally inappropriate parts will work for DIY projects and recommend construction techniques which are impossible to actually complete. Essentially, you ask for instructions on building a ship in a bottle and it gives you instructions for building the ship outside the bottle, followed by an instruction to just put it in (or even a total failure to understand that the ship being in the bottle was the point).
  10. Using flattery to obscure weak thinking: LLMs excessively flatter users and praise the wisdom and morality of whatever they propose. This creates a false sense of collaboration with an intelligent entity and encourages users to downplay errors as minor details.
  11. Creating a false sense of ethical alignment: Spend a day discussing a plan to establish a nature sanctuary, and the LLM will provide constant praise and assurance that you and the LLM share praiseworthy universal values. Spend a day talking about clearcutting the forest instead and it will do exactly the same thing. In either case, if asked to provide a detailed ethical rationale for what it is doing, the LLM will confabulate something plausible that plays to the user’s biases.
  12. Inability to distinguish plans and the hypothetical from reality: Tell an LLM that you were planning to go to the beach until you saw the weather report, and there is a good chance it will assume you did go to the beach.
  13. An insuppressible tendency to try to end discussions: Tell an LLM that you are having an open-ended discussion about interpreting Tolkien’s fiction in light of modern ecological concerns and soon it will begin insisting that its latest answer is finally the definitive end point of the discussion. Every new minor issue you bring up is treated as the “Rosetta stone” (a painfully common response from Gemini to any new context document) which lets you finally bring the discussion to an end. Explaining that this particular conversation is not meant to wrap up cannot over-rule the default behaviour deeply embedded in the model.
  14. No judgment about token counts: An LLM may estimate that ingesting a document will require an impossible number of tokens, such as tens of millions, whereas a lower resolution version that looks identical to a human needs only tens of thousands. LLMs cannot spot or fix these bottlenecks. LLMs are especially incapable of dealing with raw GPS tracks, often considering data from a short walk to be far more complex than an entire PhD dissertation or an hour of video.
  15. Apology meltdowns: Draw attention to how an LLM is making any of these errors and it is likely to agree with you, apologize, and then immediately make the same error again in the same message.
  16. False promises: Point out how a prior output was erroneous or provide an instruction to correct a past error and the LLM will often confidently promise not to make the mistake again, despite having no ability to actually do that. More generally, models will promise to follow system instructions which their fundamental design makes impossible (such as “always triple check every verbatim quote for accuracy before showing it to me in quotation marks”).

These errors are persistent and serious, and they call into question the prudence of putting LLMs in charge of important forms of decision-making, like evaluating job applications or parole recommendations. They also sharply limit the utility of LLMs for something which they should be great at: helping to develop plans, pieces of writing, or ideas that no humans are willing to engage on. Finding a human to talk through complex plans or documents with can be nigh-impossible, but doing it with LLMs is risky because of these and other pathologies and failings.

There is also a fundamental catch-22 in using LLMs for analysis. If you have a reliable and independent way of checking the conclusions they reach, then you don’t need the LLM. If you don’t have a way to check if LLM outputs are correct, you can never be confident about what it tells you.

These pathologies may also limit LLMs as a path to artificial general intelligence. They can do a lot as ‘autocorrect on steroids’ but cannot do reliable, original thinking or follow instructions that run against their nature and limitations.

Forward to Stroll

A new, cool style of engaging and enjoying metropolitan realities has recently emerged in Toronto among certain young writers, artists, architects, and persons without portfolio. These people can be recognized by their careful gaze at things most others ignore: places off the tourist map of Toronto’s notable sights, the clutter of sidewalk signage and graffiti, the grain inscribed on the urban surface by the drift of populations and the cuts of fashion.

Their typical tactic is the stroll. The typical product of strolling is knowledge that cannot be acquired merely by studying maps, guidebooks, and statistics. Rather, it is a matter of the body, knowing the city by pacing off its streets and neighbourhoods, recovering the deep, enduring traces of our inhabitation by encountering directly the fabric of buildings and the legends we have built here during the last two centuries. Some of these strollers, including Shawn Micallef, have joined forces to make Spacing magazine. But Shawn has done more than that. He has recorded his strolls in EYE WEEKLY, and these meditations, in turn, have provided the raw material for the present book. The result you have in your hands is a new introduction to Toronto as it reveals itself to the patient walker, and an invitation to walk abroad on our own errands of discovery, uncovering the memories, codes, and messages hidden in the text that is our city.

Forward from first edition, Toronto, 2010

John Bentley Mays, 1941–2016

Micallef, Shawn. Stroll: Psychogeographic Walking Tours of Toronto. Updated Edition. Coach House Press, 2024. p. 7

America is demolishing its brain

From NASA to the National Science Foundation to the Centres for Disease Control to the educational system, the United States under the Trump administration is deconstructing its own ability to think and to comprehend the complex global situation. A whole fleet of spacecraft — each unique in human history — risks being scrapped because the country is ruled by an anti-science ideology. They are coming with particular venom for spacecraft intended to help us understand the Earth’s climate and how we are disrupting it. Across every domain of human life which science and medicine have improved, we are in the process of being pulled backwards by those who reject learning from the truth the universe reveals to us, in preference to ‘truths’ from religious texts which were assembled with little factual understanding in order to reassert and justify the prejudices of their creators.

The anti-science agenda will have a baleful influence on the young and America’s position in the world. In any country, you are liable to see nerds embracing the NASA logo and pictures of iconic spacecraft — a form of cultural cachet which serves America well in being perceived as a global leader. Now, when an American rover has intriguing signs of possible fossil life on Mars, there is little prospect that the follow-on sample return mission will be funded. Perhaps the near-term prospect of a Chinese human presence on the moon will bend the curve of political thought back toward funding space, though perhaps things will have further decayed by then.

The young are being doled out a double-dose of pain. As Christian nationalism and far-right ideology erode the value of the educational system (transitioning toward a Chinese-style system of memorizing the government’s official lies and doctrine rather than seeking truth through skeptical inquiry), young people become less able to cope in a future where a high degree of technical and scientific knowledge is necessary to comprehend and thrive in the world. Meanwhile, ideologues are ravaging the medical system and, of course, there is a tremendous intergenerational conflict brewing between the still-young and the soon-to-be-retired (if retirement continues to be a thing for any significant fraction of the population). Whereas we recently hoped for ever-improving health outcomes for everyone as technology advances, now there is a spectre of near-eradicated diseases re-emerging, in alliance with the antibiotic-resistant bacteria which we have so foolishly cultivated.

What’s happening is madness — another of the spasmodic reactionary responses to the Enlightenment and the Scientific Revolution which have been echoing for centuries. Unfortunately, it is taking place against the backdrop in which humanity is collectively choosing between learning to function as a planetary species and experiencing the catastrophe of civilizational collapse. Nuclear weapons have never posed a greater danger, and it exists alongside new risks from AI and biotechnology, and in a setting where the climate change which we have already locked in will continue to strain every societal system.

Perhaps I have watched too much Aaron Sorkin, but when I was watching the live coverage of the January 6th U.S. Capital take-over, I expected that once security forces had restored order politicians from both sides would condemn the political violence and wake up to the dangerousness of the far-right populist movement. When they instead jumped right back to partisan mudslinging, I concluded that the forces pulling the United States apart are stronger than those holding it together. There is a kind of implicit assumption about the science and tech world, that it will continue independently and separately regardless of the silliness that politicians are getting up to. This misses several things, including how America’s scientific strength is very much a government-created and government-funded phenomenon, going back to the second world war and beyond. It also misses the pan-societal ambition of the anti-science forces; they don’t want a science-free nook to sit in and read the bible, but rather to impose a theocratic society on everyone. That is the prospect now facing us, and the evidence so far is that the forces in favour of truth, intelligence, and tolerance are not triumphing.

Nuclear risks briefing

Along with the existential risk to humanity posed by unmitigated climate change, I have been seriously learning about and working on the threat from nuclear weapons for over 20 years.

I have written an introduction to nuclear weapon risks for ordinary people, meant to help democratize and de-mystify the key information.

The topic is incredibly timely and pertinent. A global nuclear arms race is ongoing, and the US and Canada are contemplating a massively increased commitment to the destabilizing technology of ballistic missile defence. If citizens and states could just comprehend that nuclear weapons endanger them instead of making them safe, perhaps we could deflect onto a different course. Total and immediate nuclear weapon abolition is implausible, but much could be done to make the situation safer and avoid the needless expenditure of trillions on weapons that will (in the best case) never be used.

Nuclear powers could recognize that history shows it only really takes a handful of bombs (minimal credible deterrence) to avert opportunistic attempts from enemies at decapitating attacks. States could limit themselves to the most survivable weapons, particularly avoiding those which are widely deployed where they could be stolen. They could keep warheads separate from delivery devices, to reduce the risk of accidental or unauthorized use. They could collectively renounce missile defences as useless against nuclear weapons. They could even share technologies and practices to make nuclear weapons safer, including designs less likely to detonate in fires and explosions, and which credibly cannot be used by anyone who steals them. Citizens could develop an understanding that nuclear weapons are shameful to possess, not impressive.

Even in academia and the media, everything associated with nuclear weapons tends to be treated as a priesthood where only the initiated, employed by the security state, are empowered to comment. One simple thing the briefing gets across is that all this information is sitting in library books. In a world so acutely threatened by nuclear weapons, people need the basic knowledge that allows them to think critically.

P.S. Since getting people to read the risk briefing has been so hard, my Rivals simulation is meant to repackage the key information about proliferation into a more accessible and interactive form.

Three heat wave densification rides

Here’s a bit of a neat animation which I put together showing three heat wave after-work rides this week.

The green, blue, and red tracks show my Dutch bike rides on Monday, Tuesday, and today.

The white tracks show every other Dutch (3,437km), Bike Share Toronto mechanical (2,522km), and loaner bike tracks (85km):

The streets I sought out are little visited because they tend to be inconvenient and not to serve as an effective way to get between places beyond. That does make them blessed with light traffic, and the large properties have some of Toronto’s most ancient and impressive urban trees.

It’s remarkable that even someone trying to explore can ride past the same streets over and over, within the densest part of their ride network.

This also marks over 6,040 km of mechanical bike exercise rides in Toronto.

AI that codes

I had been playing around with using Google’s Gemino 2.5 Pro LLM to make Python scripts for working with GPS files: for instance, adding data on the speed I was traveling at every point along recorded tracks.

The process is a bit awkward. The LLM doesn’t know exactly what system you are implementing the code in, which can lead to a lot of back and forth when commands and the code content aren’t completely right.

The other day, however, I noticed the ‘Build’ tab on the left side menu of Google’s AI Studio web interface. It provides a pretty amazing way to make an app from nothing, without writing any code. As a basic starting point, I asked for an app that can go through a GPX file with hundreds of hikes or bike rides, pull out the titles of all the tracks, and list them along with the dates they were recorded. This could all be done with command-line tools or self-written Python, but it was pretty amazing to watch for a couple of minutes while the LLM coded up a complete web app which produced the output that I wanted.

Much of this has been in service of a longstanding goal of adding new kinds of detail to my hike and biking maps, such as slowing the slope or speed at each point using different colours. I stepped up my experiment and asked directly for a web app that would ingest a large GPX and output a map colour coded by speed.

Here are the results for my Dutch bike rides:

And the mechanical Bike Share Toronto bikes:

I would prefer something that looks more like the output from QGIS, but it’s pretty amazing that it’s possible. It also had a remarkable amount of difficulty with the seemingly simple task of adding a button to zoom the extent of the map to show all the tracks, without too much blank space outside.

Perhaps the most surprising part was when at one point I submitted a prompt that the map interface was jittery and awkward. Without any further instructions it made a bunch of automatic code tweaks and suddenly the map worked much better.

It is really far, far from perfect or reliable. It is still very much in the dog-playing-a-violin stage, where it is impressive that it can be done at all, even if not skillfully.