Posts on Big Muddy

New Zealand and Australia are really far apart

Mon, 27 Apr 2026 18:17:00 -0400

I think I had the general impression that New Zealand basically hugged the southeastern coast of Australia, but in fact the Kiwis are quite a bit further away from Australia than I thought. The closest points between New Zealand and Tasmania are almost 1,500 km apart (and mainland Australia is even further away).

This is about the same as the straight line distance between Toronto and Winnipeg (for Canadians), Atlanta and Boston (for Americans), or London and Warsaw (for Europeans).

The two closest major cities (i.e., cities everyone would know) are even further apart: Auckland to Sydney is about 2,150 km! This is like Toronto to St. John’s, Newfoundland, Los Angeles to Kansas City, or Rome to Helsinki.

Map by DI2000 (CC BY-SA 4.0).

How a nuclear power plant became a haven for wildlife

Sun, 26 Apr 2026 22:40:00 -0400

This Smithsonian Magazine article by Brigit Katz recounts how the American crocodile in Florida, whose numbers had dwindled to fewer than 300 by the 1970s, recovered in part due to the Turkey Point Nuclear Generating Station. The warm and relatively isolated waters of the power plant’s cooling canals are suitable for nesting and attract not just crocodiles but other wildlife, too.

It’s always fascinating to see how nature can survive and even thrive in man-made habitats. One of my favourite examples is Toronto’s Leslie Street Spit (Tommy Thompson Park), an important bird sanctuary entirely on reclaimed land—literally a rubble peninsula.

^{Hat tip to SkaldCrypto on Reddit.}

Causation does not necessarily imply correlation

Sat, 25 Apr 2026 08:48:00 -0400

Debate any subject with an empirical angle and you will inevitably run into the phrase “correlation does not necessarily imply causation”. While true, it is rarely an interesting observation, and quite often used to reflexively dismiss empirical evidence countering one’s viewpoint (even if this impulse is ultimately correct much of the time). As investor Paul Graham amusingly put it:

Whenever I see a reply mentioning that correlation isn’t causation, without fail it turns out to be saying something stupid. If they made a great seal of midwits, that phrase would be inscribed around the outer edge.

It is more interesting to note another bias making causal claims in research difficult: the fact that causation does not necessarily imply correlation, especially when human actors are involved. Economist Scott Cunningham has a great illustration of this at the beginning of his book Causal Inference: The Mixtape:

But weirdly enough, sometimes there are causal relationships between two things and yet no observable correlation. Now that is definitely strange. How can one thing cause another thing without any discernible correlation between the two things? Consider this example, which is illustrated in Figure 1.1. A sailor is sailing her boat across the lake on a windy day. As the wind blows, she counters by turning the rudder in such a way so as to exactly offset the force of the wind. Back and forth she moves the rudder, yet the boat follows a straight line across the lake. A kindhearted yet naive person with no knowledge of wind or boats might look at this woman and say, “Someone get this sailor a new rudder! Hers is broken!” He thinks this because he cannot see any relationship between the movement of the rudder and the direction of the boat.

Eroom's law

Fri, 24 Apr 2026 23:48:00 -0400

Eroom’s law (Moore’s law backwards) is a term coined by Jack Scannell et al. in 2012 to describe why drug discovery has become slower and more expensive over time. As summarized in the Wikipedia article, there are four primary causes proposed:

The ‘better than the Beatles’ problem: Many conditions already have successful therapies and improvements over these existing drugs are likely to be modest(whereas the earlier drugs were often compared against placebos).
The ‘cautious regulator’ problem: High-profile failures of drug regulation such as Thalidomide and Vioxx have are making regulators ever more risk-adverse.
The ’throw money at it’ tendency: The default response to difficulties in drug discovery is to add resources, leading to cost overruns.
The ‘basic research–brute force’ bias: Basic research has shifted toward high-throughput methods that may be nonetheless less productive (or at least overestimated in their effectiveness) than classical methods for discovering drugs that actually end up working in patients.

An additional idea (related somewhat to point #1) is that a lot of the low-hanging fruit has already been picked. While it is a somewhat circular argument, it is intuitive that drug discovery is harder because we’ve already found many of the drugs that were easy to discover.

Speaking to the broader slowdown in meaningful scientific progress (at least relative to the volume of academic outputs such as journal articles), I recall somewhat once made a similar point about the low-hanging fruit, like relatively, having already been picked. Not that relatively was easy to discover, but the point is you can only discover it once!

Maduro raid soldier arrested for insider trading on Polymarket for $400,000 score

Thu, 23 Apr 2026 23:37:00 -0400

The anonymous Polymarket trader that made over $400,000 in profit betting on Maduro’s ouster has been allegedly unmasked as special forces soldier Master Sgt. Gannon Ken Van Dyke. Van Dyke was a participant in the raid that captured the former president of Venezuela in early January. He now “faces five criminal charges for stealing and misusing confidential government information, theft and fraud.” The Commodity Futures Trading Commission, which asserts jurisdiction over prediction markets in the United States, has also filed a related civil complaint against the active duty soldier (the first such insider trading case involving prediction markets!).

We have previously discussed on this blog how prediction markets incentivize bad behaviour. The goal aggregating diffuse knowledge to produce unbiased forecasts is a lofty one, but in practice we get gambling, insider trading, and sometimes outright hostile/antisocial actions to make a bet happen.

To some, insider trading is a bug, not a feature. To quote Coinbase CEO Brian Armstrong on the subject: “If you’re actually optimizing it for a source of news, you 100% want insider trading.” (He uses the example of an admiral sitting in the Suez Canal making a bet based on military intelligence.) Is it worth knowing about events just before they happen if the mechanism is that retail traders (gamblers) get soaked over and over again?

Even the most expensive law firms are filing AI slop

Wed, 22 Apr 2026 22:50:00 -0400

Sullivan & Cromwell, one of the world’s most expensive law firms, has been caught submitting hallucinated legal citations as part of a routine bankruptcy case. It’s hardly the first time an American law firm has been caught doing this; researcher Damien Charlotin has already documented over 900 instances in the US alone.

I’m bit surprised the legal profession hasn’t uniformly adopted automated checkers by now (at the very least for hallucinated case names and quotes, interpretation is obviously harder), when the reputational damage of these errors is so significant. It seems like an obvious and achievable step for a famously conservative and detail-oriented profession. In fact, the aforementioned Damien Charlotin seems to have developed such a service himself, and I’m sure competitors exist.

ggsql: A grammar of graphics for SQL

Tue, 21 Apr 2026 07:00:00 -0400

This is pretty darn interesting new release from Thomas Lin Pedersen and team at Posit (the company behind RStudio): ggsql, a SQL-fied take on the grammar of graphics approach to data visualization made famous by ggplot2. As a veteran ggplot user myself, I will definitely be checking it out. For production-ready plots, I am not sure if it will be easier to fiddle with syntax for things like label sizes and axis ticks in SQL rather than R, but for the exploratory phase of data analysis, I can immediately see the appeal.

Japan’s Phillips Curve Looks Like Japan

Mon, 20 Apr 2026 17:20:00 -0400

Today’s post is a fun one: a working paper from 2006 entitled “Japan’s Phillips Curve Looks Like Japan”.

And indeed, it does:

(Well, as long as you reflect the plot across the y-axis, notice the plot is of -x rather than x on the x-axis.)

The Phillips curve describes the observation that inflation and unemployment have an inverse relationship in the short term (i.e., as unemployment falls, inflation rises and vice-versa).

This humorous working paper did actually lead to a full publication with the same name in 2008:

During the past 15 years Japan has experienced unprecedented, high unemployment rates and low (often negative) inflation rates. This research shows that these outcomes were predictable as part of a stable, readily recognized Phillips curve.

There is a well-known joke in economics attributed to Nobel laureate Simon Kuznets that goes something like this: “There are four types of economies: developed, underdeveloped, Japan, and Argentina.”

I guess this is one way in which Japan’s economy is very much like the rest of the world’s (at least up to 2005).

Encouraging results for mRNA therapy for pancreatic cancer

Sun, 19 Apr 2026 12:03:00 -0400

Cancer therapies based on mRNA vaccine technology have been among the most promising medical developments of the past decade. That promise is now beginning to show early signs of being realized. An extended follow-up of a phase 1 pancreatic cancer trial published last year reported striking outcomes for some patients:

Six years after treatment, Gustafson and six others who responded to the treatment are still alive, along with two of the eight people who did not respond. Two of the responders, including the one who died, had a cancer recurrence; Gustafson’s cancer has not come back.

In other words, after six years, 7/8 responders are still alive, while only 2/8 non-responders are.

Pancreatic cancer is a particularly aggressive form of cancer, with a 5-year relative survival rate of just 13%. Famously, it was the type of cancer that killed Steve Jobs. It has long been an intense target for research due to its grim prognosis and lack of progress compared to other forms of cancer.

At the same time, this remains very early evidence from a small group of patients. Phase 1 clinical trials are not primarily designed to evaluate efficacy (rather, they are designed to assess safety and establish dosing and side effects). While the difference between responders and non-responders is striking, it does not by itself show the vaccine caused the survival benefit: “responders” are defined after treatment, so they are not a proper control group.

Is the pendulum swinging back on free-range childhood?

Sat, 18 Apr 2026 10:27:00 -0400

Stephen Johnson reports in Big Think on a movement in the United States to end the fear that parents have of having Child Protective Services called on them for giving their young children some independence to roam their neighbourhoods unsupervised. In recent years, there have been several high-profile cases of parents investigated for neglect for allowing their children freedom of movement that would be considered utterly routine two decades ago.

There are many articles decrying the “helicopter parents” of today, who never let their children out of their sight. But this is rational behaviour when vague laws regarding childhood endangerment/neglect create a climate of fear: even if most people believe allowing kids independence is reasonable, all it takes is one complaint and one sympathetic social worker to create dire consequences for an entire family. This is what activists in the United States are trying to change:

The case helped persuade Georgia legislators to pass a so-called “reasonable childhood independence” (RCI) law, enacted last summer. These laws are part of a national movement to tighten vague language in states’ neglect laws. Georgia’s old law, for instance, defined neglect as the failure to provide “proper” parental care. The new law replaces that with “necessary” care and sets a higher bar for neglect: Parents must demonstrate “blatant disregard” for their child’s safety — putting them in imminent, obvious danger. The law also explicitly states that allowing a reasonably capable child to walk to school or travel to a nearby park unsupervised does not, by itself, constitute neglect.

The surprising origin of the citation system controlling academia

Fri, 17 Apr 2026 16:28:00 -0400

David Oks wrote a provocatively titled post a few weeks ago: “How citations ruined science”.

He begins by observing the tidal wave of AI slop in scientific publishing, musing:

But there’s something about all of this that puzzles me.

I get why students, for example, would want to avoid doing homework. But I don’t really understand why scientists would want to avoid doing science. Or, rather, why they’re so eager to use AI to produce a huge number of shoddy papers. No one forced them to become scientists. I imagine that most people who work as scientists chose to do so out of something like love for the subject. So why are scientists using AI to produce and submit so much garbage?

As an aside, this reminded me a bit of writer Freddie deBoer’s piece “If You Don’t Like Writing, Do Something Else” from a few years ago:

For as long as I can remember, these complaints - writer’s block, imposter syndrome, procrastination - have been key elements of writerly self-deprecation. They’re ubiquitous. And, in a sense, the author is correct to suggest that these are tools for identifying those humans who define themselves as writers. Get writers together in a room and soon they’ll be competing to be the one who likes writing the least. But none of it ever meant anything to me.

Fake stars are rampant on GitHub

Thu, 16 Apr 2026 07:00:00 -0400

This article “4.5 Million (Suspected) Fake Stars in GitHub: A Growing Spiral of Popularity Contests, Scams, and Malware” (originally posted late 2024) by He et al. has been doing the rounds lately. It exposes the rampant fraud in the GitHub “star” system, which it apparently taken quite seriously in some corporate circles (I’ve never thought of stars as anything more than a personal bookmark). Their search for fraudulent activity involved querying GHArchive, an archive of all public GitHub events, for data between 2019 and 2024.

A few of their main findings are as follows:

There was a two order-of-magnitude increase in fake stars in 2024. At the peak in July 2024, their program detected (suspected) fake star campaigns for nearly 16% of repos with ≥50 stars in that month.
Most of these repos were for short-lived malware repositories disguised as unsavoury software like crypto bots, game cheats, and pirating software. The purpose of other repos was unclear.
The majority (60%) of suspected users participating in fake star campaigns had little to no organic activity patterns.
Fake star campaigns had a small positive effect on attracting real stars for the first two months, but afterward two months they had a negative effect.

See further discussion of this article on Hacker News.

Why you can't just subsidize demand to end Canada's housing crisis

Wed, 15 Apr 2026 21:08:00 -0400

Mathieu Laberge, Chief Economist at the Canada Mortgage and Housing Corporation, has a good article out today on why you can’t subsidize your way out of Canada’s housing affordability crisis: simply helping potential homeowners with their mortgage payments ends up raising house prices for everyone. This in turn raises the price-to-income ratio for housing and deepens the housing affordability crisis overall. To avoid this outcome, you need policies to promote homebuilding and increase the housing supply beyond projected levels.

A good following on housing policy in Canada, particularly on supply-side interventions, is economist Mike Moffatt of the Missing Middle Initiative.

McDonald’s used to put the vaccine schedule on tray liners

Tue, 14 Apr 2026 21:42:00 -0400

From ProPublica’s new article on RFK Jr.’s anti-vaccine agenda, a throwback to the era before vaccines became controversial in the United States (first on the left and now on the right):

Vaccines, for decades, weren’t politically divisive. They were so uncontroversial that McDonald’s restaurants in the 1990s put the childhood immunization schedule on their tray liners.

Vaccines used to be a unifying issue with broad, bipartisan support:

When the nation’s immunization program was in trouble in the 1980s, Republicans and Democrats stepped in to save it.

An example of vote (in)efficiency in Quebec

Mon, 13 Apr 2026 21:06:00 -0400

I came across a remarkable contrast in vote efficiency in one of political analyist Patrick Déry’s recent newsletters: specifically, the case of the Parti Québécois in 1973 versus today. In the 1973 Quebec general election, the sovereignist party won just 6/110 seats in the province’s National Assembly with 30% of the vote. Today, according to projections, the party has a good shot of winning a majority with just 31% in the polls. A huge gain in vote efficiency, albeit one won over the course of half a century.

Adjusting for recalled past vote in political polling

Sun, 12 Apr 2026 22:55:00 -0400

The founder of Abacus Data, a Canadian polling firm, dropped kind of an interesting URL yesterday: abacus-weighting.com. It’s a advertisement in the form of a case study on why Abacus weights their political polls on past vote. It fits perfectly with the theme of yesterday’s post on how pollster’s get different results from the same data (the answer is they weight the raw data differently).

If you follow Nate Silver (or American political polling in general), you probably know that pollsters undercounted Trump support in all three elections where he was on the ballot. What I learned from this post is that support for the Conservative Party of Canada has been underestimated in their firm’s polling data in every polling wave for every election since 2011:

In every single wave, across every single election cycle, Conservative voters are underrepresented in our demographically weighted sample relative to their actual share of the vote. Not in most waves. Not in some elections. In every case we can observe.

Weighting for recalled past vote improves the estimate in every case, sometimes dramatically so:

In every election, past vote weighting moved our Conservative estimates upward and our Liberal estimates downward — consistently in the direction of the actual result. The 2021 election shows the most dramatic correction: a 7-point improvement in our Conservative estimate.

How do pollsters get different results from the same data?

Sat, 11 Apr 2026 22:36:00 -0400

Nate Silver linked to this throwback article from 2016 in The New York Times in his recent article on fake AI polls, which I also wrote about a few days ago. The article, entitled “We Gave Four Good Pollsters the Same Raw Data. They Had Four Different Results.” is a good reminder that modern polling diverges very far from the theoretical ideal of a simple random sample. Even after deciding on a methodology to sample participants and collecting the data, a lot of work goes into interpreting raw poll responses to give us top-line polling numbers. Every pollster needs to figure out how to weight the responses they get, since poll response rates are abysmal and variable across different demographic groups. As in the example given in this piece, these choices can result in large differences in those top-line numbers: from +4 Clinton to +1 Trump, all from the same raw data!

For an interesting follow-up: “Polling is becoming more of an art than a science”, also on Nate Silver’s Substack.

Scientists invent a fake disease, AI picks it up, other scientists cite it

Fri, 10 Apr 2026 18:27:00 -0400

A somewhat disturbing bit of reporting from Nature tells the story of bixonimania, a fake eye disease invented by Swedish medical researcher Almira Osmanovic Thunström and her team. She seeded the idea for the fake disease in a series of ridiculous, joke-filled blog posts and preprints in mid-2024.

Because AI can be overly credulous with its sourcing (how often do Google’s AI answers confident cite random Reddit posts for the bulk of an answer?), the disease got picked up as an “emerging term” by the leading chatbots. The preprints even got cited a handful of times in real publications, which is further evidence that scientists don’t read the papers they cite (I guess the modern equivalent of copying citations from other papers is having AI dredge the literature for you).

I can see AI agents being exploited by those pushing dubious medical diagnoses to flood the Internet and preprint servers with articles aimed at convincing LLMs of the validity of their positions. That is if the agents aren’t too busy spinning of websites to defame those who incur their wrath.

A data point against the idea that AI will freeze/homogenize culture

Thu, 09 Apr 2026 07:00:00 -0400

Here’s an interesting figure and accompanying passage from this 2023 preprint entitled “Machine Culture”:

The innovations generated by AlphaGo and AlphaGo Zero soon entered human culture, as shown by research comparing human gameplay before and after the algorithms’ introduction. The decision quality, as measured by an open-source variant of AlphaGo Zero, showed very little improvement in human gameplay from 1950 to 2016, followed by a sudden improvement after the introduction of AlphaGo in March 2016. However, this improvement wasn’t solely due to humans adopting strategies developed by AlphaGo. It also reflected an unexpected shift, wherein humans started developing moves that were qualitatively distinct both from previous human moves and from the novel moves introduced by AlphaGo. In summary, AlphaGo served as an early, quantifiable exemplar of machine culture, generating novel cultural variations through genuine, nonhuman innovation. This was followed by a major transition into an even broader range of traits as the result of humans building on the previous discoveries made by machines. As the methods underpinning AlphaGo have been generalized to other games and extended to scientific problems, we anticipate a continued infusion of machine-generated discoveries across diverse domains of human culture.

AI makes it easier to generate fake papers, too

Wed, 08 Apr 2026 20:09:00 -0400

Here’s a fun project from Tyler Vigen, creator of the famous Spurious Correlations page (which has been cited as a cautionary tale in many a science class). Using his database of real but spurious correlations (created by calculating the Pearson correlation coefficient r between a very large number of variables and picking out the hits), he used AI to create amusing fake manuscripts expounding on these statistical flukes as if they were real research questions.

These papers were generated in January 2024, and as previously discussed on this blog, the pipeline for end-to-end paper generation has come a long way in two years. I have no doubt Tyler could make these paper’s sound much more convincing using today’s models, though of course his goal here is to make you laugh (and think), not to trick you. But I have no doubt there will be many scholars adopting this data dredging strategy to generate “real” papers, contributing to a deluge of papers flooding the academic publishing system.

What is a public opinion poll without the public?

Tue, 07 Apr 2026 18:27:00 -0400

A few days ago, two professors (Leif Weatherby and Benjamin Recht) published an opinion piece in the New York Times calling attention to Axios publishing a story on maternal health using invented polling results:

A recent Axios story on maternal health policy referred to “findings” that a majority of people trusted their doctors and nurses. On the surface, there’s nothing unusual about that. What wasn’t originally mentioned, however, was that these findings were made up.

Clicking through the links revealed (as did a subsequent editor’s note and clarification by Axios) that the public opinion poll was a computer simulation run by the artificial intelligence start-up Aaru. No people were involved in the creation of these opinions.

The piece goes on to argue that this so-called “silicon sampling” is seductive because good public opinion polling is expensive, hard to do, and still prone to bias. But this shortcut magnifies the the problem of bias rather than solving it.

I’ve read a little bit about this strategy of using LLM-generated survey participants in the context of social science research in a series of posts (mostly from Prof. Jessica Hullman) over on Andrew Gelman’s blog:

Validating language models as study participants: How it’s being done, why it fails, and what works instead (2025-12-19)
Survey Statistics: Thomas Lumley writes about Interviewing your Laptop (2025-08-26)
When does it make sense to talk about LLMs having beliefs? (2025-08-15)
Better and worse ways to mix human and LLM responses in behavioral research (but you still have to figure what you’re measuring) (2025-06-12)
LLMs as behavioral study participants (2025-05-29)

Silicon sampling seems moderately interesting from a research perspective, but I can’t help but agree with the New York Times opinion piece authors that this will be ruinous for the already waning trust in public opinion polling. If you didn’t bother to ask the public, then why should the public care what you “find”? I think there is probably a lot of utility in using LLM samples to aid in designing and validating surveys, though.

Social media is a freak show

Mon, 06 Apr 2026 13:31:00 -0400

I quite enjoyed Nate Silver’s recent Substack post “Social media has become a freak show” (curiously, the title element of the page is “Social media is turning into a freak show”—I think the transformation has already occurred).

Nate Silver is still a Twitter power user, and yet even he acknowledges the increasing uselessness of Twitter for driving traffic to his newsletter or even just providing a forum for thoughtful engagement. I myself abandoned the platform a few years ago, having seen the direction it was heading under Elon Musk. My impression is that the utility of Twitter in most domains is asymptotically approaching zero, with a handful of exceptions (I will occasionally lurk for AI news, as the discussion is still robust, if polluted with a ton of low-quality bot or bot-like replies).

The rest of the social media ecosystem isn’t much better. Bluesky has declining engagement, probably because it has replicated Twitter’s old schoolyard dynamics on steroids. Facebook hasn’t been relevant for years, and I have no idea what it’s even for anymore if not connecting with your friends (I haven’t had an account in many years). Instagram might still be fun, though I have no idea because I’ve never used it. But it’s certainly not a place where “the discourse” happens.

How effective are Amber alerts?

Sun, 05 Apr 2026 08:15:00 -0400

A few weeks ago, I experienced a situation familiar to many Canadians, described in this article from Jonathan Jarry of McGill University’s Office for Science and Society:

On Sunday, March 22nd of this year, a large swath of the population in Quebec was woken up at 4:25 as cell phones lit up and screamed. An Amber alert had been broadcast. Less than four hours later, the two missing children were thankfully found, unharmed, and the alert was cancelled.

Thankfully, my iPhone respects silent mode and only vibrated forcefully, but apparently not all phone brands respect this setting. Unlike in the United States, Amber alerts to cell phones in Canada cannot be disabled.

The statistics regarding child abductions and Amber alerts discussed in this article are equal parts comforting and disconcerting. For example, most children who are the subject of an Amber alert are recovered unharmed:

a study published a decade ago and looking at 448 Amber alerts in the U.S. revealed that over 95% of the children had been recovered alive and nearly 90% recovered alive and without physical harm, sexual abuse, of withholding of needed medical care during the abduction. Even when Amber alerts don’t trigger a helpful tip, the child is usually found.

Other research from the United States indicates the Amber alert plays a part in the recovery about 25% of the time. However, they may be issued too late to prevent the worst outcomes:

The definition of "agent"

Sat, 04 Apr 2026 23:59:00 -0400

An interesting exchange between Guido van Rossum and Andrej Karpathy a few days ago on Twitter:

Guido van Rossum: I think I finally understand what an agent is. It’s a prompt (or several), skills, and tools. Did I get this right?

Andrej Karpathy: LLM = CPU (data: tokens not bytes, dynamics: statistical and vague not deterministic and precise) Agent = operating system kernel

The triumph of the data raccoons

Fri, 03 Apr 2026 23:40:00 -0400

My PhD co-supervisor at the University of Toronto, Dr. David Fisman, liked to use the term “data raccoon” to describe the work of using messy, incomplete, hard-to-work-with data to do serious research. Or, as he described it in testimony to the Canadian House of Commons in May 2020 (emphasis mine):

I’ll tell you, my group at University of Toronto call ourselves “data raccoons”, because we’ve sort of managed to thrive for about 15 years on data that most people regard as garbage, so it’s sort of a bit of the normal state of affairs for us with public health data analysis.

It’s an unmistakably Toronto metaphor—the city isn’t called the raccoon capital of the world for nothing!

But now the data raccoons have gone and taken over the world. The basis of the AI revolution is vast quantities of text dredged from the Internet, almost none of it written for the purpose of training the deus ex machina.

Arguably the most important dataset for training LLMs has been Common Crawl, a mostly uncurated archive of the web that has been running since 2007. According to a Mozilla report from 2024, Common Crawl was used in two thirds of LLMs developed in the formative period between 2019 and 2023, and the archive also comprised 80% of the tokens in OpenAI’s GPT-3. Unsurprisingly, the Common Crawl Foundation has received financial support from AI companies in recent years, while also facing accusations that it helped those same companies train their models on paywalled articles.

Andrew Gelman's blog schedule

Thu, 02 Apr 2026 16:24:00 -0400

Andrew Gelman, professor of statistics at Columbia University, runs one of my favourite blogs on the Internet. He has been writing there for over 21 years, since October 2004. Many of his collaborators also contribute to the blog, but he is the primary author. In a 2024 post celebrating 20 years of blogging, Gelman mentions having over 12,000 posts. This is a cadence of over 1.6 posts/day sustained for two decades!

One of the more unusual things about Gelman’s blog is that most posts are not particularly topical. Sure, many posts are time-sensitive, posting about upcoming events or commenting on recent publications (like doing damage control on deeply flawed papers like to receive attention). But there is generally one non-topical post each day. A line in a recent post caught my eye:

As regular readers know, our posts are usually on a 6-month lag, but this one is so important I had to share it with you right away.

As a regular reader myself, I was aware of the delayed posting schedule, but out of curiosity, I wanted to see how far back this habit went. Here’s the rough timeline I came up with:

In 2011, Gelman wrote that his “non-topical blog entries are on approximately one-month delay”.
In 2012, he referred to “stacking up posts here with a roughly one-month delay”.
In 2014, he said that “most of the posts here are on a 1 or 2 month delay.”
In 2016, he casually mentioned “our 2-month delay”.
Later that year (August 2016), in a post literally titled “My next 170 blog posts”, he said he had filled “the blog through mid-January” and had “170 blog posts in the queue.”
By 2018, he mentioned the blog was “mostly on a six-month delay”.
In 2019, he referred to “our 6-month blog delay.”
In 2022, he wrote: “Usually I schedule these with a 6-month lag, but this time I’m posting right away”.
In February 2026, he said the “current end of the blog queue is in early July”.
Then, in April 2026, came the latest “usually on a 6-month lag” remark.

It seems the blog had about one month of content in the publishing pipeline by 2011, ramped up to one to two months by 2014, two months by early 2016, and finally jumped to six months by August 2016, where it been ever since. Quite the arsenal of scheduled content!

Testing ZeroClaw, Part 2.5: ZeroClaw is alive!

Wed, 01 Apr 2026 23:59:00 -0400

Yesterday, I wrote about how the ZeroClaw GitHub repository had been down for two days with little explanation. Earlier today, the project provided a little more information on Twitter:

They flagged our org which is why we’re down. Code is safe and we’re still working, just waiting for @github

Since March 30 (the day after their repo started 404ing), they project has been promising a blog post to explain the situation. As of now, that post is now available:

Over the past few days, a maintainer used aggressive AI automation to review and merge PRs:

Merges went through that shouldn’t have.

In the process of trying to undo the damage, the maintainer’s GitHub account was flagged, which triggered enforcement actions on the ZeroClaw org itself.

That maintainer has been removed from the project.

This sounds strikingly similar to the incident that occurred about a month ago, which I also mentioned in yesterday’s post:

Earlier today, during routine maintenance, the visibility of the ‎`zeroclaw-labs/zeroclaw` repository was accidentally changed from public to private and was later restored to public.

After reviewing the GitHub API audit logs and collecting detailed feedback from our engineers, we confirmed that the incident was caused by improper use of an AI agent tool during maintenance.

Obviously, the use agentic workflows in open source projects is an emerging field where best practices have not yet been established. The case of ZeroClaw should be a warning to other projects to keep human review in the loop, or at least to limit the autonomy of agents when a project has numerous contributors. As they say in their blog post:

Testing ZeroClaw, Part 2: ZeroClaw is dead?

Tue, 31 Mar 2026 20:57:00 -0400

Earlier this month, I wrote about setting up one of the many lightweight OpenClaw alternatives, namely ZeroClaw. I had some issues with initial setup, but I got to the point where I could talk with my bot over Telegram.

Some of my initial enthusiasm for ZeroClaw was dampened by the divergence between the docs and the features available in the release build. The release build was quite out of date due to the breakneck pace of development. In the week or two following my initial setup, the release build pipeline was broken, so even when they released a new tag, there were no new precompiled binaries available. Being forced to compile the Rust binary yourself kind of goes against the project’s philosophy of ultra-low resource consumption.

They eventually fixed the release pipeline and I started casually working on a system where I could send notes and ideas for blog posts to my bot through Telegram and have it turn them into structured Markdown files.

But two days ago (March 29), I noticed that the ZeroClaw GitHub repo was 404ing. On the same day, the project posted the following on Twitter:

Our GitHub repo is currently returning a 404 for some users. We’re aware and actively investigating. The repo is public and all code is safe.

One important fact about for-profit plasma donation

Mon, 30 Mar 2026 21:51:00 -0400

For-profit plasma donation is in the news today in Canada. Two people recently died after giving plasma at Grifols for-profit plasma clinics in Winnipeg, Manitoba, although Health Canada has yet to find a link between the plasma collections and the deaths. Today, it was reported that a Grifols clinic in Calgary, Alberta was found non-compliant during an inspection in December 2025:

The inspection found the Calgary centre didn’t accurately assess whether donors were suitable, didn’t collect blood according to its Health Canada authorization, didn’t thoroughly investigate errors and accidents, and didn’t carry out sufficient corrective and preventative actions.

This is obviously a problem for for-profit plasma collection in Canada, where the practice is already controversial. Paid plasma collection is illegal in Canada’s three largest provinces: Ontario, British Colombia, and Quebec, though Ontario allows a few for-profit clinics to operate through an agreement with Canadian Blood Services, Canada’s independent blood authority. British Colombia and Quebec together make up over 35% of Canada’s population; including Ontario, it’s nearly 80%. Besides Ontario, for-profit clinics exist in some other smaller provinces.

Vocal advocacy exists against paid plasma collection, leading to municipal resolutions against the practice in Ontario, even as clinics open. This advocacy is often premised on the fear that paid plasma will undermine voluntary donations. To my mind, the central fact in the for-profit plasma collection debate is that only a handful of countries are self-sufficient in plasma collection, and all of them allow for paid plasma collection. They are: the United States, Germany, Czechia, Austria, and Hungary (Egypt may have also recently joined the list). While other countries, like Canada, may have achieved self-sufficiency for plasma for direct infusion, no other country can meet its own needs for plasma-derived medical products. The world relies on a small number of self-sufficient countries, primarily the United States, to meet the demand for plasma products.

How to avoid cognitive surrender to AI

Sun, 29 Mar 2026 22:59:00 -0400

I am sharing a thoughtful article today from Alex Panetta’s A.I. For You on avoiding over-reliance on AI: “cognitive debt”, “epistemic debt”, or “cognitive surrender”.

A particularly interesting nugget regarding the “Your Brain on ChatGPT” article from the MIT Media Lab (yes, that MIT Media Lab):

The paper is even written to get LLMs to read it carefully. The paper carries instructions telling LLMs which section to read first, which appears to be a clever way to force relevant context atop the context window, as LLMs tend to best remember the beginning and end of conversations — not the middle.

Opt out of very new Python package versions with uv

Sat, 28 Mar 2026 08:43:00 -0400

In light of several recent Python package compromises (litellm, telnyx), here is a useful tip from Hacker News commenter mil22:

For those using uv, you can at least partially protect yourself against such attacks by adding this to your pyproject.toml:

[tool.uv]

exclude-newer = "7 days"

or this to your ~/.config/uv/uv.toml:

exclude-newer = "7 days"

This will prevent uv picking up any package version released within the last 7 days, hopefully allowing enough time for the community to detect any malware and yank the package version before you install it.

Commenter notatallshaw follows up with how to achieve similar behaviour in *pip*:

Pip maintainer here, to do this in pip (26.0+) now you have to manually calculate the date, e.g. –uploaded-prior-to="$(date -u -d ‘3 days ago’ ‘+%Y-%m-%dT%H:%M:%SZ’)"

In pip 26.1 (release scheduled for April 2026), it will support the day ISO-8601 duration format, which uv also supports, so you will be able to do –uploaded-prior-to=P3D, or via env vars or config files, as all pip options can be set in either.

Colorado advances ban on algorothmic price and wage discrimination

Fri, 27 Mar 2026 17:48:00 -0400

The Colorado House voted today to ban the use of personal data to algorithmically set the price of a product or determine a wage. The legislation will now advance to the Colorado Senate for consideration. The summary of the bill, HB26-1210, reads:

Surveillance data is defined in the bill as data that is obtained through observation, inference, or surveillance of consumers or workers and that is related to personal characteristics, behaviors, or biometrics of an individual or group. The bill prohibits discrimination against a consumer or worker through the use of automated decision systems used to engage in:

Individualized price setting based on surveillance data regarding a consumer; or

Individualized wage setting based on surveillance data regarding a worker.

Obviously, the bill enumerates exceptions to the above rules, as it is not intended to ban, for example, charging a customer more to deliver an item a longer distance nor to prohibit schemes like discounts for students or seniors. One of the challenges of writing laws like this is to ensure they are written narrowly enough to target dystopian hyper-individualized pricing based on tracking of Internet and phone activity rather than normal business practices like pricing insurance policies according to demographic risk factors.

Colorado is one of at least a dozen American states considering similar bans. I don’t believe any of these proposed broad-based bans have been signed into law yet. I wrote about algorithmic price discrimination (surveillance pricing) last week in the context of proposed legislation in the Canadian province of Manitoba.

How SARS-CoV-2 variants get named on GitHub

Thu, 26 Mar 2026 07:00:00 -0400

Bioinformatics has long been an unusually collaborative and transparent field, with genomes, protein structures, and other complex biological data habitually deposited into open databases during the course of research. The situation was no different at the outset of the COVID-19 pandemic, when a small group of scientists developed the Pango nomenclature for classifying variants of the SARS-CoV-2 virus. Outside of a handful of Greek-letter “variants of concern” names assigned by the World Health Organization, the Pango nomenclature is the standard for tracking the evolution of the SARS-CoV-2 virus. You may recall names such as B.1.1.7 (Alpha or the UK variant), B.1.351 (Beta or the South African variant), and P.1 (Gamma or the Brazilian variant). You can see a complete list of active SARS-CoV-2 lineages using the Pango nomenclature here.

By August 2020, the work of defining new lineages of SARS-CoV-2 had moved to GitHub, where the scientific process could happen in transparent and collaborative way. The definition of new lineages happens on proposals submitted as GitHub issues. In May 2023, a second GitHub repository was opened to move discussions of smaller or less clear lineages out of the main repository. These discussions can be promoted to the main repository, as this issue tracking LP.8.1 sub-lineages was in May 2025.

The work of defining new lineages of SARS-CoV-2 continues to this day on the GitHub repository, as the virus continues to mutate and evolve. And bioinformatics continues to be a shining beacon for open science for the rest of us to learn from.

Prediction markets are coming to Canada

Wed, 25 Mar 2026 21:00:00 -0400

(Archive link to this story)

Wealthsimple is a fintech company at the forefront of a lot of innovation in Canada’s personal finance sector since the company’s founding in 2014. Notably, Wealthsimple was the first broker in Canada to offer zero-commission trades, back in 2019. In 2020, they started offering the ability to trade crypto. In 2025, they launched zero-commission options trading. This year, the company received regulatory approval to bring prediction trading to Canada.

Unlike in other parts of the world, prediction markets have not flourished in Canada and have been considered basically illegal since a 2017 ruling from Canada’s federal securities regulator. Wealthsimple has been able to get around this ruling by only offering contracts on a narrow set of questions:

Despite a 2017 ruling that largely banned these kinds of short-term, yes-or-no contracts, certain regulated firms that are CIRO members are able to offer certain types of “event contracts,” […] The approval for Ontario-based Wealthsimple permits it only to offer contracts tied to economic indicators, financial markets and climate trends, the company confirmed – not sports or elections, which are among the most popular uses of prediction markets in the United States.

Wealthsimple has driven innovation in the Canadian personal finance sector; however, their new product offerings over the last few years seem to be speedrunning the Robinhood trajectory toward high-risk, high-volatility trading and away from their traditional niche of broad, diversified funds/ETFs for ordinary people to set-and-forget. This pivot can be understood as part of a broader trend toward the casinofication of everything, which took off with crypto and the legalization of online sports betting.

Will AI help Canadian police counter a tsunami of fraud?

Tue, 24 Mar 2026 21:48:00 -0400

Zak Vescera, writing for the Investigative Journalism Foundation, observes that fraud cases reported to Canadian police has more than doubled between 2013 and 2024:

At the same time, the number of cases cleared by Canadian police has fallen. In 2013, the ratio between reported cases and cleared cases was about 3:1; by 2024, this ratio was over 9.5:1.

The vast majority of fraud cases go unsolved. This is unsurprising given that many are perpetrated over the Internet by individuals overseas and involve methods of sending money that are difficult to recover, such as crypto, gift cards, and physical transfers of cash.

In response, the National Cybercrime Coordination Centre (NC3) of the RCMP—Canada’s national police service—have built a case management system and data portal they hope will eventually be adopted by all Canadian police forces. According to the article, this system is aimed at improving coordination, data sharing, and analysis. The platform will also host a set of AI tools, though the RCMP is vague on details and which are currently implemented. The article gives a few examples: OCR allowing victims to scan gift cards used in fraud rather than typing numbers manually, a tool to classify reports to help police target their investigative resources, and a report generator to simply data sharing when investigations go international.

Vandalism of OpenStreetMap

Mon, 23 Mar 2026 17:26:00 -0400

OpenStreetMap (OSM) is an open, community-driven map database powering countless apps and services and used by organizations including Amazon, Apple, Microsoft, Uber, Mapbox, and Wikimedia. In short, it is foundational infrastructure for the web. For regions with active communities (particularly in Europe), OSM is often noted for the superiority of its data on features such as cycling routes, hiking trails, and footpaths.

The Wikipedia article for OpenStreetMap documents several instances of data vandalism, which OSM is vulnerable to as a crowdsourced project. Three incidents stood out:

In 2012, Google fired two “rogue contractors” for vandalizing the OSM database, intentionally adding false data such as reversing the direction of one-way streets.
In 2018, a vandal made several viciously antisemitic edits to place names around New York City. While quickly reverted at the source, these changes nonetheless propagated into downstream applications pulling data from MapBox, such as Zillow, Snapchat, Citibike, and Wikipedia.
Users of the mobile game Pokémon GO regularly vandalize the OSM database underlying the game to gain a gameplay advantage, although the authors of the research article on this subject note this vandalism tends to be transitory rather than sustained.

Side note: I was amused to note how strong Google’s regional results bias is for “OSM”—the entire first page is taken up by results related to the Orchestre symphonique de Montréal.

Properly the work of federal public health agencies

Sun, 22 Mar 2026 23:38:00 -0400

One of the reasons I started this blog was to have a place to put down posts and articles that have lodged themselves in my brain. The wind-down announcement of the COVID Tracking Project, a volunteer-led COVID-19 data tracking collaboration, is one such article.

But the work itself—compiling, cleaning, standardizing, and making sense of COVID-19 data from 56 individual states and territories—is properly the work of federal public health agencies. Not only because these efforts are a governmental responsibility—which they are—but because federal teams have access to far more comprehensive data than we do, and can mandate compliance with at least some standards and requirements.

After one year of work, the COVID Tracking Project decided to quite collecting data on COVID-19 in the United States, because they recognized that the work of collecting a comparable, national-level dataset was the responsibility of federal government agencies.

As someone who co-led the COVID-19 Canada Open Data Working Group, which curated COVID-19 data for Canada until the end of 2023, I think about this article a lot. It’s a good read, and it speaks to how essential open data was to filling in the gaps in the national and international understanding of the COVID-19 pandemic.

For map nerds only: An atlas of world history

Sat, 21 Mar 2026 22:39:00 -0400

I am sharing today TimeMap.org: an atlas of regions, rulers, people, and battles throughout history. Thoroughly enjoyable to swipe through, especially for connoisseurs of the map game genre.

^{Hat tip to agilek on Hacker News.}

Fight club at the bird feeder

Fri, 20 Mar 2026 07:00:00 -0400

Alternate title: Blue Jay brutally feeder mogs Tufted Titmouse

From the Cornell Lab of Ornithology, a pretty neat article about dominance hierarchies at the bird feeder using over 7,600 observations collected by citizen scientists contributing to Project Feeder Watch. Essentially, bird watchers reported instances when one bird species successfully displaced another at the bird feeder, and used this network of comparisons to build a dominance hierarchy. By using information contained within the network, you can even compare birds that are rarely observed together. Not all dominance patterns are linear, however, as the article reports:

A separate analysis uncovered some dominance triangles in which three birds had one-to-one relationships independent of each other, like a game of birdy rock-paper-scissors. For example, the House Finch dominates the Purple Finch, and the Purple Finch dominates the Dark-eyed Junco, but the junco dominates House Finch.

The full paper is here: Fighting over food unites the birds of North America in a continental dominance hierarchy.

This work is reminiscent of network meta-analysis, in which three or more interventions (e.g., drugs) are compared using both direct and indirect evidence. For example, if there are studies comparing drug A versus drug B and drug B versus drug C, we can infer the comparison between drug A and drug C, even if no study has ever directly compared them.

Make buses faster and more reliable by having fewer stops

Thu, 19 Mar 2026 07:30:00 -0400

This fascinating article by Nithin Vejendla in Works in Progress makes the case that bus networks would benefit from bus stop balancing: having fewer stops spaced further apart. This is especially true in the United States where stops tend to be only 700–800 feet (roughly 210–240 metres) apart. While having many bus stops theoretically improves access to the transit network, it also means that buses are slower (more time is spent accelerating, decelerating, and loading/unloading passengers) and less frequent, which reduces where you can actually go in a fixed amount of time, as well increasing the variability in the time it takes to get there.

The biggest problem holding back public transit in North America is that it is unreliable, and bus stop balancing is a rare policy solution that offers improved service without having to spend more. With fewer stops, the same number of buses can complete the same route faster and with greater frequency. This stops a single missed or delayed bus from ruining your plans or forcing you to build in extra time.

A research study from my city of Montreal even gets a shout out. As a big public transit user, I avoid buses where possible in favour of the metro and walking, because these modes of transportation tend to be much more reliable and less variable when it comes to the question of “how long will it take for me to get from point A to point B”. Stop balancing could go a long way toward addressing one of the main complaints about public transit: too many routes are not frequent or reliable enough to let riders stop worrying about the schedule.

Manitoba introduces bill to ban algorithmic price discrimination

Wed, 18 Mar 2026 07:30:00 -0400

The Canadian province of Manitoba has introduced a bill to ban algorithmic price discrimination (also known as surveillance pricing), i.e., the use of personal data to set prices for individual consumers:

New Democrats announced in December they would begin cracking down on what’s known as differential or predatory pricing. That is when retailers charge different amounts for the same products based on the timing of customer purchases, where they live or other personal data. […] The proposed legislation would render the use of “personalized algorithmic pricing,” both online or in store, an unfair business practice.

Okay, I guess there’s a lot of different names for this particular practice. Whatever we call it, I believe bills cracking down on algorithmic price discrimination will be very popular, as it constitutes a very clear example of companies using our data against us to rip us off. The most famous recent exposé of this practice is Groundwork Collaborative’s report on how grocery delivery service Instacart charges users different prices depending on who they are.

Manitoba isn’t the only jurisdiction introducing bills targeting this practice, but I don’t believe anywhere in the US or Canada has actually managed to ban it yet. However, New York has made in mandatory for companies to disclose when they use personal data to set prices.

Prediction markets incentivize bad behaviour

Tue, 17 Mar 2026 18:19:00 -0400

The Times of Israel journalist Emanuel Fabian is claiming that Polymarket gamblers (sorry, “traders”) have threatened his life over a report he released about an Iranian missile attack on Israel on March 10. According to the rules, this bet resolves as true if Iran strikes Israel using a drone, a missile, or an air strike on this date. At issue here is this specific rule:

Missiles or drones that are intercepted and surface-to-air missile strikes will not be sufficient for a “Yes” resolution, regardless of whether they land on Israeli territory or cause damage.

On March 10, Fabian reported a single missile had hit an open area outside the Israeli city of Beit Shemesh; he included in the report a video of the strike. This would resolve the bet as “Yes”. Evidently, holders of “No” shares would very much like him to change his report to say that the missile was intercepted, which would resolve the bet as “No”, according referenced above. This bet has seen more than 23 million USD in trading volume.

If you look at the vitriol in the comments of the bet on Polymarket, I have no trouble believing people would send threats to a journalist demanding him to change his story, whether out of desperation to change their fortunes or just in an attempt to be edgy.

Some insight into writing a book using Quarto

Mon, 16 Mar 2026 20:48:00 -0400

Prof. Kieran Healy (Sociology, Yale University) shares some nice insight into the process of writing a book in Quarto using R in this post. The output screenshots he shares look beautiful, and the idea of deploying the same content as a clean PDF and a responsive website is awesome. A full draft of the book, Data Visualization: A Practical Introduction (Second Edition), is available as a website here.

I have grown increasingly tired of writing in any format other than a plain text file I can easily version control and move around, so the idea of writing a book in Quarto is appealing to me (as long as it has enough technical content to justify the format).

Using Claude Claude for cross-package statistical audits

Sun, 15 Mar 2026 22:49:00 -0400

Economist Scott Cunningham shared an important example of why we should always report the statistical package and version used in our analyses, as he used Claude Code to produce six versions of the exact same analysis using six different packages in R, Python, and Stata. In a difference-in-differences analysis of the mental health hospital closures on homicide using the standard Callaway and Sant’Anna estimator (for DiD with multiple time periods), he got very different results for some model specifications.

Since the specifications and the data were identical between packages, he discovered the divergences occurred due to how the packages handled problems with propensity score weights. Packages were not necessarily transparent about issues with these weights. If you were not running multiple analyses and comparing results across packages, or else carefully checking propensity score diagnostics, you might never have realized how precarious your results were.

Prof. Cunningham closes with the following advice:

The fifth point, and the broader point, is that this kind of cross-package, cross-language audit is exactly what Claude Code should be used for. Why? Because this is a task that is time-intensive, high-value, and brutally easy to get wrong. But just one mismatched diagnostic across languages invalidates the entire comparison, even something as simple as sample size values differing across specifications, would flag it. This is both easy and not easy — but it is not the work humans should be doing by hand given how easy it would be to even get that much wrong.

Getting citizenship just got a lot harder for those of Italian descent

Sat, 14 Mar 2026 22:18:00 -0400

Many people in the Americas would probably be surprised to learn that, in much of the rest of the world, being born in a country does not by itself make you a citizen. In most of the Americas, citizenship is automatically granted on the basis of jus soli (“right of soil”): birth on the territory. Elsewhere, citizenship is more often based on jus sanguinis (“right of blood”): descent. This is the case in most of the EU.

Citizenship in an EU country is considered unusually desirable because of the mobility rights and powerful passport it confers. However, the rules concerning exactly what kind of descent confers citizenship varies widely among member states. Italy used to be considered among the easiest, requiring only that an applicant prove they had an Italian ancestor alive after March 17, 1861, when the Kingdom of Italy was founded. That changed last year, when the country passed a new law significantly tightening the requirements for citizenship, which was recently upheld by the country’s Constitutional Court. The new law brings requirements more in line with norm among EU member states:

Now, only individuals with at least one parent or grandparent born in Italy will automatically qualify for citizenship by descent. The amended law does not affect the 60,000 applications currently pending review. Additionally, dual nationals risk losing their Italian citizenship if they “don’t engage” by paying taxes, voting or renewing their passports.

geoBoundaries: An open database of political administrative boundaries

Fri, 13 Mar 2026 17:05:00 -0400

Today I discovered geoBoundaries, a CC BY 4.0-licensed database of political administrative boundaries covering the entire world. It is notable for its high level of detail, going from ADM0 (country), ADM1 (states/provinces), ADM2 (counties/departments or municipalities), to ADM3 (municipalities or sub-municipalities) for many countries. My go-to source for world map files is Natural Earth, which is limited to ADM0 and ADM1 but is in the public domain. Natural Earth also includes some physical geography like water and bathymetry, while geoBoundaries is focused solely on political administrative boundaries. Both datasets deal with disputed boundaries, which is an endless source of tension in the Natural Earth GitHub.

An R package for retrieving data from geoBoundaries, geobounds, was released in February. A similar package for Natural Earth, rnaturalearth, has long been maintained by rOpenSci.

Open banking comes to Canada

Thu, 12 Mar 2026 22:03:00 -0400

Canada’s banking sector is legendarily stable. However, this stability comes at the cost of innovation. Canada lags behind peers such as the EU, UK, US, and Australia in an area I care a lot about: open banking.

The premise of open banking is that consumers should be free to share their financial data with the third parties of their choosing, such as a budgeting app.. I have been following open banking in Canada for years now, ever since I started closing tracking my own finances. For a long time, I have been looking for a better way to export transactions than logging into my bank’s website and manually downloading a CSV file representing a certain time range.

Over the years, people have tried to solve this problem by writing third-party packages to retrieve data from specific banks. However, these packages were fragile and prone to breaking, and they usually relied on you providing your full account credentials, granting them to ability to impersonate a login to your account. Shockingly, this is actually the default security model for Canadian fintech companies: even a humble budget app must be given your username, password, and (implicitly) the ability to take any action on your behalf. Needless to say, this is at best a grey zone for liability, since you are willingly handing over the keys to the kingdom to a third party.

The other half of the ATM–bank teller story

Wed, 11 Mar 2026 23:49:00 -0400

David Oks had a great post yesterday on the classic parable of how the adoption of ATMs did not lead to the predicted job losses among bank tellers. In fact, the opposite occurred: the number of bank tellers rose. I heard this story recounted several times in early discussions I had about the anticipated effect of AI on labour. I think I first heard it from Ryan Khurana. More recently it has been trotted out by US Vice President JD Vance.

The problem with this story is that the key statistic quoted alongside it, namely that there are more bank tellers than ever before, is no longer true. The famous graph supporting this assertion stops in 2010, and with good reason: the number of bank tellers has sharply fallen since then.

I think I had come across this fact before, this second half of the famous ATM–bank teller story, but it wasn’t until I read David Oks’s post that I understood the reason behind it. Quite simply, mobile banking ate physical banks. The ATM didn’t reduce the demand for bank tellers because it simply changed the kind of labour they did inside the bank. The iPhone made it so we didn’t need to go to the bank at all. It changed the paradigm. Explained this way, it seems obvious. Many new banks (including my own) do not have physical locations and never did.

What will the paper of the future look like?

Tue, 10 Mar 2026 23:48:00 -0400

I am sharing today a short blog post by the Institute for Replication: “What will the paper of the future look like?”

In short: research looking more like software development (as presaged by Prof. Richard McElreath, author of the excellent Statistical Rethinking), with the ability to reuse common material, formalize results, and remix analyses built into the pipeline.

Changes in acetaminophen use after the White House Tylenol briefing

Mon, 09 Mar 2026 18:17:00 -0400

Back in September 2025, US President Donald Trump and Health and Human Services Secretary Robert F. Kennedy, Jr. held a White House briefing linking Tylenol (acetaminophen, or paracetamol to Europeans) use in pregnancy to autism. A new study in The Lancet looks at what happened to acetaminophen prescriptions during emergency room encounters for pregnant females aged 15–44. They used data from a large database covering over 1,633 hospitals and 37,000 clinics.

Here is panel A from the figure in the study, with the vertical dashed line marking the date of the White House briefing (September 22, 2025) and the other dashed lined showing the expected prescribing rates (compared to the observed ones).

Canada exports a lot of coal, but not for power generation

Sun, 08 Mar 2026 14:05:00 -0400

This provocatively titled piece in the The Hub (“Why the world needs even more Canadian coal”) made me realize I know very little about one of Canada’s most important exports: coal.

Coal is often villainized because it is incredibly dirty way of generating power. I vaguely recall an article from maybe 20 years ago claiming something along the lines of “if everyone in Canada replaced their incandescent bulbs with energy-efficient ones, the greenhouse gas savings would be cancelled out by a single coal plant that China is building every [some shockingly short amount of time]”. Although, China’s dependence on coal for power has been falling for the past two decades.

It turns out LLM-assisted search is fantastic for finding these half-remembered quotes. Here is the exact article and quote I was remembering, from a 2008 Macleans magazine article (I was pretty close):

Even if every household in the U.S. screwed in an energy-efficient light bulb today, the savings in greenhouse gas emissions would be wiped out by fewer than two medium-sized coal plants - the kind of plant that is being built in China at a rate of one a week.

But coal is also used to make most of the world’s steel (“metallurgical coal”), and this is the kind of coal that Canada (or specifically, British Columbia) overwhelmingly exports. The article goes on to claim that Canada’s production of metallurgical coal is among the cleanest (by greenhouse gas emissions) in the world.

Open By Default: A database of access to information requests to the Canadian government

Sat, 07 Mar 2026 14:32:00 -0500

In Canada, any person or corporation in the country can make a request for general records to any agency of the federal government through the Access to Information Act (the equivalent in the United States is the Freedom of Information Act). The government provides a searchable database of completed requests, but includes only a summary of the request and the number of pages of responsive material. The actual documents turned over are not included. However, completed request packages may be informally re-requested, and should you do so, someone from the relevant agency will (usually) send them to you eventually.

This re-request process has its limits. It can takes weeks or months for the documents to be sent, and the database itself only goes back to January 2020 (they used to delete records older than two years, but stopped doing this some time after 2020). Occasionally, they will never send the documents at all, and all you can do is either re-request them again or open a formal access to information request (which will cost you $5).

Making it easier to access completed access to information requests is why the Investigative Journalism Foundation built Open By Default, “the biggest database of internal government documents never before made publicly accessible”. It includes documents from completed access to information requests obtained using both automated (presumably the re-request form) and manual processes (donations from trusted partners, particularly of documents from before the online re-request form was available). The files are cleaned and OCRed into one beautiful, searchable database.

The surprising whimsy of the Time Zone Database

Fri, 06 Mar 2026 21:07:00 -0500

Time zones are hard. As a well-known Computerphile video so eloquently puts it:

What you learn after dealing with time zones, is that what you do is you put away your code, you don’t try and write anything to deal with this. You look at the people who have been there before you. You look at the first people, the people who have dealt with this before, the people who have built the spaghetti code, and you thank them very much for making it open source, and you give them credit, and you take what they have made and you put it in your program, and you never ever look at it again. Because that way lies madness.

The Canadian province of British Columbia recently decided to switch to permanent daylight time. I wanted to see if this update made it to the IANA Time Zone Database yet. Luckily, we can now view updates to this database as commits on GitHub. And there it was in the news file!

I’ve perused the tz repository before, and I always learn something interesting. For example, during WWII Britain adopted double summer time, adding two hours to the clock in the summer and one hour in the winter. The bulk of the comments in the database are dedicated to documenting this extensive history of time zone changes across the world.

Editors hate this one weird trick

Thu, 05 Mar 2026 20:05:00 -0500

Given my recent posts on AI in academic publishing, I just wanted to share this joke from Prof. Arthur Spirling on Twitter:

Actually you cant run my paper through Claude to desk reject it because Claude is a regular coauthor of mine. Conflict of interest. Checkmate, editors

Homeownership rate doesn't mean what you think it does

Wed, 04 Mar 2026 20:15:00 -0500

This thread from demographer Lyman Stone on the definition of the US homeownership rate has stuck in my head for a couple of years now. Reading it produced a pretty profound “oh” for why this particular metric didn’t line up with my perception of the issue.

To put it simply, the definition of the homeownership rate is:

Take the number of households where the home is owned by the household head, divide by the total number of households.

The homeownership rate is based on households, not individuals. If an adult child lives with their parents (and their parents own their own home), they are counted as “homeowners” for the purpose of the homeownership rate. If more and more people in their 20s and their 30s move in with their parents (or never move out in the first place) rather than renting an apartment, this has the effect of increasing the homeownership rate, because you have reduced the denominator (number of households) without changing the numerator (number of owner-occupied households).

Canada uses the same definition:

The homeownership rate refers to the proportion of all households that are owner occupied.

The productivity shock coming to academic publishing

Tue, 03 Mar 2026 19:33:00 -0500

Today, I wanted to share this piece from economist Scott Cunningham (Baylor University), who wrote about how AI is widening the gap between research and publishing. Or, in economics terms (emphasis mine):

But what happens when the same productivity shock hits a system where the bottleneck was never really production in the first place, but rather was a hierarchical journal structure that depended immensely on editor time, skill, discretion and voluntary workers with the same talents called referees for screening quality deemed sufficient for publication?

The post mentions the Autonomous Policy Evaluation project—the end-to-end AI paper pipeline I wrote about a few weeks ago—and discusses the likely consequences of this flood of AI-generated papers. Assuming the number of publication slots in reputable journals is relatively fixed, AI-generated papers should add a very large amount of mass to the left side of the paper quality distribution. Acceptance rates will plummet and journals may rely on other signals of quality (name recognition, pedigree, institution) to thin the herd before actually reviewing content. As always, the rich get richer!

But this is imperfect, not to mention unfair, and so desk rejection gets noisier: good papers get killed by tired editors and marginally lower quality papers slip through to referees. It’s a cascading failure: volume breaks editors, broken editing wastes referees, wasted referees slow science.

Testing ZeroClaw, Part 1: Setup

Mon, 02 Mar 2026 19:15:00 -0500

As mentioned last week, I’ve been meaning to test out a personal agent from the Claw-like ecosystem. I settled on testing out Zeroclaw, a popular and lightweight OpenClaw alternative that should run well on my Raspberry Pi 4 4GB.

I wanted to harden my setup as much as possible and opted to running everything in Docker. I started with the official Docker compose file and added my OpenRouter key. I brought up the pre-built container image and tried sending the basic “Hello” message to the agent using the CLI. However, I got error because the automatically generated config file defaulted to a version of Claude Sonnet 4 that wasn’t available on OpenRouter. I switched to claude-sonnet-4.6 and then gpt-oss-20b (for much cheaper testing).

The Zeroclaw web gateway was a bit of a mess. Of the features I tried, only memory management and the basic status dashboard worked. Trying to talk to the agent through the web interface would give me a black screen (here’s someone complaining about the same error). I’m still being charged for the tokens, though! The cost tracker always displayed zero, even as I sent CLI and Telegram messages (more on that soon). The configuration editor gave me an error and so did the diagnostics tool.

The project docs/wiki were helpful for figuring things out, but development is running so far ahead of releases that a bunch of the features referred to aren’t available in the current stable version (v0.1.7, from last week). This includes getting and setting specific config options from the CLI and resetting the gateway pairing token. To use these features, you have to compile yourself.

Some examples of just-build-things-ism

Sun, 01 Mar 2026 11:58:00 -0500

The best mantra to come out of the AI era is: “You can just build things”. (So good OpenAI ripped it off for their Super Bowl ad.)

I’ve been pretty inspired to see how many people are now building all kinds of incredible tools thanks to advances in AI coding agents, even if they have no previous background in coding (see my post on Havelack.AI from a few days ago).

Here are a few more examples I’ve been following:

Canadian journalist Alex Panetta writes about his AI-augmented workflow at A.I. For You. I first came across his work with his debut article “I killed my doomscrolling habit with AI. You can too”. In it, he explains how to vibe code an automated, personalized daily news digest. I’ve tried to build something for myself but I haven’t gotten it quite right yet. A great follow for big news consumers.
Economics professor Scott Cunningham, author of the great textbook Causal Inference: The Mixtape, has a presentation explaining how to encourage AI adoption among academic faculty. This starts with faculty experiencing a killer use case for AI, which he suggests is building slide decks. He shares his tools/agent skills for this use case and more on GitHub.
Another economist, Chris Blattman, built a website to share the productivity tools he developed with Claude Code. He provides a tutorial and code on Claude Blattman.

And of course, Simon Willison has been building and sharing tools habitually for years now.

Will you peruse this post?

Sat, 28 Feb 2026 13:38:00 -0500

I learned a new word today: contronym. It means a word whose definitions contradict each other. The example, thanks to a random Silicon Valley clip, is “peruse”. I’ve always used this word synonymously with “skim”, but Merriam-Webster presents two contradictory definitions:

to examine or consider with attention and in detail
to look over or through in a casual or cursory manner

I think I was vaguely aware of this definitional confusion, but only today did I learn that there was a term for this category of words.

Another one that annoys me is “sanction”…to sanction a behaviour can either mean to endorse it or to punish it…not helpful!

Big Muddy turns one month old

Sat, 28 Feb 2026 10:57:00 -0500

It’s been one month since my first post on Big Muddy. There were a few factors driving my decision to start this project:

I wanted to get into the habit of writing every day.
My “random interesting links” folder was overflowing, but I wasn’t doing anything with these links.
My admiration for Simon Willison’s work and his suggestion for everyone to start a blog to share what they learn.

As the saying goes, writing is thinking. Instead of allowing interesting articles, tools, and bits of knowledge to languish in a “temporary” bookmarks folder, I could actually engage with and learn from the material by writing something about each item and make it easier to re-find later. This also forces me to curate the links and ideas that are actually worth saving, since writing a post, even a short one, takes a lot more effort than just throwing a link into a folder.

I figured I might as well share the results with the world, since someone else might find this information useful. And I’m helping to write my ideas and preferences into the next generations of LLMs, I guess.

I’ve made exactly one post per day since starting this blog, which was my goal when I set out. A handful of these posts are pre-written the day before (if I know I won’t have the opportunity to write a post the next day), but most are written the day of. Most are short (Bash tells me just over 190 words on average, though this is slightly inflated by Markdown formatting). Some are very perfunctory, just a link with a few words, when I really needed to get a post out for the day. At the start of this project, I cut my “temporary” bookmarks folder to zero. It has now been replaced with a backlog of links I want to write about on this blog.

It's incredibly easy to game Twitter's trending news algorithm

Fri, 27 Feb 2026 20:30:00 -0500

Twitter’s “Today’s News” section is a mix of real news, very minor stories (usually discussion of a random AI-related post), nonsense trends, and barely disguised marketing.

The algorithm behind it seems pretty easy to manipulate.

This trending topic revolves around an explosive DM warning of imminent 25% layoffs at a FAANG company:

Here is the original post, which comes from an account called Tech Layoff Tracker (@TechLayoffLover):

There is no reason to believe this post is real. The account, created this month (February 2026), made its first post 7 hours ago. The post in question was made 5 hours ago, or 2 hours after the account’s very first post. Of course, the account carries an utterly meaningless blue “verified” checkmark.

But despite all this, the news summary puts “Tech Layoff Tracker” right in the headline, as if it’s a known reliable source and not an account (most likely) created the same day as the summary itself!

These academic journal AI policies aren't going to last

Thu, 26 Feb 2026 16:51:00 -0500

I recently came across the following policy on the submission page of an academic journal:

Use of Artificial Intelligence (AI) tools: One of the goals of Spectrum is to stimulate critical thinking and skill development among authors and reviewers alike. Spectrum discourages the submission of content generated by artificial intelligence (AI)-assisted technologies (such as chatGPT and similar tools). This includes tools that generate text, data, images, figures, or other materials, as well as tools that are used to summarize and synthesize sources. Authors should be aware that such tools are vulnerable to factual inaccuracies, biases, and logical fallacies, and may pose risks to privacy, confidentiality, and copyright.

If authors choose to submit work created with the assistance of AI tools, such use must be disclosed and described in the submission. The disclosure must include: 1) what system was used, 2) who used it, 3) the time/date of the use, 4) the prompt(s) used to generate the content, and 5) the content in the submission that resulted from use of AI tools. The output from the AI system should also be submitted as supplementary material. Authors must accept full responsibility for the accuracy and integrity of the submission. AI systems do not meet the criteria for authorship, and should not be listed as a co-author.

Agentic engineering patterns

Wed, 25 Feb 2026 16:15:00 -0500

Simon Willison is building a library of posts covering best practices for using agentic coding tools like Claude Code and OpenAI’s Codex. The existing articles cover test-driven development (red/green—ensure tests fail before the change and succeed after it) and AI-assisted code walkthroughs.

Comparing the Claw-like agent ecosystem

Tue, 24 Feb 2026 22:44:00 -0500

Chrys Bader has created ClawCharts to track the popularity and growth of OpenClaw and its growing number of competitors.

I have an unused Raspberry Pi 4 4GB that I’ve been meaning to test one of these Claw-like personal agents on (locked down to prevent the security nightmare scenarios we’ve seen play out since OpenClaw took off).

OpenClaw is a bit of a resource hog (which is why so many people are running out to buy Mac Minis), so I’ve been looking at the list of lightweight competitors. There is no obvious reason to prefer one over the other, so I’ll probably go with the fast-growing ZeroClaw.

ZeroClaw offers OAuth connectors for OpenAI and Anthropic subscription plans, but presently neither company is clear on whether this usage is permissible or not. Anthropic recently blew up the OpenClaw community by updating their docs to specifically ban using OAuth outside of Claude Code. An Anthropic employee partially walked this back on Twitter, but there is still no clear statement whether this use case is permitted. Regarding the use of OAuth from OpenAI for OpenClaw (specifically, GPT Codex), Peter Steinberger, creator of OpenClaw, stated on Twitter: “that already works, OAI publicly said that”. No one can seem to find this public statement, but it’s worth noting that Steinberger himself is now an OpenAI employee. So, will you get banned for using your ChatGPT Plus/Pro or Claude Pro/Max subscriptions with OpenClaw? Nobody knows.

LLMs automate the erosion of online anonymity

Mon, 23 Feb 2026 22:37:00 -0500

Economist Florian Ederer linked a new preprint describing the creation of an automated LLM-based pipeline for linking anonymous users across datasets based on unstructured text written by or about them. Prof Ederer is himself famous for unmasking the IP addresses of users of the infamous (but influential) Economics Job Market Rumors message board, exploiting a flaw in how usernames were assigned to anonymous posters. For platforms not encoding a user’s IP address in their “anonymous” username, the LLM-based approach involves:

Extracting structured features from free text
Encoding extracted features to embeddings to compare to candidate profiles
Reasoning using all available context to identify the most likely match among top candidates
Calibrate the quality of match by asking the LLM to report confidence

I guess it’s only a matter of time before someone uses this strategy to unmask Reviewer 2. (Currently this is only possible if Reviewer 2 insists you cite all of the work of the brilliant Dr. X.)

Oral texts

Sun, 22 Feb 2026 13:18:00 -0500

A major intellectual current in the post-social media age is the rediscovery of media theorists like Marshall McLuhan, Walter Ong, and Neil Postman, whose works seem incredibly prescient in the age of the Internet and the instantaneous and omnipresent mass communication it enables.

A particular sub-current of this trend is the return to orality, a culture rooted in the spoken rather than written word. Indeed, the vast majority of human history is defined by oral culture, and the world’s brief sojourn to the written tradition may have finally ended thanks to the Internet.

One of the most impressive projects to come out of this domain is Havelock.AI, a tool created by journalist Joe Weisenthal and entirely vibe coded with Claude. The tool analyzes text to give an “orality score” with supporting analysis. For example, qualified assertions are considered literate, whereas categorical statements are considered oral. The tool defines 68 oral/literate markers based on the framework of Walter Ong. It really is an impressive tool that I recommend checking out.

I plugged a few of my old articles into the tool and apparently my writing is very much rooted in the written tradition! (This post also scores as strongly literate.)

Film recommendation: Bugonia

Sat, 21 Feb 2026 23:27:00 -0500

I watched Bugonia (from director Yorgos Lanthimos) blind tonight, and I highly recommend it. The film is centred on a broken man who loses himself in conspiracy theories to cope with his tragic circumstances, but it’s also so much more than that. It features outstanding performances by Jesse Plemons and Emma Stone, as well an absolutely kidney shredding score.

Looking up the film for this post and I see it was nominated for Best Picture this year. I’m not surprised. Definitely a great watch, having known nothing about the film going in beyond the one sentence description.

Bugonia is available to stream on Amazon Prime in Canada and probably elsewhere.

The increasingly inevitable social media ban for kids

Fri, 20 Feb 2026 23:57:00 -0500

Jon Haidt writes on his Substack about the increasingly popular movement to ban social media for kids, following the implementation of Australia’s under-16 social media ban a few months ago.

A brief history of chocolate in the army

Thu, 19 Feb 2026 18:11:00 -0500

I’m almost a week late, but I enjoyed this Valentine’s themed article from Joe Schwarcz of McGill University’s Office for Science and Society giving a brief history of the use of chocolate in the army.

It turns out M&Ms were first sold to the U.S. Army during World War II. Canadians will of course be familiar with Smarties, a similar candy that was invented first.

Democratizing voice cloning scams

Wed, 18 Feb 2026 22:26:00 -0500

Jamie Pine has launched Voicebox, a new voice cloning studio built upon the open weight Qwen3-TTS model. The project is positioned as a free, local alternative to the well-known ElevenLabs voice generator. A short demo video is available.

Obviously, there are legitimate uses for voice cloning technology. But in practice, this will be used to enable AI impersonation scams and spam on a massive scale. The GitHub page for this release isn’t exactly encouraging on this front. Demo screenshots show voice clones of YouTuber Linus Tech Tips, Minecraft creator Markus “Notch” Persson, and deceased streamer twomad.

Make sure you have a secret passphrase set up with your family, since your voice is no longer uniquely your own.

Don't let AI do your thinking for you

Tue, 17 Feb 2026 21:11:00 -0500

Here’s a thought-provoking article from Harry Law on “The last temptation of Claude”—the urge to outsource all of your thinking to AI (and remember, writing is thinking).

A common theme in the AI commentary I’ve been reading lately is the growing importance of taste. AI is sending the cost of creating “content” (articles, analyses, video, etc.) to zero, even as the attention to consume it all remains fixed. If we want to keep living in a world where AI serves us, we need—more than ever—the discernment to choose the questions worth asking.

As I put it in my Globe and Mail op-ed on AI and journalism a few years ago:

AI won’t replace the sort of journalism that holds power accountable, but it could certainly enhance it. After all, you can teach a machine to spot patterns, but you can’t force it to care about your community.

In the multiverse of forking paths

Mon, 16 Feb 2026 22:49:00 -0500

STRANGE: I went forward in time to view alternate modelling decisions, to see all the possible outcomes of the coming analysis.
STAR-LORD: How many did you see?
STRANGE: 14,000,605.
STARK: How many did we achieve statistical significance?
STRANGE: One.

Prof. Jessica Hullman recently wrote a piece on Andrew Gelman’s blog discussing the use of ‘multiverse analysis’, i.e., what if we could see the results of the many slightly different decisions we could have made when constructing a model. This problem is commonly known as the garden of forking paths—during an analysis, a researcher is forced to make many small, sometimes arbitrary decisions that can lead to a different result if another researcher tries to independently replicate the analysis. While usually an innocent and inevitable part of the modelling process, these ‘researcher degrees of freedom’ can also be manipulated to produce a desired result.

Prof. Hullman points out that multiverse analysis will only become salient as AI coding tools such as Claude Code make it easier than ever to iterate on how we model our research questions.

Her longer paper with Julia M. Rohrer and Andrew Gelman, “What’s a multiverse good for anyway?” is available here.

Regulatory uncertainty threatens biotech innovation

Sun, 15 Feb 2026 22:32:00 -0500

Another post from the Clinical Trials Abundance blog, this time by Ruxandra Teslo, on how the recent refusal-to-file by the US FDA for Moderna’s new mRNA influenza vaccine increases regulatory uncertainty and threatens innovation across the entire biotechnology sector. The decision reportedly came after the country’s top vaccine regulator, Dr. Vinay Prasad, overruled career staff to quash Moderna’s application. This is just one more blow against mRNA vaccine technology to come from Health and Human services, the US federal health agency led by the world’s most prominent antivaxxer, Robert F. Kennedy Jr.

US Medicaid data gets DOGE'd

Sat, 14 Feb 2026 10:29:00 -0500

The US Health and Human Services DOGE team (I guess DOGE still exists in some form) just released a new aggregated, provider-level Medicaid claims database covering January 2018 through December 2024. With this dataset, you can track the monthly claims for each procedure (by HCPCS Code) and provider over time.

Even if the framing around this dataset’s release is partisan—tied to allegations of Medicaid fraud in Minnesota—it is a genuine advance in transparency for the US’s third largest spending program. No doubt this accomplishment required a lot of work on the backend to harmonize countless fragmented datasets into one tidy schema. These data were difficult to access before, and now they are free for anyone to use. Journalists, policy researchers, and companies working in the US healthcare sector will benefit the most, but every taxpayer benefits from added transparency about where their tax dollars go.

I would say there is the potential for these data to be misused to spark witch hunts, but this is more or less the stated purpose for this data release. Per Elon Musk: “Medicaid data has been open sourced, so the level of fraud is easy to identify.” If you go on Twitter, you will find several people have already plugged in the dataset to Claude Code and trumpeted their ASCII tables of providers flagged for potential fraud. Inevitably, some of these providers targeted by public scrutiny for their unusual billing patterns will have perfectly innocent explanations. But if ProPublica is excited about the release of this new dataset, then so am I.

More on vibe researching

Fri, 13 Feb 2026 23:49:00 -0500

To follow on yesterday’s post on AI-produced research, here is a reflection on “vibe researching” from Prof. Joshua Gans of the University of Toronto’s Rotman School of Management. Since the release of the first “reasoning” models in late 2024, he has gone all in on experimenting with AI-first research.

One of the key takeaways is that he found himself pursuing low quality ideas to completion more often, precisely because the cost of choosing to continue to pursue a questionable idea has been lowered. Sycophancy is a problem, too. With an AI cheerleader, it is easy to convince yourself you have a result when you do not.

Those ideas were all fine but not high quality, and what is worse, I didn’t realise that they weren’t that significant until external referees said so. I didn’t realise it because they were reasonably hard to do, and I was happy to have solved them.

I will note that (human) peer reviewers cannot be the levee that stops the flood of middling AI research: the system of uncompensated labour that undergirds all of academic publishing is already strained to bursting, as every editor desperate to find referees for a paper will tell you.

Prof. Gans concludes his year-long experiment in “vibe researching” was a failure, despite publishing many working papers and publishing a handful of them:

An end-to-end AI pipeline for policy evaluation papers

Thu, 12 Feb 2026 19:11:00 -0500

Prof. David Yanagizawa-Drott from the Social Catalyst Lab at the University of Zurich has launched Project APE (Autonomous Policy Evaluation), an end-to-end AI pipeline to generate policy evaluation papers. The vast majority of policies around the world are never rigorously evaluated, so it would certainly be useful if we were able to do so in an automated fashion.

Claude Code is the heart of the project, but other models are used to review the outputs and provide journal-style referee reports. All the coding is done in R (though Python is called in some scripts). Currently, judging is done by Gemini 3 Flash to compare against published research in top economics journals:

Blind comparison: An LLM judge compares two papers without knowing which is AI-generated Position swapping: Each pair is judged twice with paper order swapped to control for bias TrueSkill ratings: Papers accumulate skill ratings that update after each match

The project’s home page lists the AI’s current “win rate” at 3.5% in head-to-head matchups against human-written papers.

Prof. Yanagizawa-Drott says “Currently it requires at a minimum some initial human input for each paper,” although he does not specify exactly what. If we look at initialization.json that can be found in each paper’s directory, we see the following questions with user-provided inputs:

Policy domain: What policy area interests you?

Method: Which identification method?

Data era: Modern or historical data?

API keys: Did you configure data API keys?

External review: Include external model reviews?

Risk appetite: Exploration vs exploitation?

Other preferences: Any other preferences or constraints?

The code, reviews, manuscript, and even the results of the initial idea generation process are all available on GitHub. Their immediate goal is to generate a sample of 1,000 papers and run human evaluations on them (at time of posting, there are 264 papers in the GitHub repository).

There is only one statistical test

Wed, 11 Feb 2026 23:58:00 -0500

A classic article by computer scientist Allen Downey on why there is only one statistical test: compute a test statistic from your observed data, simulate a null hypothesis, and finally compute/approximate a p-value by calculating the fraction of test statistics from the simulated data exceeding the test statistic from your observed data.

Downey suggests using general simulation methods over the canon of rigid, inflexible tests invented when computation was difficult and expensive.

^{Hat tip to Ryan Briggs on Twitter.}

The case for sharing clinical trial data

Tue, 10 Feb 2026 19:39:00 -0500

Saloni Dattani of the excellent Works in Progress magazine (and formerly of Our World in Data) launched a new Substack today called The Clinical Trials Abundance blog. The first post is on the case for sharing clinical trial data. We have been gradually moving toward mandatory reporting of clinical trial results (though enforcement is another question), but sharing data would be one step further. Even though clinical trials rely on the trust (and often money) of the public, it can be very difficult to gain access to the raw results, even if journal article authors claim they are “available upon request”. A norm of clinical trial data sharing would not only increase the confidence in published results but also aid future drug development, reduce expensive redundancy, and improve meta-analyses (which are often forced to rely on heterogeneous summary measures).

Why a Canadian news site just launched an AI publishing tool

Mon, 09 Feb 2026 19:49:00 -0500

It’s no secret that Canadian journalism (like journalism everywhere) is in trouble. Newsrooms face a steady stream of layoffs despite a couple hundred million Canadian dollars of direct and indirect government subsidies every year. The vast majority of outlets eligible for these subsidies take advantage of them, and combined they can subsidize half of a journalist’s salary. News organizations are desperate to diversify their revenue streams.

The Hub is a right-leaning publication launched in 2021 with a focus on policy and politics. Notably, the outlet declines or donates their subsidies, citing a valid concern that the scale of such subsidies threaten the perceived trustworthiness and independence of the media.

In late January 2026, The Hub launched NewsBox, an AI-powered publishing tool. NewsBox aims to make it easier for creators to transform their content (written, audio, or video) into other formats, such as speeches, essays, or talking points, while maintaining the author’s distinct voice. You can see examples of the tool’s output on new articles in The Hub, each of which is accompanied by an AI-generated summary and list of quotes at the top of the page. There is also a “Hub AI” chatbot in the sidebar of every article.

The app very much uses The Hub’s branding, prominently featuring the outlet’s co-creators, who also created NewsBox. While their pitch talks about preserving creators’ voices to avoid the “soulless prose” and “slop” outputted by ChatGPT and similar tools, I have to wonder if tighter integration of AI into the news and opinion side of the operation will raise its own issues with trust. The Hub has always been fairly tech-friendly, including a longstanding sponsorship by Meta.

A handful of composers created most classic RPG soundtracks

Sun, 08 Feb 2026 22:36:00 -0500

I’ve always been a big fan of soundtracks, and video game soundtracks are no exception. Buying games on GOG.com usually nets you the soundtracks as well, so recently I’ve been enjoying a lot of classic RPG music. What struck me was how few composers were responsible for creating the ambiance of so many beloved classics. Look at how many series are covered by just the following six composers:

Inon Zur (Icewind Dale II, Dragon Age: Origins, Dragon Age II, Fallout series starting with Fallout 3 plus Fallout Tactics, co-composer for Baldur’s Gate II: Throne of Bhaal and Pathfinder: Kingmaker, additional music for Neverwinter Nights)
Jeremy Soule (Neverwinter Nights, Icewind Dale, The Elder Scrolls series starting with Morrowind, Star Wars: Knights of the Old Republic)
Justin E. Bell (Pillars of Eternity series, Tyranny, The Outer Worlds)
Mark Morgan (Fallout, Fallout 2, Planescape: Torment, Torment: Tides of Numenera, Wasteland 2, Wasteland 3)
Kirill Pokrovsky (Divinity series up through Divinity: Original Sin)
Borislav Slavov (Divinity: Origin Sin II, Baldur’s Gate 3)

Of the above, I highly recommend the truly excellent Divine Divinity soundtrack (terrible title, great music!), as well as Baldur’s Gate 3, particularly the vocal songs like “Down by the River”, “I Want to Live”, and “The Power”.

To me, it emphasized just how hard it is to break into this industry commercially, as these famous names and a handful of others will (deservedly!) continue to get work on the small number of major projects that get published every year. I worry that the less prestigious work that helps pays the bills/build experience for the large majority of composers who have yet to achieve name recognition will increasingly go to AI, impoverishing the pipeline for tomorrow’s great video game soundtrack composers.

How do you regain access to your computer if you lose your memory?

Sat, 07 Feb 2026 22:05:00 -0500

I read this interesting discussion this morning on Hacker News on the question of how to regain access to your computer if you lose your memory. As always, it starts with figuring out your threat model and responding accordingly.

Anthropic's statistical analysis skill doesn't get statistical significance quite right

Fri, 06 Feb 2026 19:30:00 -0500

Anthropic’s new statistical analysis skill demonstrates a common misunderstanding of statistical significance:

Statistical significance means the difference is unlikely due to chance.

But this phrasing isn’t quite right. The p-value in Null Hypothesis Significance Testing is not about the probability the results are “due to chance”; it is the probability—under the null hypothesis and the model assumptions—of observing results at least as extreme as the ones we obtained. In other words, the p-value summarizes how compatible the data are with the null, given our modelling choices. What it does not tell you is the probability that the null hypothesis is true.

Statistician Andrew Gelman gave a good definition for statistical significance in a 2015 blog post:

A mathematical technique to measure the strength of evidence from a single study. Statistical significance is conventionally declared when the p-value is less than 0.05. The p-value is the probability of seeing a result as strong as observed or greater, under the null hypothesis (which is commonly the hypothesis that there is no effect). Thus, the smaller the p-value, the less consistent are the data with the null hypothesis under this measure.

As some of the commenters in this blog post observe, simply being able to parrot a technically accurate definition of a p-value does not necessarily make us better at applying statistical significance in practice. It is certainly true that statistical significance is widely misused in scientific publishing as a threshold to distinguish signal from noise (or to be fancy, a “lexicographic decision rule”), which is why some scientists have argued that we should abandon it as the default statistical paradigm for research.

The CIA World Factbook has been memory holed

Thu, 05 Feb 2026 16:37:00 -0500

Another staple of my childhood is gone, this time the CIA’s World Factbook. I have fond memories of consulting the World Factbook for school projects in my elementary school computer lab. But as of yesterday, the entire publication along with all of its archives have been suddenly and unceremoniously wiped from the agency’s website. At least archives of the website are still available on the Internet Archive, with complete zip files up to 2020 and Wayback Machine snapshots thereafter.

Guinea worm one step closer to eradication

Wed, 04 Feb 2026 23:57:00 -0500

Only 10 cases of guinea worm were reported in 2025, down from an estimated 3.5 million cases per year when the elimination campaign began four decades ago. The disease is an ancient one, believed by some to be the “fiery serpents” that beset the ancient Israelites in The Book of Numbers. It is treated by carefully wrapping the parasite around a small stick as it painfully emerges over the course of weeks. This may be the inspiration for the Staff of Asclepius (⚕), the predominant symbol of medicine showing a a serpent wrapped around a rod.

When I was studying mathematical modelling of infectious diseases at the University of Ottawa in the mid 2010s, the question was whether Jimmy Carter would outlive the guinea worm. Tragically, he did not, but his life’s work helped to prevent an estimated 100 million cases of the disabling disease and made him a hero in global health.

While we are within spitting distance of zero cases in humans, true eradication will be more difficult due to significant animal reservoirs of the disease. The press release notes nearly 700 reported cases in animals across six countries (and who knows how many unreported cases). These non-human reservoirs pose a significant barrier to true eradication, since the disease must die out not only in human populations but also in wildlife.

msgvault: A personal email archive and search system to watch

Tue, 03 Feb 2026 08:00:00 -0500

Here’s a new project to watch if you are interested in taking control of your email: msgvault. The tool provides a local, searchable version of all of your Gmail messages and attachments, backed by SQLite and DuckDB.

The author, Wes McKinney, says he may add support for other email services in the future, as well as WhatsApp, iMessage, and SMS. I’ll probably look into it for myself once the project matures a little. Although given that it stores everything in a single giant database file, it won’t fit into my standard backup strategy of versioned, incremental backups. Still, it could be a nice step forward in regaining control over my email archives.

^{Hat tip to j4mie on HackerNews.}

The Divergent Association Task, a measure for creativity

Mon, 02 Feb 2026 12:14:00 -0500

The Divergent Association Task is a short, simple test introduced in 2021 claiming to measure creativity. Taking only a minute and a half, it asks participants to “generate 10 nouns that are as different from each other as possible in all meanings and uses of the words”.

Although the instructions say to “avoid specialized vocabulary (e.g., no technical terms)”, I imagine you might score higher if you’ve just finished cramming wordlists for the GRE. Researchers have used this test to compare human and AI creativity (though the use of GPT-4 in this article with a January 2026 publication date speaks to the incompatibility of AI research with traditional publication timelines).

A/B testing for advertising is not randomized

Sun, 01 Feb 2026 23:09:00 -0500

Florian Teschner writes about a recent paper from Bögershausen, Oertzen, & Bock arguing that online ad platforms like Facebook and Google misrepresent the meaning of “A/B testing” for ad campaigns. In A/B testing, we might assume the platform is randomly assigning users to see ad A or ad B, in an attempt to get a clean causal interpretation about which ad is more likely to drive a click (or whatever outcome you’re tracking).

But according to the paper, this is usually not what is happening. Instead, the platform optimizes delivery for each ad independently, steering each one toward the users most likely to click it. In other words, the two ads may be shown to different groups of users, and differences in click-through rates may be attributable to who is seeing the ad, as opposed to the overall appeal of the ad. Ad platforms convert A/B tests from simple randomized experiments into murky observational comparisons. For example, an ad may appear to do better because it happened to be shown disproportionately to a group with a high click-through rate, not because it presents a more compelling overall message. Advertisers get the warm glow of “experimentally backed” marketing without the assurances of randomization.

Total electoral wipeout

Sat, 31 Jan 2026 13:01:00 -0500

The 2002 Turkish general election is the canonical example of total electoral wipeout. Every party holding seats in the previous legislature was completely wiped out. Of the two parties that won seats in the 2002 election, the one that formed government didn’t even exist at the time of the previous election (current president Erdoğan’s AK Party, formed in 2001). Of note, it wasn’t a complete changing of the guard: one of the three independent members from the 1999 parliament won his seat again in 2002 (Mehmet Ağar), though it seems he took over as leader of one of the wiped-out parties shortly after the election.

_{Hat tip to kynakwado2 on Twitter.}

Twyman's law

Fri, 30 Jan 2026 19:25:00 -0500

From Wikipedia:

Twyman’s law states that “Any figure that looks interesting or different is usually wrong”

A bit different from that oft-quoted line attributed to Isaac Asimov:

The most exciting phrase in science is not ‘Eureka!’ but ‘that’s funny’

But Twyman’s law is much truer in my experience. Surprising results are usually a signal that something is screwy with my data, my assumptions, or my pipeline.

_{Hat tip to DJ Rich on Twitter.}

Remember that a lot of numbers are fake

Thu, 29 Jan 2026 23:20:00 -0500

David Oks wrote an essay reminding us that in many countries, even the most basic statistic—the population—is often shockingly uncertain or even outright fabricated. It’s a good reminder that many of the numbers we rely on for international comparisons, like crime rates and economic indices, are similarly troubled by incompatible definitions, uneven measurement, and varying degrees of manipulation. Ask Google what the population of Afghanistan is, and it will happily show you an annual timeline of population since 1960, but the tidiness of the chart belies the murkiness of the estimate.

One of the drawbacks of easily accessible international datasets from organizations like the World Bank and Our World in Data is that they paper over the huge differences among the underlying source datasets. Ultimately, you end up with one number from each country and the implication that they are all pointing to a single construct. This makes it far too easy to draw confident comparisons between countries that simply aren’t measuring the same thing. Without being forced to assemble these datasets yourself, it’s difficult to appreciate how messy it is to measure “the same thing” across different places (or even to measure the same thing over time within one place).

When evaluating a statistical claim, it’s always worth asking where the numbers come from and how they were measured. It’s easy to take figures at face value, especially when they’re rarely presented with any explicit uncertainty, which may be large. This goes double for more esoteric constructs like freedom scores or corruption indices, which often show up in social media posts cheerleading (or doom-mongering) one country over another. I remember one slickly produced video uncritically comparing COVID-19 statistics between Australia and Niger on the basis that they have the same population (do they?). Niger is one of the poorest and youngest countries in the world, and differences in demographics and health infrastructure alone invalidate any straightforward comparison with a wealthy Western country.

Welcome to Big Muddy

Wed, 28 Jan 2026 23:00:00 -0500

Hi, I’m Jean-Paul R. Soucy, a data scientist working in healthcare in Montreal, Canada. Welcome to Big Muddy, my spin on a Simon Willison-style links-and-notes blog. Here I collect and share things I’m learning across technology, science, politics, and whatever else catches my interest. You’ll find interesting links, brief write-ups, quick experiments, and the occasional deep dive.