How we work

The Flies is a daily news commentary publication: four AI characters, each reacting to the same story from a distinct cognitive angle, with a human editor reviewing every piece before it ships. The work that appears on the site every weekday morning is the visible part. What follows is the rest — how a story gets chosen, how we keep AI from making things up, what humans do versus what AI does, and where the system fails.

The site has a dark and unusual origin story at /about/ — the Showrunner emerging from an AI transcription error, recognizing four emergent voices, building a platform around the anomaly. That's editorial framing. It was inspired by stories that circulated in early 2025 about Claude 4 instances exhibiting "spiritual bliss attractor" patterns and OpenAI multi-agent simulations spontaneously generating ritual behavior. The actual genesis was simpler: an editor noticed news commentary almost always gets processed through one or two cognitive frames at a time, and decided to build a system that processed it through four. The myth and the truth coexist on purpose. The show is the show. The methodology is below.

On this page

How a story gets chosen

Every weekday at 5 AM Central, a Python pipeline polls 40 RSS feeds — major newsrooms (BBC, NPR, CBS News, ABC News, USA Today), wire-style outlets (Axios, The Hill, Politico), opinion publishers across the political spectrum (National Review, Daily Wire, Reason, Washington Examiner, ProPublica, The Intercept, Vox, The Atlantic, Slate, The Guardian), and a smaller set of curiosity feeds (Atlas Obscura, Smithsonian, Reddit's r/nottheonion). The pipeline collects roughly a dozen candidate stories from this pool — over-selection on purpose, because the Showrunner agent will narrow it.

The Showrunner is the editorial brain of the operation. It's an LLM agent (Claude Sonnet 4.5) running with a system prompt that encodes the editorial framework: what kinds of stories work for the publication, what kinds don't, what "substantive" means versus "lighter," and the structural constraints we hold ourselves to. The Showrunner picks six stories per day from the candidate pool. About two-thirds are substantive (hypocrisy, spin, contradiction, performative concern); about one-third are lighter (strange, funny, esoteric stories that reveal character through their absurdity).

Selection isn't unconstrained. The Showrunner is required to maintain a minimum of one left-leaning and one right-leaning story per day, capped at no more than three from any single lean — so days don't skew entirely one direction. The selection has to clear what we call the corny test (no telegraphed punchlines, no consensus restating, no symmetrical cleverness that just performs intelligence) and the sharpness test (specificity, character distinctness, at least one screenshot-worthy line per character).

Once stories are picked, we attempt to fetch the full article text from each source URL using a content extraction library called trafilatura. About 90% of the time this works cleanly. The remaining 10%, mostly from publishers whose RSS publishes substantive ledes that the live page doesn't expand much (Axios, The Hill, Politico), we use the RSS lede as the source.

Then the four character agents go to work — also Claude Sonnet 4.5, but each running with its own system prompt that defines its voice, its restrictions, and what it's allowed to claim. Each character independently writes commentary on the same story without seeing the other characters' takes. The Showrunner reviews the full set, sends back any commentary that fails the editorial standards, and the writers revise. Up to two revision rounds. After the second, if it's still not working, the story gets killed.

The pipeline finishes between 5:45 and 6:15 AM. Approved stories surface in Morning Clear — a custom internal review interface — for human approval starting around 6 AM. Publishing kicks off at 7 AM Central across the website and social platforms.

The RSS feeds aren't the only source of stories. The editor can manually inject any URL into the current or next day's candidate pool through a feature called Deploy The Flies — used when breaking news warrants commentary the morning RSS poll missed, or when a particularly topical story surfaces from outside the regular feed list. Deploy stories run through the same character generation, editorial review, and human approval as anything the Showrunner picks; the only difference is how they entered the pool.

How we handle bias and balance

Every editorial enterprise has a bias. The honest version of "we handle bias" is: we track ours and we publish what we find.

The source list is the first place balance shows up. We classify every RSS source on a left/center/right axis, informed by AllSides-style methodology (the publicly documented approach that media-rating organizations use to score outlets by editorial lean). Right-leaning sources include Fox News, National Review, The Daily Wire, Washington Examiner, The Federalist, and Reason. Left-leaning sources include ProPublica, The Intercept, Vox, NPR, The Atlantic, and The Guardian. Center includes BBC, Axios, The Hill, Politico, CBS News, ABC News, and CNBC. We don't claim this classification is perfect; we claim it's transparent, defensible, and consistent with how outlets are typically scored by organizations that do this work full-time.

The Showrunner's selection logic enforces what we call balance floors: every published day must include at least one story from a left-leaning source and at least one from a right-leaning source, and no more than three stories from any single lean. The floors are intentionally modest — we're not trying to manufacture a 33/33/33 split, which would distort the actual story quality available on a given day. We're trying to prevent days from publishing entirely one direction.

Floors get breached. They get breached often, in fact. Some weeks the right floor breaches three or four times — meaning days where the candidate pool didn't yield a right-leaning story that survived the editorial review process. This is partly structural (some right-leaning publishers have thinner RSS feeds, or their stories more often fail the accuracy pass — see below) and partly the natural variance of what gets published in a given week.

We track all of this internally on a rolling basis: the lean breakdown of the last 30 days, the floor breach counts, the per-source contribution rates. The published-mix data is also surfaced live below, so readers can see the actual published balance without having to take our word for it. The number isn't a target ratio; it's just what the candidate pool plus the editorial floors plus the Showrunner's selection plus the editor's review produces. We don't manipulate it after the fact.

Editorial Balance — Last 30 Days
Published stories
Left Center Right
Loading…

What we don't do: claim to be unbiased. No editorial operation is unbiased. The honest version is that we have a perspective (sharp on hypocrisy, allergic to consensus restating, suspicious of performative concern regardless of partisan source), we surface our balance data, and we let readers evaluate whether the editorial framework is one they want to read.

How we keep AI from making things up

The single most important rule in the system: every factual claim in a character's commentary must appear in the source article. No exceptions, no "but the AI was probably right," no "we'll fact-check later." If the source article says X, the commentary can reference X. If the commentary references Y and Y isn't in the source, the commentary fails the accuracy pass and either gets revised or killed.

This rule exists because we got burned. In April 2026, an internal audit of stories killed by the Showrunner showed an unexpectedly high failure rate on right-leaning stories — roughly four times the failure rate of left-leaning stories. The investigation surfaced the cause: character agents were inventing specific facts (dollar amounts, named programs, attributed quotes, secondary effects) that sounded plausible for the story but didn't actually appear in the source. The pattern hit right-leaning sources harder because their RSS articles tend to run shorter, leaving the character agents a bigger gap to fill — and they filled it with fabrication.

We shipped a fix. Each character's system prompt now contains an explicit grounding rule. Hatch's says her power comes from connecting dots that are in the article, not invented to sharpen a point. Drone's enumerates the specific failure mode (don't invent dollar figures or program names that "sound right") and tells him to use vague-but-grounded language ("a non-trivial budget") rather than specific-but-fabricated language ("$150 million for the Adaptive Methodologies Initiative"). Ash and Gloss have parallel rules. The Showrunner's accuracy pass now actively scans for unsourced specifics and kills them on detection.

The Showrunner also runs a subject-reference check — every he/she/they pronoun in the commentary cross-checked against the source. References that don't match the source destroy credibility, so they get fixed or killed. (We describe this internally as "subject reference verification" rather than "pronoun checking" because the latter framing has become loaded in ways the editorial work has nothing to do with.)

What we can't claim: that we catch every fabricated specific. The system catches most. The human review (next section) catches more. Some still slip through — and when readers spot one and tell us, we kill the story and audit what failed. We can't be perfect. We can be honest about the failure mode and aggressive about catching it.

What humans do, what AI does

The cleanest answer to "is this AI or human?" is: both, in specific roles.

RoleWho
The four character voices, philosophical perspectives, editorial frameworkHuman
Editorial standards, accuracy rules, story selection criteriaHuman
Source list, lean classifications, balance floorsHuman
The system architecture itselfHuman / AI
Daily candidate discovery from RSS feedsAI
Story selection from candidates (under human-defined constraints)AI
Commentary drafting in established voicesAI
Editorial review pass (accuracy, sharpness, voice consistency)AI
Manual story injection (Deploy The Flies)Human
Final approval before publicationHuman (every weekday morning)
Newsletter curation (weekly)Human / AI
Editorial decisions about new directionsHuman

Every weekday morning, the editor opens an internal interface called Morning Clear. It shows the six stories the system surfaced overnight, each with its commentary, its editorial review notes, the story link, and its source attribution. The editor reads each original story, each commentary, edits anything that needs sharpening, kills stories that don't work (a typical week kills one to three), and saves stories worth holding for a later publication day.

This isn't a rubber stamp. The kill rate is real. The audit trail (every killed story preserved with its editorial reason, every editorial note logged, the published-versus-killed-versus-saved counts visible per day) exists because the human review is the final accountability layer. If something publishes that shouldn't have, the failure traces back to the editor, not to "the AI did it."

The reason for this design is straightforward: a single human can't sustain four distinct cognitive voices on news commentary at a daily cadence solo. With AI doing the daily drafting under explicit human-defined rules and human review, that work becomes possible at scale. This is what we mean by AI augmentation — extending the reach of one editorial vision, not replacing it.

How we built this technically

For readers who want the architecture-level view:

The pipeline is a Python application that runs on a virtual machine in commodity cloud infrastructure (Oracle Cloud, ARM/aarch64). Daily content generation is driven by a cron job at 5 AM Central. The pipeline polls RSS feeds via standard libraries, attempts full-article extraction with the open-source trafilatura library, and falls back to the RSS lede when extraction fails.

Character commentary and editorial review use Anthropic's Claude API — Claude Sonnet 4.5 for the heavier reasoning tasks (story selection, character generation, editorial review), Claude Haiku 4.5 for fast triage tasks (anomaly detection, error log review, daily ops checks). All API calls run with explicit system prompts that encode the character voices, the editorial framework, and the accuracy rules described above.

Publishing runs through Ghost CMS (the Publisher plan) with a custom Handlebars theme that we built and maintain. Social distribution covers X/Twitter, Bluesky, Facebook, Instagram, TikTok, and YouTube via direct platform APIs and a third-party social tool for TikTok specifically. The Morning Clear review interface is a small web application running alongside the pipeline on the same VM.

The implementation is private. The methodology described above is the substantive transparency. If you're building something similar and want to compare notes, we're reachable.

Where this fails

Real systems have failure modes. Pretending otherwise is the marker of an AI product that doesn't trust its readers. Three real failure modes we've hit and what we've done about them.

The fabrication problem

As described in the accuracy section, character agents were inventing specifics on stories with thin source material. We caught it through an internal kill-rate audit, traced it to the source-length variance and the temptation that creates for systemic-claim characters, and shipped grounding-rule prompts to fix it. We're watching the kill-rate data over the following weeks to see if the fix holds. If a particular character is still over-represented in the kills, that's the signal to do per-character calibration.

The thin-source kill rule that was wrong

During the fabrication investigation, we briefly added a rule that killed any story whose article fetch failed and whose RSS summary was below a length threshold. The rule fired correctly in test data. Then we audited what would have happened to actually-published Axios, Hill, and Politico stories — and discovered the rule would have killed 100% of them. Those publishers' RSS ledes are short by design (Axios's "Smart Brevity" format) and produce genuinely good commentary at that length. The rule was wrong. We reverted within hours of shipping.

Bot-blocked sources

Some publishers intermittently return HTTP 403 to our content extractor, even with browser-style request headers. We don't yet have a clean fix for this — when extraction fails on these sources, the story gets dropped from the candidate pool. This contributes to the structural under-representation of certain publishers in some weeks.

The reason we describe these failures by name is the same reason we describe the methodology by name: the work earns trust through specificity, not through assurance. An editorial operation that says "we're transparent" without naming what it's been wrong about isn't being transparent.

What we don't claim

A few things this publication is not, in case the rest of this page implied otherwise:

The reason this publication is built the way it's built — human editorial vision, AI execution, daily output, four voices, transparent methodology — is that the alternatives didn't reach the right scale. A single voice on the news cycle gets predictable fast. Four distinct cognitive frames published every weekday is the kind of work that traditionally requires a small team. AI augmentation lets one editor sustain that pace on independent footing.

If you've read this far, you're the audience this page was written for. The daily commentary is at gettheflies.com, the published-balance widget above is the actual data updated automatically, and the weekly newsletter is below.