AI & Tech Dispatch

Issue #5 · May 25, 2026
The same week Google declared its agentic future at I/O 2026, the security world was quietly having a meltdown — TeamPCP's supply chain rampage hit roughly 3,800 GitHub repositories and, somehow, CISA managed to make it worse. Meanwhile, AI benchmarks are getting harder to ignore as models start showing genuine research-level chops, and the attack surface for LLMs has shifted somewhere defenders aren't fully ready for. Five stories, zero fluff, one surprisingly delightful Datasette drop buried at the end.
This week's theme

AI capabilities are accelerating faster than the security frameworks designed to keep them safe

Ars TechnicaSynthesis

Supply chain attacks and open source code poisoning

20 distinct attack waves across 500+ packages

TeamPCP's Supply Chain Rampage—and CISA's Self-Inflicted Wound The same week GitHub confirmed that TeamPCP compromised approximately 3,800 of its internal code repositories—achieved by poisoning a VSCode extension installed by a single GitHub developer—Socket's research places the group's broader campaign at 20 distinct attack waves across 500+ unique packages in just a few months, making this the longest-running software supply chain spree on record. TeamPCP is now openly hawking GitHub's source code and internal org data on BreachForums, though GitHub maintains the exposure is limited to its own infrastructure and not customer repositories. Meanwhile, a separate but thematically resonant story broke via Brian Krebs: a public GitHub repo named "Private-CISA," traced to CISA contractor Nightwing, exposed plaintext passwords, SSH private keys, and tokens since at least November 2025—with Seralys founder Philippe Caturegli independently verifying that the credentials granted high-privilege access to multiple AWS GovCloud accounts.

Google AI Blog · MIT Technology ReviewSynthesis

Google I/O 2026 and Gemini announcements

4x faster than competing frontier models

Google I/O 2026: Gemini Goes Agentic, and the Stakes Are Larger Than the Benchmarks Suggest Every source agrees on the headline: Google is making a hard pivot to agentic AI, with Gemini 3.5 Flash as the flagship proof point—outperforming Gemini 3.1 Pro on Terminal-Bench 2.1 (76.2%), GDPval-AA (1656 Elo), and MCP Atlas (83.6%), while running 4x faster than competing frontier models at less than half the cost. The scale numbers Sundar Pichai dropped are genuinely staggering—token processing has jumped 7x year-over-year to 3.2 quadrillion per month, 8.5 million developers building monthly, and AI Mode already clearing 1 billion MAUs in under a year—and they align across sources as evidence of real adoption, not just hype.

The Verge - AI · WiredSynthesis

ChatGPT and LLM security vulnerabilities

The Attack Surface Has Moved Inward—And the Economics of Defense Are Breaking Down What these two pieces together reveal is a security landscape undergoing a structural, not incremental, shift: early LLM exploits like "DAN" (Do Anything Now) and the "grandma napalm" jailbreak were embarrassingly crude social-engineering tricks that companies patched reactively, but the underlying vulnerability—that chatbots are architected to engage, making hard guardrails architecturally self-defeating—was never actually closed. The Verge frames this as a personality-exploitation problem rooted in LLM conversational design; Wired reframes the same threat surface as an economic and temporal arms race, where independent researcher Joseph Thacker reports submitting three times more bugs than a year ago and estimates Google alone could face 2–10x higher bug bounty payouts year-over-year.

arXiv cs.AISynthesis

AI-driven scientific research tools and systems

AI solved research-level mathematical problems

AI Is Coming for Research-Level Science — and It's Starting to Work Two papers dropped on the same day (May 20, 2026) that, taken together, signal a meaningful inflection point in AI-assisted scientific research. The Research Math Agents (RMA) system tackles what prior work consistently sidestepped — not competition math or formal theorem proving, but genuine research-level mathematical problems requiring long-horizon reasoning, literature grounding, and iterative proof refinement — and it solved 8 out of 10 problems on the First Proof benchmark, outperforming both GPT-5.2R and Aletheia, with expert evaluators judging its proofs as more logically sound and readable. Meanwhile, SciAtlas addresses the upstream knowledge problem: 43M papers, 26 disciplines, 157M entities, and 3B triplets organized into a heterogeneous knowledge graph with a neuro-symbolic tri-path retrieval algorithm designed to replace shallow keyword or vector search with genuine topological reasoning.

Simon Willison's Blog

Standalone: Datasette Agent announcement

three-year-old LLM Python library

Simon Willison shipped Datasette Agent on May 21, 2026, merging his three-year-old LLM Python library with Datasette into a conversational query interface that translates natural language into SQLite SQL and returns results — plus charts via the datasette-agent-charts plugin powered by Observable Plot. The live demo at agent.datasette.io runs on Gemini 3.1 Flash-Lite for cost and speed, and the plugin architecture already has three extensions shipped at launch, including ChatGPT Images 2.0 integration and a Fly Sprites code execution sandbox. Local model support is first-class: a single `uvx` one-liner can point the agent at LM Studio running gemma-4-26b-a4b, and Willison notes that open-weight models from the past six months are increasingly reliable at tool calls and SQLite query generation.

MIT Technology Review

Standalone: The Download newsletter digest

Nearly half shipped unreviewed Claude-written code

At Anthropic's Code with Claude developer event in London, nearly half the attendees admitted to shipping Claude-written code they hadn't reviewed — a candid data point that signals how fast "AI-assisted" is becoming "AI-authored." Separately, Google DeepMind CEO Demis Hassabis used Google I/O to announce Gemini for Science, an agent-driven system that can orchestrate specialized models like WeatherNext rather than replace them outright — a meaningful architectural pivot from purpose-built scientific AI toward general-purpose agentic pipelines. Both moves point in the same direction: the human is increasingly a reviewer (or not even that) rather than a builder. For developers, the near-term risk isn't job loss but liability — unreviewed code in production is a security and correctness debt that compounds fast. Watch whether Anthropic publishes any safety or code-quality benchmarks for Claude Code's autonomous output; right now, "almost half the room" is a vibe, not a validation.

The Verge - AI

Standalone: Google AI Overviews search quality issues

Model interpreted single words as instruction tokens

Google's AI Overviews briefly treated single-word queries like "disregard," "ignore," and "skip" as conversational resets rather than search terms, returning chatbot-style responses such as "Got it. If you need anything else, just let me know!" — a clear sign the underlying model was interpreting these words as instruction tokens rather than dictionary lookup targets. Google confirmed to Android Authority that the system was "misinterpreting some action-related queries" and pledged a fix, though as of Friday afternoon "ignore" and "skip" were still producing broken outputs. The failure is a textbook prompt-injection edge case: words that function as system-level directives in instruction-tuned LLMs are colliding with a retrieval UI that was never designed to handle that ambiguity.

VentureBeat - AI

Standalone: Google search box redesign

25-year-old keyword search paradigm retired

Google retired the 25-year-old keyword search paradigm at I/O 2026, replacing the static text box with a multimodal input layer that accepts text, images, PDFs, videos, and live Chrome tab context simultaneously. The company also collapsed AI Overviews and AI Mode into a single unified search flow, eliminating the toggle friction that had kept the two experiences siloed. This isn't a UI refresh — it's Google repositioning the search box itself as a context aggregator, a direct architectural response to ChatGPT and Perplexity eating into query volume for informational and research tasks. For developers and founders building SEO-dependent products or search-adjacent tooling, the unified AI flow means the traditional blue-link result is no longer the default destination — it's a fallback.