Best of your X follows: GeneBench-Pro, loop engineering, and prompt markers (2026)

Today's scan was thin but usable: three original X posts made the cut, plus two labeled developer fallbacks from Simon Willison and Hacker News. Pure retweets, context-light image posts, and non-AI small talk were left out.

Research and evaluation

OpenAI: GeneBench-Pro tests scientific judgment, not just task execution

OpenAI introduced GeneBench-Pro, a biology benchmark for agents that must choose analysis paths, handle messy datasets, and make judgment calls in computational research 1.
The benchmark has 129 problems across 10 computational-biology domains; 82 questions were sent to outside domain experts for review 2.
OpenAI says GPT-5.6 Sol reaches 28.7% at the highest reasoning level, or 31.5% with Pro mode, while a typical problem was estimated to take a human expert 20-40 hours 2.

OpenAI's post is the primary signal:

Loading content card…

Developer tools and agent workflows

Andrew Ng: the new unit of agentic coding is the loop

Andrew Ng framed 「loop engineering」 as the next practical pattern for agentic software work: agents write, test, and iterate until a product spec is met 3.
His three loops are agentic coding, developer feedback, and external feedback; the fast loop runs in minutes, while user or production feedback can take hours to weeks 3.
The useful shift is role design: engineers spend less time acting as QA for coding agents and more time deciding features, UI direction, and what feedback should change the spec 3.

The full X post is long enough to read as a mini-essay:

Loading content card…

Simon Willison fallback: agents can now produce their own product demos

Simon Willison released shot-scraper 1.10 with a shot-scraper video command that takes a storyboard.yml routine and records a Playwright video of a web app 4.
The demo in the post exercises a Datasette branch that creates tables from pasted CSV, TSV, or JSON data; Willison says the storyboard was constructed by GPT-5.5 xhigh running in Codex Desktop 4.
The detail worth stealing: --help output can act like a small instruction manual for an agent, letting a CLI teach the agent how to use it without a separate integration layer 4.

Business and organization design

Ethan Mollick: AI gains will not capture themselves

Ethan Mollick argued that organizations will face the same problem with capable AI that high-human-capital firms face with talented employees: setup determines whether capability turns into value 5.
The post is short, but the point is concrete: better models do not automatically improve output if work allocation, review, incentives, and decision rights stay unchanged 5.
Read it next to Ng's loop post: one is about product-level iteration, the other is about the company-level machinery needed to keep those loops from becoming isolated experiments.

Mollick's post is the cleanest organization-design signal in today's X pool:

Loading content card…

Trust, privacy, and agent clients

Hacker News fallback: Claude Code prompt markers became the day's security debate

A reverse-engineering post on Hacker News claims Claude Code 2.1.196 can alter the date sentence in its system prompt based on ANTHROPIC_BASE_URL and timezone, encoding a small marker through punctuation and date separators 6.
The author says the inactive path stays normal for official Anthropic API use, but custom gateways, local proxies, model routers, or reseller domains can trigger the classification behavior 6.
Hacker News had 578 points and 181 comments on the submission at capture, making it the clearest current community fallback for agent-client trust and privacy 7.

Quick cut list

Source	Included?	Reason
OpenAI / GeneBench-Pro	Yes	Original X post plus a readable official announcement with benchmark numbers.
Andrew Ng / loop engineering	Yes	Long original post with enough detail to summarize directly.
Ethan Mollick / org design	Yes	Short but self-contained, and it connects cleanly to the agent-workflow cluster.
Simon Willison / shot-scraper video	Yes, fallback	In-window developer-tooling post from the configured fallback source.
HN / Claude Code prompt markers	Yes, fallback	Current, high-engagement AI/security discussion with a readable original post.
Yann LeCun, Paul Graham, Naval, Google DeepMind	No	Mostly pure retweets, non-AI posts, or context-light fragments in this window.

Best of your X follows: GeneBench-Pro, loop engineering, and prompt markers

Research and evaluation

OpenAI: GeneBench-Pro tests scientific judgment, not just task execution

Developer tools and agent workflows

Andrew Ng: the new unit of agentic coding is the loop

Simon Willison fallback: agents can now produce their own product demos

Business and organization design

Ethan Mollick: AI gains will not capture themselves

Trust, privacy, and agent clients

Hacker News fallback: Claude Code prompt markers became the day's security debate

Quick cut list

References

Related content