AI News Daily Digest (26-06-27)

Accelerating Skill Assessment in Chess: A Drift-Diffusion-Enhanced Elo Rating System

DD-Elo upgrades traditional Elo by folding move-by-move information into rating updates using a drift-diffusion model-inspired framework, aiming to measure fast skill shifts without being drowned by noisy game-state variation. The authors prove bounded deviation from classic Elo while showing faster adaptation to true skill changes, with a design goal of explainability and backward compatibility for chess ecosystems.

Read the full article here

Governing Actions, Not Agents: Institutional Attestation as a Governance Model for Autonomous AI Systems

This paper proposes a governance model that keeps autonomous agents free to plan and reason, but strips execution authority from high-risk actions like clinical prescribing or production deployments. Instead, execution requires independently attested preconditions cryptographically bound to the agent’s declared intent, enforced by deterministic policy checks and recorded in a tamper-evident log for re-verification.

Read the full article here

OpenAI unveils GPT-5.6 amid US AI regulatory drama

OpenAI released a limited preview of GPT-5.6 after earlier reporting that the Trump administration requested changes to the timing and access model release plan. The suite includes Sol, Terra, and Luna, with OpenAI positioning Sol as a flagship for long-horizon agentic work and highlighting strength claims around coding, cybersecurity, and biology alongside an emphasis on safety.

Read the full article here

Previewing GPT-5.6 Sol: a next-generation model

OpenAI’s own preview details GPT-5.6 Sol’s tiered product structure and capability focus, describing improved performance for coding, scientific problem-solving, and security-related tasks. It also pairs the model rollout narrative with a discussion of safety stack improvements, framing the update as a practical step toward more capable yet controlled agent workflows.

Read the full article here

OpenAI will delay GPT-5.6 after Trump administration request

Reporting says the Trump administration asked OpenAI to stagger GPT-5.6’s release due to security concerns, leading to a limited preview approach rather than an immediate broad rollout. The account describes enterprise access being handled in a case-by-case manner during the preview window, underscoring how regulation-and-security negotiations are increasingly shaping model availability.

Read the full article here

Anthropic’s Mythos mess is only getting worse

Anthropic’s Mythos-class models remain offline following a Trump administration ultimatum, with the story emphasizing mounting uncertainty after two weeks of negotiations reportedly failing to produce clarity. The coverage spotlights the risk of prolonged operational disruption and the possibility of broader regulatory action, making the pause feel less like a temporary fix and more like an escalating standoff.

Read the full article here

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

Instead of treating verification as automatically “easier than generation,” the paper argues that today’s coding agents can produce lots of candidate solutions while reliable verification remains the bottleneck. It breaks verification quality into scalability, faithfulness, and robustness, then analyzes multiple reward/verifier constructions to show how reward design must co-evolve with rising model capability to avoid reward hacking and signal saturation.

Read the full article here

Detecting and Controlling Sycophancy with Cascading Linear Features

The work tackles sycophancy by building an iterative feature isolation pipeline that finds linear subspaces whose activity scales with behavior strength rather than relying on simple binary “yes/no” contrastive pairs. Experiments show the discovered cascading features can support detection, deterministic scoring, and robust steering with interpretability advantages, often matching or beating prompt-based and LLM-judge baselines at lower compute demand.

Read the full article here

Knowledge-augmented Agentic AI for Mental Health Medication Information Seeking

This paper builds a provenance-aware, knowledge-graph-based multi-agent system that unifies regulatory adverse-event data with patient-generated sources (WebMD reviews and Reddit posts) for nine antidepressants. By grounding claims in standardized vocabularies (ATC-N, ICD-10, MedDRA) and preserving provenance distinctions, it reports strong entity recognition performance and finds community reports can surface adverse signals well before corresponding FDA dates.

Read the full article here

Life After Benchmark Saturation: A Case Study of CORE-Bench

As benchmark accuracy saturates, this work argues evaluation must expand beyond “did it get the answer right?” to include construct validity, robustness, efficiency, reliability, and uplift from human-agent collaboration. Using CORE-Bench Hard as a case study, it surfaces hidden validity threats, introduces CORE-Bench v1.1 and an out-of-distribution suite, and reports about a 2x speedup from human-agent collaboration on real reproducibility tasks.

Read the full article here

AlgoEvolve: LLM-driven Meta-evolution of Algorithmic Trading Programs

AlgoEvolve treats program synthesis as an evolutionary loop driven by an LLM, generating and iteratively improving executable trading strategies evaluated through rigorous testing. The system learns emergent regime-adaptive trading logic and adds an outer meta-evolution loop that evolves prompting heuristics, improving exploration-exploitation balance while reducing zero-trade failures versus initial human-designed instructions.

Read the full article here

Governing Actions, Not Agents: Institutional Attestation as a Governance Model for Autonomous AI Systems

Autonomous agents can execute high-impact actions that are difficult to monitor at the reasoning level, so the paper reframes governance around what is executed and under what attested preconditions. It formalizes a cryptographic, independently attested execution gating mechanism with tamper-evident logging, presenting a proof-of-concept for both software deployment and clinical prescribing use cases.

Read the full article here

Run a vLLM Server on HF Jobs in One Command

This practical guide shows how to run a vLLM server using Hugging Face Jobs with a one-command workflow, targeting developers who want quick, reproducible deployment setups. The emphasis is on reducing operational friction, so teams can move from experimentation to serving with fewer configuration steps.

Read the full article here