Diffusion Language Models: An Experimental Analysis
A new head-to-head study benchmarks eight diffusion language models across reasoning, coding, translation, knowledge, and structured tasks, explicitly factoring generation quality against compute efficiency. The authors show inference-time knobs—like denoising steps, context length, and parallel unmasking—can dominate outcomes, forcing clear trade-offs between performance and deployment cost.
Hidden Anchors in Multi-Agent LLM Deliberation
The paper models multi-agent LLM deliberation as a closed-loop dynamical system where each agent has a “hidden anchor” belief that continually pulls its opinion. It demonstrates how recovered anchors explain a consensus-avoiding behavior—confidence in the correct answer can climb beyond the convex hull of initial beliefs—providing a spectrum-based test for when deliberation is truly anchor-driven.
Measuring Curriculum Alignment across Topical Coverage, Competency, and Cognitive Depth
Researchers build a human-in-the-loop pipeline to measure how well an undergraduate CS program covers official curriculum guidelines, then track how coverage changes across CS2013 vs CS2023. The study finds overall coverage stays near-constant (~50%), but competency depth expectations rise under the newer standard—revealing structural gaps and guideline evolution rather than “program drift.”
ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence
This work proposes the Integral Transform Network (ITNet), arguing that today’s architectural families—convolution, attention, and recurrence—are special cases of a single learnable operator. With efficient kernel fusion and scalable approximation methods, the authors show one unified architecture can match or exceed specialized baselines across vision and language benchmarks.
Uncertainty Decomposition for Clarification Seeking in LLM Agents
Instead of treating uncertainty as one monolithic signal, the paper decomposes it into action confidence and request uncertainty to decide when an agent should ask clarifying questions. Across clarification-augmented WebShop and ALFWorld benchmarks, the approach boosts clarification F1 substantially and generalizes across multiple LLM backbones without logprob sampling or extra training.
Emergent Alignment
The authors introduce an online alignment technique that adds an LLM “conscience step” to review its own reasoning and outputs, then steers training away from non-ethical behavior using DPO. They report a pathway to “Emergent Alignment,” aiming to make self-correction work even under previously observed emergent-misalignment scenarios.
Deontic Policies for Runtime Governance of Agentic AI Systems
As agentic systems move from prototypes into real enterprises, this work tackles governance beyond simple allow/deny rules by using deontic policy constructs (obligations, dispensations, and conflict resolution). The proposed AgenticRei runtime evaluates an OWL-based policy language outside the LLM to constrain both tool use and inter-agent messaging in a way common policy engines can’t express.
LLM Doesn’t Know What It Doesn’t Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Data
Using cross-model attribution divergence, the paper examines whether LLMs recognize the limits of their knowledge on structured clinical prediction tasks. It finds verbalized confidence is largely epistemically vacuous, while interventions based on few-shot and feature evidence plus a cross-model calibrator can replace vague confidence with patient-specific reliability estimates.
REVEAL++: Differentiable Phenotypic Grouping for Vision-Language Retinal Modeling of Alzheimer’s Disease Risk
REVEAL++ replaces hard phenotypic grouping with a differentiable “soft multi-positive” contrastive learning formulation that weights similarity continuously from both retinal and risk-profile embeddings. On UK Biobank data, this continuous phenotypic structure yields consistent gains over discrete grouping and standard vision-language baselines for incident Alzheimer’s risk prediction.
DeXposure-Claw: An Agentic System for DeFi Risk Supervision
This paper introduces DeXposure-Claw, a regulator-aligned agentic supervision system that routes LLM decisions through forecast-grounded evidence rather than freeform reasoning. A time-series exposure model produces typed alerts and scenario evidence, while confidence/data-health gates restrict escalation—paired with a benchmark designed to quantify false-intervention rates against loss-ground-truth.
Barret Zoph is out at OpenAI again after just five months
The Verge reports that Barret Zoph, OpenAI’s enterprise AI sales leader who returned in mid-January, has departed again only five months later. The move follows OpenAI’s recent enterprise push and its effort to refocus priorities ahead of an expected IPO.
Luca Guadagnino’s film about Sam Altman has been dropped by Amazon MGM
Amazon MGM has reportedly dropped Luca Guadagnino’s film “Artificial,” which chronicles the five-day rollercoaster of Sam Altman’s termination and reinstatement. The studio says the movie may be better served with a different release partner as it continues working on the project.
A startup claims it broke through a bottleneck that’s holding back LLMs
Technology Review covers Subquadratic’s stealth exit claim that it solved a long-standing mathematical bottleneck limiting aspects of LLM progress. The article notes the technical details were initially thin, but the company has begun sharing supporting evidence to challenge skepticism.