Midjourney Medical’s Ultrasound Scanner: The Promising Tech, the Missing Proof
Midjourney Medical shows a behind-the-scenes build of its dunk-tank ultrasound scanner setup, describing a hacked-together array of ultrasound probes plus off-the-shelf compute. The video is a rare look at the hardware pipeline, but it leaves open the big question: where are the results that prove reliability, accuracy, and clinical readiness at scale.
Diffusion Language Models for Interactive Radiology Report Drafting
DiffusionGemma-26B adapts diffusion-based text generation for medical visual question answering and finds diffusion can match or beat an autoregressive sibling under the same LoRA recipe. The bigger “radiology-friendly” leap is infill: clinicians can fix fragments and the model fills in the missing text between them using any-order infill, a capability inherently easier in diffusion than next-token generation.
RLVR Proof of Concept for Tool-Use Agents on Atlassian Workflows
This work tests Reinforcement Learning with Verifiable Rewards (RLVR) directly against tool-call traces, using synthetic Jira REST v3 and Confluence v2 environments with schema-faithful checks. On scenarios where the reward signal is well-behaved, RL-trained policies jump from baseline reward ranges around 0.35-0.92 to 0.95-1.00, highlighting a path to outcome-optimized small models without needing live APIs or human labeling.
Auto-FL-Research: Agentic Search for Federated Learning Algorithms
Auto-FL-Research (AFR) uses constrained coding agents to search over federated learning “recipes” including aggregation rules, client update schedules, objectives, and model variants within a fixed task profile. Experiments across healthcare FL benchmarks show gains on many tasks, but also reveal seed-sensitive failures and improvements that sometimes come from repeated, isolated mechanisms rather than broadly general algorithmic breakthroughs.
Anthropic Wants to Develop Its Own Drugs
Anthropic’s “Claude Science” positions its AI workbench as a unified environment for scientific tools, datasets, and figure generation, aiming to accelerate discovery and healthcare intervention development. The company also signals an ambition to move beyond assistance into drug development itself, setting up a closer relationship between foundation models and biotech execution.
Wiola: A Fully Novel Small Language Model Architecture
Wiola claims a ground-up SLM design with no structural lineage to major families like GPT, LLaMA, or Mistral, combining five novel components for positioning, attention, and stability. Spiral Rotary Positional Encoding and gated cross-layer attention target long-range coherence, while adaptive token merging reduces attention cost and modified RMSNorm helps prevent representation collapse.
World Feedback for Clinical Agents: Diagnosing RL in FHIR Environments
This paper audits MedAgentBench and finds a high “silent-finish” ceiling where the RL agent learns inaction because the evaluation fails to sufficiently penalize it. It introduces MedAgentBench-v3 with better task coverage and shows RL training is held back by capability and format-knowledge barriers, suggesting the fix is targeted SFT for code/format knowledge plus RL for conditional decision logic.
When Service Agents Reconsider: Difficulty-Routed Control in Customer-Service Operations
The work addresses a core risk in autonomous customer-service agents: acting too confidently when instructions, policy constraints, and backend writes interact. It proposes a difficulty-routed control system that escalates only operationally coupled or conflict-heavy requests, improving reliability while keeping routine sessions fast and low-friction through targeted reconsideration before consequential writes.
Agent4cs: A Multi-agent System for Code Summarization in Large Hierarchical Repositories
Agent4cs breaks code summarization into a bottom-up multi-agent pipeline that extracts keywords from subfolders, produces robust summaries, and then quality-assures output for consistency across the repository hierarchy. Across evaluations, it improves semantic consistency across folder levels and boosts normalized keyword coverage, tackling the common failure mode where single-model summarizers treat codebases as flat text.
PACE: A Neuro-Symbolic Framework for Feasibility-Aware Counterfactual Explanations
PACE separates prediction from reasoning so counterfactual recommendations must pass symbolic feasibility constraints rather than only flipping a classifier’s decision. Using a neuro model plus ASP rules, it generates explanations that are more realistic and actionable under domain-specific intervention limits, improving the “plausibility vs. validity” trade-off common in explainable AI.
CreativityNeuro: Steering Model Weights to Improve Divergent Thinking
CreativityNeuro boosts divergent thinking in LLMs using data-free contrastive weight steering rather than retraining or gradient updates. Experiments report improvements on creativity benchmarks and show reduced mode collapse, with evidence that weight-space steering can transfer across tasks where activation steering doesn’t.
Procedural Memory Distillation: Online Reflection for Self-Improving Language Models
Procedural Memory Distillation (PMD) addresses a limitation of RL-with-verifiable-rewards style training: the model’s richer rollout experience often doesn’t get reused effectively across episodes. PMD distills cross-episode procedural signals into a hierarchical memory during training and then absorbs that knowledge into the policy weights, improving performance on scientific and coding benchmarks versus prior self-distillation approaches.