AI News Daily Digest (26-06-19)

Skill-Constrained Model Predictive Control for Resilient Manufacturing Supply Chains

This work tackles production planning when human certifications decay over time and training competes with production capacity, framing workforce capability as a real operational bottleneck. The authors benchmark a closed-loop skill-constrained model predictive controller against production-only, maintenance-only, and static “skill insurance” strategies in disruption-heavy SkillChain-Gym scenarios, finding predictive control wins when skill shortages are forecastable early—but no policy dominates under surprise shocks near demand-capacity boundaries.

Read the full article here

Midjourney goes from generating cat images to full-body ultrasound scans

Midjourney CEO David Holz unveiled the company’s first medical hardware product: an ultrasound-based full-body scanner that captures internal “vertical slices” of anatomy using a ring of sensors. The company positions it as a route to high-quality body composition/organ imaging—potentially at annual or even daily cadence—and aims for MRI-comparable image quality in many cases.

Read the full article here

Photoshop and Premiere now have AI assistants

Adobe is rolling out a public beta that embeds bespoke AI assistants into major Creative Cloud apps like Photoshop and Premiere, with each assistant designed to operate “as a specialist” inside its specific workflow. The assistants are powered by Adobe’s conversational creative-agent stack and are aimed at organizing work and automating tasks rather than just offering generic chat.

Read the full article here

Who decides when AI is too dangerous?

An on-the-ground account of how AI safety, regulation, and geopolitics collide—centered on the US export controls that pulled Anthropic’s Claude Mythos/Fable family after a sudden national security scramble. The discussion frames the breakdown as both a question of technical risk (jailbreak claims) and governance mechanics (timelines, who gets consulted, and why compliance approaches ended up so blunt).

Read the full article here

Is it agentic enough? Benchmarking open models on your own tooling

The Hugging Face team argues that “agentic” performance needs testing against the real tooling an agent will use—not just benchmarks that measure pure text quality. Their focus is practical: how to evaluate open models in setups that reflect permissions, environment constraints, and the actual operational loops agents must run to be considered truly agentic.

Read the full article here

Distributed General-Purpose Agent Networks: Architecture, Key Mechanisms, and Prototypes

This paper proposes a blueprint for open peer-to-peer networks of heterogeneous agents that can discover each other, establish trust, and negotiate cooperation rules to execute long-horizon tasks. The authors highlight why agent networks aren’t “just P2P + multi-agent,” pushing a layered architecture with semantic declaration propagation, verifiable identities and multi-topic reputation, and mechanism design for open-ended execution.

Read the full article here

SkillChain-Gym: A Benchmark for Reskilling-Aware Production-Inventory Control under Disruptions

SkillChain-Gym introduces a standardized testbed where workforce learning and forgetting are first-class citizens in production-inventory control, including certification thresholds and training actions that consume the same worker-hours as production. The benchmark’s seeded disruption framework and resilience metrics make it possible to compare policies that can train, maintain, or insure skills—and to study when adaptive training helps versus when lean static cross-training remains the safer bet.

Read the full article here

When Rules Learn: A Self-Evolving Agent for Legal Case Retrieval

This method upgrades legal retrieval by letting an LLM agent iteratively generate and prune BM25-style query rewriting rules—without parameter training—based on experiment feedback. Tested on LeCaRD-v2, the self-evolving framework outperforms fixed baselines like human-designed rules and greedy selection, with analyses showing the agent learns which rule combinations to discard using prior validation results.

Read the full article here

Nothing from Something: Can a Language Model Discover 0?

The research tests whether language-model generalization can go beyond training data into genuinely new mathematical structure—specifically, whether models can independently “discover” the concept of zero. Results show GPT-2-sized models can’t generalize to zero at test time under most conditions, but improve dramatically after training on a limited number of examples, with language pretraining effectively scaffolding discovery and halving sample requirements.

Read the full article here

Beyond LoRA: Can you beat the most popular fine-tuning technique?

This post explores PEFT alternatives aimed at improving on LoRA’s fine-tuning efficiency while maintaining or boosting performance. The takeaway for practitioners: depending on model architecture and task, “beyond LoRA” strategies can deliver better adaptation quality under constrained compute and memory budgets.

Read the full article here

Quantifying Consistency in LLM Logical Reasoning via Structural Uncertainty

Instead of judging reliability by output diversity, this work measures whether an LLM consistently ranks its own reasoning candidates—capturing instability and ambiguity directly in the reasoning process. The authors’ structural uncertainty framework uses self-preference-induced ranking distributions and shows a regime boundary: it helps identify unreliable logical/mathematical reasoning, while collapsing toward uniformity for factual retrieval tasks.

Read the full article here

Improving health intelligence in ChatGPT

OpenAI highlights improvements to how ChatGPT handles health and wellness questions, emphasizing clearer reasoning, better context use, and physician-informed evaluations. The update frames “health intelligence” as more than answers—pushing toward safer, more clinically grounded response behavior.

Read the full article here

SpeechDx: A Multi-Task Benchmark for Clinical Speech AI

SpeechDx builds a broad, cross-dataset benchmark for clinical speech AI by organizing tasks around the stage of speech production affected (conceptualization, formulation, articulation) rather than isolated disease-only evaluations. The results are sobering: even strong large-scale baselines don’t generalize reliably across the clinical speech landscape, suggesting the field still lacks representations that transfer across conditions and datasets.

Read the full article here

Amazon employees say they’re facing termination for backing data center limits

In Seattle City Council testimony, Amazon software engineers alleged that the company retaliated after they supported a municipal move to restrict data center growth. The dispute escalates with HR investigations and disciplinary action claims, putting AI-adjacent infrastructure politics directly into workplace risk and employment law territory.

Read the full article here

MemTrace: Probing What Final Accuracy Misses in Long-Term Memory

This benchmark targets a blind spot in long-term memory evaluation: scoring question accuracy independently can hide whether a system tracks how a user fact changes over time. MemTrace measures memory by knowledge point (the fact itself) and tests it across controlled dimensions like memory age and evidence conditions, concluding that failures often stem from evidence use—not just retrieval—so “more memory” isn’t automatically “better memory.”

Read the full article here