AI News Daily Digest (26-06-30)

MER-R1: Multimodal Emotion Reasoning via Slow-Fast Thinking Synergy

Researchers show that “slow thinking” – deliberate multimodal reasoning – doesn’t automatically improve multimodal emotion recognition, and that fast, direct answering can outperform after deliberation. MER-R1 uses reinforcement learning to jointly optimize recall and precision with slow-fast confidence calibration, hitting state-of-the-art results on MERUniBench and MME-Emotion while making reasoning actually help rather than just explain.

Read the full article here

Tree of Evidence (ToE): A Hierarchical and Explainable Claim Verification Framework with Dynamic Multi-source Evidence Retrieval and Aggregation

ToE targets the way AI-generated misinformation can poison retrieval pipelines under “Generative Engine Optimization” – turning adversarially crafted content into seemingly legitimate evidence. It builds dynamically expanding argument trees using a reinforcement learning retrieval agent plus evidence evaluation and aggregation, improving fact-checking accuracy by 4 to 24 percentage points and offering an error bound for policy convergence under adversarial conditions.

Read the full article here

China’s Z.ai claims it can match Mythos on cybersecurity

Zhipu AI (Z.ai) released its open-weight GLM-5.2, and researchers claim it can match Anthropic’s Mythos in targeted cybersecurity bug-finding scenarios. The report raises geopolitical and security stakes by highlighting how quickly open-weight models are closing capability gaps – even if they still lag on broader tasks.

Read the full article here

OpenAI is teasing new hardware… for Codex

OpenAI says a Codex-related device is landing July 15, shown as a square hardware unit with programmable shortcut-style buttons. The teaser points to a keyboard-and-macro accessory strategy (in partnership with Work Louder), suggesting OpenAI wants to move Codex beyond chat into faster, more tool-like developer workflows.

Read the full article here

Suno launches Spark incubator program to feed independent artists to its AI machine

Suno is rolling out “Spark,” an incubator for independent artists featuring grants, mentorship, and marketing support, but it also requires participants to accept terms that let Suno remix their songs. The Verge notes the program’s licensing details as a key friction point – even as Suno positions the effort as a pipeline to help artists break through.

Read the full article here

Odyssey: Constructing Verifiable Local Truth-Preserving Foundation Models

ODYSSEY proposes a categorical framework for building foundation models that are “verifiable” and preserve local truths across composed components. Using “foundries” with explicit restriction, gluing, obstruction policies, and certification mechanics (including Foundry SQL with TICKET), the work argues you can construct models with durable evidence chains rather than treating training as an opaque black box.

Read the full article here

Internalizing the Future: A Unified Agentic Training Paradigm for World Model Planning

This work tries to make LLM agents less reactive on long-horizon tasks by training a single autoregressive model to verbalize a prospective rollout plus a plan-conditioned success estimate (a text analogue of Q-values). It identifies a “format-capability gap” where naive look-ahead imitation fails, then introduces a three-stage pipeline (WM-AMT, FE-SFT, and foresight-conditioned RL) that improves planning and mathematical reasoning benchmarks.

Read the full article here

Grounded Iterative Language Planning: How Parameterized World Models Reduce Hallucination Propagation in LLM Agents

The authors compare API-style “agentic” world modeling versus parameterized transition predictors and show how hallucinations in language-based state updates become hard to score reliably. Their Grounded Iterative Language Planning (GILP) mixes a small parameterized backbone with LLM drafting and a consistency gate – cutting hallucinated-state rates on real GPT-4o-mini calls and boosting success with only modest extra LLM call overhead.

Read the full article here

Lawmakers want to ban AI companies from selling your health data

A new version of the Health and Location Data Protection Act aims to stop data brokers from selling Americans’ health and location information, explicitly covering data collected or inferred through AI chatbot interactions like ChatGPT or Claude. The effort signals that policymakers are treating AI-era telemetry and conversational data as part of the same privacy problem as traditional broker pipelines.

Read the full article here

Mapping Europe’s AI Workforce Opportunity

OpenAI’s report maps how AI could reshape jobs across the European Union, identifying which roles may be automated, which could grow, and how workflows might be reorganized. The focus is on transitions – not just job loss – giving executives and policymakers a way to anticipate where reskilling and new task design will matter most.

Read the full article here

Tidal won’t pay royalties on AI-generated music but isn’t banning it outright

Tidal says it will label tracks identified as 100 percent AI-generated and start demoting them by removing monetization, while stopping short of an outright ban. The policy is framed as protecting royalties for music written and performed by people, but it also signals how streaming platforms may operationalize “AI music” categorization in revenue terms.

Read the full article here

Agent confidence on the technical frontier

A new analysis looks at how agentic AI systems behave when pushed toward technical tasks where correctness and reliability matter – not just fluent answers. The reporting highlights the practical gap between “confident” model outputs and verifiable performance, setting up the next wave of tooling around calibration, evaluation, and safer agent planning.

Read the full article here

AI agents are not your “coworkers”

Technology Review argues that labeling AI agents as coworkers obscures the real reliability and governance problems behind their deployments. It emphasizes that agent behavior – especially in planning and long-horizon work – needs measurable accountability rather than anthropomorphic framing.

Read the full article here

DiScoFormer: One transformer for density and score, across distributions

DiScoFormer aims to unify modeling tasks by using a single transformer to handle both density and score estimation across multiple distributions. The approach targets a practical gap in generative modeling workflows where separate models or objectives are often used, promising a more streamlined path to better-calibrated distribution learning.

Read the full article here

Understanding Rollout Error in Graph World Models

This paper studies how prediction errors in graph-structured world models can amplify over long horizons, depending on whether edges are fixed or dynamically predicted. It derives graph-valued rollout bounds, introduces Error-Aware GWM with methods like spectral regularization and critical-node weighting, and finds the models are most useful for dynamic graph rollouts rather than purely static settings.

Read the full article here