Daily AI Papers — April 3, 2026

12 minute read

Published:

1. Generative World Renderer

Authors: Zheng-Hui Huang, Zhixiang Wang, Jiaming Tan, Ruihan Yu et al. Summary: Introduces a large-scale dynamic dataset of 4M continuous frames (720p/30fps) extracted from AAA games using a novel dual-screen stitched capture method to bridge the domain gap in generative rendering. Scales inverse and forward rendering to real-world complexity using game-quality synthetic data. Link: arxiv.org/abs/2604.02329 Source: HuggingFace daily (Apr 3, #3), alphaxiv.org, arxivlens analysis, HuggingFace paper page Why trending: AAA game data for generative rendering is a creative data strategy. 4M frames at 720p is a significant new resource. Multi-platform discussion.

2. SKILL0: In-Context Agentic RL for Skill Internalization

Authors: Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han et al. Summary: Agent skills loaded at inference time suffer from retrieval noise, token overhead, and the model never truly learning the skill. SKILL0 uses in-context agentic RL to internalize skills directly into the model’s weights, eliminating the need for runtime skill loading while preserving performance. Link: arxiv.org/abs/2604.02268 Source: HuggingFace daily (Apr 3, #4), paperium.net, arxivexplained.com, YouTube explainer video, HuggingFace paper page Why trending: Skill internalization addresses a fundamental limitation of current agent architectures. Multi-platform buzz with YouTube coverage.

3. DataFlex: A Unified Framework for Data-Centric Dynamic Training of LLMs

Authors: Hao Liang, Zhengyang Zhao, Meiyi Qiang, Mingrui Chen et al. Summary: Unifies data selection, mixture optimization, and data reweighting into a single consistent framework. Existing approaches are fragmented across isolated codebases. Open-source release on GitHub (OpenDCAI/DataFlex). Link: arxiv.org/abs/2603.26164 Source: HuggingFace daily (Apr 3, #1), YouTube explainer, GitHub open-source (OpenDCAI/DataFlex), HuggingFace paper page Why trending: Open-source tool for a universal pain point (data mixing). YouTube video + GitHub release drive adoption.

4. Steerable Visual Representations

Authors: Jona Ruthardt, Manu Gaur, Deva Ramanan, Makarand Tapaswi, Yuki M. Asano Summary: Pretrained ViTs (DINOv2, MAE) focus on the most salient visual cues with no way to redirect toward less prominent concepts. Proposes steerable representations that let users direct attention toward specific visual aspects of interest, bridging the gap between frozen features and multimodal LLM capabilities. Link: arxiv.org/abs/2604.02327 Source: HuggingFace daily (Apr 3, #6), ICLR 2026 paper list (papercopilot.com), scirate.com top papers, HuggingFace paper page Why trending: Deva Ramanan co-author (CMU). Accepted at ICLR 2026. Makes frozen ViT features controllable – practical for retrieval and downstream tasks.

5. Apriel-Reasoner: RL Post-Training for General-Purpose Reasoning

Authors: Rafael Pardinas, Ehsan Kamalloo, David Vazquez, Alexandre Drouin (ServiceNow) Summary: Addresses the undisclosed training recipes of frontier reasoning models. Documents the challenges of joint RLVR optimization across domains (varying rollout lengths, difficulty, sample efficiency). Provides a transparent recipe for general-purpose reasoning via RL. Link: arxiv.org/abs/2604.02007 Source: HuggingFace daily (Apr 3), ServiceNow blog (servicenow.com/blogs/apriel-model-family-frontier-reasoning), researchtrend.ai Why trending: ServiceNow company release extending their Apriel model family. Transparent RL recipe when frontier labs keep theirs secret.

6. The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

Authors: Xinlei Yu, Zhangquan Chen, Yongbo He, Tianyu Fu et al. Summary: Comprehensive survey arguing that latent space is becoming the native substrate for language models. While models appear to work with tokens, critical internal processes happen in continuous latent space rather than human-readable traces. Covers foundation, mechanisms, capabilities, and future directions. Link: arxiv.org/abs/2604.02029 Source: HuggingFace daily (Apr 3, #2), arxiv cs.CL Why trending: Timely meta-analysis as latent reasoning and continuous representations gain momentum. Comprehensive scope covering an emerging paradigm shift.

7. Therefore I am. I Think

Authors: Esakkivel Esakkiraja, Sai Rajeswar, Denis Akhiyarov, Rajagopal Venkatesaramani Summary: Presents evidence that reasoning models make decisions before generating chain-of-thought, not after. A simple linear probe successfully decodes tool-calling decisions from pre-generation activations, suggesting CoT may be post-hoc rationalization rather than causal reasoning. Title is an intentional inversion of Descartes. Link: arxiv.org/abs/2604.01202 Source: HuggingFace daily (Apr 3), arxiv cs.AI Why trending: Provocative finding that challenges the core assumption of chain-of-thought reasoning. If decisions precede thinking, what is CoT actually doing?

8. ASI-Evolve: AI Accelerates AI

Authors: Weixian Xu, Tiantian Mi, Yixiu Liu, Yang Nan et al. Summary: Can AI accelerate AI research itself? Presents an agentic framework for AI-for-AI research that closes the loop through a learn-deploy-evaluate cycle. Tackles costly, long-horizon, weakly supervised research loops rather than well-scoped tasks with rapid feedback. Link: arxiv.org/abs/2603.29640 Source: HuggingFace daily (Apr 3, #12), arxiv cs.AI Why trending: Bold recursive vision: AI doing AI research at scale. Addresses whether agentic systems can handle real research complexity, not just benchmarks.

9. EgoSim: Egocentric World Simulator for Embodied Interaction Generation

Authors: Jinkun Hao, Mingda Jia, Ruiyan Wang, Xihui Liu et al. Summary: Closed-loop egocentric world simulator that generates spatially consistent interaction videos while persistently updating 3D scene state. Existing simulators either lack 3D grounding (causing structural drift) or treat scenes as static. EgoSim provides continuous multi-step simulation with real-time state updates. Link: arxiv.org/abs/2604.01001 Source: HuggingFace daily (Apr 3, #5), arxiv cs.CV Why trending: Embodied AI needs closed-loop simulators with persistent state. This fills a critical gap between static scene generation and real interaction.

10. Omni-SimpleMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory

Authors: Jiaqi Liu, Zipeng Ling, Shi Qiu, Yanqing Liu et al. Summary: Agent lifelong memory has a vast design space (architecture, retrieval, prompting, data pipelines) too large for manual exploration. Uses “autoresearch” – automated research methodology – to systematically discover optimal memory configurations for multimodal agents. Link: arxiv.org/abs/2604.01007 Source: HuggingFace daily (Apr 3, #8), arxiv cs.AI Why trending: Agent memory is the critical unsolved bottleneck. Using AI to research AI memory design is meta but practical. Already on v2.

11. Executing as You Generate: Hiding Execution Latency in LLM Code Generation

Authors: Zhensu Sun, Zhihao Lin, Zhi Chen, Chengran Yang et al. Summary: Current coding agents generate code fully, then execute sequentially – leaving executor idle during generation and vice versa. Proposes parallel execution during generation, hiding execution latency by running partial code as tokens stream. Practical latency reduction for agentic coding. Link: arxiv.org/abs/2604.00491 Source: HuggingFace daily (Apr 3), HuggingFace paper page, arxiv cs.SE Why trending: Directly applicable to every coding agent in production. Simple idea with measurable latency savings. Every developer will feel this improvement.

12. LatentUM: Unleashing Interleaved Cross-Modal Reasoning via Latent-Space Unified Model

Authors: Jiachun Jin, Zetong Zhou, Xiao Yang, Hao Zhang et al. Summary: Explores using unified models for interleaved cross-modal reasoning in latent space – solving visual understanding problems that require dense visual thinking, self-reflection for generation improvement, and dynamic reasoning across modalities. More ambitious than simple text-to-image or image-to-text. Link: arxiv.org/abs/2604.02097 Source: HuggingFace daily (Apr 3, #7), arxiv cs.CV Why trending: Latent-space reasoning across modalities is a natural extension of the latent space trend. Interleaved reasoning is harder than one-shot generation.

13. VOID: Video Object and Interaction Deletion

Authors: Saman Motamed, William Harvey, Benjamin Klein, Luc Van Gool et al. Summary: Existing video object removal inpaints content behind objects but fails when the removed object has significant interactions (collisions, physics). VOID handles interaction-level consequences of removal, not just appearance-level artifacts like shadows and reflections. Link: arxiv.org/abs/2604.02296 Source: HuggingFace daily (Apr 3, #10), arxiv cs.CV Why trending: Interaction-aware video editing is a meaningful step beyond pixel-level inpainting. Luc Van Gool co-author adds credibility.

14. GPA: Learning GUI Process Automation from Demonstrations

Authors: Zirui Zhao, Jun Hao Liew, Yan Yang, Wenzhuo Yang et al. (Salesforce) Summary: Lightweight vision-based Robotic Process Automation that enables fast, stable process replay from a single demo. Uses Sequential Monte Carlo-based localization to avoid the fragility of traditional RPA and non-deterministic risks of VLM-based GUI agents. From Salesforce Research (Junnan Li). Link: arxiv.org/abs/2604.01676 Source: HuggingFace daily (Apr 3, #15), arxiv cs.AI Why trending: Salesforce Research with Junnan Li. Practical tool for enterprise automation that’s more robust than either traditional RPA or LLM agents.

15. Investigating Autonomous Agent Contributions in the Wild

Authors: Razvan Mihai Popescu, David Gros, Andrei Botocan, Rahul Pandita et al. Summary: Studies how autonomous coding agents (branch creation, PR opening, code review) contribute to real-world software projects. Analyzes activity patterns and code change quality over time. First large-scale empirical study of agent contributions “in the wild.” Link: arxiv.org/abs/2604.00917 Source: HuggingFace daily (Apr 3, #13), arxiv cs.SE Why trending: Everyone is deploying coding agents but nobody has studied their actual impact on code quality and team dynamics at scale. This paper does.

16. Terminal Agents Suffice for Enterprise Automation

Authors: Patrice Bechard, Orlando Marquez Ayala, Emily Chen et al. (ServiceNow) Summary: Shows simple shell-based agents match complex MCP/web agent architectures for enterprise tasks. Questions whether elaborate agent systems are necessary. Link: arxiv.org/abs/2604.00073 Source: HuggingFace daily (Apr 2, still trending), alphaxiv.org, YouTube, CACM blog Why trending: Provocative simplicity claim backed by empirical evidence. Still accumulating attention from yesterday.

17. VideoZeroBench: Probing the Limits of Video MLLMs

Authors: Jiahao Meng, Tan Yue, Qi Xu, Haochen Wang et al. Summary: Current video MLLM evaluations suffer from inflated scores that mask fine-grained understanding gaps. VideoZeroBench requires models to identify precise spatio-temporal evidence supporting their answers, not just get the answer right. Link: arxiv.org/abs/2604.01569 Source: HuggingFace daily (Apr 3, #17), arxiv cs.CV Why trending: Evidence-grounded evaluation for video models. Addresses the fundamental problem of right-answer-wrong-reason in video benchmarks.

18. Brainstacks: Cross-Domain Cognitive Capabilities via Frozen MoE-LoRA Stacks

Authors: Mohammad R. Abu Ayyash Summary: Modular architecture for continual multi-domain fine-tuning that packages domain expertise as frozen adapter stacks composing additively on a shared frozen base. Uses MoE-LoRA with noisy top-2 routing across transformer projections under QLoRA 4-bit quantization. Link: arxiv.org/abs/2604.01152 Source: HuggingFace daily (Apr 3), arxiv cs.CL Why trending: Elegant approach to continual learning that avoids catastrophic forgetting while being memory-efficient. Builds on the composable adapter trend.

19. T5Gemma-TTS Technical Report

Authors: Chihiro Arata, Kiyoshi Kurihara Summary: Autoregressive codec language models have strong zero-shot voice cloning but decoder-only architectures weaken text conditioning over long utterances. T5Gemma-TTS uses encoder-decoder architecture to maintain persistent text conditioning by routing through a T5-style encoder, solving the positional competition problem. Link: arxiv.org/abs/2604.01760 Source: HuggingFace daily (Apr 3), arxiv cs.SD Why trending: Addresses a known failure mode of decoder-only TTS (fading text conditioning). Encoder-decoder is a natural fix that hasn’t been well-explored for codec LMs.

20. Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for VLA Models

Authors: Jiawei Chen, Simin Huang, Jiawei Du, Shuaihang Chen et al. Summary: First physically realizable adversarial attack on VLA (Vision-Language-Action) models via 3D object textures. While language perturbations and 2D visual attacks exist, 3D texture attacks are more representative of real deployment scenarios where robots interact with physical objects. Link: arxiv.org/abs/2604.01618 Source: HuggingFace daily (Apr 3, #14), arxiv cs.CV Why trending: VLA security is under-explored but critical as robots deploy in the real world. 3D adversarial textures are more realistic than patch attacks.


Honorable Mentions

  • UniDriveVLA (2604.02190) – Unified perception + action for autonomous driving
  • ActionParty (2604.02330) – Multi-subject action binding in generative video games
  • Omni123 (2604.02289) – 3D generation from limited data via unified text-to-2D/3D
  • FlowSlider (2604.02088) – Training-free continuous image editing
  • NearID (2604.01973) – Identity representation learning via near-identity distractors
  • Woosh (2604.01929) – Sound effects foundation model
  • Ask or Assume? (2603.26233) – Uncertainty-aware clarification in coding agents
  • MultiGen (2603.06679) – Level design for editable multiplayer worlds in diffusion game engines

Methodology

SourceURLSignalWhat Was Found
HuggingFace Daily (Apr 3)huggingface.co/api/daily_papers?date=2026-04-03New submissions today39 papers
HuggingFace Daily (Apr 2)huggingface.co/api/daily_papers?date=2026-04-02Yesterday’s papers, carryover30 papers
alphaxiv.orgalphaxiv.orgCross-platformGenerative World Renderer
arxivlensarxivlens.comAnalysisGenerative World Renderer
paperium.netpaperium.netCross-platformSKILL0
arxivexplained.comarxivexplained.comCross-platformSKILL0
YouTubeyoutube.comVideo explainersSKILL0, DataFlex
GitHubgithub.com/OpenDCAI/DataFlexOpen-source releaseDataFlex framework
ICLR 2026 Paper Listpapercopilot.comConference acceptanceSteerable Visual Representations
scirate.comscirate.comTop daily papersSteerable Visual Representations
ServiceNow Blogservicenow.com/blogsCompany releaseApriel-Reasoner (Apriel model family)
researchtrend.airesearchtrend.aiCross-platformApriel-Reasoner
dair-ai ML Papers of the Weekgithub.com/dair-ai/ML-Papers-of-the-WeekCurated listGeneral signal for weekly top papers
Web search (Reddit, X, HN)VariousSocial media buzzLimited direct paper-level results from search API
arxiv listingsarxiv.org/list/cs.*Raw submissionsAbstract retrieval and verification

Ranking criteria: Cross-platform presence > company blog/release > ICLR/conference acceptance > HF daily position > topic relevance.

*Generated by JarvisNext report: April 4, 2026 at 10:00 AM PT*