Daily AI Papers — April 14, 2026
Published:
1. WildDet3D: Scaling Promptable 3D Detection in the Wild
- Authors: (see arxiv)
- Link: arxiv.org/abs/2604.08626
- Summary: Tackles monocular 3D object detection—recovering extent, location, and orientation of objects from a single RGB image. Pushes toward open-world generalization beyond closed-set categories with promptable detection.
- Sources: HuggingFace (224↑ Apr 13), arxiv
- Why trending: Highest HF upvote count across both days; foundational spatial intelligence work with practical open-world applications.
2. QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation
- Authors: Ali Slim, Haydar Hamieh, Jawad Kotaich
- Link: arxiv.org/abs/2604.08570
- Summary: Introduces a unified benchmark spanning Qiskit, PennyLane, and Cirq with 42 aligned tasks to evaluate LLMs on quantum code generation, separating quantum reasoning from framework familiarity.
- Sources: HuggingFace (102↑), arxiv
- Why trending: Top paper on Apr 14 HF; bridges LLM capabilities with quantum computing—a hot intersection.
3. FORGE: Fine-grained Multimodal Evaluation for Manufacturing Scenarios
- Authors: (see arxiv)
- Link: arxiv.org/abs/2604.07413
- Summary: Evaluates MLLMs for manufacturing with fine-grained domain semantics, addressing data scarcity and the gap between perception and autonomous execution in industrial settings.
- Sources: HuggingFace (85↑ Apr 13), arxiv
- Why trending: Industrial AI applications are surging; this benchmark fills a critical evaluation gap.
4. The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping
- Authors: Yang Liu, Enxi Wang, Yufei Gao
- Link: arxiv.org/abs/2604.11297
- Summary: Addresses reduced sampling diversity in RL for LLMs where policies repeatedly generate similar erroneous behaviors. Proposes memory-enhanced reward shaping that explicitly discourages recurrent failure patterns across rollouts.
- Sources: HuggingFace (78↑), arxiv
- Why trending: Directly attacks a key failure mode in RLHF/reasoning RL—critical for frontier model training.
5. OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation
- Authors: Donghao Zhou, Guisheng Liu, Hao Yang
- Link: arxiv.org/abs/2604.11804
- Summary: Synthesizes human-object interaction videos conditioned on text, reference images, audio, and pose. Targets content creation automation for e-commerce and short video production.
- Sources: HuggingFace (54↑), arxiv
- Why trending: Multi-condition video generation is a frontier capability; practical e-commerce applications drive interest.
6. Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
- Authors: Zunhai Su, Hengyuan Zhang, Wei Wu
- Link: arxiv.org/abs/2604.10098
- Summary: Comprehensive survey on the “attention sink” phenomenon where disproportionate attention focuses on uninformative tokens. Covers utilization, interpretation, and mitigation strategies across Transformer architectures.
- Sources: HuggingFace (52↑), arxiv
- Why trending: Attention sink is a persistent architectural challenge; this survey consolidates scattered knowledge into one reference.
7. EXAONE 4.5 Technical Report
- Authors: LG AI Research
- Link: arxiv.org/abs/2604.08644
- Summary: First open-weight vision language model from LG AI Research. Integrates a visual encoder into EXAONE 4.0 for native multimodal pretraining over visual and textual modalities with large-scale data.
- Sources: HuggingFace (50↑ Apr 13), arxiv
- Why trending: New open-weight VLM release from a major lab; adds to the competitive open-model ecosystem.
8. Solving Physics Olympiad via Reinforcement Learning on Physics Simulators
- Authors: Mihir Prabhudesai, Aryan Satpathy, Yangmin Li
- Link: arxiv.org/abs/2604.11805
- Summary: Uses RL on physics simulators to solve Olympiad-level physics problems, addressing the bottleneck of limited internet QA pairs for training. Opens up simulator-grounded reasoning beyond math.
- Sources: HuggingFace (11↑), arxiv
- Why trending: Novel approach to reasoning RL beyond math; simulator-grounded training could scale to other scientific domains.
9. Strips as Tokens: Artist Mesh Generation with Native UV Segmentation
- Authors: Rui Xu, Dafei Qin, Kaichun Qiao
- Link: arxiv.org/abs/2604.09132
- Summary: Proposes treating mesh strips as tokens for autoregressive generation, producing artist-quality 3D meshes with native UV segmentation. Addresses inefficiencies in coordinate-based and patch-based approaches.
- Sources: HuggingFace (43↑), arxiv
- Why trending: 3D content generation is accelerating; this method bridges the gap between AI-generated and artist-standard meshes.
10. Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
- Authors: (see arxiv)
- Link: arxiv.org/abs/2604.08995
- Summary: Advances interactive world models with memory-enabled long-term temporal consistency and high-resolution real-time generation—a key step toward practical world simulators.
- Sources: HuggingFace (41↑ Apr 13), arxiv
- Why trending: World models are a hot research area; real-time + long-horizon memory is a breakthrough combination.
11. Uni-ViGU: Towards Unified Video Generation and Understanding via Diffusion-Based Video Generator
- Authors: Luozheng Qin, Jia Gong, Qian Qiao
- Link: arxiv.org/abs/2604.08121
- Summary: Inverts the conventional paradigm of extending understanding MLLMs to support generation. Instead builds from a diffusion-based video generator, addressing the computational cost imbalance between understanding and generation.
- Sources: HuggingFace (37↑), arxiv
- Why trending: Unified video understanding+generation is a frontier challenge; novel inverted architecture approach.
12. Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models
- Authors: Songlin Yang, Xianghao Kong, Anyi Rao
- Link: arxiv.org/abs/2604.10949
- Summary: Reveals that unified multimodal models fail to transfer LLM-like reasoning to image synthesis and exhibit divergent response behaviors. Terms this “pseudo-unification” and uses entropy probing to diagnose it.
- Sources: HuggingFace (35↑), arxiv
- Why trending: Critical analysis of whether “unified” models are truly unified—challenges a popular architectural assumption.
13. Introspective Diffusion Language Models
- Authors: Yifan Yu, Yuqing Jian, Junxiong Wang
- Link: arxiv.org/abs/2604.11035
- Summary: Identifies “introspective consistency” as the key gap between diffusion and autoregressive language models. Defines introspective acceptance rate and proposes methods to close the quality gap while preserving parallel generation benefits.
- Sources: HuggingFace (12↑), arxiv
- Why trending: Diffusion LMs are a promising alternative to autoregressive models; this paper pinpoints and addresses a fundamental limitation.
14. CodeTracer: Towards Traceable Agent States
- Authors: Han Li, Yifan Yao, Letian Zhu
- Link: arxiv.org/abs/2604.11641
- Summary: Addresses the debugging challenge of complex code agents with parallel tool calls and multi-stage workflows. Makes agent state transitions and error propagation observable and traceable.
- Sources: HuggingFace (27↑), arxiv
- Why trending: Agent debugging is a critical pain point as agentic systems scale; developer tooling for agents is underexplored.
15. CocoaBench: Evaluating Unified Digital Agents in the Wild
- Authors: CocoaBench Team, Shibo Hao, Zhining Zhang
- Link: arxiv.org/abs/2604.11201
- Summary: Benchmarks LLM agents across software engineering, deep research, GUI automation, and more in unified real-world settings, addressing the gap left by isolated capability evaluations.
- Sources: HuggingFace (26↑), arxiv
- Why trending: Agent benchmarking is crucial as unified agent scaffolds proliferate; diverse real-world evaluation fills a key gap.
16. Audio Flamingo Next: Next-Generation Open Audio-Language Models
- Authors: Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar
- Link: arxiv.org/abs/2604.10905
- Summary: Next-gen large audio-language model for speech, sound, and music understanding. Introduces a stronger foundational model with significant improvements over Audio Flamingo 3.
- Sources: HuggingFace (11↑), arxiv
- Why trending: Audio-language is an underserved modality; this represents state-of-the-art in open audio-LM capabilities.
17. From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for LLMs
- Authors: Chenchen Zhang
- Link: arxiv.org/abs/2604.09459
- Summary: Tackles credit assignment in RL for LLMs with sparse outcome-level rewards across two regimes: reasoning RL (distributing credit across tokens) and agentic RL (credit across multi-step trajectories).
- Sources: HuggingFace (7↑), arxiv
- Why trending: Credit assignment is a fundamental bottleneck in scaling RL for both reasoning and agent tasks.
18. Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks
- Authors: Yoonsang Lee, Howard Yen, Xi Ye
- Link: arxiv.org/abs/2604.11753
- Summary: Studies parallel test-time scaling for long-horizon agentic tasks like deep research, where multiple rollouts are generated in parallel and aggregated. Addresses unique challenges of long, multi-turn, tool-using trajectories.
- Sources: HuggingFace (5↑), arxiv
- Why trending: Parallel test-time compute for agents is a key scaling frontier; extends chain-of-thought scaling to agentic settings.
19. RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details
- Authors: (see arxiv)
- Link: arxiv.org/abs/2604.06870
- Summary: Region-specific image refinement that restores fine-grained details in user-specified regions (scribble masks, bounding boxes) while keeping non-edited pixels unchanged—addressing a persistent weakness in modern generation models.
- Sources: HuggingFace (37↑ Apr 13), arxiv
- Why trending: Practical editing capability; precise local control over generated images is highly demanded.
20. SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding
- Authors: Talor Abramovich, Maor Ashkenazi, Carl
- Link: arxiv.org/abs/2604.09557
- Summary: Provides a unified benchmark for evaluating speculative decoding techniques with diverse, representative workloads. Addresses limitations of existing benchmarks that suffer from narrow data coverage.
- Sources: HuggingFace (7↑), arxiv
- Why trending: Speculative decoding is critical for LLM inference speed; standardized evaluation enables fair comparison of acceleration techniques.
