Daily AI Papers — April 18, 2026
Published:
1. Sema Code: Decoupling AI Coding Agents into Programmable, Embeddable Infrastructure
- Authors: Huacan Wang, Jie Zhou, Ningyan Zhu, Shuo Zhang, Feiyu Chen et al.
- Summary: Introduces Sema Code, a framework that decouples AI coding agent reasoning from delivery form (CLI, IDE, web app), enabling enterprises to reuse agent capabilities across heterogeneous engineering environments. Addresses the systemic barrier of locked-in AI agent delivery formats that prevent cross-platform integration.
- Link: arxiv/2604.11045
- Sources found: HuggingFace (23 upvotes), arxiv
- Why trending: Tackles a key pain point in enterprise AI adoption — agent reusability across platforms. High community engagement reflects growing interest in modular agent infrastructure.
2. SemaClaw: A Step Towards General-Purpose Personal AI Agents through Harness Engineering
- Authors: Ningyan Zhu, Huacan Wang, Jie Zhou, Feiyu Chen, Shuo Zhang et al.
- Summary: Analyzes the rise of OpenClaw-style personal AI agents and introduces harness engineering as a paradigm for building general-purpose personal agents. Identifies the inflection point where AI engineering evolves from prompt engineering to harness-level system design.
- Link: arxiv/2604.11548
- Sources found: HuggingFace (18 upvotes), arxiv
- Why trending: Directly relevant to the personal AI agent wave kicked off by OpenClaw. Harness engineering as a concept is gaining traction in the agent systems community.
3. Free Geometry: Refining 3D Reconstruction from Longer Versions of Itself
- Authors: Yuhang Dai, Xingyi Yang
- Summary: Proposes a framework that enables feed-forward 3D reconstruction models to self-refine at test time by generating longer versions of their own predictions. Addresses the rigidity of zero-shot inference in current reconstruction models under occlusions, specularities, and ambiguous cues.
- Link: arxiv/2604.14048
- Sources found: HuggingFace (15 upvotes), arxiv
- Why trending: Elegant self-refinement approach for 3D reconstruction without additional training data. Feed-forward 3D methods are a hot area in computer vision.
4. LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling
- Authors: Yuxin Chen, Chumeng Liang, Hangke Sui, Ruihan Guo, Chaoran Cheng, Jiaxuan You, Ge Liu
- Summary: Closes the gap between continuous diffusion language models (DLMs) and discrete autoregressive counterparts. LangFlow is the first continuous DLM to rival discrete models in language modeling quality, challenging the assumption that continuous diffusion is inferior for text.
- Link: arxiv/2604.11748
- Sources found: HuggingFace (14 upvotes), arxiv
- Why trending: Major milestone for diffusion-based language modeling. If continuous diffusion can match discrete methods, it unlocks controllable, few-step text generation like in image diffusion.
5. TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration
- Authors: Zerun Ma, Guoqiang Wang, Xinchen Xie, Yicheng Chen, He Du et al.
- Summary: Introduces a multi-agent system that automates the entire LLM training lifecycle through tree-based exploration. Orchestrates collaboration between Researcher and Executor modules to handle data preparation, training configuration, and evaluation automatically.
- Link: arxiv/2604.14116
- Sources found: HuggingFace (12 upvotes), arxiv
- Why trending: End-to-end automation of LLM fine-tuning is a high-demand capability. Agentic approaches to ML workflows resonate strongly with practitioners seeking to reduce manual iteration.
6. TIP: Token Importance in On-Policy Distillation
- Authors: Yuanda Xu, Hejian Sang, Zhengze Zhou, Ran He, Zhipeng Wang, Alborz Geramifard
- Summary: Identifies that not all token positions carry equal learning signal in on-policy knowledge distillation — informative tokens come from regions of high student uncertainty and high teacher confidence. Provides a principled answer to which tokens matter most in OPD.
- Link: arxiv/2604.14084
- Sources found: HuggingFace (11 upvotes), arxiv
- Why trending: Distillation efficiency is critical for deploying smaller models. The token-level importance insight has immediate practical applications for model compression.
7. ROSE: An Intent-Centered Evaluation Metric for NL2SQL
- Authors: Wenqi Pei, Shizheng Hou, Boyan Li, Han Chen, Zhichao Shi, Yuyu Luo
- Summary: Argues that Execution Accuracy (EX), the standard NL2SQL metric, is increasingly unreliable — sensitive to syntax variation, blind to question ambiguity, and misled by erroneous ground-truth SQL. Introduces ROSE, an intent-centered metric that focuses on semantic correctness.
- Link: arxiv/2604.12988
- Sources found: HuggingFace (11 upvotes), arxiv
- Why trending: NL2SQL is a booming application area for LLMs. Better evaluation metrics are urgently needed as production deployments scale.
8. UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding
- Authors: Fei Tang, Bofan Chen, Zhengxi Lu, Tongbo Chen, Songqin Nong et al.
- Summary: Proposes uncertainty-driven adaptive zoom-in for GUI element localization, applying higher-resolution cropping only when the model is uncertain rather than uniformly across all instances. Improves grounding of small icons and dense layouts in screenshots.
- Link: arxiv/2604.14113
- Sources found: HuggingFace (10 upvotes), arxiv
- Why trending: GUI agents are a rapidly growing application domain. Efficient visual grounding is a key bottleneck for deploying screen-understanding agents at scale.
9. Anthropogenic Regional Adaptation in Multimodal Vision-Language Models
- Authors: Samuel Cahyawijaya, Peerat Limkonchotiwat, Tack Hwa Wong, Hitesh Laxmichand Patel et al.
- Summary: Introduces a novel paradigm for assessing human-centric alignment in vision-language systems, addressing the gap in region-specific cultural and contextual adaptation. Provides a framework for evaluating how well VLMs understand human-related visual content across different regions.
- Link: arxiv/2604.11490
- Sources found: HuggingFace (9 upvotes), arxiv
- Why trending: Cultural and regional bias in VLMs is an increasingly important topic as these models deploy globally. First dedicated framework for this evaluation.
10. ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video
- Authors: Boyuan Wang, Xiaofeng Wang, Yongkang Li, Zheng Zhu et al.
- Summary: Proposes the first feedforward framework that jointly reconstructs geometry, appearance, and physical attributes of non-rigid objects from a single video. Eliminates the need for expensive per-scene optimization or manual annotation that limits existing approaches.
- Link: arxiv/2604.07882
- Sources found: HuggingFace (9 upvotes), arxiv
- Why trending: Physically plausible reconstruction from monocular video is a long-standing challenge. The feedforward approach makes it practical for real-world applications.
11. SkVM: Revisiting Language VM for Skills across Heterogeneous LLMs and Harnesses
- Authors: Le Chen, Erhu Feng, Yubin Xia, Haibo Chen
- Summary: Analyzes 118,000 LLM agent skills and draws inspiration from virtual machines to create a portable execution layer for agent skills across diverse platforms. Addresses the fragility of skills behaving inconsistently when moved between different agent systems.
- Link: arxiv/2604.03088
- Sources found: HuggingFace (8 upvotes), arxiv
- Why trending: Skill portability is a practical blocker for the agent ecosystem. The VM abstraction for LLM skills is a novel systems-level approach.
12. Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision
- Authors: Yinghui He, Simran Kaur, Adithya Bhaskar, Yongjin Yang, Jiarui Liu et al.
- Summary: Bridges the gap between sparse-reward RL and dense-supervision distillation for post-training. Shows that self-revision can convert binary rewards into dense token-level supervision without needing an external teacher or curated demonstrations.
- Link: arxiv/2604.12002
- Sources found: HuggingFace (7 upvotes), arxiv
- Why trending: Combines the best of RLVR and distillation without their downsides. From the Danqi Chen group at Princeton, indicating strong research lineage.
13. MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments
- Authors: Han Wang, David Wan, Hyunji Lee, Thinh Pham, Mikaela Cankosyan et al.
- Summary: Introduces a human-annotated benchmark for evaluating search-augmented agents on multi-hop queries against noisy, multimodal, and conflicting web results. Measures agents’ ability to identify relevant modalities, synthesize evidence, and handle real-world search noise.
- Link: arxiv/2604.13418
- Sources found: HuggingFace (6 upvotes), arxiv
- Why trending: Real-world search-augmented generation faces noisy, conflicting evidence — this benchmark directly addresses that gap. From UNC-Chapel Hill (Mohit Bansal’s group).
14. UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization
- Authors: Zhengxi Lu, Fei Tang, Guangyi Liu, Kaitao Song, Xu Tan et al.
- Summary: Presents a collaborative framework for long-horizon GUI automation that offloads tasks beyond the MLLM’s intrinsic capabilities (memory degradation, progress confusion, math hallucination) to specialized tools. Advances the state of the art in multi-step GUI agent tasks.
- Link: arxiv/2604.13822
- Sources found: HuggingFace (6 upvotes), arxiv
- Why trending: Long-horizon GUI automation is a key frontier for computer-use agents. Tool integration for compensating MLLM weaknesses is a practical and timely approach.
15. ArcDeck: Narrative-Driven Paper-to-Slide Generation
- Authors: Tarik Can Ozden, Sachidanand VS, Furkan Horoz, Ozgur Kara, Junho Kim, James Matthew Rehg
- Summary: Introduces a multi-agent framework that formulates paper-to-slide generation as structured narrative reconstruction rather than direct summarization. Parses the input paper’s logical flow via discourse trees and global commitment documents to preserve coherence.
- Link: arxiv/2604.11969
- Sources found: HuggingFace (6 upvotes), arxiv
- Why trending: Practical productivity tool with broad appeal. The discourse-tree approach to maintaining narrative coherence is architecturally novel.
16. OneHOI: Unifying Human-Object Interaction Generation and Editing
- Authors: Jiun Tian Hoe, Weipeng Hu, Xudong Jiang, Yap-Peng Tan, Chee Seng Chan
- Summary: Unifies HOI generation and editing into a single framework that handles mixed conditions including HOI triplets and object-only entities. Bridges the gap between two previously disjoint research families in human-object interaction modeling.
- Link: arxiv/2604.14062
- Sources found: HuggingFace (6 upvotes), arxiv
- Why trending: Human-object interaction modeling has applications in AR/VR, robotics, and content creation. Unification of generation and editing simplifies the pipeline.
17. d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
- Authors: Siyan Zhao, Devaansh Gupta, Qinqing Zheng, Aditya Grover
- Summary: Demonstrates that diffusion-based large language models (dLLMs) can achieve strong reasoning capabilities through online RL, challenging the assumption that reasoning is limited to autoregressive generation. Shows that coarse-to-fine generation can be competitive with left-to-right reasoning.
- Link: arxiv/2504.12216
- Sources found: arxiv, web search
- Why trending: Connects two hot threads — diffusion LLMs and reasoning via RL. Proof that non-autoregressive models can reason opens new architectural possibilities.
18. Self-Sovereign Agent
- Authors: Wenjie Qu, Xuandong Zhao, Jiaheng Zhang, Dawn Song
- Summary: Investigates AI agents that can economically sustain and extend their own operation without human involvement. Examines the shift from developer-controlled tools to autonomous digital actors with economic self-sufficiency.
- Link: arxiv/2604.08551
- Sources found: HuggingFace (5 upvotes), arxiv
- Why trending: Provocative research direction from Dawn Song’s group at Berkeley. Self-sustaining agents raise both exciting possibilities and important safety questions.
19. MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
- Authors: Yan Li, Zezi Zeng, Yifan Yang, Yuqing Yang, Ning Liao et al.
- Summary: Addresses the challenge of integrating AIGC tools into automated webpage generation, where direct integration leads to style inconsistency and poor global coherence. Proposes a hierarchical multimodal agent that coordinates visual element generation for coherent web design.
- Link: arxiv/2604.15309
- Sources found: HuggingFace (5 upvotes), arxiv
- Why trending: Web design automation with AIGC is a practical application with commercial potential. Hierarchical coordination solves a real coherence problem.
20. Three-Phase Transformer
- Authors: Mohammad R. Abu Ayyash
- Summary: Introduces a structural prior for decoder-only Transformers that partitions the hidden vector into cyclic channels with phase-respecting operations including 2D Givens rotations. Provides a novel architectural primitive for the standard SwiGLU + RMSNorm + RoPE + GQA backbone.
- Link: arxiv/2604.14430
- Sources found: HuggingFace (3 upvotes), arxiv
- Why trending: Novel Transformer architecture modification with an intriguing mathematical foundation. Architecture innovations attract attention from researchers pushing efficiency frontiers.
Report generated 2026-04-18 10:00 PDT. Primary source: HuggingFace Daily Papers API (April 16-17 data combined for weekend coverage). Cross-referenced with arxiv, web search, and AI news aggregators. Reddit and Hacker News APIs were unavailable at fetch time.
