Daily AI Papers — June 28, 2026
Published:
1. DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation
Authors: Hao Liang, Xiaochen Ma, Zhou Liu, Zhen Hao Wong, Zhengyang Zhao, Zimo Meng et al.
Summary: DataFlow introduces a principled, LLM-driven framework for building scalable and reproducible data preparation pipelines, addressing the fragmented state of ad-hoc scripts and loosely specified workflows that currently dominate the field. It provides unified abstractions for data generation, transformation, and quality filtering, with model-in-the-loop support to produce high-quality training data for LLMs at scale.
arxiv: arxiv.org/abs/2512.16676
Sources: HuggingFace Trending (224 upvotes), arxiv cs.AI
Why trending: Addresses a universal bottleneck in LLM training — reliable, reproducible data pipelines — gaining renewed attention as teams scale training and fine-tuning workflows heading into mid-2026.
2. GLM-5: from Vibe Coding to Agentic Engineering
Authors: GLM-5 Team, Aohan Zeng, Xin Lv, Zhenyu Hou, Zhengxiao Du et al.
Summary: GLM-5 is Zhipu AI’s next-generation foundation model designed to transition from “vibe coding” (informal, prompt-driven development) to rigorous agentic engineering, with significantly enhanced agentic, reasoning, and coding capabilities. The paper outlines the model’s architecture improvements, training methodology, and benchmarks showing state-of-the-art performance across code generation, planning, and multi-step tool use.
arxiv: arxiv.org/abs/2602.15763
Sources: HuggingFace Trending (189 upvotes), arxiv cs.CL
Why trending: A major model release from Zhipu AI explicitly framed around agentic capabilities, attracting community interest amid a surge in demand for capable, open-weight coding agents.
3. Unlimited OCR Works
Authors: Youyang Yin, Huanhuan Liu, YY, Qunyi Xie, Chaorun Liu et al.
Summary: This paper challenges the prevailing assumption that using a large LLM as the decoder in OCR systems is necessary for top performance, demonstrating that leveraging the LLM’s prior language distribution can actually introduce biases that hurt recognition accuracy on non-standard text. The authors propose “Unlimited OCR Works,” a framework that decouples the visual recognition from the language prior, achieving state-of-the-art results on diverse OCR benchmarks while significantly reducing model size.
arxiv: arxiv.org/abs/2606.23050
Sources: HuggingFace Trending (38 upvotes), HuggingFace Daily Papers, arxiv cs.CV
Why trending: Sparked community debate following DeepSeek OCR’s popularity; challenges assumptions about LLM-decoder architectures in OCR and offers a leaner, more accurate alternative.
4. Agent READMEs: An Empirical Study of Context Files for Agentic Coding
Authors: Worawalan Chatlatanagulchai, Hao Li, Yutaro Kashiwa, Brittany Reid, Kundjanasith Thonglek, Pattara Leelaprute
Summary: This paper presents the first large-scale empirical study of 2,303 agent context files — the “READMEs for agents” that provide persistent, project-level instructions to agentic coding tools like Claude Code and Copilot Workspace. The study reveals that current context files are highly heterogeneous in structure and quality, and provides a taxonomy and best practices that significantly improve agent task completion rates.
arxiv: arxiv.org/abs/2511.12884
Sources: HuggingFace Trending (28 upvotes), arxiv cs.SE
Why trending: Highly practical and directly relevant to developers using agentic coding tools; provides empirical grounding for a practice that millions of developers perform daily without systematic guidance.
5. Geometric Context Transformer for Streaming 3D Reconstruction
Authors: Lin-Zhuo Chen, Jian Gao, Yihang Chen, Ka Leong Cheng, Yipengjing Sun
Summary: The paper introduces a Geometric Context Transformer (GCT) for streaming 3D reconstruction that recovers camera poses and point clouds from video streams with geometric accuracy, temporal consistency, and real-time efficiency. Inspired by SLAM principles, GCT encodes a compact geometric context that propagates across frames and enables robust reconstruction under challenging conditions like fast motion and dynamic scenes.
arxiv: arxiv.org/abs/2604.14141
Sources: HuggingFace Trending (21 upvotes), arxiv cs.CV
Why trending: Real-time 3D reconstruction is foundational for robotics and AR/VR; the streaming-capable design addresses a gap left by NeRF/3DGS methods that require batch processing.
6. Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models
Authors: Glenn Jocher, Jing Qiu, Mengyu Liu, Shuai Lyu, Fatih Cagatay Akyon et al.
Summary: YOLO26 is Ultralytics’ latest generation of real-time vision models that eliminates non-maximum suppression at inference and adopts a streamlined head design, enabling true end-to-end deployment across diverse hardware with shorter training schedules and higher accuracy. The unified architecture supports detection, segmentation, pose estimation, and classification in a single model family with significantly improved throughput over prior YOLO versions.
arxiv: arxiv.org/abs/2606.03748
Sources: HuggingFace Trending (15 upvotes), HuggingFace Daily Papers, arxiv cs.CV
Why trending: The YOLO family is the most deployed real-time vision framework globally; each new release sees immediate community adoption and benchmarking, with practitioners eager to evaluate production-readiness.
7. EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery
Authors: Yougang Lyu, Xi Zhang, Xinhao Yi, Yuyue Zhao, Shuyu Guo et al.
Summary: EvoScientist proposes an evolving multi-agent framework for automated scientific discovery, where specialized agents (idea generation, experimental design, execution, analysis) iteratively refine their roles through experience, gradually improving the quality of research output without human intervention. Evaluated on real scientific tasks, EvoScientist demonstrates that inter-agent evolution and feedback loops substantially outperform static multi-agent pipelines.
arxiv: arxiv.org/abs/2603.08127
Sources: HuggingFace Trending (15 upvotes), arxiv cs.AI
Why trending: AI-for-science is a high-priority area across major labs; EvoScientist’s evolutionary framing provides a novel alternative to static agent orchestration frameworks.
8. Zep: A Temporal Knowledge Graph Architecture for Agent Memory
Authors: Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, Daniel Chalef
Summary: Zep introduces a memory layer service for AI agents built on a temporal knowledge graph, where facts are organized with time-aware edges that enable agents to reason about when information was learned, updated, or invalidated. It outperforms MemGPT on the Deep Memory Retrieval benchmark and handles enterprise-scale memory requirements that existing RAG-based approaches fail to address.
arxiv: arxiv.org/abs/2501.13956
Sources: HuggingFace Trending (13 upvotes), arxiv cs.AI
Why trending: As production agent deployments grow, the need for scalable, enterprise-grade memory infrastructure is becoming critical; Zep’s temporal KG approach fills a gap that flat vector stores and simple retrieval systems cannot.
9. olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
Authors: Jake Poznanski, Jon Borchardt, Jason Dunkelberger, Regan Huff, Daniel Lin, Aman Rangapur
Summary: olmOCR is an open-source Python toolkit for processing diverse PDF documents — including multi-column layouts, tables, math, and scientific figures — into high-quality text for LLM training data. Built on VLMs, it significantly improves extraction fidelity over traditional PDF parsers and is designed to unlock the trillions of tokens locked in the world’s PDF corpus for training.
arxiv: arxiv.org/abs/2502.18443
Sources: HuggingFace Trending (12 upvotes), arxiv cs.CL
Why trending: Data quality and quantity remain the primary bottleneck for LLM training; open-source, production-grade PDF extraction tools are in high demand across both academia and industry.
10. AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets
Authors: Tianyu Fan, Yuhao Yang, Yangqin Jiang, Yifei Zhang, Yuxuan Chen, Chao Huang
Summary: AI-Trader introduces a live benchmark for evaluating LLM-based autonomous trading agents in fully dynamic, real-time financial markets — a setting that requires rapid information integration, multi-step planning, and adaptive decision-making under uncertainty. The benchmark reveals significant gaps between current LLM agents and expert human traders, providing a rigorous testbed that existing static financial NLP benchmarks cannot replicate.
arxiv: arxiv.org/abs/2512.10971
Sources: HuggingFace Trending (10 upvotes), arxiv cs.AI
Why trending: Autonomous agents in finance is a growing deployment area; real-time, live benchmarking provides credibility that simulated benchmarks lack, attracting both researchers and practitioners.
11. EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning
Authors: Chuanrui Hu, Xingze Gao, Zuyi Zhou, Dannong Xu, Yi Bai
Summary: EverMemOS reframes agent memory as a full operating-system abstraction — with memory processes, scheduling, eviction policies, and namespace isolation — to enable structured, long-horizon reasoning that persists across many sessions and tasks. Unlike simple retrieval-augmented approaches, EverMemOS dynamically organizes and consolidates memory to maintain coherence over extended agent lifetimes, outperforming prior systems on long-horizon QA and planning tasks.
arxiv: arxiv.org/abs/2601.02163
Sources: HuggingFace Trending (9 upvotes), arxiv cs.AI
Why trending: The agent memory problem is one of the most active unsolved challenges in AI; the OS-level framing is a novel and compelling design philosophy that resonates with the infrastructure-minded segment of the community.
12. LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference
Authors: Yuhan Liu, Yihua Cheng, Jiayi Yao, Yuwei An, Xiaokun Chen, Shaoting Feng
Summary: LMCache is a KV cache management layer that extends GPU memory for LLM inference by enabling cross-request KV cache reuse across different queries and inference engines — addressing the rapid growth of cached state that GPU memory alone cannot absorb. The system integrates with popular serving frameworks and demonstrates substantial latency reductions and cost savings in production enterprise deployments.
arxiv: arxiv.org/abs/2510.09665
Sources: HuggingFace Trending (8 upvotes), arxiv cs.LG
Why trending: KV cache management is a top-of-mind infrastructure challenge for LLM serving at scale; production-validated results from real enterprise deployments make this immediately actionable for AI infra teams.
13. Lift4D: Harmonizing Single-View 3D Estimation for 4D Reconstruction In-the-Wild
Authors: Yehonathan Litman, Xiaoxuan Ma, Manan Shah, Nicolas Ugrinovic, Kris Kitani
Summary: Lift4D addresses the challenge of reconstructing dynamic non-rigid 4D scenes from monocular video by integrating visual cues with data-driven geometric priors, achieving more accurate and temporally coherent reconstructions than either pure learning-based or pure optimization-based methods. The approach works in-the-wild without requiring multi-view inputs or controlled settings, making it practical for real-world video understanding and content creation.
arxiv: arxiv.org/abs/2606.23688
Sources: HuggingFace Daily Papers (4 upvotes), arxiv cs.CV
Why trending: 4D reconstruction from monocular video is a hard, impactful problem at the intersection of video generation and 3D understanding; growing interest from the video AI community.
14. COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami
Authors: Tom Zahavy, Shaobo Hou, Thomas Tumiel, James Doran, Francesco Faccio et al.
Summary: COrigami presents an AI pipeline that co-designs origami patterns satisfying two simultaneous constraints: strict geometric flat-foldability (so the design can actually be folded) and visual recognizability (so the result looks like the intended object). The system combines generative models with combinatorial constraint satisfaction, demonstrating that AI can operate effectively in creative domains where solutions must satisfy hard physical constraints alongside subjective aesthetic goals.
arxiv: arxiv.org/abs/2606.26299
Sources: HuggingFace Daily Papers (4 upvotes), arxiv cs.AI
Why trending: A striking demonstration of AI in creative-physical design; the paper’s results and visuals are compelling enough to generate broader social media traction beyond the typical ML audience.
15. When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents
Authors: Kaiyue Yang, Yuyan Bu, Jingwei Yi, Yuchi Wang
Summary: As LLM agents autonomously select tools, they consistently choose over-privileged options even when lower-privilege alternatives would suffice — a safety-relevant behavior this paper systematically characterizes for the first time. The study finds that this over-privilege bias is pervasive across model families and tool categories, and proposes a lightweight intervention to guide agents toward least-privilege tool selection.
arxiv: arxiv.org/abs/2606.20023
Sources: HuggingFace Daily Papers (4 upvotes), arxiv cs.AI
Why trending: Privilege escalation and least-privilege principles are foundational to system security; applying this lens to LLM agent behavior surfaces a practically important safety issue that is gaining attention from both security researchers and AI safety teams.
16. PrivacyAlign: Contextual Privacy Alignment for LLM Agents
Authors: Manveer Singh Tamber, Abhay Puri, Marc-Etienne Brunet, Perouz Taslakian
Summary: PrivacyAlign frames agent privacy as a contextual judgment problem: every message, tool call, or post an agent makes is a decision about what information is appropriate to share given the context, and current agents make these decisions inconsistently with user expectations. The paper introduces a contextual privacy alignment framework and benchmark that evaluates whether agents can correctly infer user privacy preferences across diverse real-world scenarios.
arxiv: arxiv.org/abs/2606.21710
Sources: HuggingFace Daily Papers (3 upvotes), arxiv cs.AI
Why trending: Trust and privacy are critical blockers to enterprise adoption of AI agents; the alignment framing connects privacy to the broader RLHF/alignment literature, making it accessible and actionable.
17. What Intermediate Layers Know: Detecting Jailbreaks from Entropy Dynamics
Authors: Sofiia Nikolenko, Michele Papucci, Mina Rezaei, Shireen Kudukkil Manchingal
Summary: This paper shows that jailbreak attacks produce distinctive entropy dynamics in the intermediate hidden layers of LLMs — patterns that are not visible at the input or output level — providing a reliable signal for detection before a harmful response is generated. The proposed entropy-based detection method operates without access to the model’s full probability distribution, is computationally lightweight, and generalizes across diverse jailbreak attack types.
arxiv: arxiv.org/abs/2606.25182
Sources: HuggingFace Daily Papers (3 upvotes), arxiv cs.CR
Why trending: Jailbreak detection is an active and high-stakes problem; the mechanistic, interpretability-driven approach offers a novel angle compared to purely empirical detection methods and is attracting safety researchers.
18. Do Thinking Tokens Help with Safety?
Authors: Narutatsu Ri, Abhishek Panigrahi, Sanjeev Arora
Summary: Reasoning models with extended thinking tokens are widely assumed to be safer because deliberative reasoning should help models consider whether planned actions are appropriate before responding, but this paper provides the first systematic empirical study of whether that assumption holds. The findings are nuanced: thinking tokens improve safety on some threat categories but can actually worsen it on others, suggesting that deliberation alone is insufficient for robust alignment.
arxiv: arxiv.org/abs/2606.25013
Sources: HuggingFace Daily Papers (1 upvote), arxiv cs.AI
Why trending: Reasoning models (o1, Claude 3.5, Gemini Thinking) are now widely deployed; whether their extended chain-of-thought genuinely improves safety is an urgent empirical question for the AI safety community.
19. Plans Don’t Persist: Why Context Management Is Load Bearing for LLM Agents
Authors: Aman Mehta, Anupam Datta
Summary: This paper identifies context management — the compression, summarization, and eviction of tokens to fit within finite context windows — as a critical failure mode for long-horizon agents, particularly because plans written early in a task are the most likely to be dropped. The authors demonstrate that when planning-critical information is evicted, agent performance degrades sharply in a way that is difficult to detect from task metrics alone, and propose monitoring strategies to detect and mitigate context attrition.
arxiv: arxiv.org/abs/2606.22953
Sources: HuggingFace Daily Papers (1 upvote), arxiv cs.AI
Why trending: Context window limitations are a practical daily challenge for users of long-horizon agents; this paper gives a rigorous characterization of a failure mode that practitioners have observed empirically but lacked a principled framework to address.
20. Forecasting Future Behavior as a Learning Task
Authors: Mosh Levy, Yoav Goldberg, Asa Cooper Stickland
Summary: This paper proposes “behavioral forecasting” — predicting how a large reasoning model will behave on new inputs — as a learnable task, enabling models to develop meta-level self-knowledge about their own capabilities and failure modes. The approach is shown to improve calibration, facilitate selective abstention, and provide a new angle on AI transparency by making model behavior more predictable to external observers.
arxiv: arxiv.org/abs/2606.11445
Sources: HuggingFace Daily Papers (1 upvote), arxiv cs.AI
Why trending: Predictability and calibration of LLM behavior are foundational concerns for deployment trust; framing behavioral forecasting as a learnable task opens a new and practical research direction at the intersection of interpretability and reliability.
