Daily AI Papers — April 26, 2026

11 minute read

Published:

1. RAG-Anything: All-in-One RAG Framework

  • Authors: Zirui Guo, Xubin Ren, Lingrui Xu, Jiahao Zhang, Chao Huang, et al.
  • arxiv: arxiv.org/abs/2510.12323
  • Sources: Papers With Code (#3 trending), arXiv cs.IR
  • Summary: Proposes a unified RAG framework that ingests heterogeneous knowledge — text, tables, images, code, KGs — through a single multimodal indexing+retrieval pipeline, eliminating the patchwork of modality-specific retrievers most production stacks ship today. Reports SOTA on multimodal QA benchmarks while keeping the API surface to a single query() call.
  • Why trending: Production RAG fragmentation is the loudest pain point in the agentic-app space right now, and “all-in-one” is exactly what infra teams want to ship.

2. Kronos: A Foundation Model for the Language of Financial Markets

  • Authors: Yu Shi, Zongliang Fu, Shuo Chen, Bohan Zhao, Wei Xu
  • arxiv: arxiv.org/abs/2508.02739
  • Sources: Papers With Code (#2 trending), arXiv cs.LG
  • Summary: Builds a transformer foundation model pre-trained on a 12B-token corpus of financial K-line (candlestick) data spanning 45 global exchanges, with a tokenizer adapted to OHLCV quintuples. Beats supervised baselines on price-direction, volatility, and regime-shift forecasting in zero-shot settings.
  • Why trending: First credible “GPT for markets” with permissive code release; quant Twitter and HN both lit up over it this week.

3. PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

  • Authors: Cheng Cui, Ting Sun, Suyin Liang, Tingquan Gao, Zelun Zhang, et al.
  • arxiv: arxiv.org/abs/2510.14528
  • Sources: Papers With Code (#12 trending), arXiv cs.CV
  • Summary: A 0.9B-parameter document VLM combining a NaViT-style dynamic-resolution encoder with ERNIE-4.5-0.3B; covers 100+ languages and beats much larger models on layout, table, and formula extraction. Released with permissive license and runs on a single consumer GPU.
  • Why trending: Tiny VLMs that match or beat 7B+ document models are eating the OCR-VL leaderboard; immediately useful for invoice/PDF pipelines.

4. MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

  • Authors: Junbo Niu, Zheng Liu, Zhuangcheng Gu, Bin Wang, Linke Ouyang, et al.
  • arxiv: arxiv.org/abs/2509.22186
  • Sources: Papers With Code (#11 trending), arXiv cs.CV
  • Summary: A 1.2B doc-parsing VLM that decouples global layout analysis from local content recognition via a coarse-to-fine two-stage pipeline, reaching SOTA accuracy at a fraction of the FLOPs of monolithic VLMs. Open weights and a CLI ship together.
  • Why trending: Direct rival to PaddleOCR-VL; the two are framing a “tiny doc-VLM” arms race that’s blowing up across PWC and HF spaces.

5. VibeVoice Technical Report

  • Authors: Zhiliang Peng, Jianwei Yu, Wenhui Wang, Yaoyao Chang, Yutao Sun, et al.
  • arxiv: arxiv.org/abs/2508.19205
  • Sources: Papers With Code (#6 trending), arXiv cs.SD
  • Summary: Long-form, multi-speaker speech synthesis using next-token diffusion — autoregressively generating continuous latent vectors via a diffusion head — to produce hours-scale podcasts and dialogues with consistent voices. Introduces a new continuous tokenizer that compresses speech 80× while preserving prosody.
  • Why trending: Next-token-diffusion is the hottest unification trick of the season; voice clone communities are already remixing the released checkpoints.

6. Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

  • Authors: Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, Deshraj Yadav
  • arxiv: arxiv.org/abs/2504.19413
  • Sources: Papers With Code (#17 trending), arXiv cs.AI
  • Summary: Mem0 is a memory-centric agent architecture that extracts, consolidates, and retrieves persistent facts across sessions, sidestepping LLM context-window limits. Reports 26% lower latency and 91% lower token cost vs OpenAI Memory on the LOCOMO benchmark while beating quality.
  • Why trending: “Agent memory” went from niche to table-stakes this quarter; the open-source repo just crossed major star milestones, drawing fresh attention.

7. Recursive Language Models

  • Authors: Alex L. Zhang, Tim Kraska, Omar Khattab
  • arxiv: arxiv.org/abs/2512.24601
  • Sources: Papers With Code (#10 trending), arXiv cs.CL
  • Summary: Treats long prompts as an external environment and lets the LLM programmatically peek, slice, and recurse on them — effectively turning context handling into an inference-time tool-use loop. Outperforms long-context models on multi-document reasoning at a fraction of the cost.
  • Why trending: Reframes the long-context debate as an algorithmic one rather than an architectural one; the Khattab/DSPy crowd is championing it.

8. SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

  • Authors: Ahmed Nassar, Andres Marafioti, Matteo Omenetti, Maksym Lysak, Nikolaos Livathinos
  • arxiv: arxiv.org/abs/2503.11576
  • Sources: Papers With Code (#19 trending), arXiv cs.CV, HuggingFace ecosystem
  • Summary: A sub-billion-parameter VLM that converts arbitrary document pages into a new DocTags markup capturing layout, tables, formulas, and figures in one decode pass. Designed by IBM Research and HF together, with easy fine-tuning recipes.
  • Why trending: Trio of tiny doc-VLMs (this + PaddleOCR-VL + MinerU2.5) is reshaping the doc-AI stack; SmolDocling is the most HF-native of the three.

9. OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

  • Authors: Qinglin Zhang, Luyao Cheng, Chong Deng, Qian Chen, Wen Wang
  • arxiv: arxiv.org/abs/2410.17799
  • Sources: Papers With Code (#23 trending), arXiv cs.CL
  • Summary: A full-duplex voice GPT that handles overlapping speech and natural turn-taking by flattening speech and text streams into a single autoregressive sequence. Hits human-like sub-200ms latency without a separate VAD or ASR module.
  • Why trending: Full-duplex voice agents are the next interface frontier — and this is the cleanest fully-end-to-end recipe so far.

10. TradingAgents: Multi-Agents LLM Financial Trading Framework

  • Authors: Yijia Xiao, Edward Sun, Di Luo, Wei Wang
  • arxiv: arxiv.org/abs/2412.20138
  • Sources: Papers With Code (#5 trending), arXiv cs.AI
  • Summary: Decomposes algorithmic trading into role-played LLM agents (analyst, researcher, trader, risk manager) that debate, vote, and refine positions, mirroring a real trading-desk workflow. Reports significant alpha over single-agent and rule-based baselines on equity and crypto backtests.
  • Why trending: Pairs naturally with Kronos (#2) — the “AI fund” narrative is back, and these two papers are the canonical references this week.

11. LightRAG: Simple and Fast Retrieval-Augmented Generation

  • Authors: Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, Chao Huang
  • arxiv: arxiv.org/abs/2410.05779
  • Sources: Papers With Code (#18 trending), arXiv cs.IR
  • Summary: Combines a graph-structured knowledge index with dual-level retrieval (entity + topic) to deliver GraphRAG-quality answers at a fraction of the indexing cost. Same lab as RAG-Anything (#1); LightRAG is the lean cousin focused on speed and incremental updates.
  • Why trending: Most production teams want GraphRAG quality without the GraphRAG bill — LightRAG is the pragmatic answer that keeps trending months later.

12. A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications

  • Authors: Siyuan Mu, Sen Lin
  • arxiv: arxiv.org/abs/2503.07137
  • Sources: Papers With Code (#13 trending), arXiv cs.LG
  • Summary: A 100+-page sweep of MoE — routing algorithms, load-balancing losses, system-side primitives, theoretical convergence/expressivity results, and a taxonomy of every major MoE LLM/VLM/diffusion variant shipped to date. The current go-to MoE reference.
  • Why trending: With every frontier release going MoE (Qwen3-Omni, DeepSeek-V3.x, etc.), this survey is the cliff-notes the community keeps reaching for.

13. A decoder-only foundation model for time-series forecasting (TimesFM)

  • Authors: Abhimanyu Das, Weihao Kong, Rajat Sen, Yichen Zhou
  • arxiv: arxiv.org/abs/2310.10688
  • Sources: Papers With Code (#15 trending), arXiv cs.LG
  • Summary: Google Research’s decoder-only forecasting foundation model trained on 100B+ time points, with patched-tokens and a long-context decoder. Zero-shot accuracy approaches supervised SOTA across Monash, Darts, and ETT benchmarks.
  • Why trending: Time-series foundation models are having their “GPT moment” — TimesFM keeps re-emerging on PWC as benchmark refreshes drop.

14. OpenHands: An Open Platform for AI Software Developers as Generalist Agents

  • Authors: Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, et al.
  • arxiv: arxiv.org/abs/2407.16741
  • Sources: Papers With Code (#16 trending), arXiv cs.SE
  • Summary: Open-source platform (formerly OpenDevin) where coding agents browse, edit, run code, and call tools through a sandboxed runtime, with a unified evaluation harness across SWE-bench, WebArena, GAIA. Now the de facto reference impl for SWE agents.
  • Why trending: With every frontier lab claiming SWE-bench wins, OpenHands is the open scaffold researchers actually compare against.

15. AutoDev: Automated AI-Driven Development

  • Authors: Michele Tufano, Anisha Agarwal, Jinu Jang, Roshanak Zilouchian Moghaddam, Neel Sundaresan
  • arxiv: arxiv.org/abs/2403.08299
  • Sources: Papers With Code (#14 trending), arXiv cs.SE
  • Summary: Microsoft’s framework where the AI gets full IDE primitives — file edits, build, test, git, terminal — under a permission model that mirrors human dev workflows. Demonstrates strong autonomous task completion on enterprise repos.
  • Why trending: As Copilot Workspace and Cursor-style agents proliferate, AutoDev is the academic write-up the industry keeps citing.

16. LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

  • Authors: Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, Zheyan Luo
  • arxiv: arxiv.org/abs/2403.13372
  • Sources: Papers With Code (#25 trending), arXiv cs.CL
  • Summary: A unified UI+CLI for fine-tuning 100+ open LLMs with LoRA/QLoRA/DPO/PPO/RM, plus a no-code WebUI for non-engineers. Has become the most-starred Chinese OSS LLM project and the default training stack for many Asia-based labs.
  • Why trending: With Qwen3.5, Yi, and DeepSeek series releases this week, LlamaFactory’s ranking jumped because everyone is fine-tuning something.

17. GigaWorld-Policy: An Efficient Action-Centered World–Action Model

  • Authors: Angen Ye, Boyuan Wang, Chaojun Ni, Guan Huang, Guosheng Zhao, et al.
  • arxiv: arxiv.org/abs/2603.17240
  • Sources: Papers With Code (#31 trending), arXiv cs.RO
  • Summary: Decouples future-frame prediction from policy generation in a video-pretrained World-Action Model, slashing rollout cost while keeping the world-modeling generalization gain. Beats prior WAMs on real-robot manipulation suites.
  • Why trending: World-action models are the robotics community’s hot frontier; “decoupled” recipes are the way the field is converging.

18. LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

  • Authors: Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, Randall Balestriero
  • arxiv: arxiv.org/abs/2603.19312
  • Sources: Papers With Code (#24 trending), arXiv cs.LG
  • Summary: A pixel-to-latent JEPA that trains end-to-end without EMA targets, frozen encoders, or auxiliary losses — clean enough that representation collapse is provably avoided. LeCun’s lab leaning hard into the “world-model from raw pixels” thesis.
  • Why trending: Yann LeCun byline + a clean JEPA recipe = guaranteed Twitter weekend.

19. DFlash: Block Diffusion for Flash Speculative Decoding

  • Authors: Jian Chen, Yesheng Liang, Zhijian Liu
  • arxiv: arxiv.org/abs/2602.06036
  • Sources: Papers With Code (#20 trending), arXiv cs.LG
  • Summary: Replaces the autoregressive draft model in speculative decoding with a tiny block-diffusion drafter that proposes K tokens in parallel via a few denoising steps, lifting acceptance rates and total wall-clock speedup. Plug-in to any HF or vLLM stack.
  • Why trending: Speculative-decoding remains the lowest-hanging fruit for serving cost — diffusion drafters are this year’s twist on the recipe.

20. Temporally Extended Mixture-of-Experts Models

  • Authors: Zeyu Shen, Peter Henderson
  • arxiv: arxiv.org/abs/2604.20156
  • Sources: HuggingFace daily papers, arXiv cs.LG
  • Summary: Argues that token-level expert switching causes pathological cache churn once a model outgrows GPU memory, and proposes a temporal-MoE that holds an expert active across spans of tokens. Recovers training+inference throughput without quality regressions.
  • Why trending: Hits the exact failure mode every team scaling MoE past single-node has run into; Henderson’s name brings the policy/efficiency crowd along.

Key Themes Today

  • Doc-AI tiny-VLM arms race — PaddleOCR-VL, MinerU2.5, and SmolDocling are converging on the sub-1.5B “good enough” frontier.
  • Voice goes end-to-end — VibeVoice (synthesis) + OmniFlatten (full-duplex dialog) signal the death of cascaded ASR→LLM→TTS pipelines.
  • Finance LLM moment — Kronos + TradingAgents is the “AI quant fund” pairing the week converged on.
  • MoE everywhere — survey, temporal-MoE, and downstream MoE references reflect that every frontier release is now sparse.
  • RAG matures — RAG-Anything and LightRAG (same lab) frame the future of retrieval as multimodal-graph by default.