paper_collect - a xlbqc Collection

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published 8 days ago • 167

Note benchmark AI 研究Agent的benchmark

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published 8 days ago • 92

Note Benchmark LLM 研究生多学科测试，当前deepseek 60%成功

How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Paper • 2502.14502 • Published 8 days ago • 80

Note 微调,LoRA 通过在微调数据中混合一定比例的一直知识，可以提升微调效果

PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

Paper • 2502.14282 • Published 8 days ago • 17

Note GUI-Agent,PC-Agent 1、感知增强APM,通过pywinauto/OCR来提升感知效果--->(类ARIA、A11y 和图像打框） 2、提出一个分层plann框架指令、子任务、行动

From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

Paper • 2502.14802 • Published 8 days ago • 11

Note RAG,Memory,KG 通过离线构建知识图谱，在线搜索的时候过滤搜索知识图谱的三元组（先重排、过滤）在图谱搜索的时候引入PPR算法（给每个节点添加个性化内容）

LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Paper • 2502.15007 • Published 8 days ago • 151

Note LLM,Context,模型机制研究 LLM 及 context 提示词中的冠词、停用词和标点符号对模型具有重大影响，且通常承载最高信息量，去除这些元素会对性能产生很大影响。去除停用词和标点会损失 8%的性能。——这难道不是语义表达不清楚所导致的吗？感觉是这样。文章主要提供了一套用于研究 LLM 内部机制的工具，仅以停用词和冠词的去除来展示该工具。