Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2311.09204

Code LMs Evaluation

A Survey on Language Models for Code

Paper • 2311.07989 • Published Nov 14, 2023 • 22
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Paper • 2310.06770 • Published Oct 10, 2023 • 4
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Paper • 2401.03065 • Published Jan 5, 2024 • 11
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming

Paper • 2402.14261 • Published Feb 22, 2024 • 10

Fusion-Eval: Integrating Evaluators with LLMs

Paper • 2311.09204 • Published Nov 15, 2023 • 6
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer

Paper • 2311.06720 • Published Nov 12, 2023 • 8
Safurai 001: New Qualitative Approach for Code LLM Evaluation

Paper • 2309.11385 • Published Sep 20, 2023 • 2
Assessment of Pre-Trained Models Across Languages and Grammars

Paper • 2309.11165 • Published Sep 20, 2023 • 1

Weekly reading selection

Contrastive Chain-of-Thought Prompting

Paper • 2311.09277 • Published Nov 15, 2023 • 35
Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying

Paper • 2311.09578 • Published Nov 16, 2023 • 15
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation

Paper • 2311.08877 • Published Nov 15, 2023 • 7
Fusion-Eval: Integrating Evaluators with LLMs

Paper • 2311.09204 • Published Nov 15, 2023 • 6

logical reasoning with llms

Language Models can be Logical Solvers

Paper • 2311.06158 • Published Nov 10, 2023 • 19
Fusion-Eval: Integrating Evaluators with LLMs

Paper • 2311.09204 • Published Nov 15, 2023 • 6
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation

Paper • 2311.08877 • Published Nov 15, 2023 • 7
Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?

Paper • 2311.07587 • Published Nov 8, 2023 • 4

Fusion-Eval: Integrating Evaluators with LLMs

Paper • 2311.09204 • Published Nov 15, 2023 • 6

Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 54
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 76
Calibrating LLM-Based Evaluator

Paper • 2309.13308 • Published Sep 23, 2023 • 12
Fusion-Eval: Integrating Evaluators with LLMs

Paper • 2311.09204 • Published Nov 15, 2023 • 6

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs