RegularizedSelfPlay/sppo_forwardimportance10-0.01-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter3 Text Generation • Updated Jan 30 • 11
RegularizedSelfPlay/sppo_forwardimportance10-0.01-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter2 Text Generation • Updated Jan 30 • 12
RegularizedSelfPlay/sppo_forwardimportance10-0.01-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter1 Text Generation • Updated Jan 30 • 21
RegularizedSelfPlay/sppo_forwardimportance10-0.01-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter3 Text Generation • Updated Jan 30 • 11
RegularizedSelfPlay/sppo_forwardimportance10-0.01-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter2 Text Generation • Updated Jan 30 • 12
RegularizedSelfPlay/sppo_forwardimportance10-0.01-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter1 Text Generation • Updated Jan 30 • 21
RegularizedSelfPlay/sppo_forwardimportance10-0.1-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter2 Text Generation • Updated Jan 25 • 11
RegularizedSelfPlay/sppo_forwardimportance10-0.1-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter3 Text Generation • Updated Jan 25 • 12
RegularizedSelfPlay/sppo_reversekl-0.05-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter3 Text Generation • Updated Jan 25 • 15
RegularizedSelfPlay/sppo_forwardimportance10-0.1-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter2 Text Generation • Updated Jan 25 • 11
RegularizedSelfPlay/sppo_forward1reverse5-0.1-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter3 Text Generation • Updated Jan 25 • 12
RegularizedSelfPlay/sppo_forward1reverse5-0.1-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter2 Text Generation • Updated Jan 25 • 15
RegularizedSelfPlay/sppo_forwardimportance10-0.1-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter1 Text Generation • Updated Jan 25 • 16
RegularizedSelfPlay/sppo_reversekl-2-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter2 Text Generation • Updated Jan 25 • 15
RegularizedSelfPlay/sppo_reversekl-2-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter3 Text Generation • Updated Jan 25 • 16
RegularizedSelfPlay/sppo_reversekl-0.05-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter3 Text Generation • Updated Jan 25 • 15
RegularizedSelfPlay/sppo_forward1reverse5-0.1-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter3 Text Generation • Updated Jan 25 • 12
RegularizedSelfPlay/sppo_forward1reverse5-0.1-PromptABC-LLAMA-3-8B-Instruct-SPPO-Iter2 Text Generation • Updated Jan 25 • 15