Scilons Project

non-profit

scilons

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

malteos authored a paper 3 days ago

Tokenizer Choice For LLM Training: Negligible or Crucial?

malteos authored a paper 3 days ago

Towards an Open Platform for Legal Information

malteos authored a paper 3 days ago

Aspect-based Document Similarity for Research Papers

View all activity

scilons's activity

malteos

authored 10 papers 3 days ago

Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings

Paper • 2202.06671 • Published Feb 14, 2022 • 2

Specialized Document Embeddings for Aspect-based Similarity of Research Papers

Paper • 2203.14541 • Published Mar 28, 2022

Investigating Gender Bias in Turkish Language Models

Paper • 2404.11726 • Published Apr 17, 2024 • 1

Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning

Paper • 2301.09626 • Published Jan 23, 2023 • 2

Progress Report: Towards European LLMs

Paper • 2410.03730 • Published Sep 30, 2024 • 2

Data Processing for the OpenGPT-X Model Family

Paper • 2410.08800 • Published Oct 11, 2024 • 1

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published 5 days ago • 26

sobamchan

updated a model 3 months ago

scilons/roberta-base-512-110k-steps-texts_pq_3-deduped-Eng_Latn

Fill-Mask • Updated Nov 13, 2024 • 134

lfoppiano

updated 4 datasets 5 months ago

scilons/texts_pq_3-deduped-Eng_Latn

Viewer • Updated Oct 7, 2024 • 10M • 21

scilons/texts_pq_3-filtered-Eng_Latn

Viewer • Updated Oct 6, 2024 • 12M • 24

scilons/texts_pq_3

Viewer • Updated Oct 2, 2024 • 35.1M • 18

scilons/texts_pq_2.1

Viewer • Updated Oct 2, 2024 • 919k • 25

pjox

authored a paper 8 months ago

mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus

Paper • 2406.08707 • Published Jun 13, 2024 • 16

lfoppiano

authored 2 papers 10 months ago

Semi-automatic staging area for high-quality structured data extraction from scientific literature

Paper • 2309.10923 • Published Sep 19, 2023

Mining experimental data from Materials Science literature with Large Language Models: an evaluation study

Paper • 2401.11052 • Published Jan 19, 2024 • 1

domenicrosati

authored a paper about 1 year ago

Mixture of Soft Prompts for Controllable Data Generation

Paper • 2303.01580 • Published Mar 2, 2023 • 1

pjox

authored a paper about 1 year ago

CamemBERT: a Tasty French Language Model

Paper • 1911.03894 • Published Nov 10, 2019 • 3

AI & ML interests

Recent Activity

Team members 9

scilons's activity