Add BERTopic model

fa1e316 over 1 year ago

5.38 kB


	---
	tags:
	- bertopic
	library_name: bertopic
	pipeline_tag: text-classification
	---

	# short-arxiv-bertopic

	This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
	BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

	## Usage

	To use this model, please install BERTopic:

	```
	pip install -U bertopic
	```

	You can use the model as follows:

	```python
	from bertopic import BERTopic
	topic_model = BERTopic.load("etanios/short-arxiv-bertopic")

	topic_model.get_topic_info()
	```

	## Topic overview

	* Number of topics: 38
	* Number of training documents: 9999

	<details>
	<summary>Click here for an overview of all topics.</summary>

	\| Topic ID \| Topic Keywords \| Topic Frequency \| Label \|
	\|----------\|----------------\|-----------------\|-------\|
	\| -1 \| data - learning - model - based - algorithm \| 50 \| Machine Learning and Data Analysis \|
	\| 0 \| deep - networks - neural - training - network \| 2896 \| Advances in Deep Neural Networks for Computer Vision \|
	\| 1 \| neural - word - model - language - models \| 1036 \| Neural Language Models \|
	\| 2 \| regret - bandit - online - algorithm - problem \| 755 \| Optimization of regret in multi-armed bandit problems \|
	\| 3 \| policy - reinforcement - reinforcement learning - learning - control \| 552 \| Reinforcement Learning Policies \|
	\| 4 \| clustering - clusters - data - means - cluster \| 504 \| Clustering algorithms and techniques \|
	\| 5 \| classification - classifiers - class - classifier - ensemble \| 463 \| Machine Learning Ensembles \|
	\| 6 \| gradient - stochastic - convex - convergence - optimization \| 293 \| Optimization techniques for non-convex problems \|
	\| 7 \| learning - epsilon - distribution - complexity - bounds \| 271 \| Machine Learning and Complexity \|
	\| 8 \| matrix - rank - low rank - low - completion \| 257 \| Matrix Completion and Robust Matrix Completion \|
	\| 9 \| sparse - dictionary - signal - signals - sensing \| 218 \| Sparse Coding and Dictionary Learning \|
	\| 10 \| kernel - kernels - learning - kernel learning - mkl \| 185 \| Kernel Learning and Multiple Kernels \|
	\| 11 \| topic - topics - lda - model - topic models \| 173 \| Topic Modeling and LDA \|
	\| 12 \| bayesian - structure - bayesian networks - bayesian network - network \| 163 \| Structure learning of Bayesian networks \|
	\| 13 \| users - user - recommendation - items - collaborative \| 162 \| Recommendation Systems: Collaborative Filtering \|
	\| 14 \| inference - posterior - variational - mcmc - carlo \| 157 \| Bayesian Inference Techniques \|
	\| 15 \| feature - selection - feature selection - data - classification \| 145 \| Data Preparation for Cancer Classification \|
	\| 16 \| active - active learning - learning - optimization - bayesian optimization \| 144 \| Active Learning and Optimization \|
	\| 17 \| lasso - sparse - group - sparsity - regression \| 137 \| High-dimensional regression with sparsity \|
	\| 18 \| distributed - ml - communication - machine - data \| 126 \| Distributed Machine Learning \|
	\| 19 \| privacy - private - differential privacy - differential - differentially \| 117 \| Privacy and Data Mining \|
	\| 20 \| anomaly - detection - anomaly detection - data - anomalies \| 99 \| Anomaly Detection in Data Sets \|
	\| 21 \| ranking - rank - items - pairwise - comparisons \| 92 \| Ranking and Preference Learning \|
	\| 22 \| metric - metric learning - distance - learning - similarity \| 87 \| Metric Learning \|
	\| 23 \| svm - support - support vector - svms - vector \| 79 \| Efficient and Fast SVM Algorithms \|
	\| 24 \| hashing - hash - binary - codes - bit \| 76 \| Large-scale search and indexing using hashing methods \|
	\| 25 \| graph - graphs - nodes - relational - kernels \| 75 \| Graph-based Semi-supervised Learning \|
	\| 26 \| manifold - dimensional - data - manifold learning - embedding \| 74 \| Manifold Learning and Dimensionality Reduction \|
	\| 27 \| tensor - decomposition - tensors - rank - tensor decomposition \| 74 \| Tensor Decomposition and Rank \|
	\| 28 \| bethe - belief propagation - belief - propagation - bp \| 73 \| Inference in Graphical Models \|
	\| 29 \| image - semantic - images - visual - shot \| 65 \| Zero-shot learning for image recognition \|
	\| 30 \| gp - gaussian - gaussian process - process - covariance \| 65 \| Gaussian Processes for Large Data \|
	\| 31 \| domain - adaptation - domain adaptation - target - source \| 64 \| Domain Adaptation \|
	\| 32 \| crowdsourcing - workers - labels - crowd - worker \| 56 \| Crowdsourced Labeling and Task Assignment \|
	\| 33 \| causal - variables - data - discovery - cause \| 55 \| Causal Discovery and Inference \|
	\| 34 \| label - multi label - multi - labels - multi label classification \| 55 \| Multi-Label Classification \|
	\| 35 \| protein - proteins - prediction - structure - amino \| 54 \| Protein structure prediction and sequence analysis \|
	\| 36 \| nmf - nonnegative - matrix - nonnegative matrix - factorization \| 52 \| Nonnegative Matrix Factorization (NMF) \|

	</details>

	## Training hyperparameters

	* calculate_probabilities: False
	* language: None
	* low_memory: False
	* min_topic_size: 10
	* n_gram_range: (1, 1)
	* nr_topics: None
	* seed_topic_list: None
	* top_n_words: 10
	* verbose: True

	## Framework versions

	* Numpy: 1.23.5
	* HDBSCAN: 0.8.33
	* UMAP: 0.5.4
	* Pandas: 1.5.3
	* Scikit-Learn: 1.2.2
	* Sentence-transformers: 2.2.2
	* Transformers: 4.33.2
	* Numba: 0.56.4
	* Plotly: 5.15.0
	* Python: 3.10.12