Hub Python Library documentation

Search the Hub

You are viewing v0.23.0.rc1 version. A newer version v1.0.0.rc7 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Search the Hub

In this tutorial, you will learn how to search models, datasets and spaces on the Hub using huggingface_hub.

How to list repositories ?

huggingface_hub library includes an HTTP client HfApi to interact with the Hub. Among other things, it can list models, datasets and spaces stored on the Hub:

>>> from huggingface_hub import HfApi
>>> api = HfApi()
>>> models = api.list_models()

The output of list_models() is an iterator over the models stored on the Hub.

Similarly, you can use list_datasets() to list datasets and list_spaces() to list Spaces.

How to filter repositories ?

Listing repositories is great but now you might want to filter your search. The list helpers have several attributes like:

  • filter
  • author
  • search

Two of these parameters are intuitive (author and search), but what about that filter? filter takes as input a ModelFilter object (or DatasetFilter). You can instantiate it by specifying which models you want to filter.

Let’s see an example to get all models on the Hub that does image classification, have been trained on the imagenet dataset and that runs with PyTorch. That can be done with a single ModelFilter. Attributes are combined as “logical AND”.

models = hf_api.list_models(
    filter=ModelFilter(
		task="image-classification",
		library="pytorch",
		trained_dataset="imagenet"
	)
)

While filtering, you can also sort the models and take only the top results. For example, the following example fetches the top 5 most downloaded datasets on the Hub:

>>> list(list_datasets(sort="downloads", direction=-1, limit=5))
[DatasetInfo(
	id='argilla/databricks-dolly-15k-curated-en',
	author='argilla',
	sha='4dcd1dedbe148307a833c931b21ca456a1fc4281',
	last_modified=datetime.datetime(2023, 10, 2, 12, 32, 53, tzinfo=datetime.timezone.utc),
	private=False,
	downloads=8889377,
	(...)

To explore available filter on the Hub, visit models and datasets pages in your browser, search for some parameters and look at the values in the URL.

< > Update on GitHub