Upload folder using huggingface_hub

Browse files

Files changed (8) hide show

.gitattributes +1 -0
README.md +79 -3
assets/labels.csv +0 -0
assets/perch_v2_ebird_classes.csv +0 -0
fingerprint.pb +3 -0
saved_model.pb +3 -0
variables/variables.data-00000-of-00001 +3 -0
variables/variables.index +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,79 @@
----
-license: apache-2.0
----

+---
+pretty_name: Perch
+license: apache-2.0
+tags:
+- audio
+- bird
+- nature
+- science
+- vocalization
+- bio
+- birds-classification
+- bioacoustics
+---
+# Perch Bird Vocalizations
+Perch is a bioacoustics model trained to classify nearly 15,000 species and generate audio embeddings that are useful for a variety of downstream applications (such as individual identification or estimating coral reef health). It has been used to detect critically endangered birds and power audio search engines.
+The current model (Perch 2.0) is an update to our original Perch model with improved embedding and prediction quality, as well as support for many new (non-avian) taxa. The model was trained on a combination of publicly available audio from Xeno-Canto, iNaturalist, Animal Sound Archive, and FSD50k: If you like this model, consider recording some interesting audio and contributing it to a public source!
+Perch makes predictions for most bird species as well as a variety of frogs, crickets, grasshoppers and mammals. But note that the output logits for species are uncalibrated and possibly unreliable for rare species, and we recommend that you use your own data to tune detection thresholds.
+The embeddings were trained with the goal of being linearly separable. For most cases training a simple linear classifier on top of the model’s outputs should work well. For most bioacoustics applications we recommend using an agile modelling (human-annotator-in-the-loop) workflow.
+### Model Quality
+The Perch 2.0 model was evaluated on a variety of tasks and domains: species classification in avian soundscapes, call type and dialect recognition, individual identification of dogs and bats, event detection in coral reefs, etc. It achieves state-of-the-art scores on bioacoustics benchmarks such as BirdSet and BEANS. See our paper for more details.
+### Model Description
+Perch 2.0’s embedding model is based on an EfficientNet-B3 architecture with approximately 12 million parameters. The species classification head adds an additional 91 million parameters (due to the large number of classes).
+The model outputs 1536-dimensional embeddings. It is also possible to retrieve the embeddings before spatial pooling. These have dimensions (5, 3, 1536).
+> **Note:** This version of the model requires **TensorFlow 2.20.rc0** and a **GPU**.
+> A CPU variant will be added soon.
+Perch 2.0’s embedding model is based on an **EfficientNet-B3** architecture with approximately **12 million parameters**.
+The species classification head adds an additional **91 million parameters** due to the large number of classes.
+---
+## Input
+- The model consumes **5-second segments** of audio sampled at **32 kHz**.
+- For audio with other sample rates, you can:
+  - Resample the audio.
+  - Apply pitch shifting (works well for bats in some cases).
+  - Feed the audio in its native sample rate as an array of **160,000 values**.
+---
+## Outputs
+The model produces the following outputs:
+1. **Spectrogram** computed from the input audio.
+2. **Embedding**: A 1536-dimensional vector.
+3. **Spatial Embedding**: Un-pooled embeddings with shape `(5, 3, 1536)`.
+4. **Logit Predictions** for ~15,000 classes (of which ~10,000 are birds).
+   - The predicted classes are detailed in [`assets/labels.csv`](assets/labels.csv) following the *iNaturalist* taxonomy.
+   - An additional set of conversions to **eBird** six-letter codes is provided in [`assets/perch_v2_ebird_classes.csv`](assets/perch_v2_ebird_classes.csv).
+## Example Use
+```python
+!pip install git+https://github.com/google-research/perch-hoplite.git
+!pip install tensorflow[and-cuda]~=2.20.0rc0
+from perch_hoplite.zoo import model_configs
+# Input: 5 seconds of silence as mono 32 kHz waveform samples.
+waveform = np.zeros(5 * 32000, dtype=np.float32)
+# Automatically downloads the model from Kaggle.
+model = model_configs.load_model_by_name('perch_v2')
+outputs = model.embed(waveform)
+# do something with outputs.embeddings and outputs.logits['label']
+```

assets/labels.csv ADDED Viewed

The diff for this file is too large to render. See raw diff

assets/perch_v2_ebird_classes.csv ADDED Viewed

The diff for this file is too large to render. See raw diff

fingerprint.pb ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:20274176be0d4f7009c4f5e6b2103519f5b961906218c6bf0f29fd465cf3d88d
+size 96

saved_model.pb ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d28faa13aa61eb369b9d8d66d483186da65b15d9220bb051c0003553ecae2766
+size 2701811

variables/variables.data-00000-of-00001 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:69571ece6a9229bd339af37f935821b9e3f53869298bd9ad97135a0c87efcea1
+size 407104092

variables/variables.index ADDED Viewed

Binary file (9.24 kB). View file