cgeorgiaw HF Staff commited on
Commit
c3c3c81
·
verified ·
1 Parent(s): bf89f04

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,79 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pretty_name: Perch
3
+ license: apache-2.0
4
+ tags:
5
+ - audio
6
+ - bird
7
+ - nature
8
+ - science
9
+ - vocalization
10
+ - bio
11
+ - birds-classification
12
+ - bioacoustics
13
+ ---
14
+
15
+ # Perch Bird Vocalizations
16
+
17
+ Perch is a bioacoustics model trained to classify nearly 15,000 species and generate audio embeddings that are useful for a variety of downstream applications (such as individual identification or estimating coral reef health). It has been used to detect critically endangered birds and power audio search engines.
18
+
19
+ The current model (Perch 2.0) is an update to our original Perch model with improved embedding and prediction quality, as well as support for many new (non-avian) taxa. The model was trained on a combination of publicly available audio from Xeno-Canto, iNaturalist, Animal Sound Archive, and FSD50k: If you like this model, consider recording some interesting audio and contributing it to a public source!
20
+
21
+ Perch makes predictions for most bird species as well as a variety of frogs, crickets, grasshoppers and mammals. But note that the output logits for species are uncalibrated and possibly unreliable for rare species, and we recommend that you use your own data to tune detection thresholds.
22
+
23
+ The embeddings were trained with the goal of being linearly separable. For most cases training a simple linear classifier on top of the model’s outputs should work well. For most bioacoustics applications we recommend using an agile modelling (human-annotator-in-the-loop) workflow.
24
+
25
+ ### Model Quality
26
+ The Perch 2.0 model was evaluated on a variety of tasks and domains: species classification in avian soundscapes, call type and dialect recognition, individual identification of dogs and bats, event detection in coral reefs, etc. It achieves state-of-the-art scores on bioacoustics benchmarks such as BirdSet and BEANS. See our paper for more details.
27
+
28
+ ### Model Description
29
+ Perch 2.0’s embedding model is based on an EfficientNet-B3 architecture with approximately 12 million parameters. The species classification head adds an additional 91 million parameters (due to the large number of classes).
30
+
31
+ The model outputs 1536-dimensional embeddings. It is also possible to retrieve the embeddings before spatial pooling. These have dimensions (5, 3, 1536).
32
+
33
+ > **Note:** This version of the model requires **TensorFlow 2.20.rc0** and a **GPU**.
34
+ > A CPU variant will be added soon.
35
+
36
+ Perch 2.0’s embedding model is based on an **EfficientNet-B3** architecture with approximately **12 million parameters**.
37
+ The species classification head adds an additional **91 million parameters** due to the large number of classes.
38
+
39
+ ---
40
+
41
+ ## Input
42
+
43
+ - The model consumes **5-second segments** of audio sampled at **32 kHz**.
44
+ - For audio with other sample rates, you can:
45
+ - Resample the audio.
46
+ - Apply pitch shifting (works well for bats in some cases).
47
+ - Feed the audio in its native sample rate as an array of **160,000 values**.
48
+
49
+ ---
50
+
51
+ ## Outputs
52
+
53
+ The model produces the following outputs:
54
+
55
+ 1. **Spectrogram** computed from the input audio.
56
+ 2. **Embedding**: A 1536-dimensional vector.
57
+ 3. **Spatial Embedding**: Un-pooled embeddings with shape `(5, 3, 1536)`.
58
+ 4. **Logit Predictions** for ~15,000 classes (of which ~10,000 are birds).
59
+ - The predicted classes are detailed in [`assets/labels.csv`](assets/labels.csv) following the *iNaturalist* taxonomy.
60
+ - An additional set of conversions to **eBird** six-letter codes is provided in [`assets/perch_v2_ebird_classes.csv`](assets/perch_v2_ebird_classes.csv).
61
+
62
+
63
+ ## Example Use
64
+
65
+ ```python
66
+ !pip install git+https://github.com/google-research/perch-hoplite.git
67
+ !pip install tensorflow[and-cuda]~=2.20.0rc0
68
+
69
+ from perch_hoplite.zoo import model_configs
70
+
71
+ # Input: 5 seconds of silence as mono 32 kHz waveform samples.
72
+ waveform = np.zeros(5 * 32000, dtype=np.float32)
73
+
74
+ # Automatically downloads the model from Kaggle.
75
+ model = model_configs.load_model_by_name('perch_v2')
76
+
77
+ outputs = model.embed(waveform)
78
+ # do something with outputs.embeddings and outputs.logits['label']
79
+ ```
assets/labels.csv ADDED
The diff for this file is too large to render. See raw diff
 
assets/perch_v2_ebird_classes.csv ADDED
The diff for this file is too large to render. See raw diff
 
fingerprint.pb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20274176be0d4f7009c4f5e6b2103519f5b961906218c6bf0f29fd465cf3d88d
3
+ size 96
saved_model.pb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d28faa13aa61eb369b9d8d66d483186da65b15d9220bb051c0003553ecae2766
3
+ size 2701811
variables/variables.data-00000-of-00001 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69571ece6a9229bd339af37f935821b9e3f53869298bd9ad97135a0c87efcea1
3
+ size 407104092
variables/variables.index ADDED
Binary file (9.24 kB). View file