README: Update it.
Browse files
README.md
CHANGED
@@ -7,6 +7,12 @@ library_name: sklearn
|
|
7 |
|
8 |
Prediction of aerobicity (whether an bacteria or archaeon is aerobic) based on gene copy numbers. The prediction problem is posed as a 2-class problem (the prediction is either aerobic or anaerobic).
|
9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
To apply the predictor, first setup the conda environment:
|
11 |
|
12 |
```
|
@@ -15,13 +21,14 @@ $ mamba env create -p env -f env-apply.yml
|
|
15 |
$ conda activate ./env
|
16 |
```
|
17 |
|
18 |
-
and download the eggNOG database (we use version 2.1.3
|
19 |
|
20 |
```
|
21 |
$ download_eggnog_data.py
|
22 |
```
|
23 |
|
24 |
-
|
|
|
25 |
|
26 |
```
|
27 |
$ ./17_apply_to_proteome.py --protein-fasta data/RS_GCF_000515355.1_protein.faa --eggnog-data-dir EGGNOG_DATA_DIR
|
|
|
7 |
|
8 |
Prediction of aerobicity (whether an bacteria or archaeon is aerobic) based on gene copy numbers. The prediction problem is posed as a 2-class problem (the prediction is either aerobic or anaerobic).
|
9 |
|
10 |
+
This predictor was used in this (currently pre-publication) manuscript, please cite it if appropriate:
|
11 |
+
|
12 |
+
Davin, Adrian A., Ben J. Woodcroft, Rochelle M. Soo, Ranjani Murali, Dominik Schrempf, James Clark, Bastien Boussau et al. "An evolutionary timescale for Bacteria calibrated using the Great Oxidation Event." bioRxiv (2023): 2023-08.
|
13 |
+
https://www.biorxiv.org/content/10.1101/2023.08.08.552427v1.full
|
14 |
+
|
15 |
+
## Installation
|
16 |
To apply the predictor, first setup the conda environment:
|
17 |
|
18 |
```
|
|
|
21 |
$ conda activate ./env
|
22 |
```
|
23 |
|
24 |
+
and download the eggNOG database (we use version 2.1.3, as specified in the `env-apply.yml` conda environment file)
|
25 |
|
26 |
```
|
27 |
$ download_eggnog_data.py
|
28 |
```
|
29 |
|
30 |
+
## Usage
|
31 |
+
To apply the predictor, run against a test genome, replacing `EGGNOG_DATA_DIR` with the path to the eggNOG data directory:
|
32 |
|
33 |
```
|
34 |
$ ./17_apply_to_proteome.py --protein-fasta data/RS_GCF_000515355.1_protein.faa --eggnog-data-dir EGGNOG_DATA_DIR
|