Spaces:
Sleeping
Sleeping
title: Raccoon | |
emoji: 🦝 | |
colorFrom: blue | |
colorTo: indigo | |
sdk: streamlit | |
sdk_version: 1.2.0 | |
python_version: 3.9 | |
app_file: app.py | |
pinned: false | |
license: mit | |
# Raccoon | |
## Installation | |
It is recommend to use virtual environment using [`venv`](https://docs.python.org/3/library/venv.html). | |
The fol | |
- If using Apple Silicon install rust `curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh` and `brew install cmake` | |
- Create the virtual envoirnment: `python3 -m venv .venv` | |
- Activate the virtual envoirnment: `source .venv/bin/activate` | |
- To deactive the virtual envoirnment run `deactivate` within the virtual envoirnment. | |
- Install the required packages: `.venv/bin/pip install -r requirements.txt` | |
- `.venv/bin/pip install -e .` | |
- [Create a custom search engine in Google](https://programmablesearchengine.google.com/controlpanel/all). | |
- Create a API for the custom search engine. | |
- Add the custom search engine key and PI key to `.streamlit/secrets.toml`. | |
```toml | |
google_search_api_key = "api-key" | |
google_search_engine_id = "search-engine-id" | |
``` | |
- To start the interface: `streamlit run app.py` | |
### Todo | |
- [ ] Improve fetched content. | |
- [x] Fix issue of duplicate content extracted by beautifulsoup. | |
- [x] Exclude code from content | |
- [x] Find sentences that contain the search keywords. | |
- [ ] Find sentences that contain the search keywords taking into account different spellings health care vs healthcare. | |
- [ ] Get some content from every search result. | |
- [ ] Div's with text & tags. Extract text from tags and then decompose the tags. Keep order of content and no duplicates. | |
- [ ] Summarization requires truncation. Find solution where not needed. | |
- [ ] Support German content with language switcher. | |
- [ ] Improve queries to include more keywords (Expand abrivations & define context) | |
- [ ] Control the number of results from the UI. | |
- [ ] Control summary length via settings: https://docs.streamlit.io/library/advanced-features/session-state | |