| # Open Deep Research | |
| Welcome to this open replication of [OpenAI's Deep Research](https://openai.com/index/introducing-deep-research/)! This agent attempts to replicate OpenAI's model and achieve similar performance on research tasks. | |
| Read more about this implementation's goal and methods in our [blog post](https://huggingface.co/blog/open-deep-research). | |
| This agent achieves **55% pass@1** on the GAIA validation set, compared to **67%** for the original Deep Research. | |
| ## Setup | |
| To get started, follow the steps below: | |
| ### Clone the repository | |
| ```bash | |
| git clone https://github.com/huggingface/smolagents.git | |
| cd smolagents/examples/open_deep_research | |
| ``` | |
| ### Install dependencies | |
| Run the following command to install the required dependencies from the `requirements.txt` file: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### Install the development version of `smolagents` | |
| ```bash | |
| pip install -e ../../.[dev] | |
| ``` | |
| ### Set up environment variables | |
| The agent uses the `GoogleSearchTool` for web search, which requires an environment variable with the corresponding API key, based on the selected provider: | |
| - `SERPAPI_API_KEY` for SerpApi: [Sign up here to get a key](https://serpapi.com/users/sign_up) | |
| - `SERPER_API_KEY` for Serper: [Sign up here to get a key](https://serper.dev/signup) | |
| Depending on the model you want to use, you may need to set environment variables. | |
| For example, to use the default `o1` model, you need to set the `OPENAI_API_KEY` environment variable. | |
| [Sign up here to get a key](https://platform.openai.com/signup). | |
| > [!WARNING] | |
| > The use of the default `o1` model is restricted to tier-3 access: https://help.openai.com/en/articles/10362446-api-access-to-o1-and-o3-mini | |
| ## Usage | |
| Then you're good to go! Run the run.py script, as in: | |
| ```bash | |
| python run.py --model-id "o1" "Your question here!" | |
| ``` | |
| ## Full reproducibility of results | |
| The data used in our submissions to GAIA was augmented in this way: | |
| - For each single-page .pdf or .xls file, it was opened in a file reader (MacOS Sonoma Numbers or Preview), and a ".png" screenshot was taken and added to the folder. | |
| - Then for any file used in a question, the file loading system checks if there is a ".png" extension version of the file, and loads it instead of the original if it exists. | |
| This process was done manually but could be automatized. | |
| After processing, the annotated was uploaded to a [new dataset](https://huggingface.co/datasets/smolagents/GAIA-annotated). You need to request access (granted instantly). |