Final_Assignment_Template3

Sleeping

App Files Files Community

Final_Assignment_Template3 / examples /open_deep_research /README.md

Duibonduil

Upload 7 files

68e0793 verified 4 months ago

preview code

raw

history blame

2.52 kB

	# Open Deep Research

	Welcome to this open replication of [OpenAI's Deep Research](https://openai.com/index/introducing-deep-research/)! This agent attempts to replicate OpenAI's model and achieve similar performance on research tasks.

	Read more about this implementation's goal and methods in our [blog post](https://huggingface.co/blog/open-deep-research).


	This agent achieves 55% pass@1 on the GAIA validation set, compared to 67% for the original Deep Research.

	## Setup

	To get started, follow the steps below:

	### Clone the repository

	```bash
	git clone https://github.com/huggingface/smolagents.git
	cd smolagents/examples/open_deep_research
	```

	### Install dependencies

	Run the following command to install the required dependencies from the `requirements.txt` file:

	```bash
	pip install -r requirements.txt
	```

	### Install the development version of `smolagents`

	```bash
	pip install -e ../../.[dev]
	```

	### Set up environment variables

	The agent uses the `GoogleSearchTool` for web search, which requires an environment variable with the corresponding API key, based on the selected provider:
	- `SERPAPI_API_KEY` for SerpApi: [Sign up here to get a key](https://serpapi.com/users/sign_up)
	- `SERPER_API_KEY` for Serper: [Sign up here to get a key](https://serper.dev/signup)

	Depending on the model you want to use, you may need to set environment variables.
	For example, to use the default `o1` model, you need to set the `OPENAI_API_KEY` environment variable.
	[Sign up here to get a key](https://platform.openai.com/signup).

	> [!WARNING]
	> The use of the default `o1` model is restricted to tier-3 access: https://help.openai.com/en/articles/10362446-api-access-to-o1-and-o3-mini


	## Usage

	Then you're good to go! Run the run.py script, as in:
	```bash
	python run.py --model-id "o1" "Your question here!"
	```

	## Full reproducibility of results

	The data used in our submissions to GAIA was augmented in this way:
	- For each single-page .pdf or .xls file, it was opened in a file reader (MacOS Sonoma Numbers or Preview), and a ".png" screenshot was taken and added to the folder.
	- Then for any file used in a question, the file loading system checks if there is a ".png" extension version of the file, and loads it instead of the original if it exists.

	This process was done manually but could be automatized.

	After processing, the annotated was uploaded to a [new dataset](https://huggingface.co/datasets/smolagents/GAIA-annotated). You need to request access (granted instantly).