Spaces:
Sleeping
Sleeping
| # Proxy & Security | |
| Configure proxy settings and enhance security features in Crawl4AI for reliable data extraction. | |
| ## Basic Proxy Setup | |
| Simple proxy configuration with `BrowserConfig`: | |
| ```python | |
| from crawl4ai.async_configs import BrowserConfig | |
| # Using proxy URL | |
| browser_config = BrowserConfig(proxy="http://proxy.example.com:8080") | |
| async with AsyncWebCrawler(config=browser_config) as crawler: | |
| result = await crawler.arun(url="https://example.com") | |
| # Using SOCKS proxy | |
| browser_config = BrowserConfig(proxy="socks5://proxy.example.com:1080") | |
| async with AsyncWebCrawler(config=browser_config) as crawler: | |
| result = await crawler.arun(url="https://example.com") | |
| ``` | |
| ## Authenticated Proxy | |
| Use an authenticated proxy with `BrowserConfig`: | |
| ```python | |
| from crawl4ai.async_configs import BrowserConfig | |
| proxy_config = { | |
| "server": "http://proxy.example.com:8080", | |
| "username": "user", | |
| "password": "pass" | |
| } | |
| browser_config = BrowserConfig(proxy_config=proxy_config) | |
| async with AsyncWebCrawler(config=browser_config) as crawler: | |
| result = await crawler.arun(url="https://example.com") | |
| ``` | |
| ## Rotating Proxies | |
| Example using a proxy rotation service and updating `BrowserConfig` dynamically: | |
| ```python | |
| from crawl4ai.async_configs import BrowserConfig | |
| async def get_next_proxy(): | |
| # Your proxy rotation logic here | |
| return {"server": "http://next.proxy.com:8080"} | |
| browser_config = BrowserConfig() | |
| async with AsyncWebCrawler(config=browser_config) as crawler: | |
| # Update proxy for each request | |
| for url in urls: | |
| proxy = await get_next_proxy() | |
| browser_config.proxy_config = proxy | |
| result = await crawler.arun(url=url, config=browser_config) | |
| ``` | |
| ## Custom Headers | |
| Add security-related headers via `BrowserConfig`: | |
| ```python | |
| from crawl4ai.async_configs import BrowserConfig | |
| headers = { | |
| "X-Forwarded-For": "203.0.113.195", | |
| "Accept-Language": "en-US,en;q=0.9", | |
| "Cache-Control": "no-cache", | |
| "Pragma": "no-cache" | |
| } | |
| browser_config = BrowserConfig(headers=headers) | |
| async with AsyncWebCrawler(config=browser_config) as crawler: | |
| result = await crawler.arun(url="https://example.com") | |
| ``` | |
| ## Combining with Magic Mode | |
| For maximum protection, combine proxy with Magic Mode via `CrawlerRunConfig` and `BrowserConfig`: | |
| ```python | |
| from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig | |
| browser_config = BrowserConfig( | |
| proxy="http://proxy.example.com:8080", | |
| headers={"Accept-Language": "en-US"} | |
| ) | |
| crawler_config = CrawlerRunConfig(magic=True) # Enable all anti-detection features | |
| async with AsyncWebCrawler(config=browser_config) as crawler: | |
| result = await crawler.arun(url="https://example.com", config=crawler_config) | |
| ``` | |