Feature
Description
Cloudflare Bypass
Automatically handles Cloudflare protection using cloudscraper library
Multiple Transports
Supports both stdio and HTTP transport protocols
Content Cleaning
Converts HTML to clean, LLM-friendly Markdown format
Smart Chunking
Automatically splits large responses into 10k token chunks
Docker Support
Production-ready containerized deployment
Multiple Methods
Supports GET and POST HTTP methods
Binary Handling
Base64 encoding for non-text content
File Export
Save scraped content directly to disk
Tool
Return Type
Use Case
Chunking Support
File Output
scrape_url
String (content only)
Quick content retrieval for AI processing
Yes
No
scrape_url_raw
Dictionary (metadata + content)
Full response details with headers and timing
Yes
No
scrape_url_to_file
Dictionary (save confirmation)
Export content to workspace files
No
Yes
Parameter
Type
Required
Default
Description
url
string
Yes
-
Target URL to scrape
method
string
No
"GET"
HTTP method (GET or POST)
clean_content
boolean
No
true
Convert HTML to Markdown
continuation_token
string
No
null
Token for retrieving next chunk
scrape_url Response Fields
Field
Type
Description
Response
string
Page content with chunk instructions if applicable
Note: When content exceeds 10k tokens, response includes continuation instructions embedded in the text.
scrape_url_raw Response Fields
Field
Type
Always Present
Description
status_code
integer
Yes
HTTP response status code
headers
object
Yes
Response headers (hop-by-hop headers removed)
content
string
Yes
Page content or current chunk
content_type
string
Yes
MIME type of response
response_time
number
Yes
Request duration in seconds
chunked
boolean
When chunked
Indicates response was split
chunk_index
integer
When chunked
Current chunk number (1-based)
total_chunks
integer
When chunked
Total number of chunks
continuation_token
string
When more chunks
Token for next chunk retrieval
total_tokens
integer
When chunked
Total tokens in full response
message
string
When chunked
Human-readable chunk status
error
string
On failure
Error description
scrape_url_to_file Parameters
Parameter
Type
Required
Default
Description
url
string
Yes
-
Target URL to scrape
file_path
string
Yes
-
Path where content should be saved
method
string
No
"GET"
HTTP method (GET or POST)
clean_content
boolean
No
false
Convert HTML to Markdown before saving
overwrite
boolean
No
false
Replace file if it exists
scrape_url_to_file Response Fields
Field
Type
Always Present
Description
status_code
integer
Yes
HTTP response status code
headers
object
Yes
Response headers (hop-by-hop headers removed)
content_type
string
Yes
MIME type of saved content
response_time
number
Yes
Request duration in seconds
file_path
string
On success
Absolute path to saved file
bytes_written
integer
On success
Number of bytes written to disk
message
string
On success
Confirmation message
error
string
On failure
Error description
Requirement
Version
Purpose
Python
3.10+
Runtime environment
uv
Latest
Dependency management
Git
Any
Repository cloning
Clone the repository and install dependencies:
git clone https://github.com/yourusername/cloudscraper-mcp-server.git
cd cloudscraper-mcp-server
uv sync
Transport
Best For
Configuration
stdio
Claude Code, VSCode, Direct AI integration
Default mode, no environment variables needed
http
n8n, Web apps, API integrations, Remote access
Requires MCP_TRANSPORT=http
Variable
Default
Options
Description
MCP_TRANSPORT
stdio
stdio, http
Transport protocol selection
MCP_HOST
0.0.0.0
Any valid IP
Host binding for HTTP mode
MCP_PORT
8000
Any valid port
Port for HTTP mode
Running with Stdio Transport (Default)
Running with HTTP Transport
MCP_TRANSPORT=http MCP_HOST=0.0.0.0 MCP_PORT=8000 uv run server.py
claude mcp add cloudscraper-mcp \
--type stdio \
--command " uv" \
--args " run" " server.py" \
--directory " /path/to/cloudscraper-mcp-server"
{
"mcpServers" : {
"cloudscraper-mcp" : {
"type" : " stdio" ,
"command" : " uv" ,
"args" : [
" run" ,
" server.py"
],
"cwd" : " /path/to/cloudscraper-mcp-server"
}
}
}
For containerized deployment instructions, see DOCKER.md
Component
Technology
Purpose
Protocol
FastMCP 3.0+
Model Context Protocol implementation
Scraping
cloudscraper 1.2.71+
Cloudflare bypass engine
Compression
brotli 1.0.9+
Response decompression
Parsing
beautifulsoup4 4.10.0+
HTML parsing
Conversion
markdownify 0.11.6+
HTML to Markdown transformation
Tokenization
tiktoken 0.5.0+
Token counting for chunking
Testing
pytest 8.0+
Integration test suite
Feature
Value
Description
Max Tokens Per Chunk
10,000
Maximum tokens in a single response
Chunk Expiry
2 minutes
Cache lifetime for chunk retrieval
Token Encoding
cl100k_base
tiktoken encoding model
Continuation Pattern
chunk_id:index
Token format for sequential retrieval
Header
Value
Purpose
User-Agent
Chrome 120
Browser impersonation
Sec-Ch-Ua
Chrome/Chromium
Client hints
Sec-Fetch-*
cors/same-origin
Fetch metadata
Origin/Referer
Auto-generated
Request legitimacy
Made with CloudScraper and FastMCP