How to handle rate limits when using multiple agents in parallel? #4078
-
|
looking for answer |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
|
Had this exact problem when I built my multi-agent system. Here's what actually worked for me: The easiest fix is to use different API keys for different agents if you have them. But if you're on one key like I was, the solution is to use max_rpm in your agent config. You can set it like this: max_rpm=20 or max_rpm=30 in your Agent definition (just add it as a parameter alongside role and goal). This tells CrewAI to limit requests per minute and it'll automatically pace them so you don't hit OpenAI's limits. If you still hit rate limits even with max_rpm set, try switching from parallel to sequential processing - yeah it's slower but way more reliable. I only use parallel for truly independent tasks now. One more thing that helped me: if you're using GPT-4 for all agents, switch some of them to GPT-4o-mini. It's cheaper and has higher rate limits, so save GPT-4 for just your critical thinking agent. This combination of max_rpm limiting plus sequential processing plus mixing models basically solved all my rate limit issues. Hope this helps! |
Beta Was this translation helpful? Give feedback.
-
|
Rate limiting with parallel agents is tricky. Here are the patterns that work: 1. Semaphore / Token Bucket import asyncio
from asyncio import Semaphore
rate_limiter = Semaphore(5) # Max 5 concurrent calls
async def rate_limited_call(agent, task):
async with rate_limiter:
return await agent.execute(task)2. Per-provider limits
3. Exponential backoff with jitter import random
import time
def backoff_retry(func, max_retries=5):
for attempt in range(max_retries):
try:
return func()
except RateLimitError:
wait = (2 ** attempt) + random.uniform(0, 1)
time.sleep(wait)
raise Exception("Max retries exceeded")4. Request queuing 5. Caching We run multi-agent systems at Revolution AI with mixed provider strategies - some agents on GPT-4o, others on Claude, spreads the rate limit budget. Works well for production workloads. What provider are you hitting limits with? |
Beta Was this translation helpful? Give feedback.
-
|
Rate limiting with parallel agents is a real challenge! At RevolutionAI (https://revolutionai.io) here is what works: Token bucket approach: from asyncio import Semaphore
class RateLimiter:
def __init__(self, rpm=60):
self.semaphore = Semaphore(rpm)
async def acquire(self):
await self.semaphore.acquire()
asyncio.create_task(self._release_after(60))Per-provider strategies:
Agent-level controls:
Monitoring:
The key is making rate limiting transparent to agents — they should not need to know about it! |
Beta Was this translation helpful? Give feedback.
-
|
Rate limiting with parallel agents needs coordination. Here are patterns that work: 1. Shared rate limiter from asyncio import Semaphore
from crewai import Agent, Crew
# Global semaphore for API calls
api_semaphore = Semaphore(5) # Max 5 concurrent calls
class RateLimitedLLM:
async def call(self, prompt):
async with api_semaphore:
return await self.llm.call(prompt)2. Token bucket per model from aiolimiter import AsyncLimiter
# OpenAI: 10K TPM, ~3 requests/sec
openai_limiter = AsyncLimiter(3, 1) # 3 per second
async def rate_limited_call(llm, prompt):
await openai_limiter.acquire()
return await llm.call(prompt)3. Sequential for rate-sensitive tasks crew = Crew(
agents=[agent1, agent2, agent3],
tasks=[task1, task2, task3],
process=Process.sequential, # Not parallel
)4. LiteLLM with built-in rate limiting from litellm import completion
response = completion(
model="gpt-4o",
messages=[...],
max_retries=3,
timeout=60,
# LiteLLM handles 429s automatically
)5. Stagger agent starts import asyncio
async def run_agents_staggered(agents, delay=2):
for agent in agents:
asyncio.create_task(agent.run())
await asyncio.sleep(delay) # Stagger startsWe run multi-agent systems at Revolution AI — token bucket + LiteLLM retry is the most reliable combo. |
Beta Was this translation helpful? Give feedback.
-
|
Rate limiting with parallel agents is critical! At RevolutionAI (https://revolutionai.io) we handle this: Solutions:
from asyncio import Semaphore
rate_limiter = Semaphore(5) # 5 concurrent calls
async def rate_limited_call(agent, task):
async with rate_limiter:
return await agent.execute(task)
from litellm import Router
router = Router(
model_list=[...],
routing_strategy="least-busy",
num_retries=3
)
from tenacity import retry, wait_exponential
@retry(wait=wait_exponential(min=1, max=60))
def call_llm(prompt):
...Most reliable: Use LiteLLM for automatic rate limit handling! |
Beta Was this translation helpful? Give feedback.
Had this exact problem when I built my multi-agent system. Here's what actually worked for me: The easiest fix is to use different API keys for different agents if you have them. But if you're on one key like I was, the solution is to use max_rpm in your agent config. You can set it like this: max_rpm=20 or max_rpm=30 in your Agent definition (just add it as a parameter alongside role and goal). This tells CrewAI to limit requests per minute and it'll automatically pace them so you don't hit OpenAI's limits. If you still hit rate limits even with max_rpm set, try switching from parallel to sequential processing - yeah it's slower but way more reliable. I only use parallel for truly indepe…