Merge customer data from multiple sources into enriched profiles using parallel Render Workflows.
This demo showcases:
- Parallel processing: 10 shards processed simultaneously
- Multi-source merge: CRM + Billing + Product + Support → Enriched profiles
- High throughput: 400K records processed in seconds
- Both Python and TypeScript: Identical implementations in both languages
┌─────────────────────────────────────────────────────────────────────┐
│ FRONTEND (Next.js) │
│ UI - Trigger & Monitor │
└─────────────────────────────────────────────────────────────────────┘
│
┌───────────────┴───────────────┐
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────┐
│ Python API │ │ TypeScript API │
│ (FastAPI) │ │ (Fastify) │
└─────────────────────────┘ └─────────────────────────┘
│ │
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────┐
│ Python Workflow │ │ TypeScript Workflow │
│ (render_sdk) │ │ (@renderinc/sdk) │
└─────────────────────────┘ └─────────────────────────┘
│ │
└───────────────┬───────────────┘
▼
┌─────────────────────────────────────────────────────────────────────┐
│ SAMPLE DATA │
│ crm.csv │ billing.csv │ product.csv │ support.csv (100K each) │
└─────────────────────────────────────────────────────────────────────┘
The workflow uses hash-based sharding to ensure deterministic routing:
- Load: Read all 4 CSV source files
- Route: Hash each
customer_idto assign records to 10 shards - Process: Spawn 10 parallel subtasks (one per shard)
- Merge: Each shard merges its customers' data from all sources
- Enrich: Calculate health_score, churn_risk, expansion_potential
- Aggregate: Combine all shard results into final output
customer_id → hash(customer_id) % 10 → shard_id
Same customer always routes to the same shard across all files.
- Python 3.11+
- Node.js 20+
- Render CLI 2.11.0+ (
brew install renderon macOS) - Render account with Workflows access
-
Generate sample data:
cd scripts python generate_data.py --rows 1000 # Small dataset for testing # python generate_data.py --rows 100000 # Full 100K dataset
-
Start the local workflow server (pick Python or TypeScript):
The Render CLI runs a local task server on port 8120:
Python:
cd python/workflows pip install -r requirements.txt render workflows dev -- python main.pyTypeScript:
cd typescript/workflows npm install render workflows dev -- npx tsx src/main.tsVerify tasks registered:
render workflows list --local
-
Start the matching API (pick one):
Set
RENDER_USE_LOCAL_DEV=trueso the API triggers the local workflow server instead of Render's API:Python (default, runs on http://localhost:8001):
cd python/api pip install -r requirements.txt RENDER_USE_LOCAL_DEV=true python main.pyTypeScript (runs on http://localhost:8002):
cd typescript/api npm install RENDER_USE_LOCAL_DEV=true npm run devIf using the TypeScript API, also set
NEXT_PUBLIC_API_URL=http://localhost:8002before starting the frontend. -
Start the frontend:
cd frontend npm install npm run dev # Runs on http://localhost:3000
-
Open http://localhost:3000 and click Run Workflow.
| Variable | Default | Used by |
|---|---|---|
RENDER_API_KEY |
(required for deployed services) | API services |
RENDER_USE_LOCAL_DEV |
false |
API services (set true for local dev) |
WORKFLOW_SLUG |
data-processor-workflows-py / data-processor-workflows-ts |
API services |
DATA_DIR |
../../sample_data |
Workflow services |
NEXT_PUBLIC_API_URL |
http://localhost:8001 |
Frontend |
The Blueprint (render.yaml) deploys the frontend and the Python API by default. If you prefer TypeScript, edit render.yaml to uncomment the TypeScript API and comment out the Python one (see the instructions in the file).
Or manually:
- Push this repo to GitHub/GitLab
- In Render Dashboard: New → Blueprint
- Connect your repo and deploy
Workflows are not yet supported in Blueprints. Create them manually:
- In Render Dashboard: New → Workflow
- Connect your repo
- Settings:
- Name:
data-processor-workflows-py - Root Directory:
python/workflows - Build Command:
pip install -r requirements.txt - Start Command:
python main.py
- Name:
- Deploy
- In Render Dashboard: New → Workflow
- Connect your repo
- Settings:
- Name:
data-processor-workflows-ts - Root Directory:
typescript/workflows - Build Command:
npm install && npm run build - Start Command:
npm start
- Name:
- Deploy
On each API service, set:
RENDER_API_KEY: Your Render API key (create at Dashboard → Account → API Keys)WORKFLOW_SLUG: The workflow service name (the API appends/merge_customer_dataautomatically), e.g.:- Python:
data-processor-workflows-py - TypeScript:
data-processor-workflows-ts
- Python:
/
├── frontend/ # Next.js brutalist UI
│ ├── app/
│ │ ├── page.tsx # Main demo page
│ │ └── how-it-works/
│ │ └── page.tsx # Workflow visualizer
│ ├── components/
│ │ ├── WorkflowTrigger.tsx # Run button
│ │ ├── EventLog.tsx # Terminal-style log
│ │ ├── DataPreview.tsx # Before/after view
│ │ └── ResultsSummary.tsx # Stats and shard timings
│ └── lib/
│ ├── api.ts # API client
│ └── workflow-config.ts # Visualizer config
│
├── python/
│ ├── api/ # FastAPI service
│ │ └── main.py # Trigger endpoints
│ └── workflows/ # Render Workflow
│ ├── main.py # Task definitions
│ ├── sharding.py # Hash-based routing
│ └── enrichment.py # Score calculations
│
├── typescript/
│ ├── api/ # Fastify service
│ │ └── src/index.ts # Trigger endpoints
│ └── workflows/ # Render Workflow
│ └── src/
│ ├── main.ts # Task definitions
│ ├── sharding.ts # Hash-based routing
│ └── enrichment.ts # Score calculations
│
├── sample_data/ # Generated CSVs
├── scripts/
│ └── generate_data.py # Data generator
│
├── render.yaml # Blueprint (frontend + APIs)
└── README.md
crm.csv
customer_id,email,company_name,industry,employee_count,deal_stage,deal_value,sales_owner,last_contactbilling.csv
customer_id,email,plan,mrr,payment_status,subscription_start,last_paymentproduct.csv
customer_id,email,signup_date,last_active,total_sessions,features_used,usage_pct,account_statussupport.csv
customer_id,email,total_tickets,open_tickets,avg_resolution_hrs,last_ticket_date,nps_score,csat_scoreAll fields merged, plus calculated fields:
health_score: 0-100 based on usage, payments, NPS, support ticketschurn_risk: LOW / MEDIUM / HIGHexpansion_potential: LOW / MEDIUM / HIGH
With 100K rows per source (400K total records):
| Metric | Value |
|---|---|
| Total records | 400,000 |
| Shards | 10 |
| Parallel tasks | 10 |
| Estimated time | 2-5 seconds |
| Sequential estimate | ~20+ seconds |
| Speedup | ~5-10x |
Edit NUM_SHARDS in:
python/workflows/sharding.pytypescript/workflows/src/sharding.ts
Edit the calculation functions in:
python/workflows/enrichment.pytypescript/workflows/src/enrichment.ts
python scripts/generate_data.py --rows 10000 # 10K rows
python scripts/generate_data.py --rows 1000000 # 1M rows- Check
WORKFLOW_SLUGmatches the workflow service name in the Dashboard - Ensure workflow deployed successfully in Dashboard
- Verify
RENDER_API_KEYis set correctly - API key needs Workflows permissions
- Check
DATA_DIRenvironment variable - Ensure CSVs are accessible from workflow runtime