AgentAid
System Design
A real-time call lead marketplace connecting inbound insurance calls to licensed agents via browser softphone — built for reliability, auditability, and scale.
Inbound calls.
Licensed agents.
Zero hardware.
AgentAid routes consumer-initiated inbound phone calls to licensed final expense insurance agents through a browser-based softphone.
- Agents prepay balance, go online, receive calls
- Automatically charged per connected call
- Browser-based softphone — no hardware required
The system must be reliable, auditable, event-driven, and ready to evolve into a white-label marketplace platform.
Business Goals
- Route inbound calls to the right licensed agent in real time
- Enforce online, licensed, funded, and daily cap eligibility
- Debit agent balance per connected call automatically
- Real-time visibility for admins
- Support future verticals and white-label deployments
Engineering Goals
- High availability for call routing
- Reliable event processing
- Safe payment and balance handling
- Full routing auditability
- Minimal operational complexity for MVP
- Scale without major rewrites
North Star
Simple enough to ship fast. Reliable enough for real money, real calls, and real-time routing.
Evolution Path
Architecture designed to evolve into a white-label marketplace platform without major rewrites.
Decisions driven by
constraints, not trends
Every technology choice maps to a product requirement — not resume-driven development.
Modular Monolith
One deployable unit with clear internal modules. Fast to build, simple to operate.
Event-Driven Core
Webhooks ingest fast. Workers process async. Every state change is an auditable event.
Ledger, Not Balance
Every credit and debit is a transaction record. Balance is derived, never blindly mutated.
Idempotent Everything
Duplicate webhooks must never double-charge or double-route. Store raw events first.
Real-Time by Default
WebSockets for agent ringing, admin monitoring, and balance updates — not polling.
AWS-Native MVP
Managed services for database, queue, secrets, and deployment. Minimize ops surface area.
Eight components.
One critical path.
flowchart TD
A["Inbound Call / CallGrid"] --> B["Webhook API"]
B --> C["PostgreSQL Event Store"]
B --> D["AWS SQS Queue"]
D --> E["Routing Worker"]
E --> F["PostgreSQL"]
E --> G["Twilio Voice SDK / Twilio Voice"]
E --> H["Audit Log"]
G --> I["Browser Softphone - React"]
I --> J["Agent Dashboard"]
K["Stripe Checkout"] --> L["Stripe Webhook API"]
L --> M["PostgreSQL Ledger"]
L --> D
F --> N["Admin Dashboard"]
E --> O["WebSocket Gateway"]
O --> N
O --> J
From inbound ring
to agent connection
CALL_RECEIVEDCALL_CONNECTEDAgent is approved
AND agent is online
AND agent is licensed in caller state
AND agent has enough balance
AND agent has not reached daily cap
AND agent is not currently busy
MVP Selection Strategy
- Filter eligible agents
- Prioritize agents with higher balance
- Respect daily cap
- Avoid sending multiple calls to the same busy agent
- Round-robin or weighted round-robin among eligible agents
SQS decouples
ingestion from logic
Webhook handlers: receive → validate signature → store raw event → publish to queue → return 200. Business processing happens in workers.
Why SQS over RabbitMQ?
Reduces coupling between webhook ingestion and business processing. Retries, durability, and failure isolation — without managing RabbitMQ. AWS-native, no exchange routing needed at MVP.
Why Event-Driven?
The system depends on external systems, asynchronous workflows, retries, and auditability. Prevents webhook timeouts and reduces the risk of losing critical events.
Never trust the frontend
for money
Credit Flow — Stripe Top-Up
payment_intent.succeeded webhookPAYMENT_RECEIVEDDebit Flow — Per Connected Call
CALL_BILLING_REQUESTEDUse a ledger model, not only a mutable balance field. Each balance change generates a transaction record — auditable and safer.
CREDIT
Stripe payment
DEBIT
Connected call
REFUND
Stripe refund
ADMIN_ADJUSTMENT
Manual admin override
Browser softphone.
No hardware required.
Agents receive calls directly in Chrome. No mobile app, no desk phone, no provisioning delay.
Why Twilio Voice SDK?
Removes physical phones, mobile apps, and hardware. Agents receive calls directly in Chrome. Backend generates access tokens; React registers a Twilio Device.
Agent Dashboard
Onboarding · Licensed states · NPN verification · Prepaid balance · Online/offline toggle · Browser softphone · Balance, call history, daily cap
Admin Dashboard
Live call monitor · Agent approve/suspend · Balance overrides · Campaign & phone number config · Routing logs · Revenue reports
WebSocket Gateway
Agent call ringing · Agent availability · Admin live call monitor · Balance updates · Call status changes
Strong consistency
for money & routing
PostgreSQL stores agents, campaigns, call records, routing attempts, balance ledger, Stripe transactions, and audit logs.
Near 100% availability
on the routing path
Managed infra.
Security by design.
MVP Deployment
React frontend · Node.js API · PostgreSQL on RDS · SQS · Worker process · Twilio · Stripe · WebSocket gateway.
Start with ECS Fargate if possible. EC2 acceptable for v1. RDS from day one — database reliability is critical.
Security Requirements
- HTTPS everywhere · secrets in Secrets Manager
- Validate Stripe & Twilio webhook signatures
- RBAC for admin · audit logs for admin actions
- Never expose Stripe/Twilio secrets to frontend
- Least-privilege IAM · encrypt DB at rest
- Secure JWT/session management
If routing fails,
revenue stops immediately
Operations and product analytics are directly connected. Observability is a revenue protection strategy.
- Calls received, routed, connected, missed
- Revenue by campaign and date
- Agent balance and online time
- Conversion rate by campaign
- API and webhook latency
- SQS queue depth and worker failures
- Routing latency end-to-end
- Twilio / Stripe webhook failure rates
- WebSocket disconnects · error rates
- Database query performance
CloudWatch
Infrastructure metrics, log aggregation, alerting on queue depth and error rates.
Sentry
Application error tracking with stack traces. Catch routing and billing exceptions in production.
Structured Logs
JSON logs with correlation IDs across webhook → queue → worker → database for end-to-end tracing.
Six phases.
Incremental value.
- AWS setup · PostgreSQL schema · Node.js backend
- Authentication · SQS queue · Worker service
- Raw webhook ingestion · Deployment pipeline
- Campaign phone number mapping
- Eligibility rules · state licensing · online/offline
- Daily cap · balance validation · routing audit logs
- Twilio Voice SDK token generation
- Browser softphone · incoming call flow
- Call connected/ended events · call history
- Stripe Checkout · Stripe webhooks
- Balance ledger · per-call debit
- Auto-offline on low balance
- Live call monitor · agent management
- Campaign management · revenue reporting
- Routing logs
- E2E call testing · payment testing
- Webhook replay testing · load testing
- Monitoring dashboards · soft launch with seed agents
Speed today.
Scale tomorrow.
Simple enough to build quickly, but reliable enough to handle real money, real phone calls, and real-time routing. Not microservices from day one.
Speed
Fast to build, simple to deploy, easier to maintain as the first developer. Clear module ownership inside one codebase.
Reliability
Event-driven, idempotent, ledger-based. Durable queue, raw event storage, and transactional billing from day one.
Scale Path
High-volume modules (routing, billing) can be extracted into separate services when metrics demand it — not before.