Skip to content

Integration Patterns

Production-validated integration patterns for ai.matey, discovered and tested through comprehensive integration testing.

These patterns emerged from real-world testing scenarios and represent battle-tested approaches to common challenges:

Pattern Use Case Complexity Status
Complexity-Based Routing Cost/quality optimization Moderate ✅ Production-ready
Parallel Provider Aggregation A/B testing, redundancy Advanced ✅ Production-ready
Automatic Failover Resilience Advanced ✅ Production-ready
Cost-Optimized Selection Cost reduction Moderate ✅ Production-ready
WebSocket Real-Time Streaming Real-time chat Advanced ✅ Production-ready
Batch Processing High throughput Advanced ✅ Production-ready
Middleware Composition Request pipeline Moderate ✅ Production-ready
Health Monitoring Observability Moderate ✅ Production-ready

Use Case: Route queries to appropriate providers based on complexity to optimize cost and quality.

Different queries require different model capabilities. Routing all queries to the most powerful (and expensive) model wastes money. Routing to cheaper models sacrifices quality on complex tasks.

Analyze query complexity and route to appropriate providers based on a complexity score.

import { Router } from 'ai.matey.core';
// Complexity analyzer
function analyzeComplexity(query: string): number {
let score = 0;
// Word count (longer = more complex)
const wordCount = query.split(/\s+/).length;
score += Math.min(wordCount / 2, 30);
// Keywords indicating complexity
const complexKeywords = ['analyze', 'explain', 'compare', 'evaluate'];
if (complexKeywords.some(kw => query.toLowerCase().includes(kw))) {
score += 20;
}
// Technical indicators
if (/\d+\s*[+\-*/]\s*\d+/.test(query)) score += 15; // Math
if (/[A-Z]{2,}/.test(query)) score += 10; // Acronyms
return Math.min(score, 100);
}
// Router with custom complexity-based routing
const router = new Router(new OpenAIFrontendAdapter(), {
backends: [groq, deepseek, openai, anthropic],
strategy: 'custom',
customStrategy: (request) => {
const query = request.messages[request.messages.length - 1]?.content;
const complexity = analyzeComplexity(query.toString());
if (complexity < 25) return 0; // Groq: Fast & cheap
if (complexity < 50) return 1; // DeepSeek: Cost-effective
if (complexity < 80) return 2; // OpenAI: Powerful
return 3; // Anthropic: Most capable
}
});
  • ✅ Automatic provider selection
  • ✅ Cost optimization while maintaining quality
  • ✅ 18 diverse test queries validated
  • Multiple providers with different capabilities/costs
  • Query complexity varies significantly
  • Need to optimize for both cost and quality

Use Case: Compare responses from multiple providers side-by-side or ensure redundancy.

You want to compare responses from multiple providers simultaneously, or ensure redundancy by calling multiple providers in parallel.

Execute the same request to multiple providers in parallel and aggregate streaming responses in real-time.

import { Bridge } from 'ai.matey.core';
import { EventEmitter } from 'events';
class ParallelAggregator extends EventEmitter {
async executeParallel(request: any, backends: any[]) {
const startTime = Date.now();
const promises = backends.map(async (backend) => {
const bridge = new Bridge(
new OpenAIFrontendAdapter(),
backend
);
try {
const response = await bridge.chat(request);
const duration = Date.now() - startTime;
this.emit('response', {
provider: backend.name,
response,
duration,
success: true
});
return { provider: backend.name, response, success: true };
} catch (error) {
this.emit('error', {
provider: backend.name,
error: error.message
});
return { provider: backend.name, error, success: false };
}
});
// Wait for all (graceful failure handling)
const results = await Promise.allSettled(promises);
return results.map(r => r.status === 'fulfilled' ? r.value : null);
}
}
// Usage
const aggregator = new ParallelAggregator();
aggregator.on('response', ({ provider, response, duration }) => {
console.log(`${provider}: ${response.content} (${duration}ms)`);
});
const results = await aggregator.executeParallel(request, backends);
  • ✅ Parallel streaming from 3 providers
  • ✅ Real-time chunk processing
  • ✅ Graceful failure handling
  • A/B testing different providers
  • Ensuring redundancy (if one fails, others succeed)
  • Comparing response quality
  • Performance benchmarking

Use Case: Automatically retry failed requests with fallback providers for high availability.

Providers fail due to rate limits, network issues, or service outages. Manual intervention is too slow.

Custom middleware that automatically retries failed requests with the next provider in a fallback chain.

import { Router } from 'ai.matey.core';
const router = new Router(new OpenAIFrontendAdapter(), {
backends: [
openaiBackend, // Primary
anthropicBackend, // Secondary
groqBackend, // Tertiary
deepseekBackend // Last resort
],
strategy: 'priority',
fallbackOnError: true,
healthCheck: {
enabled: true,
interval: 60000, // Check every minute
},
});
// Health tracking prevents repeated requests to failing providers
router.on('backend:failed', ({ backend, error }) => {
console.log(`${backend} failed: ${error.message}`);
});
router.on('backend:switch', ({ from, to }) => {
console.log(`Switched from ${from} to ${to}`);
});
  • ✅ Automatic failover between 4 providers
  • ✅ Health tracking prevents repeated failures
  • ✅ Comprehensive audit trail
  • ✅ 10 failure scenarios tested
  • Production systems requiring high availability
  • When provider reliability varies
  • Need automatic recovery from failures
  • Want comprehensive failure logging

Use Case: Minimize API costs while maintaining quality guarantees.

AI API costs can be high. Different providers have vastly different pricing, but you still need quality guarantees.

Estimate request costs across providers, filter by quality tier, and select the cheapest option.

// Provider pricing (per 1K tokens)
const PRICING = {
deepseek: 0.0002,
groq: 0.00027,
anthropic: 0.0008, // Haiku
openai: 0.0015 // GPT-3.5
};
// Quality tiers
const QUALITY_TIERS = {
low: ['deepseek'],
medium: ['deepseek', 'groq'],
high: ['groq', 'anthropic', 'openai'],
ultra: ['anthropic', 'openai']
};
function selectCheapestProvider(
prompt: string,
qualityTier: keyof typeof QUALITY_TIERS
) {
const estimatedTokens = Math.ceil((prompt.length + 200) / 4);
const eligibleProviders = QUALITY_TIERS[qualityTier];
let cheapest = eligibleProviders[0];
let lowestCost = PRICING[cheapest] * estimatedTokens / 1000;
for (const provider of eligibleProviders) {
const cost = PRICING[provider] * estimatedTokens / 1000;
if (cost < lowestCost) {
cheapest = provider;
lowestCost = cost;
}
}
return { provider: cheapest, estimatedCost: lowestCost };
}
  • 84% cost savings vs always using OpenAI
  • ✅ Quality tiers ensure acceptable response quality
  • ✅ Provider distribution: 50% DeepSeek, 30% Groq, 20% Haiku
  • Cost optimization is a priority
  • Flexible quality requirements
  • Processing high volumes of requests
  • Providers offer similar capabilities at different prices

Use Case: Bi-directional real-time communication for chat applications.

Need real-time communication for chat applications with streaming AI responses.

WebSocket server with per-client conversation history and real-time streaming delivery.

import WebSocket from 'ws';
import { Bridge } from 'ai.matey.core';
const wss = new WebSocket.Server({ port: 8080 });
// Per-client state
const clients = new Map();
wss.on('connection', (ws) => {
const clientId = `client-${Date.now()}`;
clients.set(ws, { id: clientId, history: [] });
ws.on('message', async (data) => {
const message = JSON.parse(data.toString());
const client = clients.get(ws);
if (message.type === 'chat') {
const bridge = new Bridge(
new OpenAIFrontendAdapter(),
backend
);
const stream = await bridge.chatStream({
messages: [{ role: 'user', content: message.content }],
stream: true
});
for await (const chunk of stream) {
if (ws.readyState === WebSocket.OPEN) {
ws.send(JSON.stringify({
type: 'chunk',
data: chunk
}));
}
}
ws.send(JSON.stringify({ type: 'done' }));
}
});
});
  • ✅ 15/15 tests passing (100%)
  • 101ms latency (ping/pong)
  • 19 chunks/sec streaming rate
  • ✅ Multi-client concurrent handling
  • Real-time chat applications
  • Interactive AI assistants
  • Bi-directional communication requirements
  • Per-client state management needed

Use Case: Process many requests quickly without hitting API rate limits.

Need to process many requests quickly while respecting provider rate limits.

Queue management with sliding window rate limiting and configurable concurrency.

class BatchProcessor {
private queue: any[] = [];
private processing = 0;
private requestTimes: number[] = [];
constructor(
private config: {
concurrency: number;
rateLimit: { maxRequests: number; perMinutes: number };
}
) {}
private canMakeRequest(): boolean {
const now = Date.now();
const windowStart = now - (this.config.rateLimit.perMinutes * 60 * 1000);
// Remove old timestamps
this.requestTimes = this.requestTimes.filter(t => t > windowStart);
return this.requestTimes.length < this.config.rateLimit.maxRequests;
}
async add(request: any): Promise<any> {
return new Promise((resolve, reject) => {
this.queue.push({ request, resolve, reject });
this.processQueue();
});
}
private async processQueue() {
while (
this.queue.length > 0 &&
this.processing < this.config.concurrency &&
this.canMakeRequest()
) {
const { request, resolve, reject } = this.queue.shift();
this.processing++;
this.requestTimes.push(Date.now());
try {
const result = await this.execute(request);
resolve(result);
} catch (error) {
reject(error);
} finally {
this.processing--;
this.processQueue();
}
}
}
}
  • 100% success on standard config
  • ✅ Sliding window rate limiting
  • ✅ Configurable concurrency
  • High-volume batch processing
  • Need to respect rate limits
  • Want configurable concurrency
  • Processing queued requests

Use Case: Build complex request pipelines with multiple middleware layers.

import {
createLoggingMiddleware,
createRetryMiddleware,
createCachingMiddleware,
createCostTrackingMiddleware,
} from 'ai.matey.middleware';
const bridge = new Bridge(
new OpenAIFrontendAdapter(),
backend
);
// Build middleware stack (order matters!)
bridge.use(createLoggingMiddleware({ level: 'info' }));
bridge.use(createRetryMiddleware({ maxAttempts: 3 }));
bridge.use(createCachingMiddleware({ ttl: 3600 }));
bridge.use(createCostTrackingMiddleware({ budgetLimit: 100 }));
  • Production applications
  • Need logging, retry, caching, etc.
  • Want separation of concerns
  • Building reusable pipelines

Use Case: Monitor provider health and performance in real-time.

class HealthMonitor {
private metrics = new Map();
trackRequest(provider: string, duration: number, success: boolean) {
if (!this.metrics.has(provider)) {
this.metrics.set(provider, {
requests: 0,
failures: 0,
totalDuration: 0,
});
}
const m = this.metrics.get(provider);
m.requests++;
if (!success) m.failures++;
m.totalDuration += duration;
}
getHealth(provider: string) {
const m = this.metrics.get(provider);
if (!m) return null;
return {
errorRate: m.failures / m.requests,
avgLatency: m.totalDuration / m.requests,
healthy: m.failures / m.requests < 0.05, // <5% errors
};
}
}
  • Production monitoring
  • Need provider health insights
  • Want to track performance
  • Building observability