Routing Engine
Intelligent model routing across 5 policies — pick the right model for every task automatically.
The routing engine is the brain of Aura Work's provider selection. It evaluates each task based on its type, sensitivity, required capabilities, and your configured policy to select the optimal model. This ensures you always get the best results while controlling costs and privacy.
How Routing Works
When you send a prompt, the routing engine:
- 1. Analyzes the task — determines if it's coding, research, document work, data analysis, browser automation, etc.
- 2. Checks capabilities — does the task need vision? Tool calling? Long context? Reasoning?
- 3. Evaluates sensitivity — is the data normal, sensitive, or secret-risk?
- 4. Applies your policy — quality-first, cost-first, privacy-first, local-only, or manual
- 5. Selects the model — picks the best provider + model combination
- 6. Returns a decision — includes the selected model, estimated cost, and confidence
📊 Task Types
The routing engine recognizes these task types:
| Type | Description | Best Providers |
|---|---|---|
coding | Code generation, refactoring, debugging | Anthropic, DeepSeek, OpenAI |
research | Information gathering, analysis | OpenAI, Gemini, Anthropic |
document | Writing, editing, formatting | OpenAI, Anthropic, Gemini |
data | Data analysis, spreadsheet work | OpenAI, Gemini, DeepSeek |
browser | Web browsing, form filling | OpenAI, Anthropic (vision) |
review | Code review, feedback | Anthropic, OpenAI |
security | Security analysis, vulnerability detection | Anthropic, OpenAI |
general | General conversation, Q&A | Any provider |
Quality-first
The routing engine selects the most capable model available across all enabled providers. Optimizes for result quality over cost. Best for complex coding tasks, research, and critical work.
Cost-first
Routes to the most affordable model that meets the minimum capability requirements (text, tool-calling, etc.). Ideal for bulk processing, routine tasks, and experimentation.
Privacy-first
Prioritizes local models (Ollama, LM Studio) for sensitive tasks. When cloud models must be used, sensitive data is automatically redacted. Best for confidential projects.
Local-only
Strict air-gapped mode. Only local models are used. All data stays on your machine. Network requests to cloud providers are blocked entirely.
Manual model
Full manual control. You select exactly which model to use for each provider. The routing engine defers to your choice without any automatic model selection.
Five Routing Strategies
Quality-first
Routes the request to the best available provider. Uses quality benchmarks (HumanEval accuracy, response speed, etc.). Suited for critical tasks.
Cost-first
Chooses the cheapest provider that meets the minimum quality bar. Suited for simple tasks or large-scale data processing.
Privacy-first
Prefers local providers (Ollama, LM Studio). Never sends data to the cloud. Requires at least one local provider to be configured.
Local-only
Uses local providers exclusively. Fails if no local provider is available. Meant for fully isolated environments.
Manual
You pick the provider and model manually for each request: aura --provider openai --model gpt-4o
Cost Optimization
To reduce AI costs without sacrificing quality:
- Use the cost-first policy for routine daily tasks (formatting, simple lookups, Q&A)
- Set a policy per project — production projects use quality-first, while experiments use cost-first
- Monitor token usage from the Dashboard to spot expensive patterns early
- Use Ollama locally for repeated development and testing at zero cost
- Set strict cost limits per task or per day to prevent unexpected bills
Privacy-Based Routing
When working with proprietary code, sensitive data, or personal information, the privacy-first policy ensures no data ever leaves your machine:
🔒 How Privacy-first Works
- The routing engine analyzes the sensitivity of the data in every request (passwords, keys, personal data)
- If high sensitivity is detected, routing is restricted to local providers only
- If no local provider is available, the task fails rather than sending sensitive data to the cloud
- You can always enforce this behavior with the local-only policy regardless of detected sensitivity
Setting up routing
Configure your routing policy in Settings → Routing:
- 1. Choose your default policy (applies to all tasks unless overridden)
- 2. Optionally set per-project policies (override for specific projects)
- 3. Configure fallback providers (what to use if primary fails)
- 4. Set cost limits (maximum tokens/cost per task)
💡 When to Use Each Policy
| Policy | Best For | Trade-offs |
|---|---|---|
| Quality-first | Important tasks, production code, client work | Higher cost, slower |
| Cost-first | Routine tasks, development, testing | May use weaker models |
| Privacy-first | Sensitive data, proprietary code, personal info | Limited to local models |
| Local-only | Offline work, air-gapped environments | Requires local hardware |
| Manual | Learning, comparing providers, specific model needs | Slower workflow |
🔄 Fallback Chain
If the primary provider fails, the routing engine tries alternatives:
// Example fallback configuration
{
"routing": {
"policy": "quality-first",
"fallback": [
"anthropic/claude-3-5-sonnet",
"openai/gpt-4o",
"deepseek/deepseek-coder",
"ollama/llama3"
],
"maxRetries": 3,
"costLimit": {
"perTask": 0.50,
"perDay": 10.00
}
}
}
The engine tries each fallback in order until one succeeds. If all fail, it notifies you and asks for manual intervention.