Back to Docs Hub

Routing Engine

Intelligent model routing across 5 policies — pick the right model for every task automatically.

quality
Quality-first
cost
Cost-first
privacy
Privacy-first
local
Local-only
manual
Manual model

The routing engine is the brain of Aura Work's provider selection. It evaluates each task based on its type, sensitivity, required capabilities, and your configured policy to select the optimal model. This ensures you always get the best results while controlling costs and privacy.

How Routing Works

When you send a prompt, the routing engine:

  1. 1. Analyzes the task — determines if it's coding, research, document work, data analysis, browser automation, etc.
  2. 2. Checks capabilities — does the task need vision? Tool calling? Long context? Reasoning?
  3. 3. Evaluates sensitivity — is the data normal, sensitive, or secret-risk?
  4. 4. Applies your policy — quality-first, cost-first, privacy-first, local-only, or manual
  5. 5. Selects the model — picks the best provider + model combination
  6. 6. Returns a decision — includes the selected model, estimated cost, and confidence

📊 Task Types

The routing engine recognizes these task types:

TypeDescriptionBest Providers
codingCode generation, refactoring, debuggingAnthropic, DeepSeek, OpenAI
researchInformation gathering, analysisOpenAI, Gemini, Anthropic
documentWriting, editing, formattingOpenAI, Anthropic, Gemini
dataData analysis, spreadsheet workOpenAI, Gemini, DeepSeek
browserWeb browsing, form fillingOpenAI, Anthropic (vision)
reviewCode review, feedbackAnthropic, OpenAI
securitySecurity analysis, vulnerability detectionAnthropic, OpenAI
generalGeneral conversation, Q&AAny provider

Quality-first

Best model for the job. Default.

The routing engine selects the most capable model available across all enabled providers. Optimizes for result quality over cost. Best for complex coding tasks, research, and critical work.

Cost-first

Cheapest model that can do the task.

Routes to the most affordable model that meets the minimum capability requirements (text, tool-calling, etc.). Ideal for bulk processing, routine tasks, and experimentation.

Privacy-first

Prefer local; redact secrets before cloud.

Prioritizes local models (Ollama, LM Studio) for sensitive tasks. When cloud models must be used, sensitive data is automatically redacted. Best for confidential projects.

Local-only

Ollama/LM Studio only. No cloud requests.

Strict air-gapped mode. Only local models are used. All data stays on your machine. Network requests to cloud providers are blocked entirely.

Manual model

Use the model you pick per provider.

Full manual control. You select exactly which model to use for each provider. The routing engine defers to your choice without any automatic model selection.

Five Routing Strategies

Quality-first

Routes the request to the best available provider. Uses quality benchmarks (HumanEval accuracy, response speed, etc.). Suited for critical tasks.

Cost-first

Chooses the cheapest provider that meets the minimum quality bar. Suited for simple tasks or large-scale data processing.

Privacy-first

Prefers local providers (Ollama, LM Studio). Never sends data to the cloud. Requires at least one local provider to be configured.

Local-only

Uses local providers exclusively. Fails if no local provider is available. Meant for fully isolated environments.

Manual

You pick the provider and model manually for each request: aura --provider openai --model gpt-4o

Cost Optimization

To reduce AI costs without sacrificing quality:

  • Use the cost-first policy for routine daily tasks (formatting, simple lookups, Q&A)
  • Set a policy per project — production projects use quality-first, while experiments use cost-first
  • Monitor token usage from the Dashboard to spot expensive patterns early
  • Use Ollama locally for repeated development and testing at zero cost
  • Set strict cost limits per task or per day to prevent unexpected bills

Privacy-Based Routing

When working with proprietary code, sensitive data, or personal information, the privacy-first policy ensures no data ever leaves your machine:

🔒 How Privacy-first Works

  • The routing engine analyzes the sensitivity of the data in every request (passwords, keys, personal data)
  • If high sensitivity is detected, routing is restricted to local providers only
  • If no local provider is available, the task fails rather than sending sensitive data to the cloud
  • You can always enforce this behavior with the local-only policy regardless of detected sensitivity

Setting up routing

Configure your routing policy in Settings → Routing:

  1. 1. Choose your default policy (applies to all tasks unless overridden)
  2. 2. Optionally set per-project policies (override for specific projects)
  3. 3. Configure fallback providers (what to use if primary fails)
  4. 4. Set cost limits (maximum tokens/cost per task)

💡 When to Use Each Policy

PolicyBest ForTrade-offs
Quality-firstImportant tasks, production code, client workHigher cost, slower
Cost-firstRoutine tasks, development, testingMay use weaker models
Privacy-firstSensitive data, proprietary code, personal infoLimited to local models
Local-onlyOffline work, air-gapped environmentsRequires local hardware
ManualLearning, comparing providers, specific model needsSlower workflow

🔄 Fallback Chain

If the primary provider fails, the routing engine tries alternatives:

// Example fallback configuration
{
  "routing": {
    "policy": "quality-first",
    "fallback": [
      "anthropic/claude-3-5-sonnet",
      "openai/gpt-4o",
      "deepseek/deepseek-coder",
      "ollama/llama3"
    ],
    "maxRetries": 3,
    "costLimit": {
      "perTask": 0.50,
      "perDay": 10.00
    }
  }
}

The engine tries each fallback in order until one succeeds. If all fail, it notifies you and asks for manual intervention.