Cross-Cluster Routing¶
When NovaEdge federation is running in mesh or unified mode, the controller merges service endpoints from all federated clusters into a single endpoint list. Agents then use locality-aware load balancing to route traffic to the nearest healthy backend, overflowing to remote clusters only when local capacity drops below a configurable threshold.
How Endpoint Merging Works¶
Each federated controller maintains a RemoteEndpointCache that stores ServiceEndpoints received from peer clusters. During every config snapshot build cycle, the snapshot builder queries this cache and merges remote endpoints into the local endpoint list for each service.
flowchart LR
subgraph ClusterA["Cluster US"]
WA["Watch<br/>EndpointSlices"] --> SBA["Snapshot Builder"]
SBA --> SA["ConfigSnapshot<br/>(local endpoints)"]
end
subgraph ClusterB["Cluster EU"]
WB["Watch<br/>EndpointSlices"] --> SBB["Snapshot Builder"]
SBB --> SB["ConfigSnapshot<br/>(local endpoints)"]
end
SBA -->|"ServiceEndpoints<br/>via federation sync"| Cache["RemoteEndpoint<br/>Cache"]
SBB -->|"ServiceEndpoints<br/>via federation sync"| Cache
Cache --> Merge["Endpoint Merge"]
SA --> Merge
Merge --> Final["Final ConfigSnapshot<br/>(local + remote endpoints)"]
Final -->|"gRPC push"| Agents["Node Agents"]
style Cache fill:#fff4e6
style Merge fill:#e8f5e9
style Final fill:#e1f5ff
Merge Pipeline¶
- Local endpoints are collected from Kubernetes EndpointSlices by the controller's informers
- Remote endpoints arrive via the federation sync protocol and are stored in the
RemoteEndpointCache, keyed by cluster name andnamespace/service - The snapshot builder calls
GetForService(namespace, serviceName)to retrieve all remote endpoints for a given service - Endpoints are combined into a single list, with each endpoint annotated with locality labels (
novaedge.io/cluster,novaedge.io/region,novaedge.io/zone) - The merged snapshot is pushed to local agents via gRPC
flowchart TB
subgraph Input["Inputs"]
Local["Local EndpointSlices<br/>(informer)"]
Remote["Remote ServiceEndpoints<br/>(federation cache)"]
end
Local --> Label1["Add labels:<br/>novaedge.io/cluster=local<br/>novaedge.io/region=us-east-1<br/>novaedge.io/zone=us-east-1a"]
Remote --> Label2["Labels already set by<br/>originating cluster"]
Label1 --> Combine["Combine endpoint lists"]
Label2 --> Combine
Combine --> Dedupe["Deduplicate by<br/>address:port"]
Dedupe --> Snapshot["ConfigSnapshot.Clusters[].Endpoints"]
style Input fill:#fff4e6
style Combine fill:#e8f5e9
style Snapshot fill:#e1f5ff
Locality-Aware Routing¶
Agents use a three-tier locality hierarchy to prefer nearby endpoints:
The locality balancer checks endpoint labels to determine affinity:
| Label | Source | Example |
|---|---|---|
novaedge.io/zone |
Set by originating cluster | us-east-1a |
novaedge.io/region |
Set by originating cluster | us-east-1 |
novaedge.io/cluster |
Set by originating cluster | cluster-us |
topology.kubernetes.io/zone |
Standard Kubernetes node label | us-east-1a |
topology.kubernetes.io/region |
Standard Kubernetes node label | us-east-1 |
Overflow Behavior¶
The agent's LocalityConfig controls when traffic overflows from local to remote endpoints:
- Collect local-zone endpoints -- endpoints in the same zone as the agent
- Check healthy ratio -- if
healthy_local / total_local >= MinHealthyPercent(default: 70%), traffic stays in the local zone - Expand to same-region -- if the local zone is below threshold, include all endpoints in the same region
- Expand to all -- if the entire region is below threshold, use all endpoints across all clusters
flowchart TB
Start["Request arrives"] --> CheckZone{"Local zone endpoints<br/>healthy >= 70%?"}
CheckZone -->|"Yes"| UseZone["Route to<br/>same-zone endpoint"]
CheckZone -->|"No"| CheckRegion{"Same region endpoints<br/>healthy >= 70%?"}
CheckRegion -->|"Yes"| UseRegion["Route to<br/>same-region endpoint"]
CheckRegion -->|"No"| UseAll["Route to<br/>any healthy endpoint<br/>(cross-cluster)"]
style UseZone fill:#e8f5e9
style UseRegion fill:#e1f5ff
style UseAll fill:#fff5e6
The MinHealthyPercent threshold is configurable per load balancer:
apiVersion: novaedge.io/v1alpha1
kind: ProxyBackend
metadata:
name: my-service
spec:
loadBalancer:
algorithm: LeastConn
localityAware:
enabled: true
minHealthyPercent: 70
Configuring RemoteClusterRouting¶
The NovaEdgeRemoteCluster resource includes a routing section that controls how traffic flows to and from the remote cluster:
spec:
routing:
# Enable or disable routing to this cluster
enabled: true
# Failover priority (lower = higher priority)
# Local cluster typically has priority 50, remote clusters 100+
priority: 100
# Traffic weight for weighted load balancing (0-100)
# Distribute 70% local / 30% remote:
# local weight: 70, remote weight: 30
weight: 50
# Prefer backends within the requesting cluster before cross-cluster
localPreference: true
# Allow cross-cluster traffic when local backends are unavailable
allowCrossClusterTraffic: true
# Filter which endpoints from this cluster to include
endpoints:
# Only include endpoints from these namespaces
namespaces:
- default
- production
# Exclude endpoints from these namespaces
excludeNamespaces:
- kube-system
- monitoring
# Only include endpoints matching these labels
matchLabels:
tier: frontend
Field Reference¶
| Field | Default | Description |
|---|---|---|
enabled |
true | Whether to route traffic to/from this cluster |
priority |
100 | Failover ordering; lower values are preferred (range: 1-1000) |
weight |
100 | Traffic weight for weighted distribution (range: 0-100) |
localPreference |
true | Prefer local backends before cross-cluster |
allowCrossClusterTraffic |
true | Allow failover to this cluster when local backends are down |
endpoints.namespaces |
(all) | Restrict to endpoints in these namespaces |
endpoints.excludeNamespaces |
(none) | Exclude endpoints from these namespaces |
endpoints.matchLabels |
(none) | Only include endpoints with matching labels |
Failover Behavior¶
When local endpoints become unhealthy, the system follows a predictable failover sequence:
sequenceDiagram
participant Client
participant Agent as Local Agent
participant LB as Locality LB
participant Local as Local Endpoints
participant Remote as Remote Endpoints
Client->>Agent: Request
Agent->>LB: Pick endpoint
alt Local zone healthy >= 70%
LB->>Local: Route to local zone
Local-->>Agent: Response
else Local zone degraded
LB->>LB: Expand to same region
alt Same region healthy >= 70%
LB->>Local: Route to same region
Local-->>Agent: Response
else Region degraded
LB->>Remote: Route to remote cluster
Remote-->>Agent: Response
end
end
Agent-->>Client: Response
Failover triggers¶
- Endpoint health checks fail -- passive failure detection (error responses) or active probing marks endpoints as unhealthy
- Healthy ratio drops below threshold -- when fewer than
MinHealthyPercentof local endpoints are healthy, the locality balancer expands to the next tier - Remote cluster disconnects -- if federation health checks mark a remote cluster as unhealthy, its endpoints are excluded from the merged list
Failback¶
When local endpoints recover:
- Health checks mark endpoints as healthy again
- The healthy ratio rises above
MinHealthyPercent - The locality balancer resumes preferring local endpoints
- Traffic gradually shifts back (no sudden switchover)
Monitoring Cross-Cluster Traffic¶
Endpoint Labels¶
Every endpoint in the merged list carries cluster, region, and zone labels. You can query Prometheus metrics with these labels to understand traffic distribution:
# Requests routed to remote clusters
sum by (cluster) (
rate(novaedge_upstream_requests_total{cluster!="local"}[5m])
)
# Cross-cluster latency
histogram_quantile(0.99,
rate(novaedge_upstream_duration_seconds_bucket{cluster!="local"}[5m])
)
Key Metrics¶
| Metric | Description |
|---|---|
novaedge_upstream_requests_total |
Total upstream requests, labeled by cluster/region/zone |
novaedge_upstream_duration_seconds |
Upstream request duration histogram |
novaedge_federation_peers_healthy |
Number of healthy federation peers |
novaedge_federation_sync_duration_seconds |
Time taken for federation sync per peer |
Checking Endpoint Distribution¶
Inspect the merged endpoint list for a service:
# View the cluster endpoints in the current config snapshot
kubectl -n novaedge-system exec deploy/novaedge-agent -- \
novaedge-agent --dump-config | \
jq '.clusters[] | select(.name=="my-service") | .endpoints[] | {address, labels}'
Related Guides¶
- Federation -- federation setup and modes
- Federation Tunnels -- network tunnels for environments with NAT or firewalls
- Load Balancing -- load balancing algorithms including locality-aware
- Health Checks -- endpoint health check configuration