Service Mesh¶
NovaEdge includes a sidecar-free service mesh for east-west (pod-to-pod) traffic. It intercepts ClusterIP traffic transparently using iptables TPROXY, authenticates services with SPIFFE-based mTLS, and enforces authorization policies -- all without injecting sidecar containers.
Overview¶
Traditional service meshes inject a sidecar proxy into every pod, adding latency, memory overhead, and operational complexity. NovaEdge takes a different approach: the node agent (DaemonSet) intercepts service traffic at the kernel level using TPROXY and tunnels it over mTLS HTTP/2 connections between nodes.
Key properties:
- No sidecars -- traffic interception happens at the node level via iptables TPROXY
- Opt-in per service -- annotate services with
novaedge.io/mesh: "enabled"to enroll them - SPIFFE identities -- each agent gets a workload certificate with a SPIFFE URI SAN
- mTLS everywhere -- node-to-node tunnel traffic is encrypted with TLS 1.3
- Authorization policies -- control which services can communicate using ALLOW/DENY rules
- Automatic certificate rotation -- certificates are renewed at 80% of their 24-hour lifetime
Architecture¶
flowchart TB
subgraph Node1["Node 1 (NovaEdge Agent)"]
Pod1["Pod A<br/>(client)"] -->|"ClusterIP:port"| IPT1["iptables TPROXY<br/>NOVAEDGE_MESH chain"]
IPT1 -->|"redirect"| TL1["Transparent Listener<br/>:15001"]
TL1 --> PD1["Protocol Detect"]
PD1 --> ST1["Service Table<br/>Lookup"]
ST1 --> TP1["Tunnel Pool<br/>(HTTP/2 mTLS client)"]
end
TP1 -->|"mTLS HTTP/2<br/>CONNECT :15002"| TS2
subgraph Node2["Node 2 (NovaEdge Agent)"]
TS2["Tunnel Server<br/>:15002"] --> AZ2["Authorizer<br/>(ALLOW/DENY)"]
AZ2 --> Pod2["Pod B<br/>(backend)"]
end
subgraph Controller["NovaEdge Controller"]
CA["Mesh CA<br/>(ECDSA P-384)"]
SB["Config Snapshot<br/>Builder"]
end
CA -.->|"Sign CSR<br/>(gRPC)"| TL1
SB -.->|"Push services +<br/>authz policies"| Node1
SB -.->|"Push services +<br/>authz policies"| Node2
style Pod1 fill:#e1f5ff
style Pod2 fill:#e1f5ff
style CA fill:#fff4e6
style AZ2 fill:#f3e5f5
style TS2 fill:#e8f5e9
style TL1 fill:#e8f5e9
Components¶
| Component | File | Port | Purpose |
|---|---|---|---|
| TPROXY Manager | internal/agent/mesh/tproxy.go |
-- | Manages iptables mangle rules in NOVAEDGE_MESH chain |
| Transparent Listener | internal/agent/mesh/listener.go |
15001 | Accepts TPROXY-redirected connections |
| Protocol Detector | internal/agent/mesh/detect.go |
-- | Peeks at first bytes to identify HTTP/1, HTTP/2, TLS, or opaque TCP |
| Service Table | internal/agent/mesh/manager.go |
-- | Maps ClusterIP:port to backend endpoints with round-robin LB |
| Tunnel Server | internal/agent/mesh/tunnel.go |
15002 | HTTP/2 CONNECT server for incoming mTLS tunnels |
| Tunnel Pool | internal/agent/mesh/tunnel.go |
-- | Persistent HTTP/2 client pool for outbound tunnels |
| TLS Provider | internal/agent/mesh/tls.go |
-- | Manages TLS certificates with mutex-protected rotation |
| Certificate Requester | internal/agent/mesh/cert.go |
-- | Generates CSR, requests cert from controller, auto-renews |
| Authorizer | internal/agent/mesh/authz.go |
-- | Evaluates ALLOW/DENY policies per service |
| Mesh CA | internal/controller/meshca/ca.go |
-- | Controller-side CA that signs workload certificates |
Enabling the Service Mesh¶
Annotate services¶
Add the novaedge.io/mesh annotation to any Kubernetes Service you want to enroll:
apiVersion: v1
kind: Service
metadata:
name: my-backend
annotations:
novaedge.io/mesh: "enabled"
spec:
selector:
app: my-backend
ports:
- port: 8080
targetPort: 8080
When the NovaEdge controller detects this annotation, it includes the service in the InternalService list pushed to agents via ConfigSnapshot. The agent then creates iptables TPROXY rules to intercept traffic to the service's ClusterIP.
Disable mesh for a service¶
Remove the annotation or set it to any value other than "enabled":
The agent will remove the corresponding TPROXY rules on the next config reconciliation.
How TPROXY Interception Works¶
NovaEdge uses iptables TPROXY in the mangle table to intercept traffic destined to mesh-enrolled ClusterIP services without modifying the packets. This preserves the original destination address so the agent can look up the correct backend.
Packet flow¶
sequenceDiagram
participant App as Pod A (client)
participant IPT as iptables (mangle)
participant TL as Transparent Listener (:15001)
participant ST as Service Table
participant Backend as Pod B (backend)
App->>IPT: TCP SYN to 10.43.0.50:8080 (ClusterIP)
Note over IPT: PREROUTING -> NOVAEDGE_MESH chain<br/>Match: -d 10.43.0.50 --dport 8080<br/>Action: TPROXY --on-port 15001<br/>Set fwmark 0x1
IPT->>TL: Connection redirected (original dst preserved)
TL->>TL: Extract original destination via SO_ORIGINAL_DST
TL->>TL: DetectProtocol (peek first 16 bytes)
TL->>ST: Lookup("10.43.0.50", 8080)
ST-->>TL: Endpoint{address: "10.42.3.15", port: 8080}
TL->>Backend: TCP connect to 10.42.3.15:8080
Note over TL,Backend: Bidirectional proxy (io.Copy)
iptables rules created¶
The TPROXY manager creates the following network configuration:
# 1. Custom chain in the mangle table
iptables -t mangle -N NOVAEDGE_MESH
# 2. Jump from PREROUTING to the custom chain
iptables -t mangle -A PREROUTING -j NOVAEDGE_MESH
# 3. Per-service TPROXY rules (one per ClusterIP:port)
iptables -t mangle -A NOVAEDGE_MESH \
-p tcp -d 10.43.0.50 --dport 8080 \
-j TPROXY --tproxy-mark 0x1/0x1 --on-port 15001
# 4. Policy routing to deliver marked packets locally
ip rule add fwmark 1 lookup 100
ip route add local 0.0.0.0/0 dev lo table 100
Rules are reconciled on every config update: new services get rules added, removed services get rules deleted. On shutdown, all rules are cleaned up.
Why TPROXY instead of REDIRECT¶
TPROXY preserves the original destination address in the socket, allowing the transparent listener to extract it with SO_ORIGINAL_DST. With REDIRECT/DNAT, the original destination is lost (replaced by the proxy's address), requiring additional mechanisms to recover it. TPROXY also avoids the connection tracking overhead of NAT.
How the mTLS Tunnel Works¶
When a connection needs to reach a pod on a different node, the agent establishes an HTTP/2 CONNECT tunnel through the peer agent's tunnel server. All tunnel traffic is encrypted with mTLS using SPIFFE certificates.
sequenceDiagram
participant Client as Source Agent (Node 1)
participant Server as Dest Agent (Node 2)
participant Backend as Backend Pod
Note over Client,Server: TLS 1.3 handshake<br/>ALPN: h2<br/>Both sides present SPIFFE certs
Client->>Server: HTTP/2 CONNECT 10.42.3.15:8080<br/>X-NovaEdge-Source-ID: spiffe://cluster.local/agent/node-1<br/>X-NovaEdge-Dest-Service: my-backend.default
Server->>Server: Verify client cert (mTLS)
Server->>Server: Extract SPIFFE ID from peer cert
Server->>Server: Authorize(source, dest, method, path)
alt Authorized
Server->>Backend: TCP connect to 10.42.3.15:8080
Server-->>Client: 200 OK
Note over Client,Backend: Bidirectional data over HTTP/2 stream
else Denied
Server-->>Client: 403 Forbidden
end
Tunnel configuration¶
| Parameter | Value | Description |
|---|---|---|
| Port | 15002 | Tunnel server listen port |
| TLS version | TLS 1.3 minimum | Enforced via MinVersion: tls.VersionTLS13 |
| Client auth | RequireAndVerifyClientCert |
Both sides must present valid certificates |
| ALPN | h2 |
HTTP/2 protocol negotiation |
| Connect timeout | 5 seconds | Timeout for dialing backend pods |
Connection pooling¶
The TunnelPool maintains persistent HTTP/2 connections to peer agents, keyed by node address. Multiple tunnel streams are multiplexed over a single TLS connection, reducing handshake overhead for subsequent requests to the same node.
Certificate Lifecycle¶
NovaEdge uses SPIFFE-compatible workload certificates for mesh identity. The certificate lifecycle is fully automatic.
sequenceDiagram
participant Agent as Node Agent
participant CR as Cert Requester
participant Controller as Controller (Mesh CA)
Note over Agent: Agent starts mesh manager
CR->>CR: Generate ECDSA P-256 key pair
CR->>CR: Build CSR with SPIFFE URI SAN<br/>spiffe://cluster.local/agent/<node-name>
CR->>Controller: gRPC: RequestMeshCertificate(CSR, nodeName)
Controller->>Controller: Verify CSR signature
Controller->>Controller: Sign with CA key (ECDSA P-384)<br/>Validity: 24 hours<br/>ExtKeyUsage: ClientAuth + ServerAuth
Controller-->>CR: Certificate + CA bundle + SPIFFE ID + Expiry
CR->>Agent: UpdateTLSCertificate(cert, key, ca, spiffeID)
Agent->>Agent: TLSProvider updates cert under write lock
Note over Agent: Tunnel server and pool use<br/>dynamic TLS callbacks (read lock)
Note over CR: Wait for 80% of lifetime (19.2h)
CR->>CR: Generate new key pair + CSR
CR->>Controller: gRPC: RequestMeshCertificate(CSR, nodeName)
Controller-->>CR: New certificate
CR->>Agent: UpdateTLSCertificate(...)
Note over Agent: Zero-downtime rotation<br/>(mutex-protected swap)
Certificate properties¶
| Property | Value |
|---|---|
| Key algorithm | ECDSA P-256 (workload), ECDSA P-384 (CA) |
| SPIFFE URI SAN | spiffe://<trust-domain>/agent/<node-name> |
| Default trust domain | cluster.local |
| Workload cert validity | 24 hours |
| Root CA validity | ~10 years |
| Renewal threshold | 80% of lifetime (19.2 hours for 24h certs) |
| Minimum renewal interval | 30 seconds (prevents tight loops) |
| CSR request timeout | 30 seconds |
| Retry delay on failure | 5 seconds |
Mesh CA¶
The controller runs an embedded Mesh CA (internal/controller/meshca/) that signs workload certificates:
- Root CA key: ECDSA P-384, stored in Kubernetes Secret
novaedge-mesh-cain namespacenovaedge-system - On first startup, the CA generates a new root key and persists it to the Secret
- On subsequent startups, it loads the existing key from the Secret
- Issued certificates include SPIFFE URI SANs and both
ClientAuthandServerAuthextended key usage
TLS rotation¶
The TLSProvider uses dynamic TLS callbacks (GetCertificate, GetClientCertificate, GetConfigForClient) so that certificate rotation is transparent to active connections. New connections automatically use the latest certificate without restarting the tunnel server or pool.
Authorization Policies¶
The mesh authorizer enforces service-level access control. Policies are pushed by the controller as part of the ConfigSnapshot.
Policy evaluation order¶
- DENY policies are evaluated first. If any DENY rule matches, the request is denied immediately.
- ALLOW policies are evaluated next. If any ALLOW rule matches, the request is allowed.
- If ALLOW policies exist but none match, the request is denied (default-deny when explicit ALLOW rules are present).
- If only DENY policies exist and none match, the request is allowed.
- If no policies exist for the destination service, the request is allowed (default-allow).
Policy structure¶
Policies are defined per target service and include source (from) and destination (to) constraints:
MeshAuthorizationPolicy:
name: string
action: "ALLOW" | "DENY"
target_service: string # e.g., "my-backend"
target_namespace: string # e.g., "default"
rules:
- from: # Source constraints (empty = match all)
- namespaces: [...]
serviceAccounts: [...]
spiffeIds: [...] # Glob patterns
to: # Destination constraints (empty = match all)
- methods: [...] # HTTP methods (case-insensitive)
paths: [...] # Glob patterns
Source matching¶
| Field | Match type | Example |
|---|---|---|
namespaces |
Exact | ["production", "staging"] |
serviceAccounts |
Exact | ["frontend-sa"] |
spiffeIds |
Glob | ["spiffe://cluster.local/ns/*/sa/frontend-*"] |
Destination matching¶
| Field | Match type | Example |
|---|---|---|
methods |
Case-insensitive exact | ["GET", "POST"] |
paths |
Glob | ["/api/*", "/health"] |
For opaque TCP connections (non-HTTP), destination rules with methods or paths set will not match. Use source-only rules for L4 authorization.
Example: allow only frontend to access backend¶
# Pushed via ConfigSnapshot (protobuf MeshAuthorizationPolicy)
action: ALLOW
target_service: my-backend
target_namespace: default
rules:
- from:
- namespaces: ["default"]
serviceAccounts: ["frontend-sa"]
to:
- methods: ["GET", "POST"]
paths: ["/api/*"]
Example: deny a specific namespace¶
action: DENY
target_service: my-backend
target_namespace: default
rules:
- from:
- namespaces: ["untrusted"]
Troubleshooting¶
Check if mesh is active on a node¶
# Verify the NOVAEDGE_MESH chain exists
iptables -t mangle -L NOVAEDGE_MESH -n -v
# Expected output shows per-service TPROXY rules:
# Chain NOVAEDGE_MESH (1 references)
# pkts bytes target prot opt in out source destination
# 142 8520 TPROXY tcp -- * * 0.0.0.0/0 10.43.0.50 tcp dpt:8080 TPROXY redirect 0.0.0.0:15001 mark 0x1/0x1
Check TPROXY routing setup¶
# Verify the ip rule for fwmark 1
ip rule show | grep "fwmark 0x1"
# Expected: 0: from all fwmark 0x1 lookup 100
# Verify the local route in table 100
ip route show table 100
# Expected: local default dev lo scope host
Check the transparent listener¶
Check the tunnel server¶
Check certificate status¶
# Check agent logs for certificate lifecycle events
kubectl logs -n novaedge-system -l app.kubernetes.io/name=novaedge-agent | grep "mesh.*cert"
# Expected log lines:
# "Mesh certificate obtained, scheduling renewal" expiry=... lifetime=24h0m0s renew_in=19h12m0s
# "Mesh certificate applied" spiffe_id=spiffe://cluster.local/agent/node-1
# Verify the CA secret exists
kubectl get secret novaedge-mesh-ca -n novaedge-system
Check mesh service count¶
# Look for mesh config application in agent logs
kubectl logs -n novaedge-system -l app.kubernetes.io/name=novaedge-agent | grep "Mesh config applied"
# Expected: "Mesh config applied" services=5 intercept_rules=8 routing_entries=8 authz_policies=3
Connection not being intercepted¶
If traffic to a mesh-enrolled service is not being intercepted:
- Verify the service has the annotation:
kubectl get svc <name> -o jsonpath='{.metadata.annotations.novaedge\.io/mesh}' - Check that the corresponding iptables rule exists:
iptables -t mangle -L NOVAEDGE_MESH -n | grep <clusterIP> - Verify the agent received the service in its config: check agent logs for
intercept_rulescount - Confirm the transparent listener is accepting connections:
ss -tlnp | grep 15001
Tunnel connection failures¶
# Check for tunnel errors in agent logs
kubectl logs -n novaedge-system -l app.kubernetes.io/name=novaedge-agent | grep -i tunnel
# Common issues:
# - "no mesh TLS certificate loaded" -> cert requester has not obtained a cert yet
# - "CONNECT ... returned status 403" -> authorization policy is denying the connection
# - "Failed to dial backend" -> backend pod is unreachable from the destination node
Authorization denied unexpectedly¶
# Check authorizer debug logs (set log level to debug)
kubectl logs -n novaedge-system -l app.kubernetes.io/name=novaedge-agent | grep "mesh authorization"
# Expected for denials:
# "mesh authorization denied by DENY policy" policy=... source=... dest=...
# "mesh authorization denied: no ALLOW policy matched" source=... dest=...
Protocol Detection¶
The transparent listener peeks at the first 16 bytes of each intercepted connection to detect the application protocol:
| Protocol | Detection method | Handling |
|---|---|---|
| HTTP/1.x | Starts with GET, POST, PUT, etc. |
L4 proxy (L7 routing planned) |
| HTTP/2 | Starts with PRI * HTTP/2 (connection preface) |
L4 proxy |
| TLS | Starts with 0x16 0x03 (ClientHello) |
L4 proxy |
| Opaque TCP | None of the above | L4 proxy (passthrough) |
All protocols are currently proxied as L4 TCP. HTTP-aware routing (L7 mesh) is planned for a future release.
Related Pages¶
- TLS -- TLS certificate management for ingress traffic
- Policies -- Rate limiting, authentication, and WAF policies for north-south traffic
- VIP Management -- Virtual IP management for external access