feat(dns-strict-resolver): add unconnected-UDP strict DNS sample#223
feat(dns-strict-resolver): add unconnected-UDP strict DNS sample#223officialasishkumar wants to merge 3 commits intomainfrom
Conversation
Adds a minimal self-contained Go HTTP server that exercises the unconnected-UDP + RFC 5452 strict-source-validation DNS client path — the exact path that exposes the bug fixed in keploy/keploy#4093 and keploy/ebpf#97 (tracking issue keploy/keploy#4092). The server opens a UDP socket, sends a raw DNS A query to the system resolver, and accepts the reply only if its source address matches the nameserver it queried. Any mismatch is counted and discarded. Under the buggy Keploy build the reply arrived from <agent_ip>:<keploy_dns_port> instead of the real nameserver, the source check rejected it, and the endpoint eventually returned HTTP 502 with source_mismatches > 0 — the userspace equivalent of the production "Temporary failure in name resolution" / EAI_AGAIN symptom. After the fix, every client passes with source_mismatches == 0. net.LookupHost is not used here on purpose: on glibc (cgo) it predominantly uses connected UDP, which was already rescued by the existing cgroup/getpeername4 hook and therefore never exposed the bug. A raw UDP client is the smallest deterministic repro of the production failure mode. Public domains (google.com, cloudflare.com, example.com) only; no internal hostnames. Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
There was a problem hiding this comment.
Pull request overview
Adds a new dns-strict-resolver/ Go sample to reproduce and guard against unconnected-UDP DNS reply source-mismatch behavior (RFC 5452 strict source validation) relevant to keploy/keploy#4092 regression testing.
Changes:
- Introduces a stdlib-only Go HTTP server with
/resolveperforming raw UDP DNS A queries and strict reply source validation. - Adds a simple curl traffic generator script for CI/e2e usage.
- Documents purpose, usage, and Keploy record/test instructions for the sample.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| dns-strict-resolver/main.go | Implements raw DNS query construction, strict UDP reply acceptance, and HTTP endpoints. |
| dns-strict-resolver/go.mod | Declares a standalone Go module for the new sample. |
| dns-strict-resolver/curl.sh | Provides traffic generation for exercising /resolve in e2e flows. |
| dns-strict-resolver/README.md | Explains rationale, running instructions, and Keploy usage for the regression sample. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| r.Error = err.Error() | ||
| return r | ||
| } | ||
| defer conn.Close() |
There was a problem hiding this comment.
defer conn.Close() drops the returned error, which errcheck will report in this repo. Please either handle the close error in a deferred func or explicitly suppress it with //nolint:errcheck (as done elsewhere in the repo).
| off := 12 | ||
| for q := uint16(0); q < qd && off < len(reply); q++ { | ||
| off = skipName(reply, off) | ||
| off += 4 | ||
| } |
There was a problem hiding this comment.
parseReply can return nil error for truncated/malformed replies: in the question section, it does off += 4 without checking off+4 <= len(reply), so malformed packets can silently produce empty results. Consider validating bounds and returning an error when the message is structurally invalid.
| line = strings.TrimSpace(line) | ||
| if strings.HasPrefix(line, "nameserver ") { | ||
| return net.JoinHostPort(strings.TrimPrefix(line, "nameserver "), "53") |
There was a problem hiding this comment.
defaultNameserver only matches lines starting with the literal prefix "nameserver ", so it will miss valid resolv.conf entries that use tabs/multiple spaces, and it may also break if the line has an inline comment (e.g. nameserver 1.1.1.1 # ...). Consider parsing with strings.Fields and stripping # comments before extracting the IP.
| line = strings.TrimSpace(line) | |
| if strings.HasPrefix(line, "nameserver ") { | |
| return net.JoinHostPort(strings.TrimPrefix(line, "nameserver "), "53") | |
| line, _, _ = strings.Cut(line, "#") | |
| fields := strings.Fields(line) | |
| if len(fields) >= 2 && fields[0] == "nameserver" { | |
| return net.JoinHostPort(fields[1], "53") |
| ```bash | ||
| sudo -E env PATH=$PATH keploy record -c "./dns-strict-resolver" | ||
| # hit /resolve endpoints, then stop keploy | ||
|
|
||
| sudo -E env PATH=$PATH keploy test -c "./dns-strict-resolver" --delay 10 | ||
| ``` |
There was a problem hiding this comment.
The “Under Keploy” section runs keploy record -c "./dns-strict-resolver" / keploy test -c "./dns-strict-resolver" but doesn’t mention building the binary first. As written, these commands will fail unless the user has already run go build . (producing ./dns-strict-resolver). Consider adding an explicit build step (or use -c "go run .").
| func buildQuery(domain string, txid uint16) ([]byte, error) { | ||
| var b bytes.Buffer | ||
| binary.Write(&b, binary.BigEndian, txid) | ||
| binary.Write(&b, binary.BigEndian, uint16(0x0100)) // RD=1 | ||
| binary.Write(&b, binary.BigEndian, uint16(1)) // QDCOUNT | ||
| binary.Write(&b, binary.BigEndian, uint16(0)) // ANCOUNT | ||
| binary.Write(&b, binary.BigEndian, uint16(0)) // NSCOUNT | ||
| binary.Write(&b, binary.BigEndian, uint16(0)) // ARCOUNT |
There was a problem hiding this comment.
buildQuery ignores the returned errors from binary.Write. With errcheck enabled in this repo, this will be reported (and it also makes error propagation inconsistent if the writer ever changes). Please handle the errors (and return them) or build the header using byte-slice appends that don’t involve error-returning writes.
| b.WriteByte(byte(len(label))) | ||
| b.WriteString(label) | ||
| } | ||
| b.WriteByte(0) | ||
| binary.Write(&b, binary.BigEndian, uint16(1)) // QTYPE A | ||
| binary.Write(&b, binary.BigEndian, uint16(1)) // QCLASS IN |
There was a problem hiding this comment.
buildQuery also ignores errors from WriteByte / WriteString (and later binary.Write calls). With errcheck enabled, this will be flagged. Please handle these errors (or restructure to avoid error-returning writes).
|
|
||
| http.HandleFunc("/health", func(w http.ResponseWriter, _ *http.Request) { | ||
| w.WriteHeader(http.StatusOK) | ||
| fmt.Fprint(w, "ok") |
There was a problem hiding this comment.
The /health handler ignores the error return from fmt.Fprint. With errcheck enabled, this will be flagged. Please check the returned error (or explicitly suppress with a justified //nolint:errcheck).
| fmt.Fprint(w, "ok") | |
| if _, err := fmt.Fprint(w, "ok"); err != nil { | |
| fmt.Fprintf(os.Stderr, "failed to write /health response: %v; check whether the client disconnected before retrying the request\n", err) | |
| } |
| return r | ||
| } | ||
| for time.Now().Before(deadline) { | ||
| _ = conn.SetReadDeadline(time.Now().Add(800 * time.Millisecond)) |
There was a problem hiding this comment.
The error from SetReadDeadline is intentionally ignored (_ = ...), but with errcheck enabled this will be flagged, and if setting the deadline fails the read loop behavior becomes unpredictable. Please handle the error (e.g., return it via res.Error) or explicitly justify/suppress it.
| _ = conn.SetReadDeadline(time.Now().Add(800 * time.Millisecond)) | |
| if err := conn.SetReadDeadline(time.Now().Add(800 * time.Millisecond)); err != nil { | |
| r.Error = err.Error() | |
| r.ElapsedMS = time.Since(start).Milliseconds() | |
| return r | |
| } |
| ns, err := net.ResolveUDPAddr("udp", nsAddr) | ||
| if err != nil { | ||
| r.Error = err.Error() | ||
| return r | ||
| } |
There was a problem hiding this comment.
resolveStrict returns early on several error paths without setting ElapsedMS, so the JSON response can show elapsed_ms: 0 even though work was done (and it differs from later error paths where ElapsedMS is set). For consistency and easier CI diagnostics, consider setting ElapsedMS on all returns (e.g., via a defer that updates it).
| r.Error = err.Error() | ||
| return r | ||
| } | ||
| conn, err := net.ListenUDP("udp", &net.UDPAddr{IP: net.IPv4zero, Port: 0}) |
There was a problem hiding this comment.
net.ListenUDP is currently bound to 0.0.0.0 (IPv4 only). If /etc/resolv.conf (or the nameserver query param) provides an IPv6 nameserver, the lookup will fail due to address-family mismatch. Consider selecting udp4 vs udp6 and binding to the matching unspecified address based on ns.IP.To4() (or use a dual-stack approach).
| conn, err := net.ListenUDP("udp", &net.UDPAddr{IP: net.IPv4zero, Port: 0}) | |
| listenNetwork := "udp4" | |
| listenAddr := &net.UDPAddr{IP: net.IPv4zero, Port: 0} | |
| if ns.IP.To4() == nil { | |
| listenNetwork = "udp6" | |
| listenAddr = &net.UDPAddr{IP: net.IPv6unspecified, Port: 0} | |
| } | |
| conn, err := net.ListenUDP(listenNetwork, listenAddr) |
The companion CI job moved from non-docker loopback mode to docker mode (keploy PR #4093). Bare-Linux loopback on ubuntu-latest turned out to not invoke cgroup/recvmsg4 for the sample's unconnected-UDP reads, which made the fix look like a no-op even when the build binary had it — docker mode (sample container → CoreDNS container over a bridge network) reliably exercises recvmsg4 and mirrors the original Flipkart production topology. Adds: - Dockerfile — stdlib-only Go build on golang:1.22-alpine, runtime on alpine:3.20, exposes :8086. - coredns/Corefile + coredns/zone — minimal CoreDNS config with hardcoded A records for google.com, cloudflare.com, example.com so the test does not depend on external resolvers. README stays accurate — strict-source validation is still exercised end-to-end; only the CI topology changed. Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Adds an optional NAMESERVER env var (ip:port) that, when set, appends `&nameserver=<value>` to each /resolve call. Lets the CI harness aim queries at a known CoreDNS fixture container even when keploy rewrites the sample's `--net` into `--network=container:<keploy-v3>`, in which case the sample's own /etc/resolv.conf (e.g. 127.0.0.11) doesn't point anywhere reachable from inside the shared netns. Unchanged by default — runs without NAMESERVER still use whatever /etc/resolv.conf resolves to, which is what you want for local standalone testing outside keploy. Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
What this adds
A new
dns-strict-resolver/sample — a minimal self-contained Go HTTP server (Go stdlib only, no third-party deps) that exercises the unconnected-UDP + RFC 5452 strict-source-validation DNS client path.The server:
/etc/resolv.conf) or an explicit?nameserver=ip:port.dnspython, rawrecvfrom-based clients, and glibc'sres_sendunconnected path perform).If the strict check rejects every reply,
/resolvereturns HTTP 502 withsource_mismatches > 0anderror: "no accepted reply ..."— the userspace equivalent of the productionjava.net.UnknownHostException: Temporary failure in name resolution/EAI_AGAINsymptom we chase in keploy/keploy#4092.Why the sample exists
Keploy has existing DNS samples (
dns-dedup) but they usenet.LookupHost, which on glibc (cgo) predominantly uses connected UDP. Connected-UDP clients are rescued by Keploy's existingcgroup/getpeername4hook and therefore never exposed the bug this sample regresses against. A raw UDP client is the smallest deterministic reproducer of the production failure.Regression target
The companion CI job in
keploy/keploy—dns_strict_resolver_testin.github/workflows/golang_linux.yml— builds and runs this sample underkeploy recordandkeploy testand fails if any/resolvecall reportssource_mismatches > 0or returns no IPs.Files
dns-strict-resolver/main.go— ~230 LOC, stdlib-only.dns-strict-resolver/curl.sh— hits/resolveforgoogle.com,cloudflare.com,example.com.dns-strict-resolver/go.mod— no dependencies.dns-strict-resolver/README.md— why and how, with a working example.All public domains — no internal hostnames.
How to try it
Expected (post-fix or no Keploy):
{"domain":"google.com","nameserver":"127.0.0.11:53","rcode":0,"ips":["142.250.x.x"],"source_mismatches":0,"attempts":1,"elapsed_ms":6}Under a buggy (pre-fix) Keploy recording:
{"domain":"google.com","nameserver":"127.0.0.11:53","rcode":0,"ips":null,"source_mismatches":3,"attempts":3,"elapsed_ms":3000,"error":"no accepted reply from 127.0.0.11:53 after 3 attempts"}Signed-off-by: Asish Kumar officialasishkumar@gmail.com