[NDMII-3459] Add VPN tunnels metadata collection for SNMP #37339

Pierre-L42 · 2025-05-23T16:18:54Z

What does this PR do?

This PR adds VPN tunnels and route table metadata collection to SNMP. It adds a new VPNTunnelsMetadata to NetworkDevicesMetadata sent to the back-end.

Motivation

https://datadoghq.atlassian.net/wiki/spaces/II/pages/5095162627/On-Prem+VPN+Data+Collection+for+Site-to-Site+VPN+Resolution

Describe how you validated your changes

Unit tests
Installed the Agent on this instance on AWS "Datadog Sandbox" which sets up a site-to-site VPN tunnel, and verified that vpn_tunnels is displayed in the datadog-agent check snmp command:

Possible Drawbacks / Trade-offs

Additional Notes

agent-platform-auto-pr · 2025-05-23T16:44:48Z

Uncompressed package size comparison

Comparison with ancestor ef24500285bcdf4fe2ba51516b1ed68353372306

Diff per package

package	diff	status	size	ancestor	threshold
datadog-agent-x86_64-rpm	0.03MB	⚠️	783.25MB	783.22MB	0.50MB
datadog-agent-x86_64-suse	0.03MB	⚠️	783.25MB	783.22MB	0.50MB
datadog-agent-amd64-deb	0.03MB	⚠️	774.30MB	774.27MB	0.50MB
datadog-heroku-agent-amd64-deb	0.03MB	⚠️	380.68MB	380.65MB	0.50MB
datadog-iot-agent-amd64-deb	0.03MB	⚠️	62.83MB	62.80MB	0.50MB
datadog-iot-agent-x86_64-rpm	0.03MB	⚠️	62.91MB	62.88MB	0.50MB
datadog-iot-agent-x86_64-suse	0.03MB	⚠️	62.91MB	62.88MB	0.50MB
datadog-agent-aarch64-rpm	0.03MB	⚠️	769.33MB	769.30MB	0.50MB
datadog-agent-arm64-deb	0.03MB	⚠️	760.40MB	760.37MB	0.50MB
datadog-iot-agent-arm64-deb	0.02MB	⚠️	59.39MB	59.37MB	0.50MB
datadog-iot-agent-aarch64-rpm	0.02MB	⚠️	59.48MB	59.45MB	0.50MB
datadog-dogstatsd-amd64-deb	0.00MB	✅	31.99MB	31.99MB	0.50MB
datadog-dogstatsd-x86_64-rpm	0.00MB	✅	32.07MB	32.07MB	0.50MB
datadog-dogstatsd-x86_64-suse	0.00MB	✅	32.07MB	32.07MB	0.50MB
datadog-dogstatsd-arm64-deb	0.00MB	✅	30.47MB	30.47MB	0.50MB

Decision

⚠️ Warning

cit-pr-commenter · 2025-05-23T17:04:48Z

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: 455b7ca7-6910-426c-8204-11b7df9a0e1c

Baseline: a95af0e
Comparison: 14b117d
Diff

Optimization Goals: ✅ Improvement(s) detected

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
✅	file_tree	memory utilization	-6.62	[-6.81, -6.44]	1	Logs
✅	docker_containers_cpu	% cpu utilization	-7.12	[-10.08, -4.16]	1	Logs

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	ddot_logs	memory utilization	+0.64	[+0.52, +0.76]	1	Logs
➖	ddot_metrics	memory utilization	+0.47	[+0.35, +0.59]	1	Logs
➖	uds_dogstatsd_to_api_cpu	% cpu utilization	+0.40	[-0.45, +1.24]	1	Logs
➖	tcp_syslog_to_blackhole	ingress throughput	+0.40	[+0.34, +0.45]	1	Logs
➖	quality_gate_idle_all_features	memory utilization	+0.36	[+0.28, +0.44]	1	Logs bounds checks dashboard
➖	quality_gate_logs	% cpu utilization	+0.26	[-2.47, +2.98]	1	Logs bounds checks dashboard
➖	otlp_ingest_metrics	memory utilization	+0.20	[+0.03, +0.36]	1	Logs
➖	otlp_ingest_logs	memory utilization	+0.09	[-0.04, +0.22]	1	Logs
➖	tcp_dd_logs_filter_exclude	ingress throughput	+0.01	[-0.02, +0.03]	1	Logs
➖	file_to_blackhole_0ms_latency_http1	egress throughput	-0.00	[-0.61, +0.61]	1	Logs
➖	uds_dogstatsd_to_api	ingress throughput	-0.01	[-0.27, +0.25]	1	Logs
➖	file_to_blackhole_1000ms_latency	egress throughput	-0.02	[-0.63, +0.59]	1	Logs
➖	file_to_blackhole_0ms_latency	egress throughput	-0.02	[-0.61, +0.57]	1	Logs
➖	file_to_blackhole_500ms_latency	egress throughput	-0.03	[-0.58, +0.53]	1	Logs
➖	file_to_blackhole_300ms_latency	egress throughput	-0.04	[-0.60, +0.53]	1	Logs
➖	file_to_blackhole_100ms_latency	egress throughput	-0.04	[-0.58, +0.49]	1	Logs
➖	file_to_blackhole_1000ms_latency_linear_load	egress throughput	-0.05	[-0.27, +0.18]	1	Logs
➖	file_to_blackhole_0ms_latency_http2	egress throughput	-0.06	[-0.66, +0.54]	1	Logs
➖	quality_gate_idle	memory utilization	-0.08	[-0.14, -0.01]	1	Logs bounds checks dashboard
➖	uds_dogstatsd_20mb_12k_contexts_20_senders	memory utilization	-0.29	[-0.35, -0.23]	1	Logs
➖	docker_containers_memory	memory utilization	-2.11	[-2.17, -2.05]	1	Logs
✅	file_tree	memory utilization	-6.62	[-6.81, -6.44]	1	Logs
✅	docker_containers_cpu	% cpu utilization	-7.12	[-10.08, -4.16]	1	Logs

Bounds Checks: ✅ Passed

perf	experiment	bounds_check_name	replicates_passed	links
✅	docker_containers_cpu	simple_check_run	10/10
✅	docker_containers_memory	memory_usage	10/10
✅	docker_containers_memory	simple_check_run	10/10
✅	file_to_blackhole_0ms_latency	lost_bytes	10/10
✅	file_to_blackhole_0ms_latency	memory_usage	10/10
✅	file_to_blackhole_0ms_latency_http1	lost_bytes	10/10
✅	file_to_blackhole_0ms_latency_http1	memory_usage	10/10
✅	file_to_blackhole_0ms_latency_http2	lost_bytes	10/10
✅	file_to_blackhole_0ms_latency_http2	memory_usage	10/10
✅	file_to_blackhole_1000ms_latency	memory_usage	10/10
✅	file_to_blackhole_1000ms_latency_linear_load	memory_usage	10/10
✅	file_to_blackhole_100ms_latency	lost_bytes	10/10
✅	file_to_blackhole_100ms_latency	memory_usage	10/10
✅	file_to_blackhole_300ms_latency	lost_bytes	10/10
✅	file_to_blackhole_300ms_latency	memory_usage	10/10
✅	file_to_blackhole_500ms_latency	lost_bytes	10/10
✅	file_to_blackhole_500ms_latency	memory_usage	10/10
✅	quality_gate_idle	intake_connections	10/10	bounds checks dashboard
✅	quality_gate_idle	memory_usage	10/10	bounds checks dashboard
✅	quality_gate_idle_all_features	intake_connections	10/10	bounds checks dashboard
✅	quality_gate_idle_all_features	memory_usage	10/10	bounds checks dashboard
✅	quality_gate_logs	intake_connections	10/10	bounds checks dashboard
✅	quality_gate_logs	lost_bytes	10/10	bounds checks dashboard
✅	quality_gate_logs	memory_usage	10/10	bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

CI Pass/Fail Decision

✅ Passed. All Quality Gates passed.

quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.

agent-platform-auto-pr · 2025-05-26T13:11:47Z

Static quality checks

✅ Please find below the results from static quality gates
Comparison made with ancestor a95af0e

Successful checks

Info

	Quality gate	Delta	On disk size (MiB)	Delta	On wire size (MiB)
✅	agent_deb_amd64	$${+0.03}$$	$${697.05}$$ < $${697.37}$$	$${+0.01}$$	$${176.11}$$ < $${177.03}$$
✅	agent_deb_amd64_fips	$${+0.03}$$	$${695.33}$$ < $${695.59}$$	$${+0.06}$$	$${175.57}$$ < $${176.51}$$
✅	agent_heroku_amd64	$${+0.03}$$	$${358.7}$$ < $${359.67}$$	$${+0.01}$$	$${96.52}$$ < $${97.47}$$
✅	agent_msi	$${+0.11}$$	$${958.99}$$ < $${959.86}$$	$${+0.04}$$	$${146.32}$$ < $${147.27}$$
✅	agent_rpm_amd64	$${+0.03}$$	$${697.04}$$ < $${697.36}$$	$${+0.02}$$	$${177.71}$$ < $${178.56}$$
✅	agent_rpm_amd64_fips	$${+0.03}$$	$${695.32}$$ < $${695.58}$$	$${-0.03}$$	$${177.54}$$ < $${178.43}$$
✅	agent_rpm_arm64	$${+0.03}$$	$${687.06}$$ < $${687.37}$$	$${-0.01}$$	$${161.1}$$ < $${161.99}$$
✅	agent_rpm_arm64_fips	$${+0.03}$$	$${685.46}$$ < $${685.72}$$	$${-0.03}$$	$${160.22}$$ < $${161.11}$$
✅	agent_suse_amd64	$${+0.03}$$	$${697.04}$$ < $${697.36}$$	$${+0.02}$$	$${177.71}$$ < $${178.56}$$
✅	agent_suse_amd64_fips	$${+0.03}$$	$${695.32}$$ < $${695.58}$$	$${-0.03}$$	$${177.54}$$ < $${178.43}$$
✅	agent_suse_arm64	$${+0.03}$$	$${687.06}$$ < $${687.37}$$	$${-0.01}$$	$${161.1}$$ < $${161.99}$$
✅	agent_suse_arm64_fips	$${+0.03}$$	$${685.46}$$ < $${685.72}$$	$${-0.03}$$	$${160.22}$$ < $${161.11}$$
✅	docker_agent_amd64	$${+0.03}$$	$${780.84}$$ < $${781.16}$$	$${+0.01}$$	$${268.8}$$ < $${269.63}$$
✅	docker_agent_arm64	$${+0.03}$$	$${794.31}$$ < $${794.62}$$	$${+0.01}$$	$${256.15}$$ < $${257.0}$$
✅	docker_agent_jmx_amd64	$${+0.03}$$	$${972.03}$$ < $${972.35}$$	$${-0}$$	$${337.77}$$ < $${338.6}$$
✅	docker_agent_jmx_arm64	$${+0.03}$$	$${974.1}$$ < $${974.41}$$	$${+0.02}$$	$${321.1}$$ < $${321.97}$$
✅	docker_agent_windows1809	$${+0.22}$$	$${1487.13}$$ < $${1488.0}$$	$${+0.34}$$	$${488.03}$$ < $${488.95}$$
✅	docker_agent_windows1809_core	$${+0.11}$$	$${6217.09}$$ < $${6218.0}$$	$${0}$$	$${2048.0}$$ < $${2049.0}$$
✅	docker_agent_windows1809_core_jmx	$${+0.31}$$	$${6338.86}$$ < $${6361.0}$$	$${0}$$	$${2048.0}$$ < $${2049.0}$$
✅	docker_agent_windows1809_jmx	$${+0.11}$$	$${1608.76}$$ < $${1609.5}$$	$${+0.02}$$	$${530.33}$$ < $${531.32}$$
✅	docker_agent_windows2022	$${-0.19}$$	$${1506.26}$$ < $${1507.0}$$	$${-0.04}$$	$${500.76}$$ < $${501.7}$$
✅	docker_agent_windows2022_core	$${+0.21}$$	$${6190.31}$$ < $${6311.0}$$	$${0}$$	$${2048.0}$$ < $${2049.0}$$
✅	docker_agent_windows2022_core_jmx	$${+0.09}$$	$${6311.83}$$ < $${6313.0}$$	$${0}$$	$${2048.0}$$ < $${2049.0}$$
✅	docker_agent_windows2022_jmx	$${-0.35}$$	$${1627.88}$$ < $${1628.16}$$	$${-0.02}$$	$${543.05}$$ < $${543.98}$$
✅	docker_cluster_agent_amd64	$${+0.03}$$	$${212.87}$$ < $${213.79}$$	$${-0}$$	$${72.38}$$ < $${73.33}$$
✅	docker_cluster_agent_arm64	$${+0.06}$$	$${228.75}$$ < $${229.64}$$	$${-0.01}$$	$${68.65}$$ < $${69.6}$$
✅	docker_cws_instrumentation_amd64	$${0}$$	$${7.08}$$ < $${7.12}$$	$${-0}$$	$${2.95}$$ < $${3.29}$$
✅	docker_cws_instrumentation_arm64	$${0}$$	$${6.69}$$ < $${6.92}$$	$${+0}$$	$${2.7}$$ < $${3.07}$$
✅	docker_dogstatsd_amd64	$${-0}$$	$${39.22}$$ < $${39.57}$$	$${-0}$$	$${15.12}$$ < $${15.76}$$
✅	docker_dogstatsd_arm64	$${0}$$	$${37.88}$$ < $${38.2}$$	$${+0}$$	$${14.53}$$ < $${14.83}$$
✅	dogstatsd_deb_amd64	$${0}$$	$${30.45}$$ < $${31.4}$$	$${-0}$$	$${8.0}$$ < $${8.95}$$
✅	dogstatsd_deb_arm64	$${0}$$	$${29.02}$$ < $${29.97}$$	$${-0}$$	$${6.94}$$ < $${7.89}$$
✅	dogstatsd_rpm_amd64	$${0}$$	$${30.45}$$ < $${31.4}$$	$${-0}$$	$${8.01}$$ < $${8.96}$$
✅	dogstatsd_suse_amd64	$${0}$$	$${30.45}$$ < $${31.4}$$	$${-0}$$	$${8.01}$$ < $${8.96}$$
✅	iot_agent_deb_amd64	$${+0.03}$$	$${50.46}$$ < $${51.38}$$	$${+0.01}$$	$${12.85}$$ < $${13.79}$$
✅	iot_agent_deb_arm64	$${+0.03}$$	$${47.92}$$ < $${48.85}$$	$${+0}$$	$${11.14}$$ < $${12.09}$$
✅	iot_agent_deb_armhf	$${+0.03}$$	$${47.5}$$ < $${48.42}$$	$${+0.01}$$	$${11.21}$$ < $${12.16}$$
✅	iot_agent_rpm_amd64	$${+0.03}$$	$${50.46}$$ < $${51.38}$$	$${+0.01}$$	$${12.87}$$ < $${13.81}$$
✅	iot_agent_rpm_arm64	$${+0.03}$$	$${47.93}$$ < $${48.85}$$	$${+0.01}$$	$${11.16}$$ < $${12.11}$$
✅	iot_agent_suse_amd64	$${+0.03}$$	$${50.46}$$ < $${51.38}$$	$${+0.01}$$	$${12.87}$$ < $${13.81}$$

hmahmood · 2025-06-05T20:48:15Z

pkg/collector/corechecks/snmp/internal/report/report_device_metadata.go

+		routePrefixLen := netmaskToPrefixlen(strings.Join(indexElems[4:8], "."))
+		nextHopIP := strings.Join(indexElems[9:13], ".")
+
+		ifIndex := store.GetColumnAsString("ipforward_deprecated.if_index", strIndex)


Is ifIndex always non-empty?

It is non-empty, but if a route doesn't have an interface, then the value will be 0

hmahmood · 2025-06-05T20:51:42Z

pkg/collector/corechecks/snmp/internal/report/report_device_metadata.go

+			nextHopIP = strings.Join(indexElems[currMaxIndex-nextHopLength:currMaxIndex], ".")
+		}
+
+		ifIndex := store.GetColumnAsString("ipforward.if_index", strIndex)


Again: is ifIndex always non-empty?

Same as above

dustmop

LGTM for agent-configuration owned files

TCheruy

Code is very clean! 👏 left a few questions but looks good overall

TCheruy · 2025-06-16T09:18:31Z

pkg/collector/corechecks/snmp/internal/report/report_device_metadata.go

+			// 4 ipCidrRouteDest
+			// 4 ipCidrRouteMask
+			// 1 ipCidrRouteTos
+			// 4 ipCidrRouteNextHop


Let's add a log here

Done in d9e394b

TCheruy · 2025-06-16T09:25:46Z

pkg/collector/corechecks/snmp/internal/report/report_device_metadata.go

+		}
+
+		destAddrType := indexElems[currMaxIndex-2]
+		if destAddrType != inetAddressIPv4 {


we don't want to support ipv6?

not for now

TCheruy · 2025-06-16T09:28:20Z

pkg/collector/corechecks/snmp/internal/report/report_device_metadata.go

+			NextHopIP:   nextHopIP,
+			IfIndex:     ifIndex,
+		}
+		routesByIfIndex[ifIndex] = append(routesByIfIndex[ifIndex], route)


if a device expose its routes through both the deprecated and current OIDs, will we get duplicated entries in the metadata?

you are right, added a set for routes to avoid duplicates in 421fb44, same for tunnels.

Also realized when converting the VPN tunnels map values to a slice, the order is not always the same, I modified the ToSlice method to always have the same output

…el slice

Create buildVPNTunnelsMetadata

517a725

github-actions bot added team/ndm-core medium review PR review might take time labels May 23, 2025

Pierre-L42 added 3 commits May 26, 2025 10:16

Add option collect_vpn

f7d61c5

Add tests for collect_vpn

eb03f09

Fix nil map

2c92d47

github-actions bot added long review PR is complex, plan time to review it and removed medium review PR review might take time labels May 26, 2025

Pierre-L42 added 2 commits May 26, 2025 13:08

Add VPNTunnels to payload

c005111

Sort indexes

7d54ce5

Pierre-L42 added 12 commits May 26, 2025 15:47

Add interface ID to VPN tunnels

dab697e

Add tests for buildVPNTunnelsMetadata

14842af

Remove route type

7c7b578

Remove route type from tests

e34944e

Remove ipforward_obsolete

ca8409e

Add release note

f03a32c

Revert

e68fc8b

Create vpn_tunnels.go

b265410

Create vpn_tunnels_test.go

1bdd39a

Merge branch 'main' into pierre.lin/vpn-tunnels-2

0ea83b2

Fix lints

e06dbd1

Fix lint

913f87e

Pierre-L42 added the qa/done QA done before merge and regressions are covered by tests label May 28, 2025

Remove prints

2149944

hmahmood reviewed Jun 5, 2025

View reviewed changes

Pierre-L42 added 2 commits June 6, 2025 15:29

Make interface optional and handle tunnels with same outside IP

80e7cb0

Merge branch 'main' into pierre.lin/vpn-tunnels-2

c43f286

Pierre-L42 marked this pull request as ready for review June 10, 2025 10:16

Pierre-L42 requested review from a team as code owners June 10, 2025 10:16

Pierre-L42 requested review from TCheruy and dustmop June 10, 2025 10:16

michaelcretzman approved these changes Jun 10, 2025

View reviewed changes

dustmop approved these changes Jun 11, 2025

View reviewed changes

TCheruy reviewed Jun 16, 2025

View reviewed changes

Pierre-L42 added 2 commits June 16, 2025 13:29

Add logs

d9e394b

Check for duplicate route/tunnel and normalize/sort returned VPN tunn…

421fb44

…el slice

Pierre-L42 added this to the 7.69.0 milestone Jun 16, 2025

Merge branch 'main' into pierre.lin/vpn-tunnels-2

14b117d

jmw51798 approved these changes Jun 16, 2025

View reviewed changes

[NDMII-3459] Add VPN tunnels metadata collection for SNMP #37339

Are you sure you want to change the base?

[NDMII-3459] Add VPN tunnels metadata collection for SNMP #37339

Uh oh!

Conversation

Pierre-L42 commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Describe how you validated your changes

Possible Drawbacks / Trade-offs

Additional Notes

Uh oh!

agent-platform-auto-pr bot commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uncompressed package size comparison

Decision

Uh oh!

cit-pr-commenter bot commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector

Regression Detector Results

Optimization Goals: ✅ Improvement(s) detected

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

CI Pass/Fail Decision

Uh oh!

agent-platform-auto-pr bot commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Static quality checks

Info

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dustmop left a comment

Choose a reason for hiding this comment

Uh oh!

TCheruy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pierre-L42 Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Pierre-L42 commented May 23, 2025 •

edited

Loading

agent-platform-auto-pr bot commented May 23, 2025 •

edited

Loading

cit-pr-commenter bot commented May 23, 2025 •

edited

Loading

agent-platform-auto-pr bot commented May 26, 2025 •

edited

Loading

Pierre-L42 Jun 16, 2025 •

edited

Loading