Experiments on New Archipelago (Ark) Measurement System

General approach: implement Linux OS-level controls to allow experimenters to run arbitrary Linux software in a constrained manner that protects the running software on the vantage point, protects privacy between multiple experimenters, and respects the vantage point host's preferences as to what network packet traffic is acceptable on that network.

Rough Goals:

Every measurement we do with the ark nodes now should be constructible in the proposed framework.

The owner of the vantage point should have sufficient control over what we do with it to prevent network abuse and legal problems, and to stop us from using it for experiments he's not comfortable with. But without getting so granular that it makes it hard to experiment.

Experimenters should have broad access to run arbitrary software in the computer language of their choice. Up to and including shell access.

It should not be possible for experimenters to disrupt each other or crash the vantage point.

Permissioning:

Vantage Point Owner View

Option: No third-party experiments. Only experiments created by CAIDA experimenters are permitted.

Option: Restrict DNS to whitelist of CAIDA-vetted names. Experiment must use getaddrinfo() or glean the allowed DNS server from /etc/resolv.conf. This would disable DNS experiments.

Option: Restrict TCP to whitelist of CAIDA-vetted IP addresses and ports. This would disable general purpose web site probing. We'd probably consider the explicit IP/port the experimenter requested for a real-time back channel as a CAIDA-vetted destination.

Option: Restrict UDP to whitelist of CAIDA-vetted IP addresses and ports. This would disable probing non-local DNS servers.

Option: Restrict all packets to whitelist of CAIDA-vetted IP addresses and ports. Note that this would disable prefix probing entirely.

Option: Restrict ICMP except echo-request, echo-reply, time exceeded and destination unreachable.

Option: Allow UDP and TCP for all DNS servers (port 53). Overrides otherwise restricted TCP and UDP.

Option: TCP port whitelist. TCP is allowed to all Internet destinations but only the destination ports on the whitelist.

Option: UDP port whitelist. UDP is allowed to all Internet destinations but only the destination ports on the whitelist.

Option: no TCP data packets. Allow TCP SYN packets to be used for traceroute to any address (overrides whitelist) but do not allow any data to be sent.

Option: Allow experiments with IP protocols other than ICMP, UDP and TCP.

Option: Allow spoofed source IP address. For example, for BCP38 compliance testing. Spoofed packets will not be subject to NAT. Limited to a CAIDA-vetted list of destination IP addresses.

Maximum megabits per second. Maximum megabytes per day.

Defaults: 10mbps. No daily max. All experimenters. CAIDA whitelist UDP and TCP. All DNS. All ICMP. No protocols other than ICMP, UDP, TCP.

Note that experiment packets targeted at restricted addresses (except the default router) are always dropped: 0.0.0.0/8, 10.0.0.0/8, 169.254.0.0/16, 172.16.0.0/12, 192.168.0.0/16, 224.0.0.0/3

Note: allow the vantage point owner to go to a web page and pull up a list of experiments completed, ongoing and planned which are set to happen on his node. Provide the short title for the experiment and a contact email address for the experimenter. If he sees traffic he doesn't like, he can pull up the experiments list at any time to see whose experiment is running. Give him a Stop button too so he can shut down any experiment causing him grief without pulling the plug on the whole node.

Experimenter View

Experimenters: names and email addresses of each person involved in operating the experiment.

Maximum experiment duration in hours.

Expected CPU usage: minimal (only a few packets at a time and little local processing), moderate (large number of packets or non-trivial local processing), substantial (lots of packets and local processing)

Maximum RAM in megabytes (note that malloc() will fail if the set of programs exceeds this amount, probably crashing the program)

Maximum megabits per second. Maximum megabytes per day.

Select: IPv4, IPv6, both. Recommend splitting the experiment into one of each unless the experiment seeks to explicitly contrast IPv4 and IPv6 activity on the same vantage point.

DNS hostnames queried. Subdomains of listed domains will be allowed. Use "any" for no restriction. Applies only to local DNS server queries. To query remote DNS servers include "0.0.0.0/0:53" in the UDP and TCP destinations below.

Outbound ICMP types used: e.g. echo-request.

UDP destination prefixes and ports used. Examples: none, 0.0.0.0/0:*, 198.51.100.1/32:1-1024

TCP destination prefixes and ports used. Examples: none, 0.0.0.0/0:*, 198.51.100.1/32:80,443

TCP SYN-only ports (for TCP traceroute). Example: 22,80,443

Other IP protocols, by number. Examples: 50 (ipsec) and 47 (gre). Note that protocols other than ICMP, UDP and TCP will most likely operate usefully on IPv4 due to NAT.

Spoofed source: Source addresses which will be spoofed (maximum 4) and the destinations to which the spoofed packets will be sent (also maximum 4).

CAIDA container version, so we run the experiment in the same container that it was built for.

Architecture restrictions: x86_64, aarch64, armv7l, any. If the experimenter provides any Linux binaries (instead of just scripts like python) then they'll have to provide them for all vantage point CPU architectures they want to run on.

Linux Container:

The basic OS components under which experiments can run. This is what sits inside the mount namespace available to the experiments.

To run an experiment, we take the container, add the experiment's software, set up all the magic with namespaces, cgroups, iptables and so on, and then execute the experiment inside the container.

Includes common Linux utilities and the python and bash interpreters, each with network "capabilities" set.

Add python networking packages and additional interpreters as we find them commonly useful.

Includes a "pcap" program that's setuid root. When run, the pcap program opens the virtual ethernet interface and streams captured packets in pcap format to stdout. Experiments can access this with popen("/usr/bin/pcap","r"). Unfiltered - all packets received on the virtual ethernet are captured. Up to the experimenter to discard the ones they don't care about.

Available for download from the CAIDA web site, so that experimenters can develop their experiment and then test it locally in a comparable environment to what will run when they're ready to do the full experiment.

Versioned. The experimenter declares the version as part of the permissioning process so that the experiment runs in the exact same environment where they built and tested it.

Data collection:

CAIDA collects anything the experiment outputs to stdout and stderr and returns them to the experimenter in batch. Scamper?

Experimenter can request TCP to an external IP/port endpoint at which which they implement a real-time data collection protocol of their choice. Subject to the vantage point owner's willingness to allow the IP and port. Note that the experimenter could use this to give themselves a literal bash shell inside the Linux container on the vantage point. This is intentional, but the experimenter must take measures to assure that only the folks CAIDA approved to perform the experiment can access the vantage point in this manner.

Available Linux knobs and controls:

CPU:

nice - force experiments to run at a lower priority than administrative access so that the nodes are likely to be remotely accessible even if an experiment goes awry.

cgroups2 cpu.max - limit % of CPU core an experiment can use.

timeout - run experiments with a hard limit on the total run time so they don't accidentally keep running.

"ulimit -t" - run experiments with a hard limit on CPU clock cycles so that unexpected infinite loops get stopped

Memory:

cgroups2 memory.max - limit the amount of server ram the experiment can use

Mount namespace: Constrain the experiment to a CAIDA-standardized subset of Linux, plus the software the experimenter explicitly adds. Other experiments and the host operating system are not visible to the experiment. Mount namespace is read-only. Optionally includes a small ramdisk if it turns out read-only is not usable for experimenters.

cgroups2 io.max - limit the rate of IO from the flash card. Limited use if we're making the mount read-only.

Processes:

cgroups2 pids.max - prevent the experiment from spawning so many processes or threads that it overwhelms the machine

Not root - Run experiments as a non-root user so that the experiment can't override these controls. Use Linux capabilities to allow non-root users to perform root-like network operations: CAP_NET_ADMIN, CAP_NET_BIND_SERVICE, CAP_NET_BROADCAST, CAP_NET_RAW. Implement CAIDA-provided helper programs to access any other critical network functions such as EBPF packet capture.

Process namespace - Use a Linux "PID" namespace so that the experiment can only see its own running processes, not anything else running on the vantage point.

"ulimit -u" - limit processes run by user. cgroups2 pids.max allows better control.

setuid - so that program running as a non-root user can invoke a root helper program for things like packet capture

Network:

Network namespace - Use a Linux "network" namespace so that the process only has access to a virtual network card specific to the experiment, not the wider network available to the vantage point host.

Use NAT on the vantage point to translate IPv4 traffic to/from the experiment to the Internet-facing address used by the vantage point. Note that this restricts protocol use by experiments to nat-friendly protocols: icmp, udp and tcp.
Assign a unique IPv6 address to the experiment while it runs. Use proxy-ND and IPv6 routing to move traffic for the address between the vantage point's physical network interface and the experiment's virtual network interface.

tc - use Linux traffic control to limit the network bandwidth available to the experiment

capabilities - Use Linux capabilities to allow non-root users to perform root-like network operations from all executables in the experiment: CAP_NET_ADMIN, CAP_NET_BIND_SERVICE, CAP_NET_BROADCAST, CAP_NET_RAW.

capture helper - Implement CAIDA-provided helper program to perform packet capture on the virtual network interface (as root) and relay the results to the experiment's software.

iptables - limit the sorts of packets the experiment can send towards the network based on what the vantage point's owner is willing to allow.

Always ban packets towards RFC1918 and other restricted addresses which may be in the vantage point owner's private network

kern.log - capture packet information dropped by the filters and alert CAIDA staff so that if an experiment tries to send packets it said it wouldn't, we can assess and potentially terminate the experiment.

dns proxy - where requested by the vantage point owner, limit the experiment's port 53 traffic to a proxy which limits dns lookups to caida-approved domain names.