Optional "Via" tagging/tunnelling method for TRRP

Basic TRRP encapsulates packets in a GRE tunnel. This is widely supported in existing software, but it consumes an additional 28 bytes of space in the packet, breaks normal ICMP functionality and causes trouble for packet filtering firewalls.

The Via method solves these problems. The Via ITR tags the packet with an IP option header instead of encapsulating it in a tunnelling protocol. The Via options header functions much like loose source routing.

IPv4 version:

DNS Route Specifier:

TRRP specifies routes via DNS TXT records in the format "pp,ii,route pp,ii,route ..." where pp is a hexadecimal priority, ii is a protocol Identifier and route is the egress information needed by the Identifier.

Identifier: v4 = Via for IPv4.

Route: The "route" parameter offers the IP address of the router the packet should hit before being sent to the destination address. 

Example:

The ITR makes a DNS TXT request for 1.101.168.192.v4.trrp.arpa. It receives one TXT record in response:

80,v4,10.0.0.1 90,g4,1.2.3.4

The best priority route uses the IPv4 "Via" protocol to route the packet first to 10.0.0.1 before releasing it to the address in the header's destination field.

IP option 30:

Regular IP header:

+ Bits 0-3 4-7 8-15 16-18 19-31
0 Version=4 Header length Type of Service Total Length
32 Identification Flags Fragment Offset
64 Time to Live Protocol Header Checksum
96 Source Address
128 Destination Address
160 Options
160 or
192+
 Data 

IP header with Option 30:
Note that the use of the option number "30"  is purloined and must be replaced by an IANA assigned number before this protocol sees production use.

+ Bits 0-3 4-7 8-15 16-18 19-31
0 Version=4 Header length Type of Service Total Length
32 Identification Flags Fragment Offset
64 Time to Live Protocol Header Checksum
96 Source Address
128 Egress Tagging Router (ETR) Address
160 IP Option Type = 0x9e Option Length reserved (zero)
192 Destination Address
224 Zero or more "been there" ETR addresses
224 or
256+
More Options
224 or
256+
Data

Type. 8 bits. Set to 0x9e.

00 01 02 03 04 05 06 07
C Class Option

C, Copy flag. 1 bit. Set to 1 (0x80)
Indicates that this option should be copied into all fragments.

Class. 2 bits. Set to 0 (0x00)
This is a control option.

Option. 5 bits. Set to 30 (0x1e).
The IP option number.

Length: 8 bits. 6+(n*4) where n is the number of "been there" addresses.

How to Route a packet with the Via option:

The Ingress Tagging Router (ITR) for this packet will add an 8-byte option header at byte 21, immediately after the normal IP destination address. Any other option headers must be pushed back, truncated and/or discarded to accommodate this option.

Once tagged with a Via option header, the packet will be routed normally to the ETR. The option will be ignored by all routers until the packet reaches the ETR. The address block containing the ETR is part of the "globally routable" space and has already been propagated by BGP.

Once the packet reaches the ETR, the ETR should determine whether it has a local route to the destination IP address. If there is a local route, The ETR should strip the VIA option header, return the final destination address to the destination portion of the IP header and deliver the packet per the normal IPv4 routes offered via a normal interior routing protocol.

If the ETR is configured to be a valid exit for the destination address but it does not currently have a route then the ETR must add its own IP address to the end of the "been there" list and reroute the packet to the next best preference ETR. The ETR will act as a normal ITR in selecting this new address and must not select an ETR which already appears in the "been there" list. If there is no more room for "been there" addresses in the IP header then the packet should be discarded and a host unreachable sent per normal rules.

If the new lookup calls for a tunnelling protocol other than Via, the Via header should be removed from the IP packet before encapsulation. If the alternate protocol supports a sufficiently robust "been there" specification then the current ETR's IP address and all addresses in the Been there list should be copied to the new protocol's been there list.

If an ETR is unreachable, the via-tagged packet will follow the default route and fall back into an ITR. The ITR should cache the knowledge that the ETR is unreachable, move the ETR to the packet's been-there list and resend it to the next best ETR. If no further ETRs are available, the Via option header should be stripped and an ICMP host unreachable message should be returned to the originating host.

Path MTU and handling of Fragmentation Needed messages

By default, an ITR should assume the path MTU to the ETR is 1280 bytes and leave sufficient space in the header for the 8 required bytes plus 8 additional been-there bytes. As a result, it should fragment the original packets to 1264 bytes before encapsulation. If the Don't Fragment bit is set then an ICMP destination unreachable - fragmentation needed message should be returned to the originating host announcing an MTU of 1264 bytes.

If a VIA-aware router other than the ITR generates a fragmentation needed message, it should strip the VIA header and restore the destination address before sending the ICMP message. The reported MTU should be the actual MTU minus the greater of 16 bytes or the actual length of the VIA option header.

If a VIA-aware host receives a Fragmentation Needed message in which the Via option header is still present, the host should strip the VIA header and restore the destination address before considering the message. It should subtract the greater of 16 bytes or the actual length of the VIA option header from the reported MTU.

The minimum path MTU between any ITR and ETR is required to be no less than 1280 bytes unless the routers on both ends of the smaller link are VIA-aware and generate corrected fragmentation-needed messages. Should a router on a smaller link generate a fragmentation needed message without first stripping the Via option header the originating host may not be able to understand it and correctly adjust its MTU. In such a situation, end-to-end communications would fail.

1280 bytes has been selected as the path minimum MTU because it is the link minimum MTU for IPv6 and backbone links are expected to already support this minimum MTU.

The ITR should proactively adjust the TCP MSS in any tagged SYN packets such that the subsequent TCP packets will be small enough to reach the ETR without fragmentation. While this is not supposed to be necessary, the reality on the ground is that ignorant firewall administrators have broadly broken path MTU discovery.

Optional discovery of a larger path MTU

After initial transmission to an ETR, a VIA ITR may attempt to discover the actual path MTU for use in future fragmentation-needed reports associated with that ETR. It will perform this function by sending an ICMP echo-request message to the ETR set to the size of its interface MTU with the DF bit set. The ETR must respond to the echo-request with a normal echo-reply. All networks upstream of the ETR and ITR must permit the transmission of ICMP echo-request and echo-reply messages for the ETR and ITR.

Upon receiving a fragmentation needed message the ITR will reduce the packet size and try again. The ITR shall accept and cache the MTU for the ETR only after an echo-response is successfully received. If the resulting MTU is larger than 1280 bytes, the ITR will periodically re-poll the ETR to verify that the path MTU has not fallen. Such re-polls may only be sent in conjunction with actual packets sent via the ETR which are larger than 1280 bytes.

ICMP echo-request polls used to determine and verify the path MTU will be sent to an ETR no more frequently than one packet per minute. They will only be sent in conjunction with other traffic to the ETR, thus if packets to the ETR cease, so will path MTU discovery attempts.

If the ITR receives a host unreachable (not fragmentation needed) message in response to one of its path MTU discovery polls, it will assume the ETR is unreachable and will cease using it until it has successfully received a ping response from the ETR. Such pings shall be the minimum ping size, will be attempted no more frequently than once per minute and will only be performed as a result of packet traffic that would otherwise be sent to that ETR.

If the ITR receives no response to three path MTU discovery ping requests in a row it will immediately attempt one echo-request again using the minimum size. If it receives no response to that as well, it will assume the ETR is unreachable as above. If it does receive a response it will assume that path MTU discovery is broken along the path, use a path MTU of 1280 bytes and cease trying to discover a larger MTU.

IPv6 version:

DNS Route Specifier:

TRRP specifies routes via DNS TXT records in the format "pp,ii,route pp,ii,route ..." where pp is a hexadecimal priority, ii is a protocol Identifier and route is the egress information needed by the Identifier.

Identifier: v6 = Via for IPv6.

Route: The "route" parameter offers the IP address of the router the packet should hit before being sent to the destination address. 

Example:

The ITR makes a DNS TXT request for 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.4.6.8.a.0.c.2.0.0.2.v6.trrp.arpa. It receives one TXT record in response:

80,v6,2000:abcd::1 90,g4,1.2.3.4

The best priority route uses the IPv6 "Via" protocol to route the packet first to 2000:abcd::1 before releasing it to the address in the header's destination field.

IP option header:

[to do]

General notes (IPv4 and IPv6)

Security Considerations

Packet filtering firewalls may process packets with the Via option as if the option was not present. Where serving a multihomed customer, take care to allow for the possibility that the network will emit packets with nonlocal source addresses but destination addresses which correspond to the customer.

The ETR must not blindly decode packets sent to it. It must first check against a list of destinations it considers itself to be authoritative for. Packets for destinations for which it is not authoritative should be discarded. Note that during failures the ETR may be authoritative for packets for which it does not currently have a route, thus the presence or absence of routes in its routing table may not be used to determine whether the ETR is authoritative.

Discussion:

Should we include the ITR's IP address in the Via header so that if the packet falls back into an ITR later that ITR can notify the original ITR that the ETR is unreachable? Or is that situation presumed to be rare enough that its not worth the extra protocol work? After all if the ETR has reachability problems, the DNS Route Server should notice and cut it from the list.

Joel Halpern mentions that many routers will kick any IPv4 packets which contain options to the slow path, that is software forward instead of hardware forward. This is because options are currently rare and the router can't know ahead of time whether it needs to take any action on a packet containing options. IPv6 attempted to solve this problem with the RouterAlert option but the routers generally still kick the packet to the slow path in order to read the options list and determine whether the packet needs to be handled specially.

The upside of this approach is:

  1. Its the original IP packet on the wire, so filtering and ICMP messages mostly work as expected.
  2. It consumes as little as 8 bytes of overhead.
  3. Supports a relatively smooth failover where packets can be rerouted by the first destination network after the failure occurs but before new best-route information has propagated.

The downside is:

  1. Faulty backbone links can result in situations where the host can not understand or recover from Path MTU Discovery.
  2. The ITR may be slow to detect that an ETR is completely unreachable since any host unreachable packets will return to the originating host.

 

The Via IP option was inspired by a discussion with Owen DeLong. He deserves much of the credit and none of the blame. Robin Whittle offered insights which moved Via from being a neat idea to actually workable.