Bill's throwaway Multipath TCP protocol

Designed to

General design:

TCP Options

Multipath negotiates compatibility and initiates multipath operation via a new TCP packet Option "Multipath TCP".

Announce Multipath Capability

8 bits Multipath TCP Option
8 bits size = 3
3 bits 0 = initialize
1 bit 0 or 1, pause user data
4 bits reserved

Must be present in both the SYN and SYN-ACK packets if multipath TCP will be used for this connection. If missing from either packet, Multipath TCP is not supported on this connection.

Exception: if this connection is intended to be a subflow which join an existing Multipath TCP connection, it must instead contain a ConnectionID-is Multipath TCP option.

Multipath TCP may or may not start with the SYN packet. When and if it does start, it will start with a Propose Connection ID option sent only by the connection's initiator.

The Announce Multipath TCP option MUST NOT be sent in any packet besides the SYN and SYN/ACK.

The pause user data bit is associated with negotiating encryption or any other setup activity which must complete before user data transmission can begin. Implementation is optional, but if not implemented or if the recipient is unwilling to pause, the recipient MUST RST the connection. The sender MUST NOT set the pause bit unless it intends to RST the connection if MultipathTCP is not supported.

Propose Connection ID

8 bits Multipath TCP Option
8 bits size = 8
3 bits 1 = request
45 bits Proposed connection ID

Propose a connection ID and offer to start Mutlipath TCP.

Proposed connection ID is a random number which is not already in use by the initiating host.

MUST only be sent by the connection's initiator and MUST NOT be sent more than once per connection, but MAY be sent in any TCP packet for which the initiator expects an ACK.

Alternate Connection ID

8 bits Multipath TCP Option
8 bits size = 8
3 bits 2 = try another
45 bits Alternate proposed connection ID

The host supports multipath TCP but the proposed connection ID is not available for use. Propose an alternate connection ID. Should be sent in every packet (including ACKs) until the other side either accepts the connection ID or changes its proposal to a different connection ID than the last one proposed.

Packets proposing an alternate connection ID will go back and forth between the initiator and acceptor until one or the other accepts or rejects the proposed connection ID.

Reject Multipath TCP

8 bits Multipath TCP Option
8 bits size = 8
3 bits 3 = reject; non-multipath TCP continues
45 bits last proposed connection ID

Reject the attempt to initiate Multipath TCP and send no more Multipath TCP options. Regular TCP continues in which the source/destination IP address and the source/destination port form the connection ID.

Should be sent in exactly those packets which ACK a packet containing a Multipath TCP proposal after the host decides to abandon the attempt to start Multipath TCP.

If the hosts can't agree on a connection ID, they should give up after X tries.

A host which supports multipath TCP but does not wish to use it on this connection should reject multipath TCP.

Connection ID IS

8 bits Multipath TCP Option
8 bits size = 8
3 bits 7 = this packet's connection ID is
45 bits connection ID

The established session ID of this TCP connection. Once the first packet with this multipath TCP option is received, only acks received for outstanding segments transmitted before this packet will be accepted without this option and only retransmissions for segments sent before this packet will be sent without this option.

Once this option is received for the first time, the original TCP connection is demoted to a subflow of the Multipath TCP connection. For every packet containing this option, the data portion of the TCP packet is broken up further.

Subflows maintain their own sequence numbers independent from the overall Multipath TCP connection. Multipath's starting sequence number is the same sequence number as the subflow's in the first packet containing the Connection ID IS option.

If this option is present in an initiator's SYN packet, it indicates an attempt to join a new subflow to an existing TCP connection. Such a packet must contain the same destination port number as the original TCP connection but may contain any values for source/destination IP and  If the SYN/ACK packet does not also contain the same connection ID, the initiator should RST and should not attempt to initiate communications with that destination IP address again unless the acceptor resends an add-address control.

Data section of a Multipath TCP packet

When Multipath TCP is operational (the packet contains Multipath option 7 above) the data portion of the packet is either exactly empty (for packets which only ACK data, SYN/FIN packets and so forth) or it contains the following structure:

32 bits MultipathTCP sequence number across all subflows
8 bits encryption employed starting with next byte. 0=none >0=which already-negotiated encryption.
8 bits reserved
16 bits start of user payload (data) in 32-bit words offset from the TCP data start + 2 words
0-262140 bytes MultipathTCP controls
remainder payload; user data

 

Multipath TCP controls are a new set of options that apply to the operation of Multipath TCP. They follow the same form:

16 bits Offset to the start of the next option in 32-bit words + 1 word
1 bit 0=Instruction, 1=Proposal
7 bits Type
1-262136 bytes option data

An instruction requires no response. The other host will act if it supports the instruction or ignore the instruction if it does not.

A proposal MUST be accepted or rejected in the first packet which ACKs the packet that contained the proposal and any packet which ACKs the proposal's retransmissions. If the receiver does not understand the proposal, it MUST reject the proposal. All proposals contain a 4-byte token immediately following the proposal type which identifies the proposal in the associated acceptance or rejection instruction.  The sender MUST NOT reuse a particular token until it receives an explicit accept or reject.

 

Multipath TCP Controls:

Reject Proposal

16 bits Offset = 1 (8 bytes)
1 bit 0=Instruction
7 bits Type = 0 (reject proposal)
8 bits reserved
32 bits token

I reject the proposal offered with the given ack token.

Support: mandatory

Accept Proposal:

16 bits Offset = 1 (8 bytes)
1 bit 0=Instruction
7 bits Type=1 (accept proposal)
8 bits reserved
32 bits token

I accept the proposal offered with the given ack token and will use it in all packets I send after this one.

Support: mandatory

Unpause

Instruction type #2. Support Optional. See Encryption.

Finish Sending

16 bits Offset = 0 (4 bytes)
1 bit 0=Instruction
7 bits Type = 3 (FIN)
8 bits reserved

Half-close the overall Multipath TCP connection. Sender is done sending on all subflows. The whole connection fully closes when this same control is received in the other direction or when all subflows FIN.

A subflow TCP FIN (as opposed to this FIN control) in any direction is a request to close the subflow in both directions. The recipient of a subflow TCP FIN should cease queueing segments for the subflow and emit a TCP FIN for the subflow following transmission of the last queued segment.

MultipathTCP SHOULD attempt to cleanly TCP FIN all subflows before shutting down as a result of receiving the a Multipath FIN Instruction on a connection where the recipient's side is already closed.

Support: mandatory

Abort and Reset Connection

16 bits Offset = 0 (4 bytes)
1 bit 0=Instruction
7 bits Type = 4 (RST)
8 bits reserved

Abort the Multipath TCP connection now, RSTing all subflows.

If the initiator receives RSTs on all subflows but does not receive this control, it MUST attempt to establish new subflows via any addresses it knows for itself and the acceptor and must keep trying until a timeout passes or one of the subflow connections succeeds.

Support: mandatory

I See You

16 bits Offset = 3 (length=16 bytes)
1 bit 0=Instruction
7 bits Type = 5 ( IPv4 I see you)
8 bits reserved
32 bits my IPv4 address
32 bits your IPv4 address
16 bits my TCP port
16 bits your TCP port

I understand this subflow to have the listed IPv4 addresses and port numbers. If you instruct me to take action on this subflow via a different subflow, use these values.

Middleboxes such as NATs may alter the subflow IP addresses perceived by either end of the connection. This control allows each end to understand what the other sees.

This control SHOULD be sent by both ends imediately after establishing a new subflow, including the demotion of the original TCP connection to a subflow. The same packet MAY contain user data.

Support: optional but strongly encouraged.

Control 6 is the same as control 5 but for IPv6 with the 32-bit IPv4 addresses replaced with 128-bit IPv6 addresses.

Abort Subflow

16 bits Offset = 3 (length=16 bytes)
1 bit 0=Instruction
7 bits Control = 7 ( IPv4 abort subflow now)
8 bits reserved
32 bits my IPv4 address
32 bits your IPv4 address
16 bits my TCP port
16 bits your TCP port

I am no longer capable of supporting the listed subflow and can't even bring it to a well ordered FIN. Please immediately abort it from your end and retransmit any affected packets via the other subflows.

Probably means I lost my IP address or the interface with that IP address went down. No host is required to send this control, but it SHOULD accept and process the control.

Support: optional but strongly encouraged.

Control 8 is the same as control 7 but for IPv6 with the 32-bit IPv4 addresses replaced with 128-bit IPv6 addresses.

Set Weight

16 bits Offset = 0 (4 bytes)
1 bit 0=Instruction
7 bits Type=9 (subflow weight)
8 bits weight

This subflow's weight for segment distribution purposes as defined in Accepting Addresses. Explicitly sets this subflow to a static weight in the direction from the control's receiver to its sender.

If this control is not sent, a subflow's default weight is:

Acceptor side: 128 / number of open subflows from the same initator address
Initiator side: Acceptor's indicated address weight / number of open subflows with the same acceptor address

Support: optional but strongly encouraged.

Authentication

Security Rank

All connections have a security rank expressed as a 32-bit unsigned integer. The default rank is 0. Higher is better. Applications may choose to require a minimum security rank before unpausing the user data flow. If the security rank is not met, Multipath TCP MUST RST the connection after all possible attempts to achieve the security rank fail.

Accepting Addresses

16 bits Offset = variable
1 bit 0=Instruction
7 bits Type = 10 (accepting IPv4 addresses)
8 bits number of IP addresses
8 bits first weight
8 bits second weight
[...]
8 bits nth weight
0-24 bits null padding to 32-bits
32 bits first IPv4 address
32 bits second IPv4 address
[...]
32 bits nth IPv4 address

Set the IPv4 addresses and weights which the connection's initiator may attempt to initate subflows to the acceptor on.

The weight means: everything else being equal, open [weight] subflows via this address for every SUM(weights) subflows you open and send [weight] bytes via the subflows for each address for every SUM(weights) bytes you send.

Weight indicates administrative preference, not congestion control. Congestion control MUST be handled independent of and take priority over weight. The Multipath TCP's total throughput MUST NOT be throttled to meet the administrative weight.

A weight of 0 means: do not attempt to start a subflow via this IP address or send any user payload packets via a subflow to this address unless you have determined that you can no longer reach me via any non-zero weighted addresses. The initiator may but is not required to FIN subflows whose weight changes to 0.

The host MUST respect the meaning of the zero weight (use only as a last resort). It SHOULD use any other weight hints to scale its own decision about which subflows to create.

A host SHOULD scale the packet distribution choice based on the subflows' packet loss rate, round trip times and the weight hints in that order. Recommendations on the dynamic alteration of weight as a function of loss rates and round trip delays to be made after testing.

Regardless of weight, the initiator MUST NOT open more than one subflow to the acceptor with the same address pair.

The default weight for any address is 128.

The acceptor may be behind a NAT using RFC1918 addresses. If so the acceptor SHOULD NOT include RFC1918 addresses in the list of accepting addresses unless the originator is also using an RFC1918 address. The acceptor MAY include addresses which it has been explicitly configured to expect the NAT device to translate its RFC1918 addresses to.

Support: mandatory

Control 11 is the same as control 10 but for IPv6 with the 32-bit IPv4 addresses replaced with 128-bit IPv6 addresses.

Initiator's Addresses

16 bits Offset = variable
1 bit 1=Proposal
8 bits Type=5 (initiator's IPv4 addresses)
32 bits ack token
8 bits number of IP addresses
32 bits first IPv4 address
32 bits second IPv4 address
[...]
32 bits nth IPv4 address

Authentication option. The initiator asserts that the listed IPv4 addresses are the only valid source IPv4 addresses from which the acceptor should permit subflows to be established. If any subflows exist which do not match this IPv4 list, immediately RST the whole Multipath TCP connection.

A control with 0 addresses means no IPv4 source addresses are legitimate for this initiator.

The initiator SHOULD avoid sending this option if he has reason to believe he is behind a NAT firewall and does not explicitly know his external IP addresses.

If the connection's security rank is less than 5, acceptance of this proposal sets it to 5.

Proposal type #8 is the same for IPv6 addresses.

Support: optional

Hostname

16 bits Offset = variable
1 bit 1=Proposal
7 bits Type=12 (my hostname is)
32 bits ack token
1-n bytes DNS hostname padded to a 32 bit boundary with zeros

Use hostname to determine the list of my valid IP addresses.

Once a host accepts the hostname, it will determine the source's IP addresses by performing a DNS lookup for A and AAAA records for the given hostname.

The recipient MUST FIN any open subflows associated with IP addresses which are not attached to the hostname. However, the recipient MUST NOT FIN the subflow on which the hostname control was received if that subflow is the last one open.

A host MUST NOT accept subflow requests which do not originate from one of the IP addresses listed in the DNS lookup. It MUST RST such subflow requests. It SHOULD re-query the hostname each time such a RST is made, but it MUST NOT query more than once per 10 seconds.

Where a hostname identifies the acceptor, the initiator SHOULD add any IP addresses it does not already know from the Accepting IP address controls to it's list of addresses for the acceptor with a weight of 0.

Where the initiator loses all its subflows and is unable to establish new subflows on any known source/destination pairs, it SHOULD re-perform the DNS lookup to determine if any more IP addresses are available for the remote host.

Note that the hostname presented in the Hostname option is not necessarily the same host name used to first establish the connection. A query for www.example.com may offer addresses for two physical machines, each of which maintains a distinct hostname for the purposes of maintaining Multipath TCP.

The recipient MUST NOT use any IP addresses offered by other means if they do not match an address retrieved by looking up the given hostname. However, it SHOULD remember those addresses so that it can use them if they later become available in a re-query of the hostname.

A host MAY re-query the offered host name at any time and proceed with the IP address information then available.

Note that the initiator may re-poll the DNS entry for the acceptor up until the connection timeout, trying to find a new addresses at which to reach the acceptor and start a new subflow.

Note that the acceptor can not initiate new subflows to the initiator if his addresses change too rapidly to communicate the fact of the failure along the open subflows. The acceptor should keep the Multipath TCP connection open for the full time out even if all subflows have failed so that if the initiator discovers the failure (e.g. by sending keepalives on a subflow) it can attempt to establish new ones and continue the connection.

The initiator SHOULD avoid sending this option if he has reason to believe he is behind a NAT firewall and does not explicitly know his external IP addresses.

If the connection's security rank is less than 10, acceptance of this proposal sets it to 10.

Support: optional

Re-Query

16 bits Offset = 0 (4 bytes)
1 bit 0=Instruction
7 bits Type=12 (re-query hostname)
8 bits reserved

Re-query my hostname now. My addresses have either just changed or the TTL has expired after a change, so new addresses are available and old addresses may have been removed.

Support: optional

 

Encryption:

The Pause Bit

The Pause Bit appears in the Multipath TCP option in the SYN and SYN/ACK packets of the original connection. The pause bit is a demand to negotiate encryption or authentication. Support is optional but if the recipient does not support the pause bit, it MUST RST the connection.

If pause user data is set on either end, no user data will be sent until the Multipath TCP unpause control is received after initating a multipath connection.

If either end wants to negotiate encryption or some other capability before sending user data, it should set the pause bit. Once set in the SYN packet it MUST also be set in the SYN/ACK packet. If it is not, the initiator MUST RST the connection.

If the initiator or acceptor would like to engage opportunistic encryption, they MAY leave the pause bit clear but then  attempt encryption negotiation anyway before sending user data.

Unpause

16 bits Offset = 0 (4 bytes)
1 bit 0=Instruction
7 bits Type = 2 (unpause)
8 bits reserved

End the user data pause request made in the SYN packet. If the connection is not yet capable of the minimum security rating required by the application or the packet containing the Unpause does not meet the minimum security rating, the recipient MUST RST the connection.

Any new subflow joined to the connection MUST receive an unpause before the host transmits or accepts any user data.

If the sender subsequently offers any packets whose encryption method has a lower security rating than the connection requires, the subflow on which the packet is received MUST be RST.

Support: optional.

Public Key

16 bits Offset = variable
1 bit 1=Proposal
7 bits Type: 13 (offer a public key)
32 bits ack token
8 bits key id
8 bits preference
2 bits 0-3 trailing bytes are not part of the key
1 bit verify signature
5 bits length of encryption name in bytes +2
2-34 bytes encryption name
1-n bytes the key

Please send any further traffic to me only after first encrypting it with the listed public key.

KeyID is the same number as the "encryption employed" byte in the TCP data segment. If KeyID is 0, KeyType must also be 0 and the preference setting will be used to determine whether to send data unencrypted.

Preference dictates which of the KeyID's I've set that I'd like you to use. Use the highest-numbered preference key currently available. A preference of 0 deletes the given key ID.

Each possible encryption algorithm and each possible format for offering a signed or unsigned key will have a unique name. For each such encryption the receiver supports, it will have an algorithm which computes a numeric "security rating" for the offered encryption, taking into account whether the key signature is verified, how long the key is and the general security of the algorithm itself. If the computed security rating is below the minimum acceptable rating the application has requested or application requirements like signature verification have not been met, the recipient MUST reject the proposal.

Trailing bytes: to pad this control to 32 bits, between 0 and three trailing null bytes are padded on the end. The actual key is present in the control after the encryption name and before these padding bytes.

If Verify Signature is set, the recipient must verify the signature on the offered public key before using it. If the verification fails, the receiver must terminate the entire multipath TCP connection with a RST. If verification is not requested, the recipient MAY attempt to verify any signatures found on the key, but it MUST NOT lower the KeyID's security rating or take any other adverse action if the key verification fails.

KeyID 0 is always no encryption and may not be overridden. KeyID 0's default preference is 128.

Note that the calling this the "public key proposal" is technically a misnomer since the "key" could as easily be a reference to a secret key already available to both machines. The central point is that the contents of the key may be sent in the clear as it does not contain the information necessary to decrypt packets encrypted by the requested algorithm.

16 bits Offset = variable
8 bits Control = 13 (supported public key encryption names)
32 bits ack token
1-n bytes encryption name list

If a host rejects a public key encryption proposal but supports public key encryption, the TCP packet containing the rejection control must also contain a supported encryptions list contol. If it contains only a rejection, the host which proposed encryption MUST assume that no public key encryptions are supported on the connection.

Encryption name list: a nul (ascii 0) padded list of \n (ascii 10) seperated names of public and private key encryption methods supported on this connection sorted in order of preferred encryptions first.

 

16 bits Offset = variable
8 bits Control = 132 (offer a secret key)
32 bits ack token
8 bits key id
8 bits preference
2 bits 0-3 trailing bytes are not part of the key
1 bit verify signature
5 bits length of encryption name in bytes +2
2-34 bytes encryption name
1-n bytes the key

Mostly the same as offer public key except it applies to a symertric cypher using a shared secret key.

The shared secret proposal MUST be offered in a packet encrypted by a previously defined public or secret key encryption method and MUST be rejected if it is not. The sender SHOULD used the highest ranked encryption method defined to send the key, even if the highest ranked encryption is not the one preferred by the recipient.

The KeyID's security rating is the lesser of this algorithm's security rating and the security rating of the KeyID use to transmit this proposal. After all, this algorithm is only as secure as the secrecy of the key.

16 bits Offset = variable
8 bits Control = 14 (supported secret key encryption names)
32 bits ack token
1-n bytes encryption name list

Same as supported public key names but sent with a secret key rejection instead of a public key rejection.