Bill's throwaway Multipath TCP protocol

Designed to

General design:

TCP Options

Multipath negotiates compatibility and initiates multipath operation via a new TCP packet Option "Multipath TCP".

Start Multipath TCP

8 bits Multipath TCP Option
8 bits size = 8
1 bit 0 or 1, pause user data
1 bit 0 = new multipath TCP connection
46 bits My Connection ID

Offer to initiate multipath TCP.  Must be present in all packets with the SYN flag set. If missing from either the SYN or SYN/ACK, multipath TCP is disabled.

My Connection ID is a randomly chosen 46-bit unsigned integer which uniquely identifies the TCP connection to the sender. If the receiver needs to negotiate any actions associated with this connection with the sender, it must reference the connection by this ID.

The pause user data bit is associated with negotiating encryption or any other setup activity which must complete while the connection is still synchronized before user data transmission can begin. Implementation is optional, but if not implemented or if the recipient is unwilling to pause, the recipient MUST RST the connection. The sender MUST NOT set the pause bit unless it intends to RST the connection if MultipathTCP is not supported.

Join Multipath TCP

8 bits Multipath TCP Option
8 bits size = 8
1 bit 0 = no pause
1 bit 1 = join existing multipath TCP connection
46 bits Your Connection ID

Ask to join a new subflow to an existing multipath TCP connection

In the direction of the connection's initiator to the acceptor, this option MUST be present in a SYN packet sent to the acceptor's alternate address using the same destination TCP port as the other subflows. It MAY use different source ports than the other subflows. It likewise MUST be present in the SYN/ACK packet accepting the new subflow. If missing from either or if the connection ID in the SYN/ACK is not what the initiator is expecting, the subflow must be RST.

In the direction of the connection's acceptor back to the initiator, it's anticipated that TCP SYN packets are blocked. Accordingly, Multipath TCP should send a packet including this option with the SYN flag clear, the ACK flag set and the acked sequence number set to zero. The initiator should respond with a packet containing this option with the SYN flag clear, the ACK flag set and the sequence numbers correct.

The pause user data flag MUST be set to 0. Pause is only supported with the connection is first initiated, not when a subflow starts. If the original pause is still in effect when the subflow starts, it will be in effect for the subflow. Otherwise it won't. The receiver SHOULD RST any subflow join attempts which have the pause flag set.

Your Connection ID is the 46-bit unsigned integer that was offered by the destination when multipath TCP first started.

Data carried inside Multipath TCP

Once multipath TCP is established, each subflow consists of a byte stream that collects into chunks. A chunk is essentially a packet encoded into the byte stream. Each chunk contains either exactly one multipath TCP "control" or some number of bytes of user data. These data chunks may only be delivered to the multipath connection in order by the chunk sequence number. A particular chunk will be transmitted in its entirety over the same subflow.

The format of a chunk is as follows:

32 bits Chunk sequence number across all subflows. Increases by exactly 1 for each chunk.
16 bits length of chunk in bytes starting with  byte 8.
8 bits chunk encoding starting with next byte. 0=none >0=which already-negotiated encoding.
8 bits Chunk type. 0=user data. >0 = Multipath TCP control ID
0-65535 bytes Chunk data

In order to guarantee sufficient receive buffer space, the following rules must be observed:

  1. Unless the reciver has raised the maximum chunk size, the maximum chunk size the sender may use is 2040 bytes of data.
  2. The sender must completely transmit a chunk on a particular subflow before it may begin transmitting the next chunk on any subflow. If a change in window size prevents the complete transmission of a chunk, the sender must wait until the window opens.
  3. If a subflow collapses while any chunks are outstanding (unacked), the sender must open a new subflow (or wait for a new subflow to be opened) and retransmit the outstanding chunks before it may continue sending data on the other subflows. It MUST NOT attempt to retransmit the lost chunks on the other open subflows - that way leads to deadlock.

This means that for user data, a chunk should generally be sized to fit within the segment in which it is being transmitted.

Controls:

Multipath TCP controls are a new set of options that apply to the operation of Multipath TCP. They consist of instructions and proposals.

An instruction requires no response. The other host will act if it supports the instruction or ignore the instruction if it does not.

A proposal MUST be accepted or rejected in the first packet which ACKs the packet that contained the proposal and any packet which ACKs the proposal's retransmissions. If the receiver does not understand the proposal, it MUST reject the proposal. All proposals contain a 4-byte token immediately following the proposal type which identifies the proposal in the associated acceptance or rejection instruction.  The sender MUST NOT reuse a particular token until it receives an explicit accept or reject.

The chunk type field is broken into two parts:

1 bit high order bit, 0=instruction, 1=proposal.
7 bits Control ID.

Multipath TCP Chunks:

User Data

1 bit 0=instruction
7 bits 0=user data
0-65535 bytes user data

Bytes which will be passed to the application.

Support: mandatory

Reject Proposal

1 bit 0=Instruction
7 bits 1 = reject proposal
32 bits token

I reject the proposal offered with the given ack token.

Support: mandatory

Accept Proposal:

1 bit 0=Instruction
7 bits 2=accept proposal
32 bits token

I accept the proposal offered with the given ack token and will use it in all packets I send after this one.

Support: mandatory

Unpause

Instruction type #3. Support Optional. See Encryption.

Finish Sending

1 bit 0=Instruction
7 bits 4=FIN

Half-close the overall Multipath TCP connection. Sender is done sending user data on all subflows. No user data will be sent in later numbered chunks. The whole connection fully closes when this same control is received in the other direction or when all subflows FIN.

A subflow TCP FIN (as opposed to this FIN control) in any direction is a request to close the subflow in both directions. The recipient of a subflow TCP FIN should cease queueing segments for the subflow and emit a TCP FIN for the subflow following transmission of the last queued segment.

MultipathTCP SHOULD attempt to cleanly TCP FIN all subflows before shutting down as a result of receiving the a Multipath FIN Instruction on a connection where the recipient's side is already closed.

Support: mandatory

Abort and Reset Connection

1 bit 0=Instruction
7 bits 5=RST

Abort the Multipath TCP connection now, RSTing all subflows.

If the initiator receives RSTs on all subflows but does not receive this control, it MUST attempt to establish new subflows via any addresses it knows for itself and the acceptor and must keep trying until a timeout passes or one of the subflow connections succeeds.

Support: mandatory

I See You

1 bit 0=Instruction
7 bits 6=IPv4 I see you
32 bits my IPv4 address
32 bits your IPv4 address
16 bits my TCP port
16 bits your TCP port

I understand this subflow to have the listed IPv4 addresses and port numbers. If you instruct me to take action on this subflow via a different subflow, use these values.

Middleboxes such as NATs may alter the subflow IP addresses perceived by either end of the connection. This control allows each end to understand what the other sees.

This control SHOULD be sent by both ends imediately after establishing a new subflow, including the demotion of the original TCP connection to a subflow. The same packet MAY contain user data.

Support: optional but strongly encouraged.

Control 7 is the same as control 6 but for IPv6 with the 32-bit IPv4 addresses replaced with 128-bit IPv6 addresses.

Abort Subflow

1 bit 0=Instruction
7 bits 8=IPv4 abort subflow now
32 bits my IPv4 address
32 bits your IPv4 address
16 bits my TCP port
16 bits your TCP port

I am no longer capable of supporting the listed subflow and can't even bring it to a well ordered FIN. Please immediately abort it from your end and retransmit any affected packets via the other subflows.

Probably means I lost my IP address or the interface with that IP address went down. No host is required to send this control, but it SHOULD accept and process the control.

Support: optional but strongly encouraged.

Control 9 is the same as control 8 but for IPv6 with the 32-bit IPv4 addresses replaced with 128-bit IPv6 addresses.

Set Weight

1 bit 0=Instruction
7 bits 10=subflow weight
8 bits weight

This subflow's weight for segment distribution purposes as defined in Accepting Addresses. Explicitly sets this subflow to a static weight in the direction from the control's receiver to its sender.

If this control is not sent, a subflow's default weight is:

Acceptor side: 128 / number of open subflows from the same initator address
Initiator side: Acceptor's indicated address weight / number of open subflows with the same acceptor address

Support: optional but strongly encouraged.

Authentication

Security Rank

All connections have a security rank expressed as a 32-bit unsigned integer. The default rank is 0. Higher is better. Applications may choose to require a minimum security rank before unpausing the user data flow. If the security rank is not met, Multipath TCP MUST RST the connection after all possible attempts to achieve the security rank fail.

Accepting Addresses

1 bit 0=Instruction
7 bits 11=accepting IPv4 addresses
8 bits number of IP addresses
8 bits first weight
8 bits second weight
[...]
8 bits nth weight
32 bits first IPv4 address
32 bits second IPv4 address
[...]
32 bits nth IPv4 address

Set the IPv4 addresses and weights which the connection's initiator may attempt to initate subflows to the acceptor on.

The weight means: everything else being equal, open [weight] subflows via this address for every SUM(weights) subflows you open and send [weight] bytes via the subflows for each address for every SUM(weights) bytes you send.

Weight indicates administrative preference, not congestion control. Congestion control MUST be handled independent of and take priority over weight. The Multipath TCP's total throughput MUST NOT be throttled to meet the administrative weight.

A weight of 0 means: do not attempt to start a subflow via this IP address or send any user payload packets via a subflow to this address unless you have determined that you can no longer reach me via any non-zero weighted addresses. The initiator may but is not required to FIN subflows whose weight changes to 0.

The host MUST respect the meaning of the zero weight (use only as a last resort). It SHOULD use any other weight hints to scale its own decision about which subflows to create.

A host SHOULD scale the packet distribution choice based on the subflows' packet loss rate, round trip times and the weight hints in that order. Recommendations on the dynamic alteration of weight as a function of loss rates and round trip delays to be made after testing.

Regardless of weight, the initiator MUST NOT open more than one subflow to the acceptor with the same address pair.

The default weight for any address is 128.

The acceptor may be behind a NAT using RFC1918 addresses. If so the acceptor SHOULD NOT include RFC1918 addresses in the list of accepting addresses unless the originator is also using an RFC1918 address. The acceptor MAY include addresses which it has been explicitly configured to expect the NAT device to translate its RFC1918 addresses to.

Support: mandatory

Control 11 is the same as control 10 but for IPv6 with the 32-bit IPv4 addresses replaced with 128-bit IPv6 addresses.

Initiator's Addresses

1 bit 1=Proposal
8 bits Type=5 (initiator's IPv4 addresses)
32 bits ack token
8 bits number of IP addresses
32 bits first IPv4 address
32 bits second IPv4 address
[...]
32 bits nth IPv4 address

Authentication option. The initiator asserts that the listed IPv4 addresses are the only valid source IPv4 addresses from which the acceptor should permit subflows to be established. If any subflows exist which do not match this IPv4 list, immediately RST the whole Multipath TCP connection.

A control with 0 addresses means no IPv4 source addresses are legitimate for this initiator.

The initiator SHOULD avoid sending this option if he has reason to believe he is behind a NAT firewall and does not explicitly know his external IP addresses.

If the connection's security rank is less than 5, acceptance of this proposal sets it to 5.

Proposal type #8 is the same for IPv6 addresses.

Support: optional

Hostname

16 bits Offset = variable
1 bit 1=Proposal
7 bits Type=12 (my hostname is)
32 bits ack token
1-n bytes DNS hostname padded to a 32 bit boundary with zeros

Use hostname to determine the list of my valid IP addresses.

Once a host accepts the hostname, it will determine the source's IP addresses by performing a DNS lookup for A and AAAA records for the given hostname.

The recipient MUST FIN any open subflows associated with IP addresses which are not attached to the hostname. However, the recipient MUST NOT FIN the subflow on which the hostname control was received if that subflow is the last one open.

A host MUST NOT accept subflow requests which do not originate from one of the IP addresses listed in the DNS lookup. It MUST RST such subflow requests. It SHOULD re-query the hostname each time such a RST is made, but it MUST NOT query more than once per 10 seconds.

Where a hostname identifies the acceptor, the initiator SHOULD add any IP addresses it does not already know from the Accepting IP address controls to it's list of addresses for the acceptor with a weight of 0.

Where the initiator loses all its subflows and is unable to establish new subflows on any known source/destination pairs, it SHOULD re-perform the DNS lookup to determine if any more IP addresses are available for the remote host.

Note that the hostname presented in the Hostname option is not necessarily the same host name used to first establish the connection. A query for www.example.com may offer addresses for two physical machines, each of which maintains a distinct hostname for the purposes of maintaining Multipath TCP.

The recipient MUST NOT use any IP addresses offered by other means if they do not match an address retrieved by looking up the given hostname. However, it SHOULD remember those addresses so that it can use them if they later become available in a re-query of the hostname.

A host MAY re-query the offered host name at any time and proceed with the IP address information then available.

Note that the initiator may re-poll the DNS entry for the acceptor up until the connection timeout, trying to find a new addresses at which to reach the acceptor and start a new subflow.

Note that the acceptor can not initiate new subflows to the initiator if his addresses change too rapidly to communicate the fact of the failure along the open subflows. The acceptor should keep the Multipath TCP connection open for the full time out even if all subflows have failed so that if the initiator discovers the failure (e.g. by sending keepalives on a subflow) it can attempt to establish new ones and continue the connection.

The initiator SHOULD avoid sending this option if he has reason to believe he is behind a NAT firewall and does not explicitly know his external IP addresses.

If the connection's security rank is less than 10, acceptance of this proposal sets it to 10.

Support: optional

Re-Query

16 bits Offset = 0 (4 bytes)
1 bit 0=Instruction
7 bits Type=12 (re-query hostname)
8 bits reserved

Re-query my hostname now. My addresses have either just changed or the TTL has expired after a change, so new addresses are available and old addresses may have been removed.

Support: optional

 

Encryption:

The Pause Bit

The Pause Bit appears in the Multipath TCP option in the SYN and SYN/ACK packets of the original connection. The pause bit is a demand to negotiate encryption or authentication. Support is optional but if the recipient does not support the pause bit, it MUST RST the connection.

If pause user data is set on either end, no user data will be sent until the Multipath TCP unpause control is received after initating a multipath connection.

If either end wants to negotiate encryption or some other capability before sending user data, it should set the pause bit. Once set in the SYN packet it MUST also be set in the SYN/ACK packet. If it is not, the initiator MUST RST the connection.

If the initiator or acceptor would like to engage opportunistic encryption, they MAY leave the pause bit clear but then  attempt encryption negotiation anyway before sending user data.

Unpause

1 bit 0=Instruction
7 bits 3=unpause

End the user data pause request made in the TCP SYN packet. If the connection is not yet capable of the minimum security rating required by the application or the packet containing the Unpause does not meet the minimum security rating, the recipient MUST RST the connection.

If the sender subsequently offers any chunks whose encryption method has a lower security rating than the connection requires, the connection MUST be RST.

Support: optional.

Public Key

16 bits Offset = variable
1 bit 1=Proposal
7 bits Type: 13 (offer a public key)
32 bits ack token
8 bits key id
8 bits preference
2 bits 0-3 trailing bytes are not part of the key
1 bit verify signature
5 bits length of encryption name in bytes +2
2-34 bytes encryption name
1-n bytes the key

Please send any further traffic to me only after first encrypting it with the listed public key.

KeyID is the same number as the "encryption employed" byte in the TCP data segment. If KeyID is 0, KeyType must also be 0 and the preference setting will be used to determine whether to send data unencrypted.

Preference dictates which of the KeyID's I've set that I'd like you to use. Use the highest-numbered preference key currently available. A preference of 0 deletes the given key ID.

Each possible encryption algorithm and each possible format for offering a signed or unsigned key will have a unique name. For each such encryption the receiver supports, it will have an algorithm which computes a numeric "security rating" for the offered encryption, taking into account whether the key signature is verified, how long the key is and the general security of the algorithm itself. If the computed security rating is below the minimum acceptable rating the application has requested or application requirements like signature verification have not been met, the recipient MUST reject the proposal.

Trailing bytes: to pad this control to 32 bits, between 0 and three trailing null bytes are padded on the end. The actual key is present in the control after the encryption name and before these padding bytes.

If Verify Signature is set, the recipient must verify the signature on the offered public key before using it. If the verification fails, the receiver must terminate the entire multipath TCP connection with a RST. If verification is not requested, the recipient MAY attempt to verify any signatures found on the key, but it MUST NOT lower the KeyID's security rating or take any other adverse action if the key verification fails.

KeyID 0 is always no encryption and may not be overridden. KeyID 0's default preference is 128.

Note that the calling this the "public key proposal" is technically a misnomer since the "key" could as easily be a reference to a secret key already available to both machines. The central point is that the contents of the key may be sent in the clear as it does not contain the information necessary to decrypt packets encrypted by the requested algorithm.

16 bits Offset = variable
8 bits Control = 13 (supported public key encryption names)
32 bits ack token
1-n bytes encryption name list

If a host rejects a public key encryption proposal but supports public key encryption, the TCP packet containing the rejection control must also contain a supported encryptions list contol. If it contains only a rejection, the host which proposed encryption MUST assume that no public key encryptions are supported on the connection.

Encryption name list: a nul (ascii 0) padded list of \n (ascii 10) seperated names of public and private key encryption methods supported on this connection sorted in order of preferred encryptions first.

 

16 bits Offset = variable
8 bits Control = 132 (offer a secret key)
32 bits ack token
8 bits key id
8 bits preference
2 bits 0-3 trailing bytes are not part of the key
1 bit verify signature
5 bits length of encryption name in bytes +2
2-34 bytes encryption name
1-n bytes the key

Mostly the same as offer public key except it applies to a symertric cypher using a shared secret key.

The shared secret proposal MUST be offered in a packet encrypted by a previously defined public or secret key encryption method and MUST be rejected if it is not. The sender SHOULD used the highest ranked encryption method defined to send the key, even if the highest ranked encryption is not the one preferred by the recipient.

The KeyID's security rating is the lesser of this algorithm's security rating and the security rating of the KeyID use to transmit this proposal. After all, this algorithm is only as secure as the secrecy of the key.

16 bits Offset = variable
8 bits Control = 14 (supported secret key encryption names)
32 bits ack token
1-n bytes encryption name list

Same as supported public key names but sent with a secret key rejection instead of a public key rejection.