Problem Statement:
IPv6 comes with too much baggage. All we really needed was IPv4 with a larger address space.
General Solution Statement:
If IP option header #29 is present starting at the sixth word in the IPv4 header then the "source" part of the IP header actually contains the four high-order bytes of the 64-bit IP destination address while the seventh and eighth words contain the 64-bit source address.
IP Header
Regular IP header:
+ | Bits 0-3 | 4-7 | 8-15 | 16-18 | 19-31 |
---|---|---|---|---|---|
0 | Version=4 | Header length | Type of Service | Total Length | |
32 | Identification | Flags | Fragment Offset | ||
64 | Time to Live | Protocol | Header Checksum | ||
96 | 32-bit Source Address | ||||
128 | 32-bit Destination Address | ||||
160 | Options | ||||
160 or 192+ |
Data |
IP header with Option 29:
+ | Bits 0-3 | 4-7 | 8-15 | 16-18 | 19-31 |
---|---|---|---|---|---|
0 | Version=4 | Header length | Type of Service | Total Length | |
32 | Identification | Flags | Fragment Offset | ||
64 | Time to Live | Protocol | Header Checksum | ||
96 | 64-bit Destination Address | ||||
128 | |||||
160 | IP Option = 0x9d | Option Length = 12 | unused | ||
192 | 64-bit Source Address | ||||
224 | |||||
256 | More Options | ||||
256 or 288+ |
Data |
Note that option #29 is valid only if present starting at bit 160. This allows router hardware to find the addresses at precise offsets within the packet without having to follow a linked list of options.
Option Type. 8 bits. Set to 0x9e.
00 | 01 | 02 | 03 | 04 | 05 | 06 | 07 |
---|---|---|---|---|---|---|---|
C | Class | Option |
C, Copy flag. 1 bit. Set to 1 (0x80)
Indicates that this option should be copied into all fragments.
Class. 2 bits. Set to 0 (0x00)
This is a control option.Option. 5 bits. Set to 29 (0x1d).
The IP option number.
How it works:
If the IP header length is at least 8 words and the 16 bits starting at bit 160 = 0x9d0c then the datagram uses 64-bit IP addresses as specified in this document. If not then its a regular IPv4 datagram with 32-bit addresses.
When a 32-bit IP address is expressed in a 64-bit address its prefixed by 0.0.0.1. Thus 192.168.100.1 becomes 0.0.0.1.192.168.100.1.
In all other respects, IPxl functions exactly as defined for plain old IP.
Shorthand
When writing a full IP address, the leading zeros can be dropped for the sake of convenience. Thus 0.0.0.1.192.168.100.1 can be written as 1.192.168.100.1. This may only be done when no trailing numbers are dropped. For example, 192.168/16 may not be written as 1.192.168/16, it must be written as 1.192.168.0.0/16.
The existing "/bits" CIDR netmask convention is retained. Positive number refer to the lower 32-bits while negative numbers refer to the high 32 bits. Thus 4.0.0.0.0/-2 is the addresses from 4.0.0.0.0 through 7.255.255.255.255.
How a non-IPxl host handles an IPxl packet:
Because the 0.0.0.1 in the destination address occupies the same word as the 32-bit source address did in a non-IPxl packet, hosts which do not understand IPxl will attempt to respond to 0.0.0.1. As 0.0.0.1 is a dead address that doesn't route anywhere, those reply packets will be promptly dropped into the bit bucket.
Originating IPxl hosts should mitigate this behavior by retrying commmunications which do not immediately succeed with IPxl packets using plain old IP packets where possible.
How a non-IPxl router handles an IPxl packet:
If a router which does not support IPxl sees an IPxl packet, it will attempt to route it based on where it would have found the original 32-bit destination address. This happens to be the lower 4 octets of the 64-bit destination address. Where the high octets are 0.0.0.1, this will result in the packet being forwarded in the desired direction. As a result, two IPxl endpoints have a high probability of successfully communicating across a non-IPxl network.
If a router which receives an IPxl packet but does not understand IPxl needs to return an ICMP packet to the sender, it will return the packet to the octets found in the original location of the IP source address which is now the high-order octets of the destination address: 0.0.0.1. This will fail. This means that hosts sending IPxl packets won't receive host-unreachable messages or fragmentation needed messages sent by pre-IPxl routers. IPxl hosts should mitigate these problems by promptly falling back to regular IP when possible and by limiting the default non-local originating MTU to 1450 bytes until such a time as support for IPxl is reasonably ubiquitous.
Special IANA trickery improves compatibility with non-IPxl routers:
IANA should initially assign supernets 0.0.0.2.*-0.0.0.16.* in a special way. IANA should select the largest prefix operated by each of the 26,000 ASes which announce routes into the DFZ today. Those exact prefixes will be valid in each of networks 2-16 as well. Other prefixes from supernets 2-16 which overlap valid public unicast space in supernet 0.0.0.1 will not be allocated until such a time as IPxl deployment is ubiquitous. As a result of this deployment strategy, routers which do not support IPxl will still route IPxl packets towards the correct AS.
IANA should continue the existing procedure of allocating blocks to regional registries in blocks of 24 million. However, until such a time as IPxl becomes ubiquitous, IANA should only allocate blocks which do not overlap public unicast space in the original IP. This means they can allocate 0.0.0.2.0.*, 0.0.0.2.10.*, 0.0.0.2.127.*, 0.0.0.2.225.*, etc. These routes will only function with routers that understand IPxl. Because they overlap dead space in the original IP, routers which do not understand IPxl will tend to discard the packets rather than routing them to an incorrect organization.
Sockets API
The existing PF_INET domain for the sockets API is partially compatible with IPxl. It cannot originate connections to highspace hosts or bind explicit local highspace highspace addresses but it can reasonably accept connections from remote highspace addresses. This will allow server software such as Apache or SSHd to function with IPxl by simply replacing the dynamic libraries, albeit with incorrect logging. An additional ioctl should be provided for fetching the full address so that server software can be quickly revised so that it functions with correct logging.
In addition, the sockets API should be extended to support IPxl with a new socket domain PF_IPXL which is identical to PF_INET in every respect save that the IP addresses are 8 bytes long instead of 4. PF_IPXL sockets may originate regular IP packets (without option 29) if the leading octets of both the source and destination address are both 0.0.0.1. Originating packets from IPxl sockets as regular IP packets where possible should be an OS-tunable parameter and should default to "on."
DNS
Name to address mapping is performed with the AX record. AX returns a 64-bit IP address instead of a 32-bit IP address. A DNS server which supports IPxl should automatically translate any known A records for a given name to AX records by prepending 0.0.0.1 to the address. It should likewise automatically translate any AX records with the prefix 0.0.0.1 into A records.
PTR records for IPxl are located under "ipxl.arpa" and work the same way as the records in "in-addr.arpa". A special tree will be created for *.1.0.0.0.ipxl.arpa which CNAMEs each individual record to the respective record in in-addr.arpa leaving the in-addr.arpa zone authoritative.
Rationale:
Why not just increment the IP version number instead of wasting four extra bytes on an option header?
Existing routers can't handle ethernet type 0x0800 packets (IP packets) whose version number is not 4. As a result, they'd just discard the packets. Most of this equipment can discard non-IP ethernet frames at the hardware level while 0x0800 frames have to be passed to the driver and checksummed before the version number is considered. So, not only do they discard the packets, they discard them the expensive way.
This is one of the reasons why IPv6 uses a new ethernet type instead of keeping 0x0800 and incrementing the IP version number to 6.