An Encoding Scheme For 2048 Bit Integers
Overview
Number of Bytes | Number of Data Bits | First Value | Last Value | 1st Byte | 2nd Byte | All Other Bytes |
---|---|---|---|---|---|---|
1 | 0 | 0 | 0 | 00000000 | N/A | N/A |
--- | --- | --- | --- | --- | --- | --- |
0+2 | (0*8)+0 | 2^0 | 2^1-1 | 00000001 | 00000000 | N/A |
0+2 | (0*8)+1 | 2^1 | 2^2-1 | 0000001x | 00000000 | N/A |
0+2 | (0*8)+2 | 2^2 | 2^3-1 | 000001xx | 00000000 | N/A |
0+2 | (0*8)+3 | 2^3 | 2^4-1 | 00001xxx | 00000000 | N/A |
0+2 | (0*8)+4 | 2^4 | 2^5-1 | 0001xxxx | 00000000 | N/A |
0+2 | (0*8)+5 | 2^5 | 2^6-1 | 001xxxxx | 00000000 | N/A |
0+2 | (0*8)+6 | 2^6 | 2^7-1 | 01xxxxxx | 00000000 | N/A |
0+2 | (0*8)+7 | 2^7 | 2^8-1 | 1xxxxxxx | 00000000 | N/A |
--- | --- | --- | --- | --- | --- | --- |
1+2 | (1*8)+0 | 2^8 | 2^9-1 | 00000001 | 00000001 | xxxxxxxx |
1+2 | (1*8)+1 | 2^9 | 2^10-1 | 00000001 | 0000001x | xxxxxxxx |
1+2 | (1*8)+2 | 2^10 | 2^11-1 | 00000001 | 000001xx | xxxxxxxx |
1+2 | (1*8)+3 | 2^11 | 2^12-1 | 00000001 | 00001xxx | xxxxxxxx |
1+2 | (1*8)+4 | 2^12 | 2^13-1 | 00000001 | 0001xxxx | xxxxxxxx |
1+2 | (1*8)+5 | 2^13 | 2^14-1 | 00000001 | 001xxxxx | xxxxxxxx |
1+2 | (1*8)+6 | 2^14 | 2^15-1 | 00000001 | 01xxxxxx | xxxxxxxx |
1+2 | (1*8)+7 | 2^15 | 2^16-1 | 00000001 | 1xxxxxxx | xxxxxxxx |
--- | --- | --- | --- | --- | --- | --- |
2+2 | (2*8)+0 | 2^16 | 2^17-1 | 00000010 | 00000001 | xxxxxxxx |
2+2 | (2*8)+1 | 2^17 | 2^18-1 | 00000010 | 0000001x | xxxxxxxx |
2+2 | (2*8)+2 | 2^18 | 2^19-1 | 00000010 | 000001xx | xxxxxxxx |
2+2 | (2*8)+3 | 2^19 | 2^20-1 | 00000010 | 00001xxx | xxxxxxxx |
2+2 | (2*8)+4 | 2^20 | 2^21-1 | 00000010 | 0001xxxx | xxxxxxxx |
2+2 | (2*8)+5 | 2^21 | 2^22-1 | 00000010 | 001xxxxx | xxxxxxxx |
2+2 | (2*8)+6 | 2^22 | 2^23-1 | 00000010 | 01xxxxxx | xxxxxxxx |
2+2 | (2*8)+7 | 2^23 | 2^24-1 | 00000010 | 1xxxxxxx | xxxxxxxx |
... | ... | ... | ... | ... | ... | ... |
255+2 | (255*8)+0 | 2^2040 | 2^2041-1 | 11111111 | 00000001 | xxxxxxxx |
255+2 | (255*8)+1 | 2^2041 | 2^2042-1 | 11111111 | 0000001x | xxxxxxxx |
255+2 | (255*8)+2 | 2^2042 | 2^2043-1 | 11111111 | 000001xx | xxxxxxxx |
255+2 | (255*8)+3 | 2^2043 | 2^2044-1 | 11111111 | 00001xxx | xxxxxxxx |
255+2 | (255*8)+4 | 2^2044 | 2^2045-1 | 11111111 | 0001xxxx | xxxxxxxx |
255+2 | (255*8)+5 | 2^2045 | 2^2046-1 | 11111111 | 001xxxxx | xxxxxxxx |
255+2 | (255*8)+6 | 2^2046 | 2^2047-1 | 11111111 | 01xxxxxx | xxxxxxxx |
255+2 | (255*8)+7 | 2^2047 | 2^2048-1 | 11111111 | 1xxxxxxx | xxxxxxxx |
Comments
This encoding scheme is mostly just big-endian with an 8-bit length prefix, but there's at least one difference. Specifically: Most single-byte values are encoded by adding a trailing null byte, not an 8-bit length prefix. That makes overlong encodings impossible, and it frees up one possible value from the length prefix. We could use that extra value to encode larger numbers, but it's probably better if we can't encode values in the range [2^2048, 2^2056-1]. Since the extra value isn't otherwise useful, we simply allow 0x00 to be encoded as a single byte.
You could actually modify this encoding such that other values in the range [0x00, 0xFF] are encoded using a single byte. If you encoded values in the range [0x00, 0x80] using a single byte, then latin text (and many text-based file formats) could be encoded more efficiently. I believe that 2^1024-1 is the largest value that could be encoded, in that case.