An Introduction to SHA256 Hash Extension Attack
SHA-256 extension attack enables attacker to easily append additional data to an existing message and generate a new valid hash value for the extended message without needing to know the original message content.
TLDR; A SHA-256 extension attack can create a new hash value for a modified message without knowledge of the original message.
SHA-256 works by taking an input message and processing it through a series of flow operations to produce a fixed-size (256-bit) output, known as the hash value or digest. This output is typically represented as a 64-character hexadecimal number (each character represents 4 bits, ranging from 0x0000
to 0x1111
), used to verify the integrity and authenticity of the input data.
If digital signatures are not used to protect the integrity of the SHA-256 hash value, an attacker can easily append additional data to an existing message (right after the original hash value) and generate a new valid hash value for the extended message without needing to know the original message content. This technique is known as a SHA-256 extension attack.
Here is an example: a system manager sends a YAML file with a SHA-256 digest to the server, and the server program can assign users specified in the YAML to the system.
The system is required to verify that the hash value of the received YAML is the same as the hash value from the sender's side in order to ensure integrity.
The system manager sends a YAML file contains:
The corresponding SHA-256 digest for this plain text is:
b3318b395ecbea550974e6558a782971262af8f786af4f01e8f8a2ba18bc1102
Charlie, the man-in-the-middle, can only write bytes to the YAML and read the digest. He can use an extension attack to modify the YAML (without reading it) and the hash to forge a modified YAML and the corresponding hash digest. He can then encapsulate the forged message and send it to the receiver to gain access privileges.
The following YAML has been modified by Charlie; he some characters to declare an access privilege, although he cannot see the message.
Charlie can generate the corresponding SHA-256 digest for the new message and then send it to the receiver to deceive them.
a40cf7232a3e7065cbab6d70a76328143b30a519ecca5a626b8b77f3c6c4924e
Charlie successfully exploit the server by modifying both YAML and the digest with the help of hash extension attack.
The following paragraphs will introduce you to the technique and implementation for the extension attack.
The SHA-256 Padding Algorithm
Before running into the hash algorithm, message should be padded to length by a multiple of 64 bytes. As an example, for the padding of the message helloworld
.
The hexadecimal value (ASCII) for helloworld
is
68 65 6c 6c 6f 77 6f 72 6c 64
^ ^ ^ ^ ^ ^ ^ ^ ^ ^
h e l l o w o r l d
length = 10
The first step is to append a 0x80
to the hexadecimal value.
68 65 6c 6c 6f 77 6f 72 6c 64 80
The second step is to append many 0x00
until length equals to $n \times 64 - 8$, where $n > 0$.
68 65 6c 6c 6f 77 6f 72 6c 64 80 00 ....... 00
| count: 45 |
length = 56
The final step is to append 8 bytes as the length indicator of the original message ($10$ bytes = $80$ bits = $50_{16}$ bits).
68 65 6c 6c 6f 77 6f 72 6c 64 80 00 ....... 00 80
| count: 52 |
length = 64
SHA-256
The SHA-256 hash is a non-reversible hash function, which works by 8 hashing values ([h]), say h0, h1, ... to h7. The value of each [h] is initialized to a specific value. As illustrated in the Figure.
The padded message is then divided into n 64-byte blocks (n=3 in this example, where n depends on the length of the message), named [block1], [block2], and [block3]. The hash algorithm then processes each block sequentially, applying the compression function F
to each block and updating the h values.
The code illustrates the compression function F
. The final hash value is obtained by concatenating the eight h values after processing all blocks with the compression function. The initial h values are typically:
h0 = 0x6a09e667
h1 = 0xbb67ae85
h2 = 0x3c6ef372
h3 = 0xa54ff53a
h4 = 0x510e527f
h5 = 0x9b05688c
h6 = 0x1f83d9ab
h7 = 0x5be0cd19
These values go through the F
function, and the function updates the h values (block-wise) with the block input, finally outputting the h values. These h values constitute the hash digest.
Merkle–Damgård Length Extension Attack
The Merkle–Damgård length extension attack forges a new message hash that is an extension of the original message. Since the output is the concatenation of h values, SHA-256 uses h values to feed into the algorithm and generate the hash digest.
The original message the system manager sent firstly goes through the SHA-256 hash algorithm and gets a hash value. This hash value is then used as the initial hash value for the extension message: , charlie
, which is then processed by the SHA256 hash algorithm. A final hash value is then generated.
Mentioned that the original message is padded and appended with length follows the padding algorithm:
name: alice, bob<0x80><0x00>..<0x00><length>
Since there will be only one padding and length running into the hash function F
, the original padding and length for the message "ends with bob
" will be garbage characters. Therefore, we need to reconstruct the new padding and length through manual calculation in order to feed them into the hash function F
.
The new length can be calculated using the previous length. We need to obtain the length of the previous message from metadata in order to calculate the new length accurately.
The message block to be fed into the extension F
(right most) is:
name: alice, bob<0x80><0x00>..<0x00><length>, charlie<0x80><0x00>...<0x00><new_length>
where, alice
, bob<0x80><0x00>..<0x00><length>
, and charlie
will be counted as three users to be added as the root user to the system.
In conclusion, a hash extension attack is a type of attack where an attacker is able to calculate the hash value of a longer message based on the hash value of a shorter message and some additional information. By exploiting the structure of certain hash functions, an attacker can append data to the original message and generate a valid hash value for the extended message without knowing the original message content. This type of attack highlights the importance of using secure hash functions that are resilient against such attacks.