  This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. To finish the basic schemes section, we look at one of the most popular hash function designs...
  A Merkle-Damgaard (MD) hash function is a hash function built by extending the domain of a collision-resistant compression function that preserves the collision resistance. This means we can take a small (and fixed) width compression function, prove it is secure and then use it to make a variable length hash function.  Whilst other methods for building hash functions exist, MD is by far the most "popular" (well, most frequently used at least!), with examples including MD5,SHA1 and SHA2. So, time to break those terms down:

Secure Hash Function? 安全散列函数?

Traditionally, a secure hash function h should be:
传统上,安全散列函数 h 应该是:
  • Pre-image resistant: given h(x), it is hard to find x.
    抗预镜像:给定 h(x) ,很难找到 x 。
  • Second pre-image resistance: given x, it is hard to find y such that h(x)=h(y).
    第二个预镜像阻力:给定 x ,很难找到 y 这样的 h(x)=h(y) 。
  • Collision Resistance: It is hard to find x,y such that h(x)=h(y).
    抗碰撞性:很难找到 x,y 这样的 h(x)=h(y) 。
If a hash function is collision then clearly it must be second pre-image resistant, so it is this [collision resistance] that we will focus on.

Compression Function 压缩功能

compression function f:{0,1}n×{0,1}r→{0,1}n is a function that, as the name suggest, compresses n+b-bits worth of input into an n-bit output. As you might expect, a collision resistant compression function is a compression function that is collision resistant. So, it can be thought of as a fixed input length hash function, but what happens if we want our hash function to be secure for any input length?
压缩函数 f:{0,1}n×{0,1}r→{0,1}n 是一个函数,顾名思义,它将 n+b 比特的输入压缩为 n 比特的输出。正如您所期望的,抗碰撞压缩函数是一种抗碰撞的压缩函数。因此,它可以被认为是一个固定输入长度的哈希函数,但如果我们希望我们的哈希函数对任何输入长度都是安全的,会发生什么?

The MD hash function construction

The MD construction provides a method for extending the domain of a fixed length compression function into a variable input length hash function. Using a compression function f as above, we are going to use the n-bit value as our internal state, and feed in r-bits each iteration (it's quite common to set r=n). To do this, we begin with an initial value (IV) and split the message M up into blocks of r bits M=M0M1⋯Mm, and then simply iterate the construction by setting:
MD结构提供了一种将固定长度压缩函数的域扩展为可变输入长度散列函数的方法。使用上面的压缩函数 f ,我们将使用 n 比特值作为内部状态,并在每次迭代中输入 r 比特(设置 r=n 是很常见的)。要做到这一点,我们从初始值(IV)开始,将消息#4分解为 r 比特 M=M0M1⋯Mm 的块,然后通过设置简单地迭代构建:
Confused? Perhaps the the following diagram will help:
Diagram of the MD construction (from Wikipedia)
The most important thing about the MD-construction is that if the compression function is collision resistant, then so is the overall construction (as proven by Merkle). This gives us a secure method for building hash functions out of a smaller, easier studied primitives.

Length Extension 长度延伸

You might notice that the diagram has an extra stage that my description didn't: the "finalisation" stage. This is to prevent length extension attacks. For an example, if N is a single block (ie N∈{0,1}r) if the attacker knows h(M)=x, then he can very easily calculate h(M||N), because h(M||N)=f(M,N). So, some form of finalisation function has to be used to break this relationship.
你可能会注意到,这个图表有一个我描述中没有的额外阶段:“最终确定”阶段。这是为了防止长度扩展攻击。例如,如果 N 是单个块(即 N∈{0,1}r ),如果攻击者知道 h(M)=x ,那么他可以非常容易地计算 h(M||N) ,因为#4。因此,必须使用某种形式的最终确定函数来打破这种关系。

