# MD5 is less secure than SHA-1 so why...

## SubAtomic

MD5 is less secure than SHA-1 so why do we guarantee the authenticity and integrity of sources downloaded from gentoo mirrors using MD5?

It is suggested (Schneier, Zimmermann and others) that we should no longer use MD5 and should move to a more secure hash algorithm such as SHA-1 or even RIPEMD-160, RIPEMD-256, or RIPEMD-320. These algorithms are easily implemented and very cheap (computationally cheap that is).

Please refer to the following sources for more information on the topic...

http://samsimpson.com/cryptography/pgp/pgpfaqnew.html#SubMD5Broke

and

[book] Secrets and Lies, Digital Security in a Networked World by Bruce Schneier [/book]

Of course there are many other sources available.

Because gentoo is bleeding edge and we all want to ensure the absolute integrity of our downloaded sources, arent these reasons enough for gentoo to move to a more secure hashing system?

I welcome your thoughts on this subject.

----------

## kashani

Link the 1996 paper detailing most of this.

ftp://ftp.rsasecurity.com/pub/cryptobytes/crypto2n2.pdf

"If essentially weaker cryptographic properties than

collision-resistance suffice then the use of MD5

might still be secure. MD5 can still be used as a

one-way function. The HMAC due to Bellare,

Canetti, and Krawczyk [3, 4] is not touched by the

recent analytic progress."

I'm assuming this is why md5 is used so often for source code and not so much for encryption systems. 

kashani

----------

## georwell

I would wager that md5 is used to verify that the download of the tar file wasn't corrupted not for serious verification.  If you can hack the server you can add your md5 if you want.

Gentoo has talked about signing each package with a developers GPG key and using that to verify authenticity, perhaps with portage-ng?

----------

## asimon

 *georwell wrote:*   

> 
> 
> Gentoo has talked about signing each package with a developers GPG key and using that to verify authenticity, perhaps with portage-ng?

 

Yes, I remember such talk 2 years ago. In the meantime we even have heard of a compromised Gentoo server   :Twisted Evil: 

BTW it has become very silent on the portage-ng front. Is there any news or progress?

----------

## twoblink

I'm fairly curious, I'm not a big fan of MD5, I think it should be retired because we have things like SHA256 which is a bit more secure.

Why didn't the portage integrity checks use SHA instead of MD5?  It would have been just as easy and a bit more secure..

----------

## kashani

https://forums.gentoo.org/viewtopic.php?t=148344&highlight=

I believe the plan is still to replace the hash stuff with something that adds security in a meaningful way.

kashani

----------

## GentooBox

 *twoblink wrote:*   

> I'm fairly curious, I'm not a big fan of MD5, I think it should be retired because we have things like SHA256 which is a bit more secure.
> 
> Why didn't the portage integrity checks use SHA instead of MD5?  It would have been just as easy and a bit more secure..

 

MD5 works like it should, dosent it ?

MD5 is just a watermark of the file, and thats all what portage needs.

so why should portage change to SHA ?

----------

## SubAtomic

It seems the trustworthiness of MD5 to correctly verify the integrity of packages has been broken by Xiaoyun Wang et al with their collision attack.

http://www.doxpara.com/md5_someday.pdf

There are also questions now about the security of SHA-0 and SHA-1.

http://www.freedom-to-tinker.com/archives/000661.html

http://www.freedom-to-tinker.com/archives/000664.html

----------

## mrpdaemon

The attack by Wang et al. is a collision attack (finding two distinct things that hash to the same value). It is of little applicable use (compared to, for example a preimage attack), and it would be way too paranoid to replace Gentoo's integrity check with SHA-1 for this. If someone has the computational resources to find second-preimages to MD5 which contain malicious code, than we are in more serious trouble than imaginable anyways  :Smile: 

----------

## SubAtomic

The insecurities found (i.e. changes can be made to source data that does not affect the md5 hash) are much more concerning than the ability of finding two distinct things that hash to the same value. 

The ability of making changes to the source without affecting the hash itself is reason enough to reconsider the use of md5.

----------

## mrpdaemon

See, every hash algorithm is bound to have collisions. Infinitely many of them, as a matter of fact. But with good hash algorithms, they are notoriously hard to find. The attacks found by Wang et al. (not published yet, we need to wait to see how they do it) are very likely to be extremely limited in scope and applicability.

You are confusing collision with second preimage. It is one thing to find two values that hash to the same value (which Wang et al. did) and it is completely another thing to find a second value which hashes to the same value with a file you have. The latter is deemed to be much harder.

However it is true that these attacks demonstrate that MD5 is indeed not a very strong hash function (and likely to be pretty badly broken in this decade), so it is a wise idea to phase it out gradually, in favor or hash functions we think are more secure. No reason to panic though, since it was known for quite some time that MD5 has weaknesses, and there is no attack demonstrated to find arbitrary second preimages.

----------

## vonhelmet

The idea is this:

You have some piece of vicious code that you want to disseminate. You know the MD5 hash of this piece of code, and the size. You then find a piece of innocent code that is of the same size and has the same MD5 hash. You can then add either piece of code to a package and the two packages will have the same size and hash, but one will be dangerous and the other safe. You then get someone to review and approve the safe code, then replace it with the dangerous code. As the package will have the same hash and size, people won't suspect that it is any different. Havoc ensues.

I don't think this is dangerous at the moment, as it is probably still exceedingly difficult to do this. Yes, it has been shown as proof of concept with about 16 bytes of data, but at this stage I think it would be extremely difficult to find a collision for the hundreds or thousands of bytes needed for, say, a rootkit.

We're safe for now. Yes, we will need to move on from MD5 in the future, but the same thing will happen in time with whatever cryptography we use. It's not a big issue at the moment, and it should be dealt with in time.

----------

## jeroo

doEs anyone kNow oF 

Data-Hiding encryption Capacity of Image Sources on web pages?

found some ReAl interesting methods with Crypt0graphy Research Labs go to their lab, they have a method of hiding a encryption function behind an image and other stuff I've nevEr heard oF..crazy. 

http://www.cryptanalysis.biz/index.html

other old Steganography stuff 

http://www-staff.lboro.ac.uk/~elaja/IEEElectronicsLetters2002.pdf

cheers~roo

(' :Twisted Evil: ')

----------

## littlebuddy

Sorry for reviving an old thread...

So gentoo still uses MD5 for it's checksum, which may or may not necessarily be a bad thing.

What hasn't been answered in the thread is this: "For what purpose are the md5 sums being used?"

If it's just for transmission integrity, then the use of md5 is probably fine (but there are multiple levels of network error-correction...).

If it's to protect against an adversary of some sort, what kind?  In this case gentoo should really consider switching to another hash function - there is a recent paper (http://eprint.iacr.org/2005/075) which suggests that collisions in md5 could be found in only a couple of minutes on a laptop.

----------

## kashani

Is transmission integrity any different than making sure people haven't downloaded tainted sources? Not really in my opinion. And as all the papers written by people who know what they are talking about say:

"md5 has more collisions than we thought. That's bad.

Hacking up random software so that the hashes collide is still non trivial. That's good.

But it's probably a good idea to change to something stonger in the next year or so."

kashani

----------

## littlebuddy

 *kashani wrote:*   

> Is transmission integrity any different than making sure people haven't downloaded tainted sources? Not really in my opinion.

 

I would have to disagree with you there.  In the first case (guarding against transmission failures) there is no adversary, in the second case, there is.  From a mathematical point of view, for two randomly chosen messages M,M' from the domain of MD5, the probability that MD5(M) == MD5(M') is assumed (correctly, I think) to be around 1/2^{128}.  In this case MD5 is still an excellent choice for determining if two files differ.  As the recent attacks show, messages with carefully chosen differentials can be shown to collide with probability much less than that (more than 1/2^{40}).

Incidentally, if all gentoo is looking to do with MD5 is find transmission errors, there is a set of functions (Universal Hash Families) that do just that, except much, much faster (factor of 10 or thereabouts).

 *kashani wrote:*   

> 
> 
> And as all the papers written by people who know what they are talking about say:
> 
> "md5 has more collisions than we thought. That's bad.
> ...

 

Saying that MD5 has 'more collisions than we thought' doesn't make much sense.  It 'has' the same number of collisions it always did, we just know more efficient ways of finding them now.

'Hacking up random software so that the hashes collide' is NOT non-trivial.  There is a recent paper (http://eprint.iacr.org/2005/067) that gives two X.509 certificates with the same MD5 hash.

Perhaps what you mean is that finding second pre-images is still considered to be difficult.  This is currently true, but I would not be surprised if this were to change soon with respect to MD5.

----------

## Jake

 *mrpdaemon wrote:*   

> The attack by Wang et al. is a collision attack (finding two distinct things that hash to the same value). It is of little applicable use (compared to, for example a preimage attack), and it would be way too paranoid to replace Gentoo's integrity check with SHA-1 for this. If someone has the computational resources to find second-preimages to MD5 which contain malicious code, than we are in more serious trouble than imaginable anyways 

 

Too paranoid? I just compared the hash times (3 run averages) for a 100Mb random file- MD5 took 11.04sec and SHA1 took 12.23sec. MD5 is only getting weaker, and I'd rather waste an extra ~10% of CPU time than have to quickly switch when the next breakthrough comes out.

----------

## kashani

Littlebuddy,

I'll take the correction that there are indeed not more MD5 collisions, but that lately collisions have been proven easier to find. I'll also take the correction that you can checksum a file and that's a different kettle of fish than a cryptographic hash. Though the point I badly made was a cryptographic hash can do both and that's where I think the Gentoo devs were going with it.

However you have yet to convince me that producing collisions starting from J Random package anything less than non-trivial. Quoting the paper you linked to

 *Quote:*   

> 
> 
> The heart of our construction is that, starting from a specially crafted
> 
> MD5-collision produced by the group of Xiaoyun Wang, we can construct
> ...

 

and

 *Quote:*   

> 
> 
> 3. Using the techniques developed by Xiaoyun Wang et al.2 we construct two
> 
> different bitstrings b1 and b2, of 1024 bits each, for which the MD5
> ...

 

To me that states, "if we carefully choose X we can create Y with the same hash." However I still reach the conclusion that starting with random data X it is still non-trivial to create Y with a matching hash, much less contruct Y with a malicious payload. Unless you've got a link to another paper that makes this claim.

kashani

----------

## littlebuddy

 *kashani wrote:*   

> To me that states, "if we carefully choose X we can create Y with the same hash." However I still reach the conclusion that starting with random data X it is still non-trivial to create Y with a matching hash, much less contruct Y with a malicious payload. Unless you've got a link to another paper that makes this claim.
> 
> kashani

 

That's completely correct.  There is no literature yet which performs the attack you mention.

But, as others have noted, attacks never get worse, they only seem to get better.  Note that in the first paper (on the X.509 certificates), the collisions were found in very specific types of strings - this is getting closer to the attack you mention.  I envision a time not too distant from the present where finding a useful collision for J random package is possible - why wait until that happens to switch?

We can speculate for a long time on how long it will take to find an attack of the form you mention, but I really just want to know for what purpose MD5 is being used now.

----------

## oglueck

Because it merely takes 45 minutes to generate a file that has the same MD5 value as any given file.

read http://it.slashdot.org/article.pl?sid=05/11/15/2037232&tid=172&tid=93&tid=228

----------

## xces

 *oglueck wrote:*   

> Because it merely takes 45 minutes to generate a file that has the same MD5 value as any given file.

 

MD5 is fine for a plain check, if the downloaded files are corrupted (not compromised). If you are paranoid, check the genuineness of your files with GnuPG. You can set your FEATURES in make.conf accordingly.

----------

## kashani

 *oglueck wrote:*   

> Because it merely takes 45 minutes to generate a file that has the same MD5 value as any given file.
> 
> read http://it.slashdot.org/article.pl?sid=05/11/15/2037232&tid=172&tid=93&tid=228

 

While the above is basically true it doesn't mean that md5 is broken as a way to check that you're getting the right source code under normal conditions. However I do agree that it is time to start moving to something else. 

http://www.cryptography.com/cnews/hash.html

 *the experts wrote:*   

> 
> 
> Q: Do these attacks allow somebody to break tools that use MD5 or SHA-1 to check for malicious binaries?
> 
> A: Not usually, as this would require a preimage attack. It would, however, be possible for someone to construct an innocuous program and a malicious program with the same hash. If this adversary could get the innocuous version on the "good" list (e.g. by having a trusted authority sign the hash value), the malicious program would also be accepted.
> ...

 

kashani

----------

