Should you use MD5 and SHA-1 hashes for file integrity?

The Mandiant advanced persisten threat reports were published on a website along with MD5 and SHA-1 hashes intended… actually, to do what? These reports were published at Mandiant website intelreport.mandiant.com which is a non-SSL HTTP server. At the bottom there are the following hashes:

Mandiant_APT1_Report.pdf MD5: 936FEB234F60CFBF6916BA61FBAB2781 SHA-1: 3974687624EB85CDCF1FC9CCFB68EEA052971E84

Mandiant_APT1_Report_Appendix.zip MD5: FD103F16BBBB28162C23BE3A47371AA9 SHA-1: ABF9D09A991E56393D18433644FF0DBA907A9154 </code>

On Twitter you could see recommendations to check these hashes after download. But how useful they are actually are? I’d say: none at all.

If the file is damaged in-transit then you will notice this when your PDF reader or ZIP unpacker alerts you about the file being damaged. Incomplete download is really the only possible scenario here, because there’s plenty of checksums on lower network layes to prevent accidental bit flipping in the file contents. And ZIP itself has built-in CRC32 checksums as well.

So maybe it’s to prevent intentional document manipulation (like Great Firewall of China)? It’s seems like many people still perceive these MD5 and SHA-1 hashes as a form of integrity and authenticity protection, as explained in an article Use MD5 hashes to verify software downloads (unrelated to Mandiant):

It’s always a good idea to make sure someone has not somehow arranged for your download to be compromised so that you get a modified or different file that can be used to crack security on your computer when executed.

And this is where it just gets plainly wrong. The problem reduces to a classical chicken or the egg problem, which is called trust anchor in information security:

There's a file posted on a website that you want to make sure is authentic,
There is cryptographic hash posted along with the file,
But do you know that the hash is authentic</a>, given it's posted on the same site as these untrusted files? </ol> HTTPS comes handy here. It ensures two security attributes: data confidentiality through encryption and server authenticity through X.509 certificates. And it was exactly the same problem that Netscape tried to solve when they initially created SSL – how can you send you login and password to a server, if you don't know who owns it? HTTPS helps build the trust path:

My browser trusts the issuer who signed website's certificate (trust anchor),
and the website's signature is valid, so the website is authentic (trust path validation),
so the hashes posted there are authentic as well,
so I can use them to verify file authenticity. </ol> This would work for publication, but not neccesarily for long-term storage, where you probably need to look at PGP or S/MIME based digital signatures.