I need to develop an MD5 checksum routine to check files that were being picked up from a Unix host. I found a number of blogs/code samples that showed the basics of using the MD5CryptoServiceProvider, so I quickly put together the code.
Unfortunately when I tested the results against the sample data, my checksums were different.
After some research I narrowed down the problem to the Unix MD5Sum using the file in TEXT mode. This is native for Unix, but not windows.
Ok, I can change my routine to use text mode as well? Not quite that simple.
I could not find a way to get .NET c# to read a file in text mode.
After some more research I found that the main difference when using text mode on a windows platform is that \r\n (carriage return, line feed) have the \r removed. In binary mode this is obviously not done.
So I did a quick test and did a string replace prior to doing the MD5 hash, and voila!!! I got the match.
So here is my class I created to allow for both modes when doing the MD5 checksum.
Hope this saves someone else some time!
class MD5FileCheckSum
{
public static string CalculateChecksum(Stream inputDataStream, bool textMode)
{
using (MD5 md5Computer = new MD5CryptoServiceProvider())
{
if (textMode)
{
//in text mode - for UNIX compatibility, we need to remove the \r characters from the \r\n end of lines
//this is also the same result as using the 'c' command fopen r
StreamReader sr = new StreamReader(inputDataStream);
string textdata1 = sr.ReadToEnd();
//remove the carriage returns
string textdata2 = textdata1.Replace("\r\n", "\n");
//convert back to a byte array
byte[] textbytes = Encoding.ASCII.GetBytes(textdata2);
//do the hash
byte[] retvaltext = md5Computer.ComputeHash(textbytes);
//return the result
return BitConverter.ToString(retvaltext).Replace("-", String.Empty);
}
else
{
//binary mode - just hash the whole data
byte[] retval = md5Computer.ComputeHash(inputDataStream);
return BitConverter.ToString(retval).Replace("-", String.Empty);
}
}
}
}