The Magic Of Hash (And I Mean Of The MD5 and SHA-1 Vintage)

Hashing is one of the most useful practices a sysadmin can employ. I think too many sysadmins do not use this as a regular part of their everyday work. While there are security concerns that go with hashing for verification purposes, once you understand what it does you’ll find there are many uses for it too.


Just the facts, Please

This article is more about what a hash is and how you can use it everyday whether you are an end-user or a sysadmin. ESPECIALLY if you are a sysadmin you should read this if you are not already utilizing hashing. I then go into examples with Linux and Windows how to do it.

If you’re interested in knowing just how to calculate hashes, I have a separate updated article here on calculating hash in Linux, AIX, and Windows:

Getting MD5 and SHA-1 hash values on linux, AIX, and Windows

Okay that being said, onward!!

What is a hash?

So what exactly is a hash? It takes some sort of data, runs an algorithm on that data, and outputs a string.

The idea behind this is that unique content will output a unique hash. If you run a hash on two pieces of duplicate content, you will get the same hash.

Sponsored Links

As an example, a fingerprint is one type of “data”, and when you run a hashing algorithm on it, it will output a unique string (hash)


[pic from (excellent site which I'll mention later in this post)]

The md5sum, sha1sum, and csum Commands For Hashing

So how do I find the commands to find hashes on my servers?

The most commonly used algorithms and commands are for md5 hash algorithm and sha-1 hash algorithm.

The commands you would use on linux would be

md5sum [filename]


sha1sum [filename]


csum -h MD5 [filename]


csum -h SHA1 [filename]

Examples of using hashing on linux and AIX

Sponsored Links

Here I will cat the contents of the file, then run the hashing commands. You’ll see the unique hash outputs.

AIX (using csum command)

aixtestbox01 / # cat 1.txt
aixtestbox01 / # cat 2.txt
aixtestbox01 / # cat 3.txt
aixtestbox01 / # csum -h MD5 1.txt
ba1f2511fc30423bdbb183fe33f3dd0f 1.txt
aixtestbox01 / # csum -h MD5 2.txt
ba1f2511fc30423bdbb183fe33f3dd0f 2.txt
aixtestbox01 / # csum -h MD5 3.txt
e7df7cd2ca07f4f1ab415d457a6e1c13 3.txt
aixtestbox1 / # csum -h SHA1 1.txt
a8fdc205a9f19cc1c7507a60c4f01b13d11d7fd0 1.txt
aixtestbox1 / # csum -h SHA1 2.txt
a8fdc205a9f19cc1c7507a60c4f01b13d11d7fd0 2.txt
aixtestbox1 / # csum -h SHA1 3.txt
1be168ff837f043bde17c0314341c84271047b31 3.txt
aixtestbox1 / #

Linux (using md5sum and sha1sum command)

linuxtestbox01:~ # cat 5.txt
linuxtestbox01:~ # cat 6.txt
linuxtestbox01:~ # md5sum 5.txt
ba1f2511fc30423bdbb183fe33f3dd0f 5.txt
linuxtestbox01:~ # md5sum 6.txt
e7df7cd2ca07f4f1ab415d457a6e1c13 6.txt
linuxtestbox01:~ # sha1sum 5.txt
a8fdc205a9f19cc1c7507a60c4f01b13d11d7fd0 5.txt
linuxtestbox01:~ # sha1sum 6.txt
1be168ff837f043bde17c0314341c84271047b31 6.txt
linuxtestbox01:~ #

So note that when the contents are the same, the hashes match. Hashing algorithms are universal and so there should be a way to run a hashing algorithm on a file on any OS.

Windows (using cygwin’s md5sum command)

Update: Use Microsoft’s FCIV tool. I’ve written about that here: Getting MD5 and SHA-1 hash values on linux, aix, and Windows

For windows, I’m sure there is a gui based software. I prefer to use cygwin (being the unix/linux guy I am). Lots of great utilities including the md5sum and sha1sum commands


So how does this help us as a computer user or a sysadmin?

When do you think you’d ever want to verify that a file’s contents are the same as another file’s contents? The primary reason is security and is also what it is most likely used for.

This is why if you’ve ever downloaded any type of open source software (like Apache) or any installation .iso (like CentOS) there is usually a hash (sha1 or md5) on the vendor site. This is especially true if the vendor employs mirror site for download (like CentOS).

But hashing is MUCH MORE useful than just for security. You can use it to check how pristine and intact files are whenever they are copied somewhere.


Some examples of me actually using hashing in my everyday job and everyday life

* Verify downloaded software is intact

  1. Download .iso off mirror site
  2. Run the hash on the .iso you downloaded
  3. Compare hash to the hash on the actual vendor site
  4. They match? Your .iso you downloaded is intact

* In place of recursive diff when comparing two directories

Because when you run a diff on two files, sometimes it might just take a long time

  1. md5sum hash all files in directory #1 and output to a file
  2. md5sum hash all files in directory #2 and output to a file
  3. diff the two files.

* Same as above, but now comparing directories on different servers

What if the directories you want to compare are on different servers? For example a data directory on a production server and a data directory on a disaster recovery server miles away. Recursive diff is not practical

  1. md5sum hash all files on production server and output to a file
  2. md5sum hash all files on disaster recovery server and output to a file
  3. diff the two files

* What about if there’s no network access? And there are thousands of files

You don’t want to eyeball the hashes of thousands of files

  1. md5sum hash all files on production server and output to a file
  2. md5sum hash the output file (just one file)
  3. md5sum hash all files on disaster recovery server and output to a file
  4. md5sum hash the output file (just one file)
  5. now you only need compare the md5hash of one file on production and one file on disaster recovery

These are the very simple steps. Realize you need to make sure that all your data is same format, and same order. I usually use “sort” on the filename so that everything is listed in order.

Also in this last example, realize also that if one directory has extra files in it, that will throw your md5hash off even if all files are identical and one directory has one extra file

Okay..more uses!

* Oracle export and import to copy a database from one server to another ..with NO network access!

  1. DBA exported an Oracle database at one colocation site.
  2. md5sum hash all files that were exported by DBA
  3. Copy all files to USB drive
  4. md5sum hash all files on USB drive and compare to original files
  5. Everything intact: ship the USB drive to second colocation site
  6. At second colo site, copy all files to second server
  7. md5sum hash files that are on the second server
  8. Everything intact: import to the database knowing all data is intact

You probably get the idea. Any time I send someone a binary, a driver, an installation executable, I will md5sum hash the file and send the string too. When they download the file to their location, they can md5sum hash the file and make sure it is intact

You may think this is overkill…

But in my years of sysadmin experience, corrupt files have caused a lot of headaches

One final example

* Finding new install media from non-official sources

A user asked for a windows XP VM to be installed. As much as we tried to get this VM up and running, each time we handed it over the VM acted “funky”

Eventually I logged in to the MSDN microsoft site and checked the sha1 hash and ran it on our iso. Low and behold, our iso was not pristine. Somehow it had gotten corrupt over the years.


Right away, that’s helpful.


When I went to download the .iso from the MSDN site, it was NO LONGER AVAILABLE!! And this was an emergency

So what did I do? Well, the name of the .iso is right there.


So I googled it. Naturally there are a lot of downloads available from “other” sites.


So I had a lot of options for downloading that .iso. I eventually did. And then ran the sha1sum on it and compared it to Microsoft’s site. And it matched!

Just for fun, I downloaded it from quite a few sites and they all matched.

So voila, recovered .iso until we could get it from Microsoft again

Is that safe? Are these hashing algorithms really safe?

Well, ….. theoretically yes, and I would say for the majority of everyday use, sure. It can’t hurt, it can only help. But be mindful that yes, of course eventually someone will figure ways to hack or crack anything.

A fantastic article about the security of hashing is right here:

What we really should be wary of: two different files that have the same hash. One could be benign, one could be a virus or malware

I love one example in the article where the author talked about where they had two postscript files with an identical md5 hash, but one letter was a security clearance while the other was a letter of recommendation.

However, for a malicious attack like this, for two different files to have the same hash, both files need to be created by the attacker. (You can read about it here: AND the hash cannot be targeted. You cannot find the true hash published by the vendor and design a file to match that hash. Instead, an attacker can create two different files that have identical hashes. So if you use the hash from the vendor, you know that is the actual hash you can check against.

The other issue: since there is a finite number of hashes available (depending on encryption strength) it is possible that by dumb random luck two different files may end up with the same hash. This is called a “collision”. Good hashing algorithms are supposed to be “collision resistant”.

So yes, I still use hashing…all the time

It’s too useful not to use it, and the security holes, while there, should not be too big a worry. And it’s always good to be aware of security issues around the situation you are using it for.

For me specifically, IT’S AWESOME! Many of my uses are internal (like the Oracle import/export or the internal comparison of directories on servers with no network access). Which is why I also use whatever is fastest (md5sum seems fast for me at least).

If the situation you are using it for dictates you be more mindful about security, like downloading third party software, then use a stronger hashing algorithm if available. Keeping in mind naturally that stronger algorithms a.k.a more complex algorithms will take longer (BUT IT’S SO WORTH IT!)

In the end…

If the reason you are using the hash is for security reasons, use the more secure hash that is given to you. Such as downloading .iso from a third party site not related to the vendor (like my Windows example).

If the reason you are using the hash is for internal checking, such as my Oracle export/import example, then security is not so much of a concern as file checking AND speed – so you can use the fastest hashing algorithm.

Hope this has been helpful!

Geek, Linux, Unix, Windows

Leave a Comment

Your email address will not be published. Required fields are marked *

Spam protection by WP Captcha-Free