Checksum
Algorithms
Site Map Feedback
Up Adler32 CRC32 MD5 Roller32

Check or Compare Data

The algorithms in this section generate Checksums.
Checksums have various uses, hence the different versions.
The basic idea is that you have some data (for example a text file) that you want to check.
The quickest way would be to add together (sum) all the Bytes in the file and see what number you got.
If you had another copy of a similar file, you could add all the Bytes together in that file too.
If the sum of Bytes in both files is different, you know that the files have different data.
Unfortunately, if the sums are both the same, the files may still be different!
If one file contained Bytes:
1 2 3 4
and the other contained:
4 1 2 3
Both add up to 8 but the files are not the same!
So real checksums are more complex than simple adders.
They use equations that give different answers if the same Bytes occur in a different order.

The Checksum most likely to give a different Checksum for different data is MD5.
For data of any length you get a Checksum (or Hash Code) which is 32 Bytes long.
People love to use this to encrypt Passwords, for example.
The user types in their Password, the web-page finds the MD5 Checksum of the Password text and sends only the MD5 to the server, where it can be stored in a database.
There is no record of the actual password text at all.
When the user next logs in, all that matters is that the Password text they type has the same MD5 checksum that it had when they set up their password.
To hack someone's password from the MD5 would require going through every possible password text, finding the MD5 checksum and seeing if it matched.
While not impossible, that is more effort than most people would go to!
For more security, the password Checksum only needs to be hidden again in some way (exclusive-ORed with a suitable number) to stop that sort of attack.

Most File Checksum use is to make sure that you have the latest version of a downloaded File, that the file isn't corrupted (bits flipped), or that the File you have hasn't had extra data added, or been zeroed by a Virus.
This sort of check can be done with a simpler checksum and CRC32 is very popular (used in .zip files, for example).

Adler32 is another checksum alogorithm that is used as a "quick and dirty" version of the CRC32 algorithm. You get a single 32-bit number that represents a block of data (of any size but >128 bytes preferably). If the data changes in any way (even a single bit) the Adler32 calculation would yield a completely different value.
A very fast checksum, which takes very little code to implement is:
unsigned Checksum(const unsigned char* Buffer, unsigned Len) {
  for(unsigned i=0,s1=0,s2=0; i++<Len; s2+=s1+=13+*Buffer++);
  return s1+(s2<<16);
}
This is used in a program which compares files to see if they are the same without comparing every byte of the files (because the files are separated by a slow medium, like the Internet).
The fast (but weak) checksum helps indicate that the files may be the same.
Once you know they may be the same you can do a stronger checksum like MD5 to get a better comparison.

The last type of checksum is one that can be maintained for the current data block in a Stream while the Stream changes.
This 'Rolling' Checksum scrolls through data: you can remove a Byte from the start of the Data, and add a Byte at the end, and the new Checksum Value is the same as if you can recalculated the Checksum of the whole block of data!
For example, for a block size of 10 and data:
1 2 3 4 5 6 7 8 9 10
The sum of numbers is 55.
If the 1 is removed and an 11 added
the new data is:
2 3 4 5 6 7 8 9 10 11
and the new sum of numbers is 55-1+11 which is easy when just adding again!
Not so easy for a real checksum!
Again, it's fast and intended as a first check for similar data.
If you find the checksum you're looking for in the Stream of Data, it's worth checking that data using a stronger Checksum like MD5.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.