The Simplest to Calculate Checksum

by:

Python

This tutorial shows the Simplest to Calculate Checksum. It shares a Python function that handles the MD5 and SHA256 hashing functions which can be used to check your file(s) integrity.

1. What is Checksum?

If you are in the world of computers and, why not, even cybersecurity, you may have heard the term control sum thrown here and there. A control sum is a sequence of numbers and letters used to check the error data. If you know the checksum for an original file, you can use a control sum utility to confirm that your copy is identical. To produce a checksum, you run a program that places this file via an algorithm. The algorithm uses a cryptographic hash function that takes an input and produces a string (a sequence of numbers and letters) of a fixed length. The input file can be a small file of 1 MB or a huge 4GB file, but in any case, you will get a control sum of the same length. The sums of control can also be called “hash”.

When the checks are useful You can use control sums to check files and other data for errors that occur during transmission or storage. For example, a file may not have been downloaded correctly due to network problems or hard disk problems may have caused the corruption of a file on the disk. Computers use control sum techniques to check the data looking for problems in the background, but you can also do it yourself. You can also use control sums to verify the integrity of any other type of file, applications to documents, and media. It is quite important to mention that this control sum does not occur simply because it is created by an operation called the control sum function. However, you can also meet him in the form of a control sum algorithm, just for you to know it. The design of an algorithm may vary, but in general, an effective control sum algorithm must generate a significantly different value (control sum), even for the smallest changes made to the entry, regardless of the type of document or data.

1.1. Word of parity or byte of parity

We start with the simplest control sum algorithm, better known as longitudinal parity control. The operation of this algorithm is to divide the data into words that have a fixed number of bits (n), then calculate the exclusive “or” (XOR) of all the words it has just been treated. The result of this calculation is added to the message as an additional word.

To verify the integrity of a message, the recipient calculates the “or exclusive” (XOR) of all words, inclusive control sum, and checks whether or not the result is a word composed of (n) zeros. If this is the case, everything is fine, but otherwise, it will know that an error has occurred somewhere during the transmission.

1.2. Supplement

As you can imagine, because it is defective, there has been an attempt to perfect the algorithm mentioned and described above, so the sum supplement algorithm was designed. Unlike the parity control algorithm, the sum supplement added all the “words” in the form of unsigned binary numbers, while removing each overflow bit, as well as adding the total supplement to two as a sum of control.n order to validate a message, the recipient must add all the words as the longitudinal parity control algorithm, including also the check. If the resulting channel is not a word filled with zeros, it is a strong indicator that a transmission error occurred at a given moment.

1.3. According to the position.

The position-dependent control algorithm was intended to correct the flaws of the two functions described above because they are not exactly capable of detecting many common mistakes that can occur in more than one bit. Some examples of these common errors include the insertion or deletion of words with all bits set to zero, as well as the modification of the order of the data words.

1.4. Sum of blurred control

This type of control sum algorithm has been developed as an effective way to detect spam by e-mail. The operation of this function is to generate cooperative databases from several ISPs. These databases included suspected e-mails or reported as spam. However, since the contents of each junk mail may be very different from the following, it would make the control sum algorithms and conventional checks ineffective and therefore not worth using. In turn, the use of a blurred check algorithm can reduce the body of a minimum email, then calculate a control sum as usual.

1.5. How to use the file control sum?

If you are serious about the safety of your computer and you understand the dangers of the random download of files on the left and right from ladle websites, you may want to use the control sum to check the integrity of your files before downloading or copying them wherever they are. to be able to do damage.

2. Checksum: MD5 and SHA256

First and foremost, the simple Python function below can be used to calculate the checksum for files using the MD5 and SHA26 hashing functions.

# !/usr/bin/env python3
# -*- coding: utf-8 -*-


import hashlib


def get_checksum(filename, hash_function):
    """Generate checksum for file baed on hash function (MD5 or SHA256).

    Args:
        filename (str): Path to file that will have the checksum generated.
        hash_function (str):  Hash function name - supports MD5 or SHA256

    Returns:
        str`: Checksum based on Hash function of choice.

    Raises:
        Exception: Invalid hash function is entered.

    """
    hash_function = hash_function.lower()

    with open(filename, "rb") as f:
        bytes = f.read()  # read file as bytes
        if hash_function == "md5":
            readable_hash = hashlib.md5(bytes).hexdigest()
        elif hash_function == "sha256":
            readable_hash = hashlib.sha256(bytes).hexdigest()
        else:
            Raise("{} is an invalid hash function. Please Enter MD5 or SHA256")

    return readable_hash

2. Test case

Moreover, I downloaded a random image online and computed the checksum for MD5 and SHA256 hashing functions using the Python function and to compare it to the one generated using Unix commands.

import os

photo = "g_circle-300x300.png"
md5_result = get_checksum(photo, "md5")
sha256_result = get_checksum(photo, "sha256")

os.system("md5 {}".format(photo))
print('Hash Function: MD5 - Filename: {}'.format(md5_result))

os.system("shasum -a 256 {}".format(photo))
print('Hash Function: SHA256 - Filename: {}'.format(sha256_result))

which prints as

MD5 (g_circle-300x300.png) = 0100c08784ed0f8defa8c9156b45a97e
Hash Function: MD5 - Filename: 0100c08784ed0f8defa8c9156b45a97e
132124053114b94b46e421dbc4587a15e6962d812389d1705c66d431f1944b9e  g_circle-300x300.png
Hash Function: SHA256 - Filename: 132124053114b94b46e421dbc4587a15e6962d812389d1705c66d431f1944b9e

Last but not least, in case you want to learn more about checksum, please check the Youtube video below:

More Resources

Here are three of my favorite Python Books in case you want to learn more about it.

Conclusion

In Summary, I hope this tutorial was useful for you. Hopefully, you learned how simple it is to check for file integrity for MD5 and SHA256.

Comment below for what kind of application you need to calculate the MD5 and SHA256 hashes.

Related Posts