This tutorial shows the Simplest to Calculate Checksum. It shares a Python function that handles the MD5 and SHA256 hashing functions which can be used to check your file(s) integrity.
1. What is Checksum?
If you are in the world of computers and, why not, even cybersecurity, you may have heard the term control sum thrown here and there. A control sum is a sequence of numbers and letters used to check the error data. If you know the checksum for an original file, you can use a control sum utility to confirm that your copy is identical. To produce a checksum, you run a program that places this file via an algorithm. The algorithm uses a cryptographic hash function that takes an input and produces a string (a sequence of numbers and letters) of a fixed length. The input file can be a small file of 1 MB or a huge 4GB file, but in any case, you will get a control sum of the same length. The sums of control can also be called “hash”.
When the checks are useful You can use control sums to check files and other data for errors that occur during transmission or storage. For example, a file may not have been downloaded correctly due to network problems or hard disk problems may have caused the corruption of a file on the disk. Computers use control sum techniques to check the data looking for problems in the background, but you can also do it yourself. You can also use control sums to verify the integrity of any other type of file, applications to documents, and media. It is quite important to mention that this control sum does not occur simply because it is created by an operation called the control sum function. However, you can also meet him in the form of a control sum algorithm, just for you to know it. The design of an algorithm may vary, but in general, an effective control sum algorithm must generate a significantly different value (control sum), even for the smallest changes made to the entry, regardless of the type of document or data.
1.1. Word of parity or byte of parity
We start with the simplest control sum algorithm, better known as longitudinal parity control. The operation of this algorithm is to divide the data into words that have a fixed number of bits (n), then calculate the exclusive “or” (XOR) of all the words it has just been treated. The result of this calculation is added to the message as an additional word.
To verify the integrity of a message, the recipient calculates the “or exclusive” (XOR) of all words, inclusive control sum, and checks whether or not the result is a word composed of (n) zeros. If this is the case, everything is fine, but otherwise, it will know that an error has occurred somewhere during the transmission.
1.2. Supplement
As you can imagine, because it is defective, there has been an attempt to perfect the algorithm mentioned and described above, so the sum supplement algorithm was designed. Unlike the parity control algorithm, the sum supplement added all the “words” in the form of unsigned binary numbers, while removing each overflow bit, as well as adding the total supplement to two as a sum of control.n order to validate a message, the recipient must add all the words as the longitudinal parity control algorithm, including also the check. If the resulting channel is not a word filled with zeros, it is a strong indicator that a transmission error occurred at a given moment.
1.3. According to the position.
The position-dependent control algorithm was intended to correct the flaws of the two functions described above because they are not exactly capable of detecting many common mistakes that can occur in more than one bit. Some examples of these common errors include the insertion or deletion of words with all bits set to zero, as well as the modification of the order of the data words.
1.4. Sum of blurred control
This type of control sum algorithm has been developed as an effective way to detect spam by e-mail. The operation of this function is to generate cooperative databases from several ISPs. These databases included suspected e-mails or reported as spam. However, since the contents of each junk mail may be very different from the following, it would make the control sum algorithms and conventional checks ineffective and therefore not worth using. In turn, the use of a blurred check algorithm can reduce the body of a minimum email, then calculate a control sum as usual.
1.5. How to use the file control sum?
If you are serious about the safety of your computer and you understand the dangers of the random download of files on the left and right from ladle websites, you may want to use the control sum to check the integrity of your files before downloading or copying them wherever they are. to be able to do damage.
2. Checksum: MD5 and SHA256
First and foremost, the simple Python function below can be used to calculate the checksum for files using the MD5 and SHA26 hashing functions.
# !/usr/bin/env python3
# -*- coding: utf-8 -*-
import hashlib
def get_checksum(filename, hash_function):
"""Generate checksum for file baed on hash function (MD5 or SHA256).
Args:
filename (str): Path to file that will have the checksum generated.
hash_function (str): Hash function name - supports MD5 or SHA256
Returns:
str`: Checksum based on Hash function of choice.
Raises:
Exception: Invalid hash function is entered.
"""
hash_function = hash_function.lower()
with open(filename, "rb") as f:
bytes = f.read() # read file as bytes
if hash_function == "md5":
readable_hash = hashlib.md5(bytes).hexdigest()
elif hash_function == "sha256":
readable_hash = hashlib.sha256(bytes).hexdigest()
else:
Raise("{} is an invalid hash function. Please Enter MD5 or SHA256")
return readable_hash
2. Test case
Moreover, I downloaded a random image online and computed the checksum for MD5 and SHA256 hashing functions using the Python function and to compare it to the one generated using Unix commands.
import os
photo = "g_circle-300x300.png"
md5_result = get_checksum(photo, "md5")
sha256_result = get_checksum(photo, "sha256")
os.system("md5 {}".format(photo))
print('Hash Function: MD5 - Filename: {}'.format(md5_result))
os.system("shasum -a 256 {}".format(photo))
print('Hash Function: SHA256 - Filename: {}'.format(sha256_result))
which prints as
MD5 (g_circle-300x300.png) = 0100c08784ed0f8defa8c9156b45a97e
Hash Function: MD5 - Filename: 0100c08784ed0f8defa8c9156b45a97e
132124053114b94b46e421dbc4587a15e6962d812389d1705c66d431f1944b9e g_circle-300x300.png
Hash Function: SHA256 - Filename: 132124053114b94b46e421dbc4587a15e6962d812389d1705c66d431f1944b9e
Last but not least, in case you want to learn more about checksum, please check the Youtube video below:
More Resources
Here are three of my favorite Python Books in case you want to learn more about it.
- Python Cookbook, Third Edition by David Beazley and Brian K. Jones
- Learning Python, 5th Edition Fifth Edition by Mark Lutz
- Python Pocket Reference: Python In Your Pocket by Mark Lutz
Conclusion
In conclusion, this tutorial has presented a simple and effective way to calculate checksums for ensuring the integrity of files. By using the Python function to handle MD5 and SHA256 hashing functions, users can easily generate checksums that can be compared against the original data to ensure that it has not been altered or corrupted during transmission or storage. This method is particularly useful for applications where data accuracy and reliability are critical, and it can help prevent data loss or security breaches. By following the steps outlined in this tutorial, users can implement this technique in their systems and have peace of mind knowing that their data is safe and secure.