Post

How do QR Codes Work - Overview of the QR Code

Comprehensive overview of QR codes, structure, including their input modes, character encoding processes, ZXing-cpp library for QR code and etc.

How do QR Codes Work - Overview of the QR Code

About QR code

A QR code, quick-response code, is a type of two-dimensional matrix barcode invented in 1994 by Masahiro Hara of Japanese company Denso Wave for labelling automobile parts. It features black squares on a white background with fiducial markers, readable by imaging devices like cameras, and processed using Reed-Solomon error correction until the image can be appropriately interpreted. The required data is then extracted from patterns that are present in both the horizontal and the vertical components of the QR image.

ver40 version 40 qrcode: Bobmath and authors of wiki:QRcode

Simply put, QR codes are essentially encoded forms for strings, which are ultimately converted into binary code. Various auxiliary and error correction information is added to this code.

There are a total of 40 verisons of QR codes, each version corresponds to a different size. Version 1 is a 21x21 matrix. As the version number increases by one, the length and width of the matrix becomes larger by 4.

\[(Version - 1) \times 4 + 21\]

Structure

finder patterns from Bobmath and authors of wiki:QRcode

There are 3 finder patterns (also called Position Markers) in total, which are used to determine the position of the QR code, so that no matter what direction we scan the code, scanners can locate and recognize which side should face up accurately, also at high speed. It is a fixed size pattern.

There are always a white bar of width 1 around the finder patterns called separator, which helps the scanner distinguish between finder patterns and the actual data.

finder patterns from Fast Adaptive Binarization of QR Code Images for Automatic Sorting in Logistics Systems

alingment The alignment patterns are smaller than finder patterns. They help the scanner determine the orientation of the QR Code. This makes it possible to scan a QR Code at any angle.

Different versions have different numbers of alignment patterns and their locations are also different.

alingment

Timing patterns are a series of alternating black and white blocks that run through the QR code’s rows and columns. These patterns help in determining the scanning speed and facilitate the correct interpretation of the QR code data.

Format information patterns contain information about the error correction and the data mask pattern and make it easier to scan the code.

mask patterns Format information stores error correction level and mask pattern

Version information patterns are only available in versions >= 7 and represents the version number.

Data and error correction keys refers to the section that stores the actual data intended to be read, alongside additional encoded information used to correct errors if parts of the code are damaged or unreadable, allowing the scanner to still decode the data even with partial information missing; essentially, it’s the primary area where both the data itself and its error correction mechanisms are embedded together.

QR codes allow users to choose different levels of error correction (L, M, Q, H) depending on how much damage is expected, with higher levels providing more redundancy but potentially requiring a larger code area.

Levels Capability
L(Low) Recovers 7% of data
M(Medium) Recovers 15% of data
Q(Quartile) Recovers 25% of data
H(High) Recovers 30% of data

Input Mode

QR code input mode is the storage type that determines how much data a QR code can hold. The mode is indicated in the QR code’s version information field. Generally, it can be either numeric, alphanumeric, binary, or kanji. A second kanji mode called Extended Channel Interpretation (ECI) mode can specify the kanji character set UTF-8. However, some newer QR code readers will not be able to read this character set.

Numeric mode

Stores numbers from 0-9. This mode is the most efficient and can store up to 7,089 characters.

Alphanumeric mode

Stores numbers from 0-9, uppercase letters A-Z, and symbols like $, %, *, +, -, ., /, and :. This mode can store up to 4,296 characters.

Byte mode

Stores characters from the ISO-8859-1 character set. This mode can store up to 2,953 characters.

Kanji mode

Stores double-byte characters from the Shift JIS character set. This is the original mode developed by Denso Wave. However, it has since become the least effective, with only 1,817 characters available for storage.

There are two additional modes which are modifications of the other types:

  • Structured Append mode encodes data across multiple QR codes, allowing up to 16 QR codes at once.
  • FNC1 mode allows QR codes to incorporate GS1 barcode functionality.

Upper Limit

There is an upper limit on the data capacity of QR codes. A Version-40 QR code has the highest capacity.

Input Mode Max. characters Bits/char.
Numeric 7089 characters 10/3
Alphanumeric 4296 characters 11/2
Binary 2953 characters 8
Kanji 1817 characters 13

Mode Indicator

Each input mode has a 4-bit mode indicator that identifies it. The encoded data must start with the appropriate mode indicator that specifies the mode being used for the bits that come after it.

Input Mode Indicator
Numeric 0001
Alphanumeric 0010
Byte 0100
Kanji 1000/0111(for ECI)
FNC1 in first position 0101(first)/1001(second position)
Structured append 0011
End of message (Terminator) 0000

Input modes can be mixed as needed within a QR code:

1
[Mode Indicator][bitstream] --> [Mode Indicator][bitstream] --> etc... --> [0000(Terminator)]

Character Count Indicator

Character count indicator represents the number of characters that are encoded. It must be placed after the mode indicator.

Character count indicator has a specific bit length, depending on the QR code version and encoding mode:

Input Mode Versions 1~9 Versions 10~26 Versions 27~40
Numeric 10 bits 12 bits 14 bits
Alphanumeric 9 bits 11 bits 13 bits
Byte 8 bits 16 bits 16 bits
Kanji 8 bits 10 bits 12 bits

Encodeing

Numeric encoding

To encode a numeric string in numeric mode, first split the string into groups of three. If the length of the string is not a multiple of 3, the last set of numbers will only be one or two digits. Add zeroes in front to form a group of three digits.

20250126 –> 202 501 026

Convert each group of numbers to binary:

202 –> 0011 0010 10

501 –> 0111 1101 01

026 –> 0011 010

Three-digit number is converted into 10 binary bits. The group starting with one zero should be converted into 7 binary bits, and if there are two zeroes at the beginning of a group, it should be converted into 4 binary bits.

Together with the mode and character count indicators, the final result is:

0001 0000001000 0011001010 0111110101 0011010

Alphanumeric encoding

Unlike the numeric encoding, the character encoding groups characters into pairs.

HE, LL, O , WO, RL, D

Each alphanumeric character is represented by a number according to the Alphanumeric Table:

0 0 A 10 K 20 U 30 + 40
1 1 B 11 L 21 V 31 - 41
2 2 C 12 M 22 W 32 . 42
3 3 D 13 N 23 X 33 / 43
4 4 E 14 O 24 Y 34 : 44
5 5 F 15 P 25 Z 35  
6 6 G 16 Q 26 (space) 36  
7 7 H 17 R 27 $ 37  
8 8 I 18 S 28 % 38  
9 9 J 19 T 29 * 39  

For each pair of characters, get the number representation of the first character and multiply it by 45. Then add it to the number representation of the second character. Convert the result number into an 11-bit binary string. The representation of the final character will be converted into a 6-bit binary string.

H –> 17

E –> 14

\[17 \times 45 + 14 = 779\]

779 –> 01100001011

Together with the mode and character count indicators, the final result is:

0010 000001011 01100001011 01111000110 10001011100 10110111000 10011010100 001101

Byte encoding

The default character set for byte mode is ISO 8859-1, firstly, the text should be converted to this character set. If the text has characters that cannot be encoded in ISO 8859-1, you can use UTF-8 encoding, as some QR code readers are able to detect and display UTF-8 encoding correctly in byte mode.

After converting the text string to ISO 8859-1 or UTF-8, the string needs to be split into 8-bit bytes. Then, convert the byte into an 8-bit binary string.

H –> 0x48 –> 01001000

e –> 0x65 –> 01100101

l –> 0x6c –> 01101100

l –> 0x6c –> 01101100

o –> 0x6f –> 01101111

, –> 0x2c –> 00101100

–> 0x20 –> 00100000

w –> 0x77 –> 01110111

o –> 0x6f –> 01101111

r –> 0x72 –> 01110010

l –> 0x6c –> 01101100

d –> 0x64 –> 01100100

! –> 0x21 –> 00100001

Together with the mode and character count indicators, the final result is:

0100 00001101 01001000 01100101 01101100 01101100 01101111 00101100 00100000 01110111 01101111 01110010 01101100 01100100 00100001

Add Pad Bytes

After obtaining a string of bits that consists of the mode indicator, the character count indicator, and the data bits, it may be necessary to add 0s and pad bytes to fill the total capacity of the QR code. After adding the terminator(0000), if the number of bits in the string is not a multiple of 8, pad the string with 0s to make the length a multiple of 8.

If the string is still too short, add the following pad bytes repeatedly until the string reaches the maximum lenght:

11101100 00010001

The encoding result of HELLO WORLD consists of the following parts:

0010 // mode indicator
000001011 // character count indicator
01100001011 01111000110 10001011100 10110111000 10011010100 001101 // data bits
0000 // terminator
00 // pad 0s
11101100 00010001 11101100 // pad bytes

Error Correction

The error correction code contains two groups, and each group contains at most two blocks. The data encoding result must be divided into up to two groups, and each group also needs to be further divided into blocks.

group

The error correction code can be calculated based on the data encoding result, using the Reed-Solomon error correction algorithm. This algorithm is used widely, in QR codes, to resist scanning errors, in disks, to resist loss. In advanced storage systems such as Google’s GFS and BigTable, it resists data loss and reduces read latency. The finite field $\mathbb{F}_{256}$ or $GF(2^8)$ can be implemented efficiently on computers, which means that we can implement the system based on mathematical theorems without worrying about overflowing that are usually encountered when modeling.

For a given version, the total length of the data encoding and the total length of the error correction code is fixed, for example, Version-5 has 134 bytes.

Regardless of data code or error correction code, take out the first byte of each block and arrange it in order, then take the second byte of each block, and so on.

If the result is not enough to fill the entire data area and the error correction code area, you need to add 0s at the end.

Data Masking

Uneven distribution of black and white will result in large areas of white or black, making scanning and recognition difficult.

To solve this problem, the QR code provides mask patterns. Before QR code is finally generated, the data bits must be XOR-ed with mask pattern. There are 8 mask pattern that available in QR code, namely mask pattern 0 to 7 (or 000 to 111 in binary). This process have a purpose of making QR code more readable by QR scanner.

mask patterns Eight mask patterns

After a mask pattern has been applied to the QR matrix, it is given a penalty score based on four evaluation conditions. A QR code encoder must apply all eight mask patterns and evaluate each one. Whichever mask pattern results in the lowest penalty score is the mask pattern that must be used for the final output.

ZXing-cpp

ZXing-C++ (“zebra crossing”) is an open-source, multi-format linear/matrix barcode image processing library implemented in C++.

It was originally ported from the Java ZXing Library but has been developed further and now includes many improvements in terms of runtime and detection performance. It can both read and write barcodes in a number of formats.

Github Source: https://github.com/zxing-cpp/zxing-cpp

A very simple example looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include "ZXing/ReadBarcode.h"
#include <iostream>

int main(int argc, char** argv){
    int width, height;
    unsigned char* data;
    // load your image data from somewhere. ImageFormat::Lum assumes grey scale image data.
    auto image = ZXing::ImageView(data, width, height, ZXing::ImageFormat::Lum);
    auto options = ZXing::ReaderOptions().setFormats(ZXing::BarcodeFormat::Any);
    auto barcodes = ZXing::ReadBarcodes(image, options);

    for(const auto& b : barcodes){
        std::cout << ZXing::ToString(b.format()) << ": " << b.text() << "\n";
    }
    return 0;
}

It also has wrappers/bindings for Android, C, iOS, Kotlin/Native, .NET, Python, Rust, WebAssembly, WinRT, Flutter (external project).

Web Demos:

This post is licensed under CC BY 4.0 by the author.