Caesar Cipher Cryptanalysis & Frequency Analysis

Classified in Mathematics

Written on in English with a size of 6.56 KB

Caesar Cipher: Formal Representation

Plain alphabet: P = {sequence of plaintext letters}. Key: k ∈ {i | 0 ≤ i ≤ 25}. If k = 25, the shift maps a → z, b → a, and so on. Encryption: E(p) = (p + k) mod 26. Decryption: D(c) = (26 + c − k) mod 26.

Attacking the Caesar Cipher

Common methods to solve or attack a Caesar (shift) cipher include:

  1. Brute force: Try all possible keys (0–25) and inspect the results.
  2. Statistical (frequency) analysis: Use letter frequency distributions of the language to infer likely mappings.

Frequency Analysis: Basic Idea

Certain letters appear more frequently than others in a given language. By comparing ciphertext letter frequencies to natural language frequencies, you can match ciphertext characters to likely plaintext letters.

Steps for Frequency Analysis
  1. Count occurrences: Count how many times each letter appears in the ciphertext.
  2. Compute frequencies: For each letter, divide its count by the total number of letters to get relative frequency.
  3. Correlation / match: Compare ciphertext frequencies to English frequencies and propose mappings.
  4. Test keys: Calculate the shift from candidate mappings and apply the shift; verify if the result is readable English.

English letter frequency (index: letter / value): (a/0: 0.080, b/1: 0.015, c/2: 0.030, d/3: 0.040, e/4: 0.130, f/5: 0.020, g/6: 0.015, h/7: 0.060, i/8: 0.065, j/9: 0.005, k/10: 0.005, l/11: 0.035, m/12: 0.030, n/13: 0.070, o/14: 0.080, p/15: 0.020, q/16: 0.002, r/17: 0.065, s/18: 0.060, t/19: 0.090, u/20: 0.030, v/21: 0.010, w/22: 0.015, x/23: 0.005, y/24: 0.020, z/25: 0.002)

Example: If ciphertext letter O has the highest frequency, it might correspond to plaintext E, the most frequent English letter. If ciphertext R is the next most frequent (e.g., 0.2), it may correspond to plaintext T.

Calculate the Shift

If O in the ciphertext corresponds to E in plaintext: O is the 15th letter of the alphabet and E is the 5th. Shift = ciphertext position − plaintext position = 15 − 5 = 10, so the key might be 10. Test this key by shifting every ciphertext letter backward by 10 positions to attempt decryption.

When the Key Isn't Obvious

If the key is not immediately obvious, use correlation scoring for each possible key. (Keep the image placeholder below intact.)

0WA3wHUrjZpz3XruAAAAABJRU5ErkJggg==

For each candidate key, compute a score by summing products of ciphertext frequencies and appropriately shifted English frequencies. Concretely, for each ciphertext character compute f(c) × f(e − i), where f(c) is the ciphertext character frequency, f(e − i) is the English frequency shifted by the candidate key, and i is the ciphertext index minus the key. Sum all these products to get a correlation score for the candidate key (sometimes referred to here as the odd circle or scoring index). The key index corresponds to the highest score.

Example key-frequency scores: 0: 0.0482, 1: 0.0364, 2: 0.0410, 3: 0.0575, 4: 0.0252, 5: 0.0190, 6: 0.0660, 7: 0.0442, 8: 0.0202, 9: 0.0267, 10: 0.0635, 11: 0.0262, 12: 0.0325, 13: 0.0520, 14: 0.0535, 15: 0.0226, 16: 0.0322, 17: 0.0392, 18: 0.0299, 19: 0.0315, 20: 0.0302, 21: 0.0517, 22: 0.0380, 23: 0.0370, 24: 0.0316, 25: 0.0430. If results are incomprehensible or the top key does not produce readable plaintext, try other key values and re-evaluate the scores.

Index of Coincidence (IC) and Vigenère

The Index of Coincidence (IC) is used to estimate the period or key length of a Vigenère cipher. Let n be the total length of the ciphertext and n_i the count of the i-th letter (for i = 0..25). A common formula for IC is:

IC = (1 / (n (n − 1))) × Σ_{i=0}^{25} n_i (n_i − 1)

Equivalently, compute for each letter: (count × (count − 1)), sum those values, and divide by total letters × (total letters − 1).

Typical IC values (examples): 1 = 0.066, 2 = 0.052, 3 = 0.047, 5 = 0.044, 10 = 0.041, large = 0.038. Use the IC value to estimate the likely period; then split the ciphertext into that many groups and perform frequency analysis on each group (treat each group like a Caesar cipher) to identify key letters and decrypt using the full Vigenère key.

Related entries: