How to shannon entropy for file analysis: a technical deep dive
- Step 1Compute the byte frequency table — Slice the file into 256-byte chunks. For each chunk, count the occurrences of all 256 possible byte values. Divide by 256 to get probabilities.
- Step 2Apply the Shannon formula — H = -Σ p(b) × log₂(p(b)) for each byte value b where p(b) > 0. The result is a float between 0 and 8.
- Step 3Plot the profile — Plot H values on the y-axis against chunk index (× 256 = file offset) on the x-axis. The resulting chart shows exactly how randomness varies across the file.
Frequently asked questions
Why is 8 bits/byte the maximum?+
log₂(256) = 8. A file where every byte value appears exactly once per 256-byte window achieves maximum entropy. Truly random data approaches but rarely reaches this theoretical maximum.
What is the entropy of AES-encrypted data?+
AES-GCM output is computationally indistinguishable from random, so entropy approaches 7.99+ bits/byte across the ciphertext. Only the 28-byte header (salt + IV) shows lower entropy from the format markers.
How does DEFLATE compression affect entropy?+
DEFLATE output has entropy of 7.5–7.9 bits/byte — high but not maximal, because Huffman coding introduces minor regularities. This is why gzip/zlib compressed regions appear in the high-entropy zone alongside encrypted data.
Privacy first
Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.