Learn Ethical Hacking (#9) - Cryptography for Hackers - What Protects Data (and What Doesn't)
What will I learn
- Symmetric encryption (AES) vs asymmetric encryption (RSA) -- when to use which;
- How TLS really works -- the handshake, certificates, and what breaks when it's misconfigured;
- Hashing beyond passwords: integrity verification, digital signatures, HMACs;
- Common crypto failures: ECB mode, padding oracles, key reuse, weak random numbers;
- Building a simple encrypted messenger in Python to understand the concepts hands-on;
- Why "roll your own crypto" is the most dangerous phrase in security.
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- Your hacking lab from Episode 2;
- Python 3 with the
cryptographylibrary (pip install cryptography); - The ambition to learn ethical hacking and security research.
Difficulty
- Beginner
Curriculum (of the Learn Ethical Hacking series):
- Learn Ethical Hacking (#1) - Why Hackers Win
- Learn Ethical Hacking (#2) - Your Hacking Lab
- Learn Ethical Hacking (#3) - How the Internet Actually Works - For Attackers
- Learn Ethical Hacking (#4) - Reconnaissance - The Art of Not Being Noticed
- Learn Ethical Hacking (#5) - Active Scanning - Mapping the Attack Surface
- Learn Ethical Hacking (#6) - The AI Slop Epidemic - Why AI-Generated Code Is a Security Disaster
- Learn Ethical Hacking (#7) - Passwords - Why Humans Are the Weakest Cipher
- Learn Ethical Hacking (#8) - Social Engineering - Hacking the Human
- Learn Ethical Hacking (#9) - Cryptography for Hackers - What Protects Data (and What Doesn't) (this post)
Solutions to Episode 8 Exercises
Exercise 1 -- Phishing page clone:
The key factors for believability (in order of importance):
1. Exact CSS/styling match (fonts, colors, spacing, logo)
2. Correct page title and favicon
3. Matching URL structure in the address bar (hardest to fake)
4. Realistic form field names and placeholder text
5. SSL certificate warning absence (requires attacker domain with cert)
A basic clone takes 15-30 minutes with browser "Save Page As" + modification.
A convincing clone with matching domain takes 1-2 hours including DNS setup.
The key insight: the technology for cloning a login page is trivial -- wget --mirror captures everything. The hard part is the delivery: making the victim visit YOUR page instead of the real one. That's where DNS spoofing, typosquatting (g00gle.com), and phishing emails come in.
Exercise 2 -- Phishing email analyzer:
import re
URGENCY_WORDS = ['immediately', 'urgent', 'expires', 'suspended',
'verify', 'confirm', 'within 24', 'action required']
def analyze_email(text):
score = 0
findings = []
# Check urgency keywords
for word in URGENCY_WORDS:
if word.lower() in text.lower():
score += 1
findings.append(f"Urgency keyword: '{word}'")
# Check URL mismatch (simplified: markdown-style links)
links = re.findall(r'\[([^\]]+)\]\((http[^\)]+)\)', text)
for display, actual in links:
if 'http' in display and display != actual:
score += 3
findings.append(f"URL mismatch: displays '{display}' links to '{actual}'")
# Check sender spoofing patterns
if re.search(r'From:.*<[^>]+>', text):
from_match = re.search(r'From:\s*([^<]+)<([^>]+)>', text)
if from_match:
display_name = from_match.group(1).strip().lower()
actual_addr = from_match.group(2).strip().lower()
if not any(w in actual_addr for w in display_name.split()):
score += 2
findings.append(f"Sender mismatch: '{display_name}' vs '{actual_addr}'")
return min(score, 10), findings
The key insight: even basic heuristics catch obvious phishing. Production email security combines these with ML models trained on millions of known phishing emails, reputation databases for sender domains, and sandboxing for URL detonation.
Exercise 3 -- Spear phishing campaign design:
Target selection rationale (fictional AcmeTech):
1. Finance controller (high-value: can authorize wire transfers,
likely has access to banking portals, less technical)
2. HR coordinator (has access to employee PII, SSNs, payroll data,
regularly receives attachments from unknown external parties)
3. Junior developer (has code repo access, VPN credentials,
likely to click "urgent security update" links)
Defense recommendations:
- Wire transfer verification: mandatory phone callback to known number
- HR: sandbox all inbound attachments, restrict PII access with MFA
- Dev: hardware security keys for VPN + repo access
- All: simulated phishing exercises quarterly, no-blame reporting
The key insight: target selection is about access, not rank. A junior HR coordinator with access to the payroll system is a higher-value target than a VP with no system access. Attackers map organizational access, not org charts.
Learn Ethical Hacking (#9) - Cryptography for Hackers
Cryptography is the math that makes internet security possible. Without it, every password, every bank transaction, every private message would be readable by anyone sitting on the wire. We saw this in episode 3 -- Wireshark captures everything that's not encrypted, and we demonstrated it. Episode 5 showed us open ports and services. Episode 7 showed us how passwords are stored (and broken). Episode 8 showed us how humans are manipulated into handing over credentials.
All of those attacks have one thing in common: at some point, data needs to be protected in transit or at rest. Cryptography is how that protection works -- when it's done right. The problem? Most developers never actually study cryptography. They call library functions, use whatever defaults the framework gives them, and assume it's secure because "it's encrypted." Attackers absolutely love this, because the devil lives in the implementation details.
We're not going to become mathematicians today. We're going to understand enough crypto to recognize what's secure, what's broken, and what looks secure but isn't. That distinction is where real vulnerabilities live.
Symmetric Encryption: One Key, Two Directions
Symmetric encryption uses the same key to encrypt and decrypt. Think of it as a combination lock -- the same code locks and unlocks. You need to share that code with anyone who needs access, which immediately creates the key distribution problem (how do you securely share the key without someone intercepting it?).
AES (Advanced Encryption Standard) is the current standard. It's used everywhere -- HTTPS, disk encryption, messaging apps, VPNs, file archives. If data is encrypted in 2026, there's a very high probability AES is doing the work:
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.primitives import padding
import os
def aes_encrypt(plaintext, key):
"""AES-256-CBC encryption."""
iv = os.urandom(16) # random initialization vector (MUST be unique per message)
cipher = Cipher(algorithms.AES(key), modes.CBC(iv))
encryptor = cipher.encryptor()
# Pad plaintext to AES block size (16 bytes)
padder = padding.PKCS7(128).padder()
padded = padder.update(plaintext) + padder.finalize()
ciphertext = encryptor.update(padded) + encryptor.finalize()
return iv + ciphertext # prepend IV so decryptor knows it
def aes_decrypt(data, key):
"""AES-256-CBC decryption."""
iv = data[:16]
ciphertext = data[16:]
cipher = Cipher(algorithms.AES(key), modes.CBC(iv))
decryptor = cipher.decryptor()
padded = decryptor.update(ciphertext) + decryptor.finalize()
unpadder = padding.PKCS7(128).unpadder()
return unpadder.update(padded) + unpadder.finalize()
# Usage
key = os.urandom(32) # 256-bit key
message = b"Attack at dawn"
encrypted = aes_encrypt(message, key)
decrypted = aes_decrypt(encrypted, key)
print(f"Plaintext: {message}")
print(f"Encrypted: {encrypted.hex()[:60]}...")
print(f"Decrypted: {decrypted}")
The encrypted output looks like random noise. Without the key, recovering the plaintext is computationally infeasible -- as in "the sun will burn out first."
But there's a critical detail I buried in that code: the IV (Initialization Vector). The IV must be random and unique for every single message. If you reuse an IV with the same key, patterns in the plaintext leak through the ciphertext. This is one of the most common crypto implementation mistakes, and we'll see exactly why in a moment.
The ECB Penguin: When Crypto Fails Visually
The most famous demonstration of bad encryption is the ECB penguin. ECB (Electronic Codebook) mode encrypts each 16-byte block independently. If two blocks of plaintext are identical, the ciphertext blocks are identical too.
Encrypt an image with ECB mode and the structure of the image is perfectly visible in the ciphertext. The famous "Tux penguin" example shows the Linux mascot clearly recognizable even after encryption -- because large areas of the same color produce identical ciphertext blocks. You can literally SEE the penguin in the encrypted data. The encryption is technically happening but providing zero practical confidentiality. Dat is geen encryptie, dat is een illusie.
# DON'T DO THIS -- ECB mode leaks patterns
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
key = os.urandom(32)
# ECB: each block encrypted independently (INSECURE for structured data)
cipher_ecb = Cipher(algorithms.AES(key), modes.ECB())
# CBC: each block XORed with previous ciphertext block (SECURE)
iv = os.urandom(16)
cipher_cbc = Cipher(algorithms.AES(key), modes.CBC(iv))
Never use ECB mode. Use CBC with a random IV, or better yet, use GCM (Galois/Counter Mode) which provides both encryption AND authentication (tamper detection). GCM is what you actually want in almost every real-world scenario:
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
key = AESGCM.generate_key(bit_length=256)
aesgcm = AESGCM(key)
nonce = os.urandom(12) # 96-bit nonce for GCM
# GCM encrypts AND authenticates -- tampered ciphertext is rejected
ciphertext = aesgcm.encrypt(nonce, b"Secret message", b"additional data")
plaintext = aesgcm.decrypt(nonce, ciphertext, b"additional data")
The "additional data" parameter is for associated data -- information that should be authenticated (verified untampered) but NOT encrypted. Think of it as the envelope of a letter: you want to make sure nobody changed the address, but the address doesn't need to be secret. In TLS, the packet headers are associated data while the payload is both encrypted and authenticated.
Asymmetric Encryption: Two Keys, One Direction Each
Asymmetric encryption uses a key pair: a public key (anyone can have it) and a private key (kept secret). Data encrypted with the public key can only be decrypted with the private key, and vice versa. This elegantly solves the key distribution problem -- you can publish your public key to the entire world, and only you can decrypt messages encrypted with it.
RSA is the most well-known asymmetric algorithm. It's based on the mathematical difficulty of factoring large prime numbers -- specifically the product of two very large primes:
from cryptography.hazmat.primitives.asymmetric import rsa, padding as asym_padding
from cryptography.hazmat.primitives import hashes
# Generate key pair
private_key = rsa.generate_private_key(public_exponent=65537, key_size=2048)
public_key = private_key.public_key()
# Encrypt with public key (anyone can do this)
message = b"Top secret intelligence"
ciphertext = public_key.encrypt(
message,
asym_padding.OAEP(
mgf=asym_padding.MGF1(algorithm=hashes.SHA256()),
algorithm=hashes.SHA256(),
label=None
)
)
# Decrypt with private key (only key holder can do this)
plaintext = private_key.decrypt(
ciphertext,
asym_padding.OAEP(
mgf=asym_padding.MGF1(algorithm=hashes.SHA256()),
algorithm=hashes.SHA256(),
label=None
)
)
print(f"Decrypted: {plaintext}")
Notice we're using OAEP padding -- not the older PKCS#1 v1.5 padding. This matters. PKCS#1 v1.5 is vulnerable to Bleichenbacher's attack (1998), where an attacker can decrypt RSA ciphertext by sending millions of carefully crafted messages to a server and observing which ones produce padding errors. OAEP was designed to prevent this. If you see RSA with PKCS#1 v1.5 padding in production code, that's a finding.
Asymmetric encryption is MUCH slower than symmetric -- roughly 1000x or more. So in practice, we use hybrid encryption: asymmetric crypto to exchange a symmetric key, then symmetric crypto for the actual data. This is exactly how TLS works, and it's exactly what we'll build at the end of this episode.
How TLS Actually Works
When your browser connects to https://hive.blog, here's what happens in the simplified TLS 1.3 handshake:
Client Server
| |
|--- ClientHello (supported ciphers) -->
| |
|<-- ServerHello (chosen cipher) ----|
|<-- Certificate (server's public key) -
| |
| [Client verifies certificate |
| against trusted CA list] |
| |
|--- Key Exchange (Diffie-Hellman)-->|
| |
| [Both sides now have a shared |
| symmetric session key] |
| |
|<== Encrypted data (AES-GCM) =====>|
The certificate is the critical piece. It's signed by a Certificate Authority (CA) that your browser trusts. Your browser ships with a list of ~150 trusted CAs (the "root store"). The server proves its identity by presenting a certificate chain that leads back to one of those trusted roots. If the certificate is invalid, self-signed, or expired, your browser shows a warning.
If you click through that warning, you're trusting that the server IS who it says it is -- which an attacker performing a man-in-the-middle (MITM) attack would very much like you to believe. In a corporate environment, companies often install their own CA certificate in the root store, which means they can generate valid-looking certificates for ANY domain and intercept all TLS traffic. Your employer can read your "encrypted" browsing. Corporate TLS inspection proxies do exactly this, and they're perfectly legal on company hardware.
Having said that, modern TLS 1.3 uses ephemeral Diffie-Hellman key exchange instead of RSA key transport. The practical difference? Even if someone steals the server's private key LATER, they can't decrypt traffic they captured EARLIER. This property is called forward secrecy (sometimes "perfect forward secrecy") and it's one of the most important improvements in TLS 1.3 over older versions. Every session generates a new key pair that's thrown away after use -- there's nothing to steal retroactively.
Hashing: Integrity, Not Secrecy
We covered password hashing extensively in episode 7 (bcrypt, argon2, the whole slow-hashing story). But hashing has other critical security roles beyond passwords.
HMAC (Hash-based Message Authentication Code) lets you verify both the integrity AND the authenticity of a message. Unlike a plain hash (which anyone can compute), an HMAC requires a shared secret key:
import hmac
import hashlib
secret_key = b"shared_secret_between_parties"
message = b"Transfer 100 HIVE to scipio"
# Create HMAC
mac = hmac.new(secret_key, message, hashlib.sha256).hexdigest()
print(f"HMAC: {mac}")
# Verify HMAC (receiver side)
received_mac = mac # would come with the message
expected_mac = hmac.new(secret_key, message, hashlib.sha256).hexdigest()
if hmac.compare_digest(received_mac, expected_mac):
print("Message authentic and untampered")
else:
print("WARNING: message tampered or wrong key!")
Notice hmac.compare_digest() instead of ==. This is a constant-time comparison -- it takes the same amount of time regardless of how many bytes match. A regular == comparison short-circuits on the first mismatched byte, which means an attacker can measure the comparison time and determine how many bytes of their forged HMAC are correct. This is called a timing attack, and yes, people have exploited it in the real world. The difference is microseconds, but with enough measurements and statistical analysis, it's enough.
API webhooks use HMACs constantly. When Github sends a webhook to your server, it includes an X-Hub-Signature-256 header containing an HMAC of the payload. Your server recomputes the HMAC with the shared secret and compares. If they match, you know: (a) the message came from someone who knows the secret (authenticity), and (b) the message wasn't modified in transit (integrity). If anyone on the wire changed a single byte of the payload, the HMAC won't match.
Digital Signatures and Integrity
Cryptography isn't just about secrecy. Digital signatures prove that a message hasn't been tampered with and was created by a specific key holder:
from cryptography.hazmat.primitives.asymmetric import padding as asym_padding
from cryptography.hazmat.primitives import hashes
# Sign with private key
signature = private_key.sign(
b"I authorize this transaction",
asym_padding.PSS(
mgf=asym_padding.MGF1(hashes.SHA256()),
salt_length=asym_padding.PSS.MAX_LENGTH
),
hashes.SHA256()
)
# Anyone with public key can verify
public_key.verify(
signature,
b"I authorize this transaction",
asym_padding.PSS(
mgf=asym_padding.MGF1(hashes.SHA256()),
salt_length=asym_padding.PSS.MAX_LENGTH
),
hashes.SHA256()
)
# No exception = signature valid. Tampered message = InvalidSignature exception.
This is how HTTPS certificates work (the CA signs the certificate), how software updates are verified (the developer signs the package), how git commits can be GPG-signed, and how blockchain transactions are authorized (the account owner signs with their private key). If you've been following along with this series, you'll recognise this pattern from how Hive broadcasting works -- every transaction is signed with a posting or active key before it's accepted by the network.
The fundamental difference between HMAC and digital signatures: HMAC uses a shared secret (both parties need the same key), while digital signatures use asymmetric keys (only the signer needs the private key, anyone with the public key can verify). This means digital signatures provide non-repudiation -- the signer can't deny they signed it, because only they had the private key. HMAC can't provide this because both parties know the secret.
Common Crypto Failures
Attackers don't break AES or RSA mathematically. They exploit implementation mistakes. These are the vulnerabilites you'll encounter most often in real penetration tests:
1. Key Reuse with Stream Ciphers/CTR Mode
# If you encrypt two messages with the same key AND nonce in CTR mode:
# C1 = P1 XOR keystream
# C2 = P2 XOR keystream
# C1 XOR C2 = P1 XOR P2 (keystream cancels out!)
# Attacker can now recover both plaintexts using frequency analysis
This is called a "two-time pad" and it completely breaks the encryption. The one-time pad is theoretically perfect -- but only if the pad is truly used ONCE. Microsoft's PPTP VPN protocol made this exact mistake: it reused the RC4 keystream for both directions of the connection, allowing full traffic decryption. That was the end of PPTP as a serious VPN protocol.
2. Padding Oracle Attacks
When a server decrypts CBC-mode data and returns different errors for "bad padding" vs "bad data", the attacker can decrypt the ENTIRE message one byte at a time by observing which error is returned. No key needed. The attacker sends modified ciphertext blocks and watches the server's response. "Invalid padding" means one thing, "invalid content" means another, and the difference leaks exactly one byte of plaintext per query.
This attack (CVE-2014-3566, also known as POODLE) broke SSL 3.0 and forced the industry to deprecate it entirely. It's also why AES-GCM is preferred over AES-CBC -- GCM authenticates the ciphertext first, so tampered blocks are rejected before any decryption happens. No oracle, no attack.
3. Weak Random Number Generation
# INSECURE -- predictable "random" numbers
import random
key = random.randbytes(32) # Uses Mersenne Twister -- predictable after 624 outputs!
# SECURE -- cryptographically secure randomness
import os
key = os.urandom(32) # Uses /dev/urandom -- unpredictable
# Also secure (Python 3.6+)
import secrets
key = secrets.token_bytes(32) # Explicitly for cryptographic use
If the attacker can predict your "random" numbers, they can predict your keys. The Mersenne Twister PRNG (Python's random module) is NOT cryptographically secure -- its internal state can be fully reconstructed from 624 consecutive outputs. Once the state is known, every future "random" value is predictable. The secrets module was added to Python specifically to make this mistake harder to make -- if you're generating tokens, keys, or nonces, use secrets, not random.
4. Rolling Your Own Crypto
The most dangerous crypto failure: inventing your own encryption algorithm or protocol. "I XOR the data with a secret key" is NOT encryption. "I hash the password and use the hash as a key" is NOT key derivation. "I encrypt each block independently" is ECB mode and we already saw what happens. Professional cryptographers spend years designing algorithms and they STILL find flaws during peer review. You are not going to do better than AES by being clever on a Thursday afternoon.
Gebruik bestaande bibliotheken. Altijd.
I cannot stress this enough. If you're implementing crypto in an application, use a high-level library like cryptography's Fernet (symmetric) or NaCl/libsodium (asymmetric). These make the right choices by default -- secure modes, proper nonces, authenticated encryption. The "hazmat" (hazardous materials) primitives we've been using in this episode exist for educational purposes and for building higher-level protocols. In production code, use the highest-level abstraction that fits your use case and let the cryptographers worry about the implementation details.
Building a Hybrid Encrypted Messenger
Time to put it all together. This is hybrid encryption in practice -- RSA for key exchange, AES-GCM for the actual message encryption. This is a simplified version of how PGP, Signal, and TLS all work under the hood:
#!/usr/bin/env python3
"""
Hybrid encryption messenger -- combines RSA + AES-GCM.
Demonstrates the pattern used by TLS, PGP, and Signal.
"""
from cryptography.hazmat.primitives.asymmetric import rsa, padding as asym_padding
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
import os
import json
import base64
def generate_keypair():
"""Generate RSA-2048 key pair."""
private = rsa.generate_private_key(public_exponent=65537, key_size=2048)
public = private.public_key()
return private, public
def serialize_public_key(public_key):
"""Export public key to PEM format (shareable)."""
return public_key.public_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PublicFormat.SubjectPublicKeyInfo
)
def encrypt_message(plaintext, recipient_public_key):
"""Encrypt a message using hybrid RSA + AES-GCM."""
# Step 1: generate a random AES session key
session_key = AESGCM.generate_key(bit_length=256)
# Step 2: encrypt the session key with RSA (only recipient can decrypt)
encrypted_key = recipient_public_key.encrypt(
session_key,
asym_padding.OAEP(
mgf=asym_padding.MGF1(algorithm=hashes.SHA256()),
algorithm=hashes.SHA256(),
label=None
)
)
# Step 3: encrypt the actual message with AES-GCM
aesgcm = AESGCM(session_key)
nonce = os.urandom(12)
ciphertext = aesgcm.encrypt(nonce, plaintext.encode(), None)
# Package everything together
return json.dumps({
'encrypted_key': base64.b64encode(encrypted_key).decode(),
'nonce': base64.b64encode(nonce).decode(),
'ciphertext': base64.b64encode(ciphertext).decode(),
})
def decrypt_message(encrypted_package, recipient_private_key):
"""Decrypt a hybrid-encrypted message."""
pkg = json.loads(encrypted_package)
# Step 1: decrypt the session key with RSA
session_key = recipient_private_key.decrypt(
base64.b64decode(pkg['encrypted_key']),
asym_padding.OAEP(
mgf=asym_padding.MGF1(algorithm=hashes.SHA256()),
algorithm=hashes.SHA256(),
label=None
)
)
# Step 2: decrypt the message with AES-GCM
aesgcm = AESGCM(session_key)
plaintext = aesgcm.decrypt(
base64.b64decode(pkg['nonce']),
base64.b64decode(pkg['ciphertext']),
None
)
return plaintext.decode()
# Demo
alice_priv, alice_pub = generate_keypair()
bob_priv, bob_pub = generate_keypair()
# Alice sends encrypted message to Bob
encrypted = encrypt_message("Meet me at the usual spot. Bring the USB drive.", bob_pub)
print(f"Encrypted package length: {len(encrypted)} bytes")
# Bob decrypts
decrypted = decrypt_message(encrypted, bob_priv)
print(f"Decrypted: {decrypted}")
# Eve intercepts -- she has the encrypted package but not Bob's private key
try:
decrypt_message(encrypted, alice_priv) # Alice's key won't work
except Exception as e:
print(f"Eve's decryption failed: {type(e).__name__}")
Run this and you'll see: Alice's message is encrypted with a random AES session key, that session key is encrypted with Bob's RSA public key, and only Bob's private key can unlock it. Eve (the attacker) can intercept the entire encrypted package -- she can see every byte -- but without Bob's private key, it's computationally useless. The AES session key is different for every message, so even if one message's key is somehow compromised, all other messages remain secure.
This is the same principle behind every secure communication protocol in widespread use today. The details differ (Signal uses the Double Ratchet algorithm instead of RSA, TLS 1.3 uses ephemeral Diffie-Hellman), but the fundamental pattern -- asymmetric crypto for key exchange, symmetric crypto for bulk data -- is universal.
Certificate Pinning and Trust
One thing worth understanding about the TLS certificate system: it's built on trust, and trust can be misplaced. Your browser trusts ~150 Certificate Authorities. If ANY of those CAs is compromised, an attacker can generate valid certificates for ANY domain.
This has actually happened. In 2011, the DigiNotar CA was compromised and the attacker issued fraudulent certificates for google.com, used for MITM attacks against Iranian Gmail users. In 2015, CNNIC (the Chinese Internet Network Information Center) issued unauthorized certificates for Google domains through a subordinate CA. Both CAs were subsequently removed from browser trust stores, but the damage was done.
Certificate pinning is the defense: your application remembers the specific certificate (or public key) it expects for a particular domain and rejects any other certificate, even if it's technically valid. Mobile banking apps do this extensively -- they pin their server's certificate so a compromised CA can't help an attacker impersonate the bank.
# Conceptual certificate pinning check
import hashlib
import ssl
import socket
EXPECTED_PIN = "sha256/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="
def verify_pin(hostname, port=443):
"""Check if server certificate matches our pinned hash."""
ctx = ssl.create_default_context()
with ctx.wrap_socket(socket.socket(), server_hostname=hostname) as s:
s.connect((hostname, port))
cert_der = s.getpeercert(binary_form=True)
cert_hash = hashlib.sha256(cert_der).digest()
actual_pin = f"sha256/{base64.b64encode(cert_hash).decode()}"
if actual_pin != EXPECTED_PIN:
raise ssl.SSLError(f"Certificate pin mismatch for {hostname}!")
print(f"Pin verified for {hostname}")
This is a simplfied version of what HPKP (HTTP Public Key Pinning) did before it was deprecated, and what curl --pinnedpubkey still does. Certificate pinning is powerful but dangerous -- if you pin the wrong certificate or lose your key, your application stops working entirely and there's no fallback. Google Chrome removed HPKP support in 2018 partly because it was being used for ransom attacks: hackers would compromise a site, set HPKP headers pointing to their own keys, and demand payment to release the pin ;-)
Exercises
Exercise 1: Using the AES-CBC code from this episode, write a program that demonstrates the IV reuse vulnerability. Encrypt two different messages with the SAME key and SAME IV. XOR the two ciphertexts together (just the first 16 bytes, one block). Then encrypt the same messages with different random IVs and try the same XOR. Write 3-4 sentences explaining what you observe and why IV reuse breaks security. Hint: when IVs are the same, XORing the ciphertexts gives you the XOR of the plaintexts -- and from that, an attacker can use letter frequency analysis to recover both messages.
Exercise 2: Build the full hybrid encrypted message exchange from this episode as two separate scripts: sender.py and receiver.py. receiver.py generates an RSA key pair, saves the public key to receiver_pub.pem. sender.py loads the public key, takes a message from sys.argv[1], encrypts it using hybrid RSA + AES-GCM, and saves the encrypted package to message.enc. receiver.py then loads message.enc and decrypts it. Test with 5 different messages. Then try tampering with the encrypted file (change one byte) and confirm that AES-GCM rejects the tampered ciphertext. This is a simplified version of how PGP email encryption works.
Exercise 3: Use openssl s_client to inspect the TLS certificate of three different websites: openssl s_client -connect hive.blog:443 -brief, then try example.com:443 and expired.badssl.com:443 (a test site with an expired certificate). For each, document: the certificate issuer, the cipher suite negotiated, the TLS version, and whether the certificate chain is valid. What happens when you connect to the expired certificate site? Write a short explanation of why expired certificates are a security risk even if the encryption itself is still strong.