Learn AI Series (#5) - Patterns in Data - What "Learning" Actually Looks Like
What will I learn
- You will learn how to spot relationships between variables by looking at data;
- what correlation means and what it doesn't (spoiler: not causation);
- the "line of best fit" -- what your eye does naturally;
- the difference between signal and noise in real data;
- what happens when one feature isn't enough;
- the conceptual leap from human pattern recognition to machine pattern recognition.
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- An installed Python 3(.11+) distribution;
- The ambition to learn AI and machine learning.
Difficulty
- Beginner
Curriculum (of the Learn AI Series):
- Learn AI Series (#1) - What Machine Learning Actually Is
- Learn AI Series (#2) - Setting Up Your AI Workbench - Python and NumPy
- Learn AI Series (#3) - Your Data Is Just Numbers - How Machines See the World
- Learn AI Series (#4) - Your First Prediction - No Math, Just Intuition
- Learn AI Series (#5) - Patterns in Data - What "Learning" Actually Looks Like (this post)
Learn AI Series (#5) - Patterns in Data - What "Learning" Actually Looks Like
When someone says a machine learning model "learned" something, what actually happened? It didn't read a textbook. It didn't have an "aha!" moment in the shower. It found a pattern in numbers -- a relationship between inputs and outputs that holds well enough to make useful predictions.
Today we're looking at what those patterns look like, how to spot them, and why some are easy to find while others are buried in noise. By the end, you'll have the intuition for why we need the math that's coming in the next episodes -- and why that math isn't scary once you understand what it's actually trying to do ;-)
Two variables, one relationship
The simplest pattern is between two numbers. Let's create some data that has a clear relationship and some that doesn't.
import numpy as np
np.random.seed(42)
# Strong relationship: bigger houses cost more
square_meters = np.array([40, 55, 65, 70, 80, 85, 95, 100, 110, 120, 130, 150])
prices = square_meters * 2500 + np.random.randn(12) * 15000
# price ≈ 2500 * sqm + noise
print("Square meters -> Price:")
for sqm, price in zip(square_meters, prices):
print(f" {sqm:>3d} sqm -> €{price:>8,.0f}")
Even without plotting this, you can see the pattern: as square meters go up, prices go up. Not perfectly - there's noise - but the trend is clear. If I told you about a 90 sqm apartment, you'd guess something around €225,000, and you'd be close.
Now compare with data that has no pattern:
# No relationship: shoe size and income
shoe_sizes = np.array([38, 42, 40, 45, 37, 43, 41, 39, 44, 36, 42, 40])
incomes = np.array([35, 52, 41, 28, 67, 45, 38, 55, 33, 48, 29, 61]) * 1000
print("\nShoe size -> Income:")
for shoe, income in zip(shoe_sizes, incomes):
print(f" Size {shoe} -> €{income:>6,}")
No pattern. Knowing someone's shoe size tells you nothing useful about their income. An ML model trained on this data would perform no better than guessing the average.
Correlation: measuring the strength of a relationship
We can quantify how strongly two variables relate with the correlation coefficient - a number between -1 and +1.
# Correlation: how linearly related are two variables?
corr_sqm_price = np.corrcoef(square_meters, prices)[0, 1]
corr_shoe_income = np.corrcoef(shoe_sizes, incomes)[0, 1]
print(f"Correlation (sqm ↔ price): {corr_sqm_price:+.3f}")
print(f"Correlation (shoe ↔ income): {corr_shoe_income:+.3f}")
The sqm-price correlation will be close to +1.0 (strong positive relationship). The shoe-income correlation will be close to 0.0 (no relationship).
What the values mean:
- +1.0: Perfect positive relationship. X goes up, Y always goes up proportionally.
- 0.0: No linear relationship. Knowing X tells you nothing about Y.
- -1.0: Perfect negative relationship. X goes up, Y always goes down proportionally.
Values between these extremes indicate partial relationships. In practice, a correlation above 0.5 or below -0.5 usually indicates something worth paying attention to.
The critical caveat: correlation is not causation
Ice cream sales and drowning deaths are highly correlated. Does ice cream cause drowning? Obviously not - both increase in summer because of temperature. The correlation is real, but the causal story is wrong.
In ML, we mostly don't care about causation. We care about prediction: does this feature help predict the target? If ice cream sales predict drowning risk (they do!), a model can use it, even if the causal mechanism is indirect. But when interpreting your model's decisions or making policy recommendations, the distinction matters enormously.
The line of best fit: your eye already knows
Look at the apartment data again. If you plotted square meters on one axis and price on the other, your brain would automatically imagine a line through the middle of the points. That line is the line of best fit - the straight line that comes closest to all the data points simultaneously.
Let's find it without any fancy math - just by trying different lines:
# A line is: price = slope * sqm + intercept
# Let's try a few manually
def predict_with_line(sqm, slope, intercept):
return slope * sqm + intercept
def compute_error(actual, predicted):
return np.mean((actual - predicted) ** 2)
# Try different slopes and intercepts
candidates = [
(2000, 20000, "gentle slope, high start"),
(2500, 0, "steeper slope, zero start"),
(3000, -50000, "steep slope, negative start"),
(2500, 10000, "medium slope, small start"),
]
print("Trying different lines:\n")
for slope, intercept, desc in candidates:
preds = predict_with_line(square_meters, slope, intercept)
error = compute_error(prices, preds)
print(f" y = {slope}*x + {intercept:>6d} "
f"({desc:>35s}) MSE: {error:>15,.0f}")
One of these will have a lower error than the others. That's the better line. But it's not the best line - there are infinitely many possible slopes and intercepts. Finding the actual best one is what linear regression does. We'll build it from scratch in episode #10.
For now, the point is this: the "best line" is the one that makes the smallest total error. That's all optimization means in ML - finding the settings (slope, intercept, weights, whatever) that minimize the error.
Noise: why data doesn't follow the pattern perfectly
Real data is messy. Even when there's a strong underlying relationship, individual data points deviate from the pattern. This deviation is noise.
# Same underlying relationship, different noise levels
true_slope = 2500
true_intercept = 5000
sqm = np.linspace(40, 150, 20)
# Low noise
low_noise = true_slope * sqm + true_intercept + np.random.randn(20) * 5000
# High noise
high_noise = true_slope * sqm + true_intercept + np.random.randn(20) * 40000
print("Low noise - prices cluster tightly around the trend:")
print(f" Std of residuals: €{np.std(low_noise - (true_slope * sqm + true_intercept)):,.0f}")
print("High noise - prices scatter widely:")
print(f" Std of residuals: €{np.std(high_noise - (true_slope * sqm + true_intercept)):,.0f}")
Where does noise come from? Everything we didn't measure:
- Location (a studio in central Amsterdam vs suburbs)
- Condition (renovated vs needs work)
- The seller's urgency
- Market timing
- Random fluctuation
The more relevant features we include, the less "noise" remains. Some of what looks like noise is actually signal from missing features. This is why feature engineering - creating and selecting the right features - is so important.
When one feature isn't enough
So far we've looked at one feature predicting one target. But real predictions need multiple features working together.
# Apartment data with multiple features
# [sqm, rooms, has_balcony, floor, age_years]
apartments = np.array([
[65, 2, 0, 3, 15],
[82, 3, 1, 1, 5],
[45, 1, 0, 5, 30],
[120, 4, 1, 2, 2],
[55, 2, 0, 4, 10],
[90, 3, 1, 3, 8],
[70, 2, 1, 1, 20],
[110, 4, 0, 6, 1],
])
prices = np.array([185, 240, 130, 350, 165, 260, 195, 310], dtype=np.float64) * 1000
# Correlation of each feature with price
feature_names = ["sqm", "rooms", "balcony", "floor", "age"]
print("Feature correlations with price:\n")
for i, name in enumerate(feature_names):
corr = np.corrcoef(apartments[:, i], prices)[0, 1]
print(f" {name:>8s}: {corr:+.3f}")
You'll see that square meters and rooms correlate positively with price (bigger = more expensive), age correlates negatively (older = cheaper), and floor and balcony have some effect too.
No single feature perfectly predicts price. But together, they tell a much more complete story. A machine learning model takes all these features as input simultaneously and finds the combination that best predicts the target.
This is the key insight: ML models find patterns across multiple dimensions at once - something humans are terrible at beyond 2-3 variables. When your data has 50 features, you can't scatter-plot your way to understanding. You need a model.
Nonlinear patterns: not everything is a straight line
So far we've focused on linear relationships (straight lines). But many real-world patterns are curved.
# A nonlinear relationship: diminishing returns
experience_years = np.arange(0, 30)
salary = 30000 + 5000 * np.sqrt(experience_years) + np.random.randn(30) * 2000
print("Experience -> Salary (notice diminishing returns):")
for yr in [0, 5, 10, 15, 20, 25, 29]:
print(f" {yr:2d} years -> €{salary[yr]:>7,.0f}")
# Linear correlation misses the full picture
corr = np.corrcoef(experience_years, salary)[0, 1]
print(f"\nLinear correlation: {corr:+.3f}")
# Still positive, but a straight line won't fit perfectly
The relationship between experience and salary isn't linear - each additional year of experience adds less than the previous one (diminishing returns). A straight line would miss this curvature.
This is why we'll eventually learn polynomial regression, decision trees, and neural networks - they can capture nonlinear patterns that straight lines can't. But the thinking process is the same: find the pattern, fit a model, measure the error.
The leap: from human to machine
Here's what we've been doing with our brains throughout this episode:
- Look at data
- Notice a pattern (or lack thereof)
- Imagine a simple rule that approximates the pattern
- Judge how well the rule works
A machine learning model does exactly the same thing, but it:
- Handles thousands of features instead of two
- Processes millions of data points instead of a dozen
- Finds optimal parameters through systematic search instead of intuition
- Quantifies its confidence instead of saying "probably"
The gap between "I see a pattern" and "the computer found the pattern" is just math. Math that formalizes what your brain does intuitively: measure the error, adjust the parameters, repeat until it stops improving.
That's where we're heading next.
Quick recap
- Patterns in data are relationships between variables - some strong, some weak, some nonexistent;
- Correlation measures the strength of a linear relationship (-1 to +1) but does NOT imply causation;
- The "line of best fit" is the line that minimizes total error - finding it automatically is what ML does;
- Noise comes from unmeasured factors - more features reduce apparent noise;
- Single features rarely tell the whole story - ML combines many features simultaneously to find multi-dimensional patterns humans can't see;
- Not all patterns are linear - curved relationships need more flexible models;
- Machine learning is your brain's pattern recognition, formalized and scaled up with math.