Learn Zig Series (#33) - Building a State Machine with Tagged Unions
What will I learn
- You will learn how to write solutions for the Episode 32 exercises;
- You will learn modeling states as enum variants in a tagged union;
- You will learn transition functions that return the next state;
- You will learn exhaustive switch guarantees: the compiler catches missing states;
- You will learn state-specific data carried in union payloads;
- You will learn the state machine pattern for protocol parsing;
- You will learn nested state machines and hierarchical states;
- You will learn testing state machines: verifying transition sequences;
- You will learn a practical example: an HTTP request parser as a state machine.
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- An installed Zig 0.14+ distribution (download from ziglang.org);
- The ambition to learn Zig programming.
Difficulty
- Intermediate
Curriculum (of the Learn Zig Series):
- Zig Programming Tutorial - ep001 - Intro
- Learn Zig Series (#2) - Hello Zig, Variables and Types
- Learn Zig Series (#3) - Functions and Control Flow
- Learn Zig Series (#4) - Error Handling (Zig's Best Feature)
- Learn Zig Series (#5) - Arrays, Slices, and Strings
- Learn Zig Series (#6) - Structs, Enums, and Tagged Unions
- Learn Zig Series (#7) - Memory Management and Allocators
- Learn Zig Series (#8) - Pointers and Memory Layout
- Learn Zig Series (#9) - Comptime (Zig's Superpower)
- Learn Zig Series (#10) - Project Structure, Modules, and File I/O
- Learn Zig Series (#11) - Mini Project: Building a Step Sequencer
- Learn Zig Series (#12) - Testing and Test-Driven Development
- Learn Zig Series (#13) - Interfaces via Type Erasure
- Learn Zig Series (#14) - Generics with Comptime Parameters
- Learn Zig Series (#15) - The Build System (build.zig)
- Learn Zig Series (#16) - Sentinel-Terminated Types and C Strings
- Learn Zig Series (#17) - Packed Structs and Bit Manipulation
- Learn Zig Series (#18) - Async Concepts and Event Loops
- Learn Zig Series (#18b) - Addendum: Async Returns in Zig 0.16
- Learn Zig Series (#19) - SIMD with @Vector
- Learn Zig Series (#20) - Working with JSON
- Learn Zig Series (#21) - Networking and TCP Sockets
- Learn Zig Series (#22) - Hash Maps and Data Structures
- Learn Zig Series (#23) - Iterators and Lazy Evaluation
- Learn Zig Series (#24) - Logging, Formatting, and Debug Output
- Learn Zig Series (#25) - Mini Project: HTTP Status Checker
- Learn Zig Series (#26) - Writing a Custom Allocator
- Learn Zig Series (#27) - C Interop: Calling C from Zig
- Learn Zig Series (#28) - C Interop: Exposing Zig to C
- Learn Zig Series (#29) - Inline Assembly and Low-Level Control
- Learn Zig Series (#30) - Thread Safety and Atomics
- Learn Zig Series (#31) - Memory-Mapped I/O and Files
- Learn Zig Series (#32) - Compile-Time Reflection with @typeInfo
- Learn Zig Series (#33) - Building a State Machine with Tagged Unions (this post)
Learn Zig Series (#33) - Building a State Machine with Tagged Unions
Solutions to Episode 32 Exercises
Exercise 1 - Generic structFromEnv that populates a struct from environment variables:
const std = @import("std");
fn structFromEnv(comptime T: type) !T {
const info = @typeInfo(T).@"struct";
var result: T = undefined;
inline for (info.fields) |field| {
const env_name = comptime blk: {
var upper: [field.name.len]u8 = undefined;
for (field.name, 0..) |c, i| {
upper[i] = if (c >= 'a' and c <= 'z') c - 32 else c;
}
break :blk upper;
};
const maybe_val = std.posix.getenv(&env_name);
if (maybe_val) |val| {
@field(result, field.name) = try parseEnvValue(field.type, val);
} else if (field.default_value_ptr) |ptr| {
const default_ptr: *align(1) const field.type = @ptrCast(ptr);
@field(result, field.name) = default_ptr.*;
} else {
return error.MissingEnvVar;
}
}
return result;
}
fn parseEnvValue(comptime T: type, val: []const u8) !T {
if (T == []const u8) return val;
if (T == bool) return std.mem.eql(u8, val, "true") or std.mem.eql(u8, val, "1");
if (@typeInfo(T) == .int) return std.fmt.parseInt(T, val, 10);
return error.UnsupportedType;
}
const Config = struct {
host: []const u8,
port: u32,
debug: bool = false,
};
pub fn main() !void {
const cfg = structFromEnv(Config) catch |e| {
std.debug.print("Failed to load config: {}\n", .{e});
return;
};
std.debug.print("host={s} port={d} debug={}\n", .{ cfg.host, cfg.port, cfg.debug });
}
The neat thing here is the comptime uppercasing of field names. The env_name block runs entirely at compile time -- it iterates the field name characters and converts lowercase to uppercase using plain ASCII arithmetic. At runtime the code just calls getenv with a string literal the compiler baked in. The default_value_ptr handling is a bit gnarly because @typeInfo gives you an *anyopaque that you have to cast back to the correct field type (with align(1) because default value pointers have minimal alignment).
Exercise 2 - Generic structDiff showing which fields differ:
const std = @import("std");
fn structDiff(a: anytype, b: @TypeOf(a)) void {
const T = @TypeOf(a);
const info = @typeInfo(T).@"struct";
var diffs: usize = 0;
inline for (info.fields) |field| {
const va = @field(a, field.name);
const vb = @field(b, field.name);
const differs = if (comptime isSlice(field.type))
!std.mem.eql(sliceChild(field.type), va, vb)
else if (@typeInfo(field.type) == .@"struct")
!structEqlRecursive(va, vb)
else
va != vb;
if (differs) {
std.debug.print("DIFF .{s}: ", .{field.name});
printVal(field.type, va);
std.debug.print(" -> ", .{});
printVal(field.type, vb);
std.debug.print("\n", .{});
diffs += 1;
}
}
if (diffs == 0) std.debug.print("No differences.\n", .{});
}
fn isSlice(comptime T: type) bool {
return switch (@typeInfo(T)) {
.pointer => |p| p.size == .slice,
else => false,
};
}
fn sliceChild(comptime T: type) type {
return @typeInfo(T).pointer.child;
}
fn structEqlRecursive(a: anytype, b: @TypeOf(a)) bool {
const info = @typeInfo(@TypeOf(a)).@"struct";
inline for (info.fields) |field| {
const va = @field(a, field.name);
const vb = @field(b, field.name);
if (comptime isSlice(field.type)) {
if (!std.mem.eql(sliceChild(field.type), va, vb)) return false;
} else if (va != vb) return false;
}
return true;
}
fn printVal(comptime T: type, val: T) void {
if (comptime isSlice(T) and sliceChild(T) == u8) {
std.debug.print("\"{s}\"", .{val});
} else if (T == bool) {
std.debug.print("{}", .{val});
} else {
std.debug.print("{any}", .{val});
}
}
const Vec2 = struct { x: f32, y: f32 };
const Player = struct {
id: u32,
name: []const u8,
health: i32,
pos: Vec2,
active: bool,
};
pub fn main() void {
const a = Player{ .id = 1, .name = "alice", .health = 100, .pos = .{ .x = 0, .y = 0 }, .active = true };
const b = Player{ .id = 1, .name = "alice", .health = 73, .pos = .{ .x = 5.0, .y = 0 }, .active = false };
structDiff(a, b);
}
This follows the same reflection pattern from the episode but adds recursive struct comparison for nested types like Vec2. The printVal helper dispatches on type to give readable output -- strings get quoted, bools print as true/false, everything else uses {any}.
Exercise 3 - validateStruct with comptime-generated error set:
const std = @import("std");
fn ValidateErrors(comptime T: type) type {
const info = @typeInfo(T).@"struct";
var errs: [info.fields.len]std.builtin.Type.Error = undefined;
var count: usize = 0;
for (info.fields) |field| {
if (field.type == []const u8 or field.type == [:0]const u8) {
errs[count] = .{ .name = field.name ++ "_empty" };
count += 1;
} else if (@typeInfo(field.type) == .int) {
errs[count] = .{ .name = field.name ++ "_invalid" };
count += 1;
}
}
if (count == 0) return error{ValidationFailed};
return @Type(.{ .error_set = errs[0..count] });
}
fn validateStruct(value: anytype) ValidateErrors(@TypeOf(value))!void {
const T = @TypeOf(value);
const info = @typeInfo(T).@"struct";
inline for (info.fields) |field| {
const val = @field(value, field.name);
if (field.type == []const u8 or field.type == [:0]const u8) {
if (val.len == 0) return @field(ValidateErrors(T), field.name ++ "_empty");
} else if (@typeInfo(field.type) == .int) {
if (val <= 0) return @field(ValidateErrors(T), field.name ++ "_invalid");
}
}
}
const UserProfile = struct {
name: []const u8,
age: i32,
email: []const u8,
score: u32,
};
pub fn main() void {
const bad = UserProfile{ .name = "", .age = -5, .email = "test@example.com", .score = 0 };
validateStruct(bad) catch |e| {
std.debug.print("Validation failed: {}\n", .{e});
return;
};
std.debug.print("All valid!\n", .{});
}
This one is the trickiest of the three. ValidateErrors is a comptime function that returns a TYPE -- specifically an error set type built from field names. It iterates the struct's fields, and for each string or integer field it creates an error variant (name_empty, age_invalid, etc.). Then @Type(.{ .error_set = ... }) constructs the actual error set type at compile time. The validateStruct function uses that error set as its return type, so the caller gets precise error variants telling exactly WHICH field failed validation.
Right, with those out of the way, let's talk about state machines. If you've been following this series you've already seen tagged unions in episode 6, comptime in episode 9, and compile-time reflection in the previous episode. Today we're combining the core tagged union concept with explicit transition logic to build state machines -- one of the most useful patterns in systems programming. Parsers, protocol handlers, game logic, connection managers, UI flows -- once you see state machines you start seeing them everywhere ;-)
Modeling states as enum variants in a tagged union
A state machine is, at its core, a thing that can be in exactly one state at a time, and transitions between states based on input events. In Zig the natural representation for "exactly one of N alternatives" is a tagged union. Each state becomes a union variant, and each variant can carry data specific to that state.
Here's a simple example -- a door that can be locked, closed, or open:
const std = @import("std");
const DoorState = union(enum) {
locked: u8, // number of locks engaged
closed,
open: f32, // angle in degrees
};
const DoorEvent = enum {
turn_key,
push,
pull,
slam,
};
fn transition(state: DoorState, event: DoorEvent) DoorState {
return switch (state) {
.locked => |locks| switch (event) {
.turn_key => if (locks <= 1) DoorState{ .closed = {} } else DoorState{ .locked = locks - 1 },
.push, .pull => state, // can't move a locked door
.slam => state,
},
.closed => switch (event) {
.turn_key => DoorState{ .locked = 1 },
.push, .pull => DoorState{ .open = 90.0 },
.slam => DoorState{ .open = 180.0 },
},
.open => switch (event) {
.push => state,
.pull => DoorState{ .closed = {} },
.slam => DoorState{ .closed = {} },
.turn_key => state, // can't lock an open door
},
};
}
pub fn main() void {
var door: DoorState = .{ .locked = 2 };
const events = [_]DoorEvent{ .turn_key, .turn_key, .push, .slam, .pull, .turn_key };
for (events) |evt| {
const prev = door;
door = transition(door, evt);
std.debug.print("{s} --[{s}]--> {s}\n", .{
@tagName(prev), @tagName(evt), @tagName(door),
});
}
}
Output:
locked --[turn_key]--> locked
locked --[turn_key]--> closed
closed --[push]--> open
open --[slam]--> closed
closed --[pull]--> open
open --[turn_key]--> open
Notice how the locked state carries a u8 (lock count) and the open state carries an f32 (door angle). The closed state has no payload. This is the power of tagged unions for state machines -- each state can hold exactly the data relevant to that state and NOTHING else. You can't accidentially access the "angle" when the door is locked because the compiler won't let you destructure the wrong variant.
Transition functions that return the next state
The transition function pattern is straightforward: take the current state and an event, return the new state. The function is pure -- no side effects, no mutations, just input to output. This makes it extremely testable and easy to reason about.
const std = @import("std");
const ConnState = union(enum) {
disconnected,
connecting: struct { host: []const u8, attempts: u8 },
connected: struct { socket_fd: i32, bytes_sent: u64 },
error_state: []const u8,
};
const ConnEvent = enum {
connect,
ack_received,
data_sent,
timeout,
reset,
};
fn connTransition(state: ConnState, event: ConnEvent) ConnState {
return switch (state) {
.disconnected => switch (event) {
.connect => ConnState{ .connecting = .{ .host = "example.com", .attempts = 1 } },
else => state,
},
.connecting => |info| switch (event) {
.ack_received => ConnState{ .connected = .{ .socket_fd = 42, .bytes_sent = 0 } },
.timeout => if (info.attempts < 3)
ConnState{ .connecting = .{ .host = info.host, .attempts = info.attempts + 1 } }
else
ConnState{ .error_state = "max retries exceeded" },
.reset => ConnState.disconnected,
else => state,
},
.connected => |info| switch (event) {
.data_sent => ConnState{ .connected = .{ .socket_fd = info.socket_fd, .bytes_sent = info.bytes_sent + 1024 } },
.timeout => ConnState{ .error_state = "connection timed out" },
.reset => ConnState.disconnected,
else => state,
},
.error_state => switch (event) {
.reset => ConnState.disconnected,
else => state,
},
};
}
pub fn main() void {
var conn: ConnState = .disconnected;
const events = [_]ConnEvent{ .connect, .timeout, .timeout, .ack_received, .data_sent, .data_sent, .timeout, .reset };
for (events) |evt| {
conn = connTransition(conn, evt);
switch (conn) {
.disconnected => std.debug.print("[disconnected]\n", .{}),
.connecting => |i| std.debug.print("[connecting] attempt {d}\n", .{i.attempts}),
.connected => |i| std.debug.print("[connected] fd={d} sent={d}\n", .{ i.socket_fd, i.bytes_sent }),
.error_state => |msg| std.debug.print("[error] {s}\n", .{msg}),
}
}
}
Output:
[connecting] attempt 1
[connecting] attempt 2
[connecting] attempt 3
[error] max retries exceeded
[error] max retries exceeded
[error] max retries exceeded
[error] max retries exceeded
[disconnected]
See what happened? After 3 timeouts the machine transitions to error_state, and from there the only valid transition is reset back to disconnected. The ack_received and data_sent events in error state are silently ignored (return state). This is a common pattern -- many events are simply no-ops in certain states, and the nested switch makes that explicit without any if/else chains.
Having said that, there's a design choice here. I returned the current state for invalid transitions (else => state). Some people prefer returning an error for invalid transitions, which makes bugs louder. Both are legitimate depending on your use case -- protocol parsers typically want errors, game logic often prefers silent ignoring.
Exhaustive switch guarantees: the compiler catches missing states
This is where Zig really shines for state machines. When you switch on a tagged union, the compiler REQUIRES you to handle every variant. If you add a new state later and forget to update the transition function, you get a compile error -- not a runtime bug.
const std = @import("std");
const TrafficLight = union(enum) {
red: u32, // seconds remaining
yellow: u32,
green: u32,
flashing_red, // emergency mode
};
fn tick(light: TrafficLight) TrafficLight {
return switch (light) {
.red => |secs| if (secs > 1)
TrafficLight{ .red = secs - 1 }
else
TrafficLight{ .green = 30 },
.yellow => |secs| if (secs > 1)
TrafficLight{ .yellow = secs - 1 }
else
TrafficLight{ .red = 45 },
.green => |secs| if (secs > 1)
TrafficLight{ .green = secs - 1 }
else
TrafficLight{ .yellow = 5 },
.flashing_red => TrafficLight.flashing_red,
};
}
pub fn main() void {
var light: TrafficLight = .{ .green = 3 };
for (0..12) |i| {
switch (light) {
.red => |s| std.debug.print("tick {d:>2}: RED ({d}s)\n", .{ i, s }),
.yellow => |s| std.debug.print("tick {d:>2}: YELLOW ({d}s)\n", .{ i, s }),
.green => |s| std.debug.print("tick {d:>2}: GREEN ({d}s)\n", .{ i, s }),
.flashing_red => std.debug.print("tick {d:>2}: FLASH RED\n", .{i}),
}
light = tick(light);
}
}
Now imagine you add a .flashing_yellow variant to TrafficLight. The compiler will immediately flag BOTH tick() AND the display switch in main() as incomplete. You can't forget. Try doing that with an integer state variable and a bunch of if (state == 3) checks -- you'd be hunting runtime bugs for days. The exhaustive switch is probably the single biggest reason I reach for tagged unions when building state machines in Zig ;-)
State-specific data carried in union payloads
We've already been doing this in the examples above, but it deserves its own focus. The union payload is what makes Zig's state machines fundamentally different from the classic "enum + separate data struct" pattern you see in C.
const std = @import("std");
const ParseState = union(enum) {
idle,
reading_header: struct {
name_start: usize,
name_end: usize,
},
reading_value: struct {
key: []const u8,
value_start: usize,
},
reading_quoted: struct {
key: []const u8,
value_start: usize,
escape_next: bool,
},
done: struct {
pairs: u32,
},
err: struct {
position: usize,
message: []const u8,
},
};
Each state carries ONLY the data it needs. idle has nothing. reading_header tracks where the header name starts and ends in the input buffer. reading_value has the resolved key string plus where the value starts. reading_quoted adds an escape flag on top of that. And the error state has a position and message.
The alternative (what you'd do in C or in languages without tagged unions) is a struct that has ALL possible fields:
// DON'T do this -- the "kitchen sink" approach
const BadParseState = struct {
state: enum { idle, reading_header, reading_value, reading_quoted, done, err },
name_start: usize = 0,
name_end: usize = 0,
key: []const u8 = "",
value_start: usize = 0,
escape_next: bool = false,
pairs: u32 = 0,
error_position: usize = 0,
error_message: []const u8 = "",
};
This compiles fine but it's a bug magnet. Nothing stops you from reading escape_next when you're in the idle state. Nothing prevents you from modifying pairs when you're in reading_header. The tagged union makes invalid states unrepresentable -- and that's the phrase you'll hear from Rust and Haskell people, because it applies equally to Zig. If the type system won't let you construct a bad state, you don't need to write tests for bad states.
The state machine pattern for protocol parsing
Protocol parsers are the classic application for state machines. You consume input one byte (or chunk) at a time, transitioning between states as you recognize tokens. Here's a simple key-value parser that handles input like name=Alice age=30 city="New York":
const std = @import("std");
const KvState = union(enum) {
seeking_key,
reading_key: usize,
seeking_value: []const u8,
reading_value: struct { key: []const u8, start: usize },
reading_quoted: struct { key: []const u8, start: usize },
};
const KvPair = struct {
key: []const u8,
value: []const u8,
};
fn parseKv(input: []const u8, results: []KvPair) usize {
var state: KvState = .seeking_key;
var count: usize = 0;
for (input, 0..) |ch, pos| {
state = switch (state) {
.seeking_key => if (ch != ' ' and ch != '\t' and ch != '\n')
KvState{ .reading_key = pos }
else
state,
.reading_key => |start| if (ch == '=')
KvState{ .seeking_value = input[start..pos] }
else if (ch == ' ' or ch == '\n')
KvState.seeking_key // malformed, reset
else
state,
.seeking_value => |key| if (ch == '"')
KvState{ .reading_quoted = .{ .key = key, .start = pos + 1 } }
else if (ch != ' ')
KvState{ .reading_value = .{ .key = key, .start = pos } }
else
state,
.reading_value => |info| if (ch == ' ' or ch == '\n' or pos == input.len - 1) blk: {
const end = if (ch == ' ' or ch == '\n') pos else pos + 1;
if (count < results.len) {
results[count] = .{ .key = info.key, .value = input[info.start..end] };
count += 1;
}
break :blk KvState.seeking_key;
} else state,
.reading_quoted => |info| if (ch == '"') blk: {
if (count < results.len) {
results[count] = .{ .key = info.key, .value = input[info.start..pos] };
count += 1;
}
break :blk KvState.seeking_key;
} else state,
};
}
// Handle value that runs to end of input without trailing space
switch (state) {
.reading_value => |info| {
if (count < results.len) {
results[count] = .{ .key = info.key, .value = input[info.start..] };
count += 1;
}
},
else => {},
}
return count;
}
pub fn main() void {
const input = "name=Alice age=30 city=\"New York\" role=admin";
var buf: [16]KvPair = undefined;
const n = parseKv(input, &buf);
for (buf[0..n]) |pair| {
std.debug.print("{s} = {s}\n", .{ pair.key, pair.value });
}
}
Output:
name = Alice
age = 30
city = New York
role = admin
The parser walks through the input one character at a time. In seeking_key it skips whitespace. Once it hits a non-space character it transitions to reading_key carrying the start position. When it finds = it grabs the key slice and transitions to seeking_value. From there, a " sends it into reading_quoted mode, otherwise it starts reading_value. Both value modes collect characters until they hit their terminator (space or quote) and emit a pair.
This pattern scales. Real protocol parsers (HTTP, SMTP, DNS) work the same way -- you just have more states and more complex transition rules. The tagged union ensures you always know exactly what data is available in each state.
Nested state machines and hierarchical states
Sometimes a single flat state machine isn't enough. Complex protocols have sub-states within states -- for example, an HTTP parser might have a top-level state machine (request line, headers, body) where the "headers" state itself contains a sub-machine (field name, colon, field value, line ending).
const std = @import("std");
const HeaderSubState = union(enum) {
field_name: usize,
colon_seen: []const u8,
field_value: struct { name: []const u8, start: usize },
line_end: struct { name: []const u8, value: []const u8 },
};
const HttpParseState = union(enum) {
request_line: struct { start: usize },
headers: struct {
sub: HeaderSubState,
header_count: u16,
},
body: struct {
content_length: usize,
consumed: usize,
},
complete,
parse_error: []const u8,
};
const Header = struct {
name: []const u8,
value: []const u8,
};
fn feedHeaderByte(sub: HeaderSubState, ch: u8, pos: usize, input: []const u8) struct { state: HeaderSubState, header: ?Header } {
_ = input;
return switch (sub) {
.field_name => |start| if (ch == ':')
.{ .state = .{ .colon_seen = input[start..pos] }, .header = null }
else
.{ .state = sub, .header = null },
.colon_seen => |name| if (ch != ' ')
.{ .state = .{ .field_value = .{ .name = name, .start = pos } }, .header = null }
else
.{ .state = sub, .header = null },
.field_value => |info| if (ch == '\n')
.{ .state = .{ .line_end = .{ .name = info.name, .value = input[info.start..pos] } }, .header = .{ .name = info.name, .value = input[info.start..pos] } }
else
.{ .state = sub, .header = null },
.line_end => if (ch == '\n')
.{ .state = sub, .header = null } // double newline means end of headers
else
.{ .state = .{ .field_name = pos }, .header = null },
};
}
pub fn main() void {
const raw = "GET / HTTP/1.1\nHost: example.com\nAccept: text/html\n\nHello";
var state: HttpParseState = .{ .request_line = .{ .start = 0 } };
var headers: [16]Header = undefined;
var header_count: usize = 0;
for (raw, 0..) |ch, pos| {
state = switch (state) {
.request_line => |_| if (ch == '\n')
HttpParseState{ .headers = .{
.sub = .{ .field_name = pos + 1 },
.header_count = 0,
} }
else
state,
.headers => |info| blk: {
const result = feedHeaderByte(info.sub, ch, pos, raw);
var new_count = info.header_count;
if (result.header) |h| {
if (header_count < headers.len) {
headers[header_count] = h;
header_count += 1;
}
new_count += 1;
}
// Check for double newline (end of headers)
if (ch == '\n') {
switch (result.state) {
.line_end => break :blk HttpParseState{ .body = .{
.content_length = raw.len - pos - 1,
.consumed = 0,
} },
else => {},
}
}
break :blk HttpParseState{ .headers = .{
.sub = result.state,
.header_count = new_count,
} };
},
.body => |info| blk: {
const new_consumed = info.consumed + 1;
if (new_consumed >= info.content_length) {
break :blk HttpParseState.complete;
}
break :blk HttpParseState{ .body = .{
.content_length = info.content_length,
.consumed = new_consumed,
} };
},
.complete => state,
.parse_error => state,
};
}
std.debug.print("Parsed {d} headers:\n", .{header_count});
for (headers[0..header_count]) |h| {
std.debug.print(" {s}: {s}\n", .{ h.name, h.value });
}
std.debug.print("Final state: {s}\n", .{@tagName(state)});
}
The top-level machine has states for request_line, headers, body, and complete. The headers state contains a HeaderSubState sub-machine that tracks where we are within a single header line. When feedHeaderByte processes the header sub-machine, it returns both the new sub-state AND optionally a completed header. The parent machine updates its own state accordingly.
This composition pattern is really powerfull. You can build complex parsers by nesting small, testable state machines inside larger ones. Each level only worries about its own transitions. The outer machine delegates to the inner machine and reacts to its outputs.
Testing state machines: verifying transition sequences
Because transition functions are pure (state in, state out), testing them is trivial. You just feed a sequence of events and assert on the resulting states:
const std = @import("std");
const testing = std.testing;
const Turnstile = union(enum) {
locked,
unlocked,
broken: []const u8,
};
const TurnstileEvent = enum { coin, push, kick };
fn turnstileTransition(state: Turnstile, event: TurnstileEvent) Turnstile {
return switch (state) {
.locked => switch (event) {
.coin => Turnstile.unlocked,
.push => state,
.kick => Turnstile{ .broken = "kicked while locked" },
},
.unlocked => switch (event) {
.coin => state, // already unlocked, extra coin
.push => Turnstile.locked, // person walked through
.kick => Turnstile{ .broken = "kicked while unlocked" },
},
.broken => state, // once broken, stays broken
};
}
test "basic coin-push cycle" {
var s: Turnstile = .locked;
s = turnstileTransition(s, .coin);
try testing.expectEqual(Turnstile.unlocked, s);
s = turnstileTransition(s, .push);
try testing.expectEqual(Turnstile.locked, s);
}
test "push on locked does nothing" {
var s: Turnstile = .locked;
s = turnstileTransition(s, .push);
try testing.expectEqual(Turnstile.locked, s);
}
test "kick breaks it permanently" {
var s: Turnstile = .locked;
s = turnstileTransition(s, .kick);
try testing.expect(s == .broken);
// Once broken, nothing changes it
s = turnstileTransition(s, .coin);
try testing.expect(s == .broken);
s = turnstileTransition(s, .push);
try testing.expect(s == .broken);
}
test "full sequence" {
const events = [_]TurnstileEvent{ .push, .coin, .push, .coin, .coin, .push };
const expected = [_]std.meta.Tag(Turnstile){ .locked, .unlocked, .locked, .unlocked, .unlocked, .locked };
var s: Turnstile = .locked;
for (events, expected) |evt, exp| {
s = turnstileTransition(s, evt);
try testing.expectEqual(exp, std.meta.activeTag(s));
}
}
pub fn main() void {
std.debug.print("Run with: zig test this_file.zig\n", .{});
}
The std.meta.activeTag function extracts the tag (enum variant) from a tagged union value, and std.meta.Tag(Turnstile) gives you the tag's type. This lets you compare which variant is active without needing to match payload data. For testing sequences like the "full sequence" test, you define parallel arrays of events and expected tag outcomes, then loop through them. If ANY transition produces the wrong state, the test fails immediately with the index.
This is how I test every state machine I write. The transition function is pure, so tests are deterministic and fast -- no setup, no teardown, no mocking. You can easily add property-based tests too: generate random event sequences and verify invariants (like "the turnstile should never go from broken back to locked" or "after a coin, the next push should always lock it").
Practical example: an HTTP request parser as a state machine
Let's put everthing together with a more complete HTTP request line parser. This handles the GET /path HTTP/1.1 part of an HTTP request, extracting the method, path, and version:
const std = @import("std");
const Method = enum {
GET,
POST,
PUT,
DELETE,
HEAD,
OPTIONS,
fn fromString(s: []const u8) ?Method {
const map = .{
.{ "GET", Method.GET },
.{ "POST", Method.POST },
.{ "PUT", Method.PUT },
.{ "DELETE", Method.DELETE },
.{ "HEAD", Method.HEAD },
.{ "OPTIONS", Method.OPTIONS },
};
inline for (map) |entry| {
if (std.mem.eql(u8, s, entry[0])) return entry[1];
}
return null;
}
};
const RequestLine = struct {
method: Method,
path: []const u8,
version_major: u8,
version_minor: u8,
};
const ReqLineState = union(enum) {
method: usize,
method_sp: []const u8,
path: struct { method: Method, start: usize },
path_sp: struct { method: Method, path: []const u8 },
version: struct { method: Method, path: []const u8, start: usize },
done: RequestLine,
fail: []const u8,
};
fn parseRequestLine(input: []const u8) ?RequestLine {
var state: ReqLineState = .{ .method = 0 };
for (input, 0..) |ch, pos| {
state = switch (state) {
.method => |start| if (ch == ' ') blk: {
break :blk .{ .method_sp = input[start..pos] };
} else state,
.method_sp => |raw| blk: {
const m = Method.fromString(raw) orelse
break :blk ReqLineState{ .fail = "unknown method" };
break :blk ReqLineState{ .path = .{ .method = m, .start = pos } };
},
.path => |info| if (ch == ' ')
ReqLineState{ .path_sp = .{ .method = info.method, .path = input[info.start..pos] } }
else
state,
.path_sp => |info| ReqLineState{ .version = .{
.method = info.method,
.path = info.path,
.start = pos,
} },
.version => |info| if (ch == '\n' or ch == '\r' or pos == input.len - 1) blk: {
const ver_str = input[info.start .. pos + 1];
// Parse "HTTP/1.1" or "HTTP/1.0"
if (ver_str.len >= 8 and std.mem.startsWith(u8, ver_str, "HTTP/")) {
const major = ver_str[5] -% '0';
const minor = if (ver_str.len > 7) ver_str[7] -% '0' else 0;
break :blk ReqLineState{ .done = .{
.method = info.method,
.path = info.path,
.version_major = major,
.version_minor = minor,
} };
}
break :blk ReqLineState{ .fail = "bad version" };
} else state,
.done => state,
.fail => state,
};
}
return switch (state) {
.done => |req| req,
else => null,
};
}
pub fn main() void {
const lines = [_][]const u8{
"GET / HTTP/1.1\n",
"POST /api/users HTTP/1.0\n",
"DELETE /item/42 HTTP/1.1\n",
"PATCH /nope HTTP/1.1\n",
};
for (lines) |line| {
if (parseRequestLine(line)) |req| {
std.debug.print("{s} {s} HTTP/{d}.{d}\n", .{
@tagName(req.method),
req.path,
req.version_major,
req.version_minor,
});
} else {
std.debug.print("FAILED: {s}\n", .{line[0 .. @min(line.len, 40)]});
}
}
}
Output:
GET / HTTP/1.1
POST /api/users HTTP/1.0
DELETE /item/42 HTTP/1.1
FAILED: PATCH /nope HTTP/1.1
The parser walks through the request line one byte at a time. State transitions carry forward the accumulated data -- the method string gets resolved to an enum via fromString, then the path gets sliced out, then the version gets parsed. The PATCH line fails because we didn't include it in the Method enum (on purpose, to show failure handling).
Each state knows exactly what it has accumulated so far. The path state knows the method. The version state knows both method and path. The done state has the complete parsed result. This kind of incremental accumulation through state transitions is the bread and butter of protocol parsing.
Real HTTP parsers (like Zig's own std.http.Server) are more complex -- they handle chunked encoding, keep-alive, header continuation, and plenty of edge cases. But the core pattern is identical: tagged union states, byte-by-byte processing, pure transition functions. Once you nail this pattern, scaling up is just adding more states and more careful transition logic.
Wat we geleerd hebben
- Tagged unions are the natural way to model state machines in Zig -- each variant represents a state, each payload carries state-specific data.
- Transition functions take
(state, event) -> new_stateand should be pure (no side effects) for testability. - Zig's exhaustive switch means the compiler catches any state you forget to handle -- add a new variant and every switch on that union becomes a compile error until updated.
- Union payloads make invalid states unrepresentable: you CAN'T access data that doesn't exist in the current state because the compiler won't let you destructure the wrong variant.
- Protocol parsing maps perfectly to state machines: consume input byte by byte, transition between states, accumulate parsed data in payloads.
- Nested state machines handle hierarchical protocols by embedding a sub-machine inside a parent state's payload.
- Testing pure transition functions is straightforward: define event sequences, assert on resulting states, no mocking needed.
The combination of tagged unions + exhaustive switches + zero-cost payload data gives Zig state machines a level of compile-time safety that's hard to match in C (where state machines are typically switch(state_int) with manual data management). Next time you find yourself reaching for a bunch of boolean flags or an integer state variable with a comment saying "1 = connecting, 2 = connected, 3 = error" -- that's the smell of a tagged union waiting to happen ;-)
Exercises
Build a tokenizer for simple math expressions using a state machine. The input is a string like
"123 + 45.6 * (7 - 2)"and the output is a sequence of tokens:number(123),op(+),number(45.6),op(*),paren_open,number(7),op(-),number(2),paren_close. Model the tokenizer state as a tagged union with variants likeidle,reading_integer,reading_decimal,reading_operator. Process the input one character at a time and emit tokens into a result buffer.Implement a "retry with backoff" connection state machine. States:
idle,connecting(with attempt count and current delay),connected(with uptime counter),backing_off(with remaining wait ticks),failed(with error message). Events:start,tick,success,failure,reset. The backoff delay should double after each failed attempt (1, 2, 4, 8 ticks) up to a maximum of 5 attempts. Write the transition function and a test that verifies the full connect-fail-backoff-retry-succeed cycle.Build a Markdown inline parser that handles
**bold**,*italic*, and`code`spans. Model it as a state machine with states for normal text, potential bold/italic (seen one*), bold (seen two*), italic (inside single*), and code (inside backtick). The parser should take a string and return an array of spans, each tagged with its style (plain, bold, italic, code). Handle the edge case where*appears at end-of-input without a closing match (emit it as plain text).
Bedankt en tot de volgende keer!