Learn Zig Series (#39) - Markdown to HTML: Renderer and CLI
Project A: Markdown to HTML Converter (3/3)
What will I learn
- You will learn walking the AST to emit HTML tags;
- You will learn escaping HTML entities in text content;
- You will learn handling inline elements within block elements;
- You will learn pretty-printing vs minified HTML output;
- You will learn CLI interface: read from file or stdin, write to file or stdout;
- You will learn piping support:
cat README.md | zigmd > output.html; - You will learn adding basic CSS inline or via a template;
- You will learn end-to-end testing: Markdown file in, HTML file out, diff against expected.
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- An installed Zig 0.14+ distribution (download from ziglang.org);
- The ambition to learn Zig programming.
Difficulty
- Intermediate
Curriculum (of the Learn Zig Series):
- Zig Programming Tutorial - ep001 - Intro
- Learn Zig Series (#2) - Hello Zig, Variables and Types
- Learn Zig Series (#3) - Functions and Control Flow
- Learn Zig Series (#4) - Error Handling (Zig's Best Feature)
- Learn Zig Series (#5) - Arrays, Slices, and Strings
- Learn Zig Series (#6) - Structs, Enums, and Tagged Unions
- Learn Zig Series (#7) - Memory Management and Allocators
- Learn Zig Series (#8) - Pointers and Memory Layout
- Learn Zig Series (#9) - Comptime (Zig's Superpower)
- Learn Zig Series (#10) - Project Structure, Modules, and File I/O
- Learn Zig Series (#11) - Mini Project: Building a Step Sequencer
- Learn Zig Series (#12) - Testing and Test-Driven Development
- Learn Zig Series (#13) - Interfaces via Type Erasure
- Learn Zig Series (#14) - Generics with Comptime Parameters
- Learn Zig Series (#15) - The Build System (build.zig)
- Learn Zig Series (#16) - Sentinel-Terminated Types and C Strings
- Learn Zig Series (#17) - Packed Structs and Bit Manipulation
- Learn Zig Series (#18) - Async Concepts and Event Loops
- Learn Zig Series (#18b) - Addendum: Async Returns in Zig 0.16
- Learn Zig Series (#19) - SIMD with @Vector
- Learn Zig Series (#20) - Working with JSON
- Learn Zig Series (#21) - Networking and TCP Sockets
- Learn Zig Series (#22) - Hash Maps and Data Structures
- Learn Zig Series (#23) - Iterators and Lazy Evaluation
- Learn Zig Series (#24) - Logging, Formatting, and Debug Output
- Learn Zig Series (#25) - Mini Project: HTTP Status Checker
- Learn Zig Series (#26) - Writing a Custom Allocator
- Learn Zig Series (#27) - C Interop: Calling C from Zig
- Learn Zig Series (#28) - C Interop: Exposing Zig to C
- Learn Zig Series (#29) - Inline Assembly and Low-Level Control
- Learn Zig Series (#30) - Thread Safety and Atomics
- Learn Zig Series (#31) - Memory-Mapped I/O and Files
- Learn Zig Series (#32) - Compile-Time Reflection with @typeInfo
- Learn Zig Series (#33) - Building a State Machine with Tagged Unions
- Learn Zig Series (#34) - Performance Profiling and Optimization
- Learn Zig Series (#35) - Cross-Compilation and Target Triples
- Learn Zig Series (#36) - Mini Project: CLI Task Runner
- Learn Zig Series (#37) - Markdown to HTML: Tokenizer and Lexer
- Learn Zig Series (#38) - Markdown to HTML: Parser and AST
- Learn Zig Series (#39) - Markdown to HTML: Renderer and CLI (this post)
Learn Zig Series (#39) - Markdown to HTML: Renderer and CLI
Here we go -- the final piece of the puzzle! ;-)
In episode 37 we built a tokenizer that breaks raw Markdown into a flat stream of tokens. In episode 38 we wrote a parser that organizes those tokens into a tree -- an AST with block nodes, inline nodes, and all the nesting relationships that make a document a document rather than just a bag of text. But a tree sitting in memory doesn't help anyone read your README. We need something that walks that tree and produces actual HTML.
That's what we're building today: the renderer. It takes an AST in, spits HTML out. And then we'll wrap the whole pipeline in a CLI so you can actually use this thing from the terminal like a real tool. Read from a file, read from stdin, write to a file, pipe it around -- the works. By the end of this episode, you'll have a fully functional Markdown-to-HTML converter written entirely in Zig, across three episodes, from scratch.
The HtmlRenderer struct
The renderer walks the AST using the visitor pattern we set up last episode and writes HTML into a growable buffer. The core idea: for every node type, we emit an opening tag when we enter the node and a closing tag when we leave it. Text nodes just dump their (escaped) content. The tree structure guarantees correct nesting -- if bold is inside a paragraph, the visitor enters the paragraph first, then the bold, then leaves bold, then leaves paragraph. The HTML tags come out in the right order automatically.
const std = @import("std");
const Node = @import("parser.zig").Node;
const NodeType = @import("parser.zig").NodeType;
const Visitor = @import("parser.zig").Visitor;
pub const HtmlRenderer = struct {
buffer: std.ArrayList(u8),
indent_level: usize,
pretty: bool,
pub fn init(allocator: std.mem.Allocator, pretty: bool) HtmlRenderer {
return .{
.buffer = std.ArrayList(u8).init(allocator),
.indent_level = 0,
.pretty = pretty,
};
}
pub fn deinit(self: *HtmlRenderer) void {
self.buffer.deinit();
}
pub fn render(self: *HtmlRenderer, root: *const Node) ![]const u8 {
const visitor = Visitor(HtmlRenderer){
.enterFn = enterNode,
.leaveFn = leaveNode,
};
visitor.walk(root, self);
return self.buffer.items;
}
fn write(self: *HtmlRenderer, bytes: []const u8) void {
self.buffer.appendSlice(bytes) catch {};
}
fn writeIndent(self: *HtmlRenderer) void {
if (!self.pretty) return;
var i: usize = 0;
while (i < self.indent_level) : (i += 1) {
self.write(" ");
}
}
fn newline(self: *HtmlRenderer) void {
if (self.pretty) self.write("\n");
}
};
The pretty flag controls whether we emit whitespace and indentation (for human-readable output) or pack everything tight (for production use). The buffer is just an ArrayList(u8) -- we keep appending bytes as we walk the tree, and at the end the caller gets a contiguous slice of HTML.
One thing worth noting: we're using catch {} on the appendSlice call inside write. In a production tool you'd want proper error propagation, but for our visitor callbacks (which can't return errors due to the function pointer signature), swallowing allocation failures is the pragmatic choice. If the allocator is completely out of memory you have bigger problems anyway.
Entering and leaving nodes -- the tag logic
This is where the actual HTML generation happens. Every node type maps to an HTML tag (or in some cases, raw content):
fn enterNode(self: *HtmlRenderer, node: *const Node, depth: usize) void {
_ = depth;
switch (node.node_type) {
.document => {},
.heading => {
self.writeIndent();
const level = if (node.level > 6) @as(u8, 6) else if (node.level == 0) @as(u8, 1) else node.level;
var tag_buf: [4]u8 = undefined;
const tag = std.fmt.bufPrint(&tag_buf, "h{d}", .{level}) catch "h1";
self.write("<");
self.write(tag);
self.write(">");
},
.paragraph => {
self.writeIndent();
self.write(""
);
},
.code_block => {
self.writeIndent();
if (node.language.len > 0) {
self.write("<code class=\"language-"
);
self.write(node.language);
self.write("\">");
} else {
self.write(""
);
}
// Write code content escaped
escapeHtmlTo(&self.buffer, node.content);
// Close tags immediately -- code blocks have no children
self.write("");
self.newline();
},
.unordered_list => {
self.writeIndent();
self.write("- "
- "
"); self.newline(); self.indent_level += 1; }, .horizontal_rule => { self.writeIndent(); self.write("
"); self.newline(); }, .text => { escapeHtmlTo(&self.buffer, node.content); }, .bold => self.write(""), .italic => self.write(""), .code_span => { self.write("
");
escapeHtmlTo(&self.buffer, node.content);
self.write("");
},
.link => {
self.write("<a href=\"");
escapeHtmlTo(&self.buffer, node.url);
self.write("\">");
},
.image => {
self.write("<img src=\"");
escapeHtmlTo(&self.buffer, node.url);
self.write("\" alt=\"");
escapeHtmlTo(&self.buffer, node.alt_text);
self.write("\">");
},
.line_break => self.write(""), } }
And the leave function closes everything that was opened:
fn leaveNode(self: *HtmlRenderer, node: *const Node, depth: usize) void {
_ = depth;
switch (node.node_type) {
.document => {},
.heading => {
const level = if (node.level > 6) @as(u8, 6) else if (node.level == 0) @as(u8, 1) else node.level;
var tag_buf: [4]u8 = undefined;
const tag = std.fmt.bufPrint(&tag_buf, "h{d}", .{level}) catch "h1";
self.write("</");
self.write(tag);
self.write(">");
self.newline();
},
.paragraph => {
self.write("");
self.newline();
},
.code_block => {}, // already closed in enter
.unordered_list => {
self.indent_level -= 1;
self.writeIndent();
self.write("");
self.newline();
},
.ordered_list => {
self.indent_level -= 1;
self.writeIndent();
self.write("");
self.newline();
},
.list_item => {
self.write("");
self.newline();
},
.blockquote => {
self.indent_level -= 1;
self.writeIndent();
self.write("");
self.newline();
},
.bold => self.write(""),
.italic => self.write(""),
.link => self.write(""),
// These don't have closing tags
.horizontal_rule, .text, .code_span, .image, .line_break => {},
}
}
The pattern is really satisfying once you see it working: the visitor's natural enter/leave pairing perfectly mirrors HTML's open/close tag structure. We don't need to manually track which tags are still "open" -- the tree traversal does it for us. This is exactly why we bothered building an AST in the first place rather than trying to do a single-pass token-to-HTML translation.
Notice the code_block special case: we emit both the opening and closing tags in enterNode and then do nothing in leaveNode. That's because code block content is already stored in node.content (not as children), so we don't need the visitor to recurse into children. We just dump the escaped content right there between the tags.
Escaping HTML entities
This is one of those things that seems trivial but will absolutley bite you if you skip it. If someone writes x < 10 in their Markdown, the HTML output needs x < 10. Otherwise the browser will try to parse < 10 as the start of a tag and produce garbage.
The five characters that need escaping in HTML content:
fn escapeHtmlTo(buffer: *std.ArrayList(u8), input: []const u8) void {
for (input) |c| {
switch (c) {
'&' => buffer.appendSlice("&") catch {},
'<' => buffer.appendSlice("<") catch {},
'>' => buffer.appendSlice(">") catch {},
'"' => buffer.appendSlice(""") catch {},
'\'' => buffer.appendSlice("'") catch {},
else => buffer.append(c) catch {},
}
}
}
Five characters. Five substitutions. That's the entire function. You might wonder -- do we need to escape all of these everywhere? Strictly speaking, > doesn't need escaping in regular text content (only inside attributes), and single quotes only matter inside single-quoted attributes. But escaping all five everywhere is safe, fast, and means you never have to think about context. The performance cost of writing > instead of > for those rare greater-than signs in your prose is effectively zero.
One detail: we call escapeHtmlTo for text nodes, code spans, code block content, link URLs, and image alt text. Basically everything that comes from the original Markdown source gets escaped. The only things that DON'T get escaped are our own generated tags (<p>, <strong>, etc.) because those are hardcoded strings, not user input.
If you've ever done web development (even a little), this concept is familiar. It's the exact same idea as parameterized SQL queries preventing SQL injection -- you separate the structure (HTML tags) from the data (user text), and the data gets sanitized before insertion. The renderer produces the structure, escapeHtmlTo sanitizes the data.
Pretty-printing vs minified output
I mentioned the pretty flag earlier. Here's what the difference looks like. Given this Markdown:
# Hello
Some **bold** text.
- first
- second
Pretty-printed output:
<h1>Hello</h1>
<p>Some <strong>bold</strong> text.</p>
<ul>
<li>first</li>
<li>second</li>
</ul>
Minified output:
<h1>Hello</h1>Some bold</
strong> text.</p>- first</
li><li>second</li></ul>
The pretty version adds newlines after block elements and indents nested content (list items inside lists, paragraphs inside blockquotes). The minified version just concatenates everything. Both produce identical rendering in a browser -- whitespace between block elements is ignored by HTML parsers.
The implementation is dead simple because we already built writeIndent() and newline() as separate methods. Pretty mode uses them, minified mode skips them. The actual tag-emitting logic is identical either way. This kind of separation -- the rendering logic doesn't know or care about formatting -- is a good design habit.
Why bother with minified output? Two reasons. First, smaller file size (matters if you're generating thousands of pages). Second, some tools (diffing, testing, embedding in JSON) work better with single-line output where whitespace is deterministic. Having said that, for most use cases the pretty version is what you want.
The CLI -- reading input and writing output
Now we wire everything together into a proper command-line tool. The main.zig reads input (from a file argument or stdin), runs the tokenizer + parser + renderer pipeline, and writes HTML to stdout (or a file):
const std = @import("std");
const Document = @import("parser.zig").Document;
const HtmlRenderer = @import("renderer.zig").HtmlRenderer;
const css_template =
\\<style>
\\ body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
\\ max-width: 800px; margin: 0 auto; padding: 2rem; line-height: 1.6; }
\\ pre { background: #f4f4f4; padding: 1rem; overflow-x: auto; border-radius: 4px; }
\\ code { font-family: 'Fira Code', monospace; font-size: 0.9em; }
\\ blockquote { border-left: 3px solid #ccc; margin: 0; padding-left: 1rem; color: #555; }
\\ img { max-width: 100%; }
\\ a { color: #0366d6; }
\\</style>
;
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
const args = try std.process.argsAlloc(allocator);
defer std.process.argsFree(allocator, args);
var input_path: ?[]const u8 = null;
var output_path: ?[]const u8 = null;
var pretty = true;
var include_css = false;
var full_html = false;
var i: usize = 1;
while (i < args.len) : (i += 1) {
const arg = args[i];
if (std.mem.eql(u8, arg, "--minify")) {
pretty = false;
} else if (std.mem.eql(u8, arg, "--css")) {
include_css = true;
} else if (std.mem.eql(u8, arg, "--full")) {
full_html = true;
include_css = true;
} else if (std.mem.eql(u8, arg, "-o") or std.mem.eql(u8, arg, "--output")) {
i += 1;
if (i < args.len) output_path = args[i];
} else if (arg[0] != '-') {
input_path = arg;
}
}
// Read input
const source = if (input_path) |path| blk: {
const file = try std.fs.cwd().openFile(path, .{});
defer file.close();
break :blk try file.readToEndAlloc(allocator, 10 * 1024 * 1024);
} else blk: {
// Read from stdin
const stdin = std.io.getStdIn();
break :blk try stdin.reader().readAllAlloc(allocator, 10 * 1024 * 1024);
};
defer allocator.free(source);
// Parse
var doc = try Document.init(source, allocator);
defer doc.deinit();
// Render
var renderer = HtmlRenderer.init(allocator, pretty);
defer renderer.deinit();
const html_body = try renderer.render(doc.root);
// Write output
const stdout = std.io.getStdOut().writer();
const writer = if (output_path) |path| blk: {
const file = try std.fs.cwd().createFile(path, .{});
break :blk file.writer();
} else blk: {
break :blk stdout;
};
if (full_html) {
try writer.writeAll("\n\n\n\n");
if (include_css) try writer.writeAll(css_template);
try writer.writeAll("\n\n\n");
} else if (include_css) {
try writer.writeAll(css_template);
try writer.writeAll("\n");
}
try writer.writeAll(html_body);
if (full_html) {
try writer.writeAll("\n\n");
}
}
The argument parsing is deliberately simple -- no fancy argument parsing library, just a loop over args. For a small tool like this, that's the right call. We support:
zigmd input.md-- read file, output to stdoutzigmd input.md -o output.html-- read file, write to filecat input.md | zigmd-- read from stdin (piping)zigmd --minify input.md-- minified outputzigmd --css input.md-- prepend a basic CSS stylesheetzigmd --full input.md-- full HTML document with doctype, head, body
The stdin detection is interesting. We don't explicitly check "is stdin a pipe?" -- we just always try to read from stdin if no input file is given. If someone runs zigmd with no arguments and no pipe, it'll block waiting for input (which is standard Unix behavior -- cat does the same thing). You could detect whether stdin is a TTY using std.os.isatty and print a usage message instead, but that's a polish item, not a correctness issue.
Piping support and Unix philosophy
The tool follows the Unix philosophy: read from stdin, write to stdout, compose with other tools. This means you can do things like:
# Basic conversion
zigmd README.md > README.html
# Pipe from another command
cat README.md | zigmd > output.html
# Pipe through sed first to fix something, then convert
sed 's/old-url/new-url/g' README.md | zigmd --full -o page.html
# Chain with other tools
zigmd README.md | wc -c # count HTML bytes
zigmd README.md | tidy -i # run through HTML tidy
Making this work required no special code at all. The input side reads from getStdIn() when there's no file argument. The output side writes to getStdOut() when there's no -o flag. That's it. Zig's standard library handles the buffering and the OS handles the pipe plumbing. One of the nice things about systems programming -- you're close enough to the OS that "just use stdin/stdout" actually works without layers of abstraction getting in the way.
The readAllAlloc call has a 10MB limit. For Markdown files that's absurdly generous -- even a book-length document would be well under 1MB. But it's good to have an explicit limit rather than letting a malicious input consume all memory. We covered why explicit limits matter way back in the memory management episodes (ep7, ep8).
Adding basic CSS
The --css flag injects a minimal stylesheet so the HTML doesn't look like raw browser default styling. The CSS is stored as a compile-time string using Zig's multiline string literals (the \\ prefix syntax):
const css_template =
\\<style>
\\ body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
\\ max-width: 800px; margin: 0 auto; padding: 2rem; line-height: 1.6; }
\\ pre { background: #f4f4f4; padding: 1rem; overflow-x: auto; border-radius: 4px; }
\\ code { font-family: 'Fira Code', monospace; font-size: 0.9em; }
\\ blockquote { border-left: 3px solid #ccc; margin: 0; padding-left: 1rem; color: #555; }
\\ img { max-width: 100%; }
\\ a { color: #0366d6; }
\\</style>
;
Nothing fancy -- just enough to make code blocks readable, paragraphs nicely spaced, and the page centered. The --full flag goes further and wraps everything in a proper HTML5 document structure with <!DOCTYPE html>, <head>, and <body> tags. This is useful when you want to open the output directly in a browser without embedding it in an existing page.
You could make this more elaborate -- load CSS from an external file, support a --template flag that takes an HTML file with a {{content}} placeholder, etc. But for a learning project, the inline approach keeps the code simple and self-contained. A good extension exercise if you want to take it further.
End-to-end testing
Unit tests for individual components are great (we wrote plenty in episodes 37 and 38), but for the final project we also want end-to-end tests: feed a .md file through the entire pipeline and compare the output against an expected .html file. If they match, the whole stack works correctly. If they diverge, something broke somewhere.
const std = @import("std");
const Document = @import("parser.zig").Document;
const HtmlRenderer = @import("renderer.zig").HtmlRenderer;
fn renderMarkdown(allocator: std.mem.Allocator, source: []const u8) ![]const u8 {
var doc = try Document.init(source, allocator);
defer doc.deinit();
var renderer = HtmlRenderer.init(allocator, false); // minified for deterministic comparison
defer renderer.deinit();
const result = try renderer.render(doc.root);
return try allocator.dupe(u8, result);
}
test "heading renders to h1" {
const html = try renderMarkdown(std.testing.allocator, "# Hello World\n");
defer std.testing.allocator.free(html);
try std.testing.expect(std.mem.indexOf(u8, html, "Hello World
") != null);
}
test "bold renders to strong" {
const html = try renderMarkdown(std.testing.allocator, "Some **bold** text\n");
defer std.testing.allocator.free(html);
try std.testing.expect(std.mem.indexOf(u8, html, "bold") != null);
}
test "code block renders with language class" {
const html = try renderMarkdown(std.testing.allocator, "```zig\nconst x = 42;\n```\n");
defer std.testing.allocator.free(html);
try std.testing.expect(std.mem.indexOf(u8, html, "class=\"language-zig\"") != null);
try std.testing.expect(std.mem.indexOf(u8, html, "const x = 42;") != null);
}
test "html entities are escaped" {
const html = try renderMarkdown(std.testing.allocator, "x 5\n");
defer std.testing.allocator.free(html);
try std.testing.expect(std.mem.indexOf(u8, html, "<") != null);
try std.testing.expect(std.mem.indexOf(u8, html, "&") != null);
try std.testing.expect(std.mem.indexOf(u8, html, ">") != null);
// Must NOT contain raw (except in tags)
}
test "link renders with href" {
const html = try renderMarkdown(std.testing.allocator, "[click](https://example.com)\n");
defer std.testing.allocator.free(html);
try std.testing.expect(std.mem.indexOf(u8, html, "click") != null);
}
test "unordered list renders as ul/li" {
const html = try renderMarkdown(std.testing.allocator, "- alpha\n- beta\n");
defer std.testing.allocator.free(html);
try std.testing.expect(std.mem.indexOf(u8, html, ""
) != null);
try std.testing.expect(std.mem.indexOf(u8, html, "alpha ") != null);
try std.testing.expect(std.mem.indexOf(u8, html, "beta ") != null);
}
test "full document pipeline" {
const markdown =
\\# My Document
\\
\\This is a paragraph with **bold** and *italic* text.
\\
\\## Section Two
\\
\\- item one
\\- item two
\\
\\```zig
\\const x = 42;
\\```
\\
\\Check [this link](https://example.com) for details.
;
const html = try renderMarkdown(std.testing.allocator, markdown);
defer std.testing.allocator.free(html);
// Verify all major elements rendered
try std.testing.expect(std.mem.indexOf(u8, html, ""
) != null);
try std.testing.expect(std.mem.indexOf(u8, html, ""
) != null);
try std.testing.expect(std.mem.indexOf(u8, html, ""
) != null);
try std.testing.expect(std.mem.indexOf(u8, html, "") != null);
try std.testing.expect(std.mem.indexOf(u8, html, "") != null);
try std.testing.expect(std.mem.indexOf(u8, html, ""
) != null);
try std.testing.expect(std.mem.indexOf(u8, html, "" ) != null);
try std.testing.expect(std.mem.indexOf(u8, html, "<code"
) != null);
try std.testing.expect(std.mem.indexOf(u8, html, ") != null);
}
We use minified output for test comparisons -- no whitespace variations to worry about. The renderMarkdown helper runs the full pipeline (tokenize, parse, render) and returns a heap-allocated string that the test frees when done. std.testing.allocator catches any leaks in the pipeline, which means these tests simultaneously verify both correctness and memory safety.
For a more thorough approach, you could load .md files from a tests/ directory, render them, and compare against corresponding .html expected files:
test "file-based end-to-end" {
const test_dir = "tests/fixtures/";
var dir = try std.fs.cwd().openDir(test_dir, .{ .iterate = true });
defer dir.close();
var iter = dir.iterate();
while (try iter.next()) |entry| {
if (!std.mem.endsWith(u8, entry.name, ".md")) continue;
// Read input .md
const md_file = try dir.openFile(entry.name, .{});
defer md_file.close();
const source = try md_file.readToEndAlloc(std.testing.allocator, 1024 * 1024);
defer std.testing.allocator.free(source);
// Render
const html = try renderMarkdown(std.testing.allocator, source);
defer std.testing.allocator.free(html);
// Read expected .html
var expected_name: [256]u8 = undefined;
const base = entry.name[0 .. entry.name.len - 3];
const expected_path = std.fmt.bufPrint(&expected_name, "{s}.html", .{base}) catch continue;
const exp_file = dir.openFile(expected_path, .{}) catch continue;
defer exp_file.close();
const expected = try exp_file.readToEndAlloc(std.testing.allocator, 1024 * 1024);
defer std.testing.allocator.free(expected);
try std.testing.expectEqualStrings(expected, html);
}
}
This pattern scales beautifully. Every time you find a bug, add a new .md / .html pair to the fixtures directory and the test suite picks it up automatically. No code changes needed for new test cases. It's the same approach that CommonMark (the Markdown specification) uses for its test suite -- thousands of input/expected pairs, one generic test runner.
Updating the build.zig for the final project
The final build.zig ties everything together -- the executable, all three test suites (tokenizer, parser, renderer), and a run step:
const std = @import("std");
pub fn build(b: *std.Build) void {
const target = b.standardTargetOptions(.{});
const optimize = b.standardOptimizeOption(.{});
const exe = b.addExecutable(.{
.name = "zigmd",
.root_source_file = b.path("src/main.zig"),
.target = target,
.optimize = optimize,
});
b.installArtifact(exe);
const run_cmd = b.addRunArtifact(exe);
run_cmd.step.dependOn(b.getInstallStep());
if (b.args) |args| run_cmd.addArgs(args);
const run_step = b.step("run", "Run zigmd");
run_step.dependOn(&run_cmd.step);
// Tests for each module
const modules = [_][]const u8{ "src/tokenizer.zig", "src/parser.zig", "src/renderer.zig" };
const test_step = b.step("test", "Run all tests");
for (modules) |mod| {
const t = b.addTest(.{
.root_source_file = b.path(mod),
.target = target,
.optimize = optimize,
});
test_step.dependOn(&b.addRunArtifact(t).step);
}
}
Now zig build compiles the zigmd binary, zig build run -- input.md runs it, and zig build test executes all test suites across all three modules. The build system handles dependency resolution between modules automatically -- if renderer.zig imports parser.zig which imports tokenizer.zig, all three get compiled when any of them is a root source file.
Wat we geleerd hebben
- Walking an AST with the visitor pattern to emit HTML tags -- enter/leave callbacks map directly to opening/closing tags, so the tree structure guarantees correct nesting without manual tracking
- HTML entity escaping for the five critical characters (
&,<,>,",') and why escaping everything unconditionally is the safest approach - Handling inline elements within block elements by letting the visitor recurse naturally -- bold inside paragraphs, italic inside list items, text inside links, all handled by the same enter/leave logic
- Pretty-printing vs minified output controlled by a simple boolean flag, with
writeIndent()andnewline()methods that become no-ops in minified mode - A CLI that reads from file or stdin and writes to file or stdout, following Unix conventions so it composes naturally with pipes and other tools
- Adding inline CSS with multiline string literals and a
--fullflag for complete HTML5 document generation - End-to-end testing by running the entire tokenizer-parser-renderer pipeline and checking the HTML output, plus file-based fixture tests that scale without code changes
And with that, Project A is done! We built a Markdown-to-HTML converter from scratch across three episodes. The tokenizer breaks text into tokens, the parser builds a tree, the renderer walks the tree to emit HTML, and the CLI wraps it all up in a usable tool. No external dependencies, no magic -- just data structures, algorithms, and the Zig standard library.
The concepts from this project -- tokenizing, parsing, AST construction, tree walking, code generation -- show up everywhere in programming. Compilers, template engines, configuration parsers, syntax highlighters, linters -- they all follow this same pipeline. If you understood how our little Markdown converter works, you understand the skeleton of much larger systems.
De groeten!