Learn Zig Series (#49) - Build a Shell: Built-in Commands
Project D: Build Your Own Shell (3/4)
What will I learn
- You will learn built-in vs external commands: why
cdcan't be external; - You will learn implementing
cdwithstd.posix.chdir; - You will learn implementing
pwdwithstd.fs.cwd(); - You will learn implementing
exportfor environment variables; - You will learn implementing
exitwith proper cleanup; - You will learn implementing
echowith escape sequence support; - You will learn a dispatch table: mapping command names to built-in functions;
- You will learn the command resolution order: built-in first, then PATH search.
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- An installed Zig 0.14+ distribution (download from ziglang.org);
- The ambition to learn Zig programming.
Difficulty
- Intermediate
Curriculum (of the Learn Zig Series):
- Zig Programming Tutorial - ep001 - Intro
- Learn Zig Series (#2) - Hello Zig, Variables and Types
- Learn Zig Series (#3) - Functions and Control Flow
- Learn Zig Series (#4) - Error Handling (Zig's Best Feature)
- Learn Zig Series (#5) - Arrays, Slices, and Strings
- Learn Zig Series (#6) - Structs, Enums, and Tagged Unions
- Learn Zig Series (#7) - Memory Management and Allocators
- Learn Zig Series (#8) - Pointers and Memory Layout
- Learn Zig Series (#9) - Comptime (Zig's Superpower)
- Learn Zig Series (#10) - Project Structure, Modules, and File I/O
- Learn Zig Series (#11) - Mini Project: Building a Step Sequencer
- Learn Zig Series (#12) - Testing and Test-Driven Development
- Learn Zig Series (#13) - Interfaces via Type Erasure
- Learn Zig Series (#14) - Generics with Comptime Parameters
- Learn Zig Series (#15) - The Build System (build.zig)
- Learn Zig Series (#16) - Sentinel-Terminated Types and C Strings
- Learn Zig Series (#17) - Packed Structs and Bit Manipulation
- Learn Zig Series (#18) - Async Concepts and Event Loops
- Learn Zig Series (#18b) - Addendum: Async Returns in Zig 0.16
- Learn Zig Series (#19) - SIMD with @Vector
- Learn Zig Series (#20) - Working with JSON
- Learn Zig Series (#21) - Networking and TCP Sockets
- Learn Zig Series (#22) - Hash Maps and Data Structures
- Learn Zig Series (#23) - Iterators and Lazy Evaluation
- Learn Zig Series (#24) - Logging, Formatting, and Debug Output
- Learn Zig Series (#25) - Mini Project: HTTP Status Checker
- Learn Zig Series (#26) - Writing a Custom Allocator
- Learn Zig Series (#27) - C Interop: Calling C from Zig
- Learn Zig Series (#28) - C Interop: Exposing Zig to C
- Learn Zig Series (#29) - Inline Assembly and Low-Level Control
- Learn Zig Series (#30) - Thread Safety and Atomics
- Learn Zig Series (#31) - Memory-Mapped I/O and Files
- Learn Zig Series (#32) - Compile-Time Reflection with @typeInfo
- Learn Zig Series (#33) - Building a State Machine with Tagged Unions
- Learn Zig Series (#34) - Performance Profiling and Optimization
- Learn Zig Series (#35) - Cross-Compilation and Target Triples
- Learn Zig Series (#36) - Mini Project: CLI Task Runner
- Learn Zig Series (#37) - Markdown to HTML: Tokenizer and Lexer
- Learn Zig Series (#38) - Markdown to HTML: Parser and AST
- Learn Zig Series (#39) - Markdown to HTML: Renderer and CLI
- Learn Zig Series (#40) - Key-Value Store: In-Memory Store
- Learn Zig Series (#41) - Key-Value Store: Write-Ahead Log
- Learn Zig Series (#42) - Key-Value Store: TCP Server
- Learn Zig Series (#43) - Key-Value Store: Client Library and Benchmarks
- Learn Zig Series (#44) - Image Tool: Reading and Writing PPM/BMP
- Learn Zig Series (#45) - Image Tool: Pixel Operations
- Learn Zig Series (#46) - Image Tool: CLI Pipeline
- Learn Zig Series (#47) - Build a Shell: Parsing Commands
- Learn Zig Series (#48) - Build a Shell: Process Spawning
- Learn Zig Series (#49) - Build a Shell: Built-in Commands (this post)
Learn Zig Series (#49) - Build a Shell: Built-in Commands
After two episodes of building our shell -- parsing commands in episode 47 and spawning processes in episode 48 -- we have a shell that can actually run programs. You type ls | grep foo | wc -l and it does the right thing. Pipes work, redirections work, error reporting works. Pretty good for a weekend project.
But there's a category of commands that our current architecture fundamentally cannot handle. Try typing cd /tmp into our shell right now. What happens? The shell forks a child process, the child calls chdir("/tmp"), the child's working directory changes.. and then the child exits. Our shell's working directory hasn't moved at all. The cd happened in a completely separate process and died with it.
This is one of those "aha" moments in systems programming where you realize that the fork/exec model, as powerful as it is, has a fundamnetal limitation: a child process cannot modify its parent's state. The parent's memory, environment, and working directory are all isolated. Which means certain commands MUST be handled by the shell itself, without forking. These are the built-in commands, and today we're building them.
Here we go!
Why built-in commands exist
Every Unix shell has a set of commands that are implemented inside the shell rather than as separate executables. The most obvious ones:
cd-- changes the working directory. Must be built-in becausechdir()only affects the calling process.pwd-- prints the current working directory. Could technically be external (there IS a/usr/bin/pwd) but it's built-in for performance and because it should always reflect the shell's actual cwd, not whatever an external binary thinks.export-- sets environment variables. Must be built-in because environment modifications only propagate to child processes, not to the parent.exit-- terminates the shell. Obviously has to be built-in because you can't exit a process from a child.echo-- prints its arguments. Could be external (and/usr/bin/echoexists) but shells build it in for performance since it's called so frequently.
The pattern is clear: any command that needs to modify the shell's own state (working directory, environment, running status) MUST be built-in. Commands that are just really common (echo, test, [) are built-in for speed -- spawning a new process for every echo in a script would be wasteful.
Having said that, not all shells agree on what should be built-in. Bash has around 60 built-in commands. Zsh has even more. The POSIX standard mandates only a handful. For our shell, we'll implement the essentials: cd, pwd, export, unset, exit, and echo. That covers the cases where being built-in is either mandatory or clearly beneficial.
The dispatch table: how to route commands
Before we implement individual built-ins, we need a mechanism to check if a command is built-in and route it accordingly. This is the dispatch table -- a mapping from command name strings to handler functions.
In Zig, we can use a comptime-generated structure for this. A std.StaticStringMap is perfect -- it maps string keys to values at compile time, giving us O(1) lookups without any runtime allocation:
const std = @import("std");
const posix = std.posix;
const BuiltinFn = *const fn (
args: []const []const u8,
env: *Environment,
stdout: std.fs.File.Writer,
) BuiltinError!BuiltinResult;
const BuiltinResult = enum {
ok,
exit_shell,
};
const BuiltinError = error{
WriteFailed,
ChdirFailed,
GetcwdFailed,
InvalidArgs,
EnvError,
};
const builtin_map = std.StaticStringMap(BuiltinFn).initComptime(.{
.{ "cd", &builtinCd },
.{ "pwd", &builtinPwd },
.{ "export", &builtinExport },
.{ "unset", &builtinUnset },
.{ "exit", &builtinExit },
.{ "echo", &builtinEcho },
});
fn isBuiltin(name: []const u8) bool {
return builtin_map.has(name);
}
fn executeBuiltin(
name: []const u8,
args: []const []const u8,
env: *Environment,
stdout: std.fs.File.Writer,
) BuiltinError!BuiltinResult {
const func = builtin_map.get(name) orelse unreachable;
return func(args, env, stdout);
}
We covered StaticStringMap back in episode 22 when we looked at hash maps. The .initComptime call means the map is built entirely at compile time -- zero runtime cost, zero allocation. The function pointer type BuiltinFn ensures every built-in has the same signature: it receives the arguments, a mutable reference to our environment state, and a writer for output.
The BuiltinResult enum lets us distinguish between normal completions and the special case where the shell should exit. We can't just call std.process.exit() from the exit built-in because we might want to run cleanup code first (closing file descriptors, flushing buffers, etc.).
The Environment struct
Built-in commands like export and cd need to modify shared shell state. We need a struct to hold this state:
const Environment = struct {
vars: std.StringHashMap([]const u8),
cwd_buf: [std.fs.max_path_bytes]u8,
allocator: std.mem.Allocator,
fn init(allocator: std.mem.Allocator) Environment {
return .{
.vars = std.StringHashMap([]const u8).init(allocator),
.cwd_buf = undefined,
.allocator = allocator,
};
}
fn deinit(self: *Environment) void {
var iter = self.vars.iterator();
while (iter.next()) |entry| {
self.allocator.free(entry.key_ptr.*);
self.allocator.free(entry.value_ptr.*);
}
self.vars.deinit();
}
fn setVar(self: *Environment, key: []const u8, value: []const u8) !void {
// Free old value if key already exists
if (self.vars.fetchRemove(key)) |old| {
self.allocator.free(old.key);
self.allocator.free(old.value);
}
const owned_key = try self.allocator.dupe(u8, key);
errdefer self.allocator.free(owned_key);
const owned_value = try self.allocator.dupe(u8, value);
errdefer self.allocator.free(owned_value);
try self.vars.put(owned_key, owned_value);
}
fn getVar(self: *const Environment, key: []const u8) ?[]const u8 {
// Check shell-local vars first, then OS environment
if (self.vars.get(key)) |val| return val;
return std.posix.getenv(key);
}
fn removeVar(self: *Environment, key: []const u8) void {
if (self.vars.fetchRemove(key)) |old| {
self.allocator.free(old.key);
self.allocator.free(old.value);
}
}
};
The Environment struct wraps a string hash map (same type we used in the key-value store, episode 40) and owns all its keys and values. The setVar function handles the case where a variable already exists -- it frees the old key and value before inserting the new ones. The errdefer on the duped strings ensures we don't leak if the put call fails. This is the exact ownership pattern from episode 7.
Notice that getVar checks our local variables first, then falls back to std.posix.getenv() to read from the inherited process environment. This means export FOO=bar will override any existing FOO from the parent process, but variables we haven't explicitly set will still be visible. Real shells do this too -- your shell inherits PATH, HOME, USER etc. from the login process and you can override them with export.
Implementing cd
The cd command is the poster child for built-in commands. Here's the implementation:
fn builtinCd(
args: []const []const u8,
env: *Environment,
stdout: std.fs.File.Writer,
) BuiltinError!BuiltinResult {
_ = stdout;
const target = if (args.len > 0)
args[0]
else
env.getVar("HOME") orelse {
return error.InvalidArgs;
};
// Handle ~ expansion
var path_buf: [std.fs.max_path_bytes]u8 = undefined;
const resolved = if (target.len > 0 and target[0] == '~') blk: {
const home = env.getVar("HOME") orelse return error.ChdirFailed;
if (target.len == 1) {
break :blk home;
}
if (target[1] == '/') {
const rest = target[2..];
const len = home.len + 1 + rest.len;
if (len > path_buf.len) return error.ChdirFailed;
@memcpy(path_buf[0..home.len], home);
path_buf[home.len] = '/';
@memcpy(path_buf[home.len + 1 ..][0..rest.len], rest);
break :blk path_buf[0..len];
}
break :blk target; // ~user syntax not supported
} else target;
// Convert to null-terminated for the syscall
var nt_buf: [std.fs.max_path_bytes + 1]u8 = undefined;
if (resolved.len >= nt_buf.len) return error.ChdirFailed;
@memcpy(nt_buf[0..resolved.len], resolved);
nt_buf[resolved.len] = 0;
const nt_path: [*:0]const u8 = @ptrCast(nt_buf[0..resolved.len :0]);
posix.chdir(nt_path) catch return error.ChdirFailed;
return .ok;
}
A few things worth noting here. When called with no arguments (cd by itself), bash changes to the home directory. We do the same by looking up HOME in the environment. The tilde expansion (~/Documents becomes /home/user/Documents) is something the shell has to do -- the OS doesn't understand ~.
The most interesting part is the null-termination. Zig's posix.chdir expects a sentinel-terminated string (a [*:0]const u8), but our parsed arguments are regular slices without a null terminator. We covered sentinel-terminated types in episode 16 -- the short version is that C APIs (and POSIX syscall wrappers) need that trailing zero byte. We copy the path into a stack buffer with room for the sentinel and cast the pointer.
The posix.chdir() call is what actually does the work. It calls the chdir syscall, which changes the calling process's current working directory. Because this runs inside our shell process (not in a forked child), the directory change persists. That's the whole point.
One thing we're NOT implementing: cd - (change to the previous directory). That requires keeping a OLDPWD variable around and swapping it on every cd call. It's strightforward to add but I'll leave that as something to explore on your own -- the principle is the same as any other environment variable management.
Implementing pwd
pwd is delightfully simple:
fn builtinPwd(
args: []const []const u8,
env: *Environment,
stdout: std.fs.File.Writer,
) BuiltinError!BuiltinResult {
_ = args;
_ = env;
const cwd = std.fs.cwd().realpathAlloc(std.heap.page_allocator, ".") catch
return error.GetcwdFailed;
defer std.heap.page_allocator.free(cwd);
stdout.print("{s}\n", .{cwd}) catch return error.WriteFailed;
return .ok;
}
std.fs.cwd() returns a Dir handle representing the current working directory. We call realpathAlloc on "." to resolve the absolute path, print it, and free the allocation. The page_allocator is fine here since pwd is called infrequently and the allocation is tiny.
You might wonder why we don't just read the PWD environment variable. The reason is that PWD can get out of sync -- if some other part of the code calls chdir() without updating PWD, the environment variable would be stale. Asking the OS directly via realpathAlloc(".") always gives the truth. This is the approach POSIX pwd -P takes (the "physical" directory, resolving all symlinks).
Implementing export
export sets environment variables. The syntax is export NAME=VALUE or just export NAME (which marks an existing variable for export to child processes):
fn builtinExport(
args: []const []const u8,
env: *Environment,
stdout: std.fs.File.Writer,
) BuiltinError!BuiltinResult {
if (args.len == 0) {
// No arguments: print all variables
var iter = env.vars.iterator();
while (iter.next()) |entry| {
stdout.print("export {s}=\"{s}\"\n", .{
entry.key_ptr.*,
entry.value_ptr.*,
}) catch return error.WriteFailed;
}
return .ok;
}
for (args) |arg| {
// Find the '=' separator
if (std.mem.indexOf(u8, arg, "=")) |eq_pos| {
const key = arg[0..eq_pos];
const value = arg[eq_pos + 1 ..];
if (key.len == 0) {
stdout.print("export: invalid variable name\n", .{}) catch
return error.WriteFailed;
continue;
}
// Validate variable name: must start with letter or underscore,
// contain only alphanumeric or underscore
if (!isValidVarName(key)) {
stdout.print("export: '{s}': not a valid identifier\n", .{key}) catch
return error.WriteFailed;
continue;
}
env.setVar(key, value) catch return error.EnvError;
} else {
// export NAME without = -- mark for export (we just acknowledge it)
if (!isValidVarName(arg)) {
stdout.print("export: '{s}': not a valid identifier\n", .{arg}) catch
return error.WriteFailed;
}
// In a full shell, this would mark the variable for
// inclusion in child environments. Our simple implementation
// already passes all vars to children, so nothing to do.
}
}
return .ok;
}
fn isValidVarName(name: []const u8) bool {
if (name.len == 0) return false;
const first = name[0];
if (!std.ascii.isAlphabetic(first) and first != '_') return false;
for (name[1..]) |c| {
if (!std.ascii.isAlphanumeric(c) and c != '_') return false;
}
return true;
}
The variable name validation is important -- POSIX says variable names must start with a letter or underscore, and contain only letters, digits, and underscores. Without validation, a user could do export 123=hello or export foo bar=baz and corrupt the environment. The isValidVarName function is also a good example of Zig's character classification functions from std.ascii -- much cleaner than manual range checks.
When called with no arguments, export lists all currently set variables. This matches bash behavior and is useful for debugging ("what did I actually set?").
The export NAME form (without =) is interesing. In real shells, there's a distinction between shell-local variables (only visible inside the shell) and exported variables (passed to child processes). Our simplified shell doesn't make that distinction -- all variables in our Environment struct are effectively exported. But we accept the syntax without error for compatibility.
Implementing unset
The counterpart to export -- removes a variable from the environment:
fn builtinUnset(
args: []const []const u8,
env: *Environment,
stdout: std.fs.File.Writer,
) BuiltinError!BuiltinResult {
_ = stdout;
for (args) |name| {
env.removeVar(name);
}
return .ok;
}
Short and sweet. unset takes one or more variable names and removes them. No error if the variable doesn't exist -- that's the standard behavior. POSIX says unset of a non-existent variable is a no-op, and I think that's the right call. You don't want scripts to fail just because they tried to clean up a variable that wasn't set.
Implementing exit
exit terminates the shell, optionally with a specific exit code:
fn builtinExit(
args: []const []const u8,
env: *Environment,
stdout: std.fs.File.Writer,
) BuiltinError!BuiltinResult {
_ = env;
if (args.len > 0) {
const code = std.fmt.parseInt(u8, args[0], 10) catch {
stdout.print("exit: {s}: numeric argument required\n", .{args[0]}) catch {};
return .exit_shell;
};
_ = code; // In a full implementation, we'd pass this to std.process.exit
}
return .exit_shell;
}
The .exit_shell result tells our main REPL loop to break out and shut down. In the last episode we had a simple string comparison for exit directly in the REPL -- now we've moved it to the built-in system. The optional exit code (exit 0, exit 1, exit 42) is parsed but in our simplified shell we don't actually propagate it -- a full implementation would store it and pass it to std.process.exit() after cleanup.
The catch {} on the error print is deliberate -- if we can't even write to stdout, there's nothing we can do about it, and we still want to exit the shell. Silently dropping the write error is the right thing here.
Implementing echo
echo is one of those commands that seems trivial but has surprising depth. The basic form just prints its arguments separated by spaces:
fn builtinEcho(
args: []const []const u8,
env: *Environment,
stdout: std.fs.File.Writer,
) BuiltinError!BuiltinResult {
_ = env;
var i: usize = 0;
var newline = true;
var interpret_escapes = false;
// Handle flags
while (i < args.len) {
if (std.mem.eql(u8, args[i], "-n")) {
newline = false;
i += 1;
} else if (std.mem.eql(u8, args[i], "-e")) {
interpret_escapes = true;
i += 1;
} else if (std.mem.eql(u8, args[i], "-E")) {
interpret_escapes = false;
i += 1;
} else {
break;
}
}
// Print arguments
var first = true;
while (i < args.len) : (i += 1) {
if (!first) {
stdout.writeByte(' ') catch return error.WriteFailed;
}
first = false;
if (interpret_escapes) {
writeEscaped(stdout, args[i]) catch return error.WriteFailed;
} else {
stdout.writeAll(args[i]) catch return error.WriteFailed;
}
}
if (newline) {
stdout.writeByte('\n') catch return error.WriteFailed;
}
return .ok;
}
fn writeEscaped(writer: std.fs.File.Writer, input: []const u8) !void {
var j: usize = 0;
while (j < input.len) {
if (input[j] == '\\' and j + 1 < input.len) {
switch (input[j + 1]) {
'n' => {
try writer.writeByte('\n');
j += 2;
},
't' => {
try writer.writeByte('\t');
j += 2;
},
'\\' => {
try writer.writeByte('\\');
j += 2;
},
'r' => {
try writer.writeByte('\r');
j += 2;
},
'a' => {
try writer.writeByte(0x07); // bell
j += 2;
},
'b' => {
try writer.writeByte(0x08); // backspace
j += 2;
},
'0' => {
// Octal escape: \0NNN
var val: u8 = 0;
var k: usize = j + 2;
var digits: u8 = 0;
while (k < input.len and digits < 3) : (k += 1) {
if (input[k] >= '0' and input[k] <= '7') {
val = val * 8 + (input[k] - '0');
digits += 1;
} else break;
}
try writer.writeByte(val);
j = k;
},
else => {
try writer.writeByte('\\');
j += 1;
},
}
} else {
try writer.writeByte(input[j]);
j += 1;
}
}
}
The -n flag suppresses the trailing newline, -e enables interpretation of escape sequences, and -E disables them (the default). This matches GNU coreutils echo behavior. The escape sequence parser handles the common ones: \n, \t, \r, \\, \a (bell), \b (backspace), and \0NNN (octal).
The octal escape is worth looking at. \0 followed by up to three octal digits specifies a byte value. So echo -e "\0101" prints A (octal 101 = decimal 65 = ASCII 'A'). The parsing is careful about counting digits -- at most three, and it stops at the first non-octal character. This kind of manual character-by-character parsing is very typical in systems programming and I think it's something Zig handles particularly cleanly because you have full control over the indexing without any iterator magic getting in the way.
The command resolution order
Now we need to wire the built-in dispatch into our REPL from episode 48. The resolution order is:
- Check if the command is a built-in. If yes, execute it directly.
- Otherwise, spawn it as an external command (the existing
executePipelinecode).
Here's the updated REPL:
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer {
const check = gpa.deinit();
if (check == .leak) std.debug.print("WARNING: memory leak detected\n", .{});
}
const allocator = gpa.allocator();
var env = Environment.init(allocator);
defer env.deinit();
const stdin = std.io.getStdIn().reader();
const stdout = std.io.getStdOut().writer();
try stdout.print("zsh-lite> ", .{});
var line_buf: [4096]u8 = undefined;
while (stdin.readUntilDelimiter(&line_buf, '\n')) |line| {
if (line.len == 0) {
try stdout.print("zsh-lite> ", .{});
continue;
}
const tokens = tokenize(allocator, line) catch |err| {
switch (err) {
error.UnterminatedQuote => try stdout.print("Error: unterminated quote\n", .{}),
error.TrailingEscape => try stdout.print("Error: trailing backslash\n", .{}),
else => try stdout.print("Tokenize error: {}\n", .{err}),
}
try stdout.print("zsh-lite> ", .{});
continue;
};
defer {
for (tokens) |tok| {
if (tok.kind == .word) allocator.free(tok.value);
}
allocator.free(tokens);
}
if (tokens.len == 0) {
try stdout.print("zsh-lite> ", .{});
continue;
}
const pipeline = parsePipeline(allocator, tokens) catch |err| {
switch (err) {
error.EmptyCommand => try stdout.print("Error: empty command\n", .{}),
error.EmptyPipeline => try stdout.print("Error: empty pipeline\n", .{}),
error.MissingRedirectTarget => try stdout.print("Error: redirect needs a filename\n", .{}),
error.TrailingPipe => try stdout.print("Error: trailing pipe\n", .{}),
else => try stdout.print("Parse error: {}\n", .{err}),
}
try stdout.print("zsh-lite> ", .{});
continue;
};
defer pipeline.deinit(allocator);
// Check if single command and built-in
if (pipeline.commands.len == 1) {
const cmd = &pipeline.commands[0];
if (isBuiltin(cmd.program)) {
const result = executeBuiltin(
cmd.program,
cmd.args,
&env,
stdout,
) catch |err| {
switch (err) {
error.ChdirFailed => try stdout.print(
"cd: {s}: No such file or directory\n",
.{if (cmd.args.len > 0) cmd.args[0] else "~"},
),
error.InvalidArgs => try stdout.print(
"{s}: invalid arguments\n",
.{cmd.program},
),
error.GetcwdFailed => try stdout.print(
"pwd: error reading current directory\n",
.{},
),
else => try stdout.print(
"{s}: error\n",
.{cmd.program},
),
}
try stdout.print("zsh-lite> ", .{});
continue;
};
if (result == .exit_shell) break;
try stdout.print("zsh-lite> ", .{});
continue;
}
}
// External command(s) -- use existing pipeline execution
const exit_code = executePipeline(allocator, &pipeline) catch |err| {
switch (err) {
error.CommandNotFound => try stdout.print(
"{s}: command not found\n",
.{pipeline.commands[0].program},
),
error.RedirectFailed => try stdout.print(
"Error: could not open file for redirection\n",
.{},
),
else => try stdout.print("Execution error: {}\n", .{err}),
}
try stdout.print("zsh-lite> ", .{});
continue;
};
if (exit_code != 0) {
try stdout.print("(exit {d})\n", .{exit_code});
}
try stdout.print("zsh-lite> ", .{});
} else |err| {
if (err != error.EndOfStream) {
std.debug.print("Read error: {}\n", .{err});
}
}
try stdout.print("Bye!\n", .{});
}
The key change is the built-in check right after parsing. If the pipeline contains exactly one command and that command name is in our builtin_map, we handle it in-process. Otherwise we fall through to the existing executePipeline code. This is exactly how real shells work -- built-ins take priority, external programs are the fallback.
There's an important subtlety with pipes: echo hello | wc -l has echo as the first command, but it's in a pipeline. Should we use the built-in echo or spawn /usr/bin/echo? The answer is: we spawn the external one, because built-ins in a pipeline would need to write to the pipe fd instead of stdout, and our simple built-in interface doesn't support that. Real shells like bash DO run built-ins in pipelines (by threading the pipe fd through), but that adds complexity we can skip for now. Our pipeline.commands.len == 1 check handles this -- built-ins only fire for standalone commands.
Passing environment to child processes
There's one more piece we need: when we spawn external commands, they should see the variables we've exported. Right now our executePipeline uses the default behavior which inherits the parent's environment. But variables set via export in our shell live in our Environment struct, not in the actual process environment.
To fix this properly, we need to merge our shell variables into the environment block passed to child processes:
fn buildChildEnv(
allocator: std.mem.Allocator,
env: *const Environment,
) !std.process.EnvMap {
var child_env = std.process.EnvMap.init(allocator);
errdefer child_env.deinit();
// Start with inherited environment
const os_env = try std.process.getEnvMap(allocator);
defer @constCast(&os_env).deinit();
var os_iter = os_env.iterator();
while (os_iter.next()) |entry| {
try child_env.put(entry.key_ptr.*, entry.value_ptr.*);
}
// Override with our shell-local variables
var shell_iter = env.vars.iterator();
while (shell_iter.next()) |entry| {
try child_env.put(entry.key_ptr.*, entry.value_ptr.*);
}
return child_env;
}
This merges the inherited OS environment with our shell-local variables. Shell-local variables take priority (they're inserted second, overwriting any duplicates). The resulting EnvMap can be passed to std.process.Child via the .env_map field.
This is the kind of plumbing that's invisble when you use a shell day-to-day, but it's what makes export PATH=$PATH:/my/new/dir actually work. The shell stores the modified PATH, and every time it spawns a child, it passes the updated environment. Without this, the child would see the original, unmodified PATH.
Testing the built-ins
Let's write tests for our built-in commands. Built-ins are easier to test than external commands because there's no process spawning involved -- everything happens in-process.
test "cd changes directory" {
const allocator = std.testing.allocator;
var env = Environment.init(allocator);
defer env.deinit();
// Get current directory before cd
const original = try std.fs.cwd().realpathAlloc(allocator, ".");
defer allocator.free(original);
// cd to /tmp
const args = [_][]const u8{"/tmp"};
const result = try builtinCd(&args, &env, std.io.getStdOut().writer());
try std.testing.expect(result == .ok);
// Verify we're in /tmp
const after = try std.fs.cwd().realpathAlloc(allocator, ".");
defer allocator.free(after);
try std.testing.expectEqualStrings("/tmp", after);
// Restore original directory
var nt_buf: [std.fs.max_path_bytes + 1]u8 = undefined;
@memcpy(nt_buf[0..original.len], original);
nt_buf[original.len] = 0;
const nt: [*:0]const u8 = @ptrCast(nt_buf[0..original.len :0]);
posix.chdir(nt) catch {};
}
test "cd with no args goes to HOME" {
const allocator = std.testing.allocator;
var env = Environment.init(allocator);
defer env.deinit();
try env.setVar("HOME", "/tmp");
const original = try std.fs.cwd().realpathAlloc(allocator, ".");
defer allocator.free(original);
const args = [_][]const u8{};
const result = try builtinCd(&args, &env, std.io.getStdOut().writer());
try std.testing.expect(result == .ok);
const after = try std.fs.cwd().realpathAlloc(allocator, ".");
defer allocator.free(after);
try std.testing.expectEqualStrings("/tmp", after);
// Restore
var nt_buf: [std.fs.max_path_bytes + 1]u8 = undefined;
@memcpy(nt_buf[0..original.len], original);
nt_buf[original.len] = 0;
const nt: [*:0]const u8 = @ptrCast(nt_buf[0..original.len :0]);
posix.chdir(nt) catch {};
}
test "export and getVar" {
const allocator = std.testing.allocator;
var env = Environment.init(allocator);
defer env.deinit();
try env.setVar("MY_VAR", "hello");
const val = env.getVar("MY_VAR");
try std.testing.expect(val != null);
try std.testing.expectEqualStrings("hello", val.?);
}
test "unset removes variable" {
const allocator = std.testing.allocator;
var env = Environment.init(allocator);
defer env.deinit();
try env.setVar("TO_DELETE", "temp");
try std.testing.expect(env.getVar("TO_DELETE") != null);
env.removeVar("TO_DELETE");
// After removal, should fall through to OS env (which probably doesn't have it)
const from_os = std.posix.getenv("TO_DELETE");
try std.testing.expect(env.vars.get("TO_DELETE") == null);
// OS-level check is separate, we just verify it's gone from our map
_ = from_os;
}
test "exit returns exit_shell" {
const allocator = std.testing.allocator;
var env = Environment.init(allocator);
defer env.deinit();
const args = [_][]const u8{};
const result = try builtinExit(&args, &env, std.io.getStdOut().writer());
try std.testing.expect(result == .exit_shell);
}
test "isBuiltin identifies built-in commands" {
try std.testing.expect(isBuiltin("cd"));
try std.testing.expect(isBuiltin("pwd"));
try std.testing.expect(isBuiltin("export"));
try std.testing.expect(isBuiltin("unset"));
try std.testing.expect(isBuiltin("exit"));
try std.testing.expect(isBuiltin("echo"));
try std.testing.expect(!isBuiltin("ls"));
try std.testing.expect(!isBuiltin("grep"));
try std.testing.expect(!isBuiltin("cat"));
}
test "isValidVarName rejects invalid names" {
try std.testing.expect(isValidVarName("FOO"));
try std.testing.expect(isValidVarName("_bar"));
try std.testing.expect(isValidVarName("my_var_2"));
try std.testing.expect(!isValidVarName("123abc"));
try std.testing.expect(!isValidVarName(""));
try std.testing.expect(!isValidVarName("foo-bar"));
try std.testing.expect(!isValidVarName("foo.bar"));
}
test "export with invalid name" {
const allocator = std.testing.allocator;
var env = Environment.init(allocator);
defer env.deinit();
// Should not crash, just print error
const args = [_][]const u8{"123=bad"};
const result = try builtinExport(&args, &env, std.io.getStdOut().writer());
try std.testing.expect(result == .ok);
// Variable should NOT have been set
try std.testing.expect(env.getVar("123") == null);
}
Notice the cd tests save and restore the original directory. This is important for test hygiene -- if a test changes the working directory and crashes, subsequent tests would run in the wrong directory. The defer block with posix.chdir ensures cleanup even on test failure.
The isBuiltin test is a simple sanity check that our StaticStringMap is set up correctly. It costs nothing and catches typos in the dispatch table -- imagine if you typed "cdd" instead of "cd" in the map. Without this test you'd get a confusing "cdd: command not found" error at runtime.
A session with our upgraded shell
Let's see everything working together:
zsh-lite> pwd
/home/user/projects/zsh-lite
zsh-lite> cd /tmp
zsh-lite> pwd
/tmp
zsh-lite> cd ~
zsh-lite> pwd
/home/user
zsh-lite> export GREETING=hello
zsh-lite> export
export GREETING="hello"
zsh-lite> echo $GREETING
$GREETING
zsh-lite> ls | head -3
Desktop
Documents
Downloads
zsh-lite> cd /nonexistent
cd: /nonexistent: No such file or directory
zsh-lite> echo -e "line1\nline2\nline3"
line1
line2
line3
zsh-lite> echo -n "no newline"
no newlinezsh-lite> exit
Bye!
One thing you'll notice: echo $GREETING prints the literal string $GREETING instead of hello. That's because we haven't implemented variable expansion yet -- the tokenizer from episode 47 treats $GREETING as a plain word. Variable expansion is a layer between parsing and execution that substitutes $NAME with the variable's value. We mentioned this as a natural extension at the end of last episode, and now you can see exactly where it would slot in: after tokenization, before command dispatch.
The project so far
Three episodes in, our shell now handles:
- Parsing: tokenization, quoting, escaping, pipes, redirections (ep47)
- Process spawning: fork/exec, pipe plumbing, I/O redirection, PATH resolution (ep48)
- Built-in commands: cd, pwd, export, unset, exit, echo -- with a dispatch table for clean routing (this episode)
- Environment management: shell-local variables, inherited OS env, merge for child processes (this episode)
What's still missing for the final episode: job control and signals. Right now if you press Ctrl+C while a command is running, our entire shell dies. A real shell catches SIGINT and only kills the foreground process. Background jobs (command &), job listing (jobs), foreground/background switching (fg, bg), and signal handling with sigaction -- that's the last piece of the puzzle that turns our tool into something you could actually use as your daily driver shell. Well, almost ;-)
Wat we geleerd hebben
- Why certain commands MUST be built-in:
cd,export, andexitneed to modify the shell's own process state, which child processes cannot do - Using
std.StaticStringMapas a compile-time dispatch table for routing command names to handler functions with zero runtime allocation cost - The
Environmentstruct pattern: wrapping aStringHashMapwith owned keys/values, layered on top ofstd.posix.getenvfor inherited variables posix.chdirfor changing the working directory, including null-termination requirements for the POSIX syscall interface- Tilde expansion (
~/pathto/home/user/path) as a shell responsibility -- the operating system doesn't understand~ - Variable name validation following POSIX rules (letter or underscore first, then alphanumerics and underscores)
- The echo flag parsing pattern:
-nfor no trailing newline,-e/-Efor escape interpretation toggle - Octal escape sequences (
\0NNN) in echo -- manual character-by-character parsing with digit counting - The built-in vs pipeline distinction: standalone built-ins run in-process, but built-ins inside pipelines need special handling (or fall through to external commands)
- Merging shell-local environment variables with the inherited OS environment for child process spawning
De groeten!