Learn Zig Series (#50) - Build a Shell: Job Control and Signals
Project D: Build Your Own Shell (4/4)
What will I learn
- You will learn signal handling: installing signal handlers with
sigactionso Ctrl+C doesn't kill the shell; - You will learn process groups: why every pipeline needs its own group and how
setpgidworks; - You will learn terminal control:
tcsetpgrpand which group owns the terminal at any given moment; - You will learn background processes: parsing the
&suffix and launching jobs without waiting; - You will learn a job table: tracking background jobs by ID, PID, status, and command string;
- You will learn implementing
jobs,fg, andbgbuilt-in commands; - You will learn reaping children: using
waitpidwithWNOHANGto detect completed background jobs; - You will learn the
~/.zigshrcconfig file: reading and executing startup commands.
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- An installed Zig 0.14+ distribution (download from ziglang.org);
- The ambition to learn Zig programming.
Difficulty
- Advanced
Curriculum (of the Learn Zig Series):
- Zig Programming Tutorial - ep001 - Intro
- Learn Zig Series (#2) - Hello Zig, Variables and Types
- Learn Zig Series (#3) - Functions and Control Flow
- Learn Zig Series (#4) - Error Handling (Zig's Best Feature)
- Learn Zig Series (#5) - Arrays, Slices, and Strings
- Learn Zig Series (#6) - Structs, Enums, and Tagged Unions
- Learn Zig Series (#7) - Memory Management and Allocators
- Learn Zig Series (#8) - Pointers and Memory Layout
- Learn Zig Series (#9) - Comptime (Zig's Superpower)
- Learn Zig Series (#10) - Project Structure, Modules, and File I/O
- Learn Zig Series (#11) - Mini Project: Building a Step Sequencer
- Learn Zig Series (#12) - Testing and Test-Driven Development
- Learn Zig Series (#13) - Interfaces via Type Erasure
- Learn Zig Series (#14) - Generics with Comptime Parameters
- Learn Zig Series (#15) - The Build System (build.zig)
- Learn Zig Series (#16) - Sentinel-Terminated Types and C Strings
- Learn Zig Series (#17) - Packed Structs and Bit Manipulation
- Learn Zig Series (#18) - Async Concepts and Event Loops
- Learn Zig Series (#18b) - Addendum: Async Returns in Zig 0.16
- Learn Zig Series (#19) - SIMD with @Vector
- Learn Zig Series (#20) - Working with JSON
- Learn Zig Series (#21) - Networking and TCP Sockets
- Learn Zig Series (#22) - Hash Maps and Data Structures
- Learn Zig Series (#23) - Iterators and Lazy Evaluation
- Learn Zig Series (#24) - Logging, Formatting, and Debug Output
- Learn Zig Series (#25) - Mini Project: HTTP Status Checker
- Learn Zig Series (#26) - Writing a Custom Allocator
- Learn Zig Series (#27) - C Interop: Calling C from Zig
- Learn Zig Series (#28) - C Interop: Exposing Zig to C
- Learn Zig Series (#29) - Inline Assembly and Low-Level Control
- Learn Zig Series (#30) - Thread Safety and Atomics
- Learn Zig Series (#31) - Memory-Mapped I/O and Files
- Learn Zig Series (#32) - Compile-Time Reflection with @typeInfo
- Learn Zig Series (#33) - Building a State Machine with Tagged Unions
- Learn Zig Series (#34) - Performance Profiling and Optimization
- Learn Zig Series (#35) - Cross-Compilation and Target Triples
- Learn Zig Series (#36) - Mini Project: CLI Task Runner
- Learn Zig Series (#37) - Markdown to HTML: Tokenizer and Lexer
- Learn Zig Series (#38) - Markdown to HTML: Parser and AST
- Learn Zig Series (#39) - Markdown to HTML: Renderer and CLI
- Learn Zig Series (#40) - Key-Value Store: In-Memory Store
- Learn Zig Series (#41) - Key-Value Store: Write-Ahead Log
- Learn Zig Series (#42) - Key-Value Store: TCP Server
- Learn Zig Series (#43) - Key-Value Store: Client Library and Benchmarks
- Learn Zig Series (#44) - Image Tool: Reading and Writing PPM/BMP
- Learn Zig Series (#45) - Image Tool: Pixel Operations
- Learn Zig Series (#46) - Image Tool: CLI Pipeline
- Learn Zig Series (#47) - Build a Shell: Parsing Commands
- Learn Zig Series (#48) - Build a Shell: Process Spawning
- Learn Zig Series (#49) - Build a Shell: Built-in Commands
- Learn Zig Series (#50) - Build a Shell: Job Control and Signals (this post)
Learn Zig Series (#50) - Build a Shell: Job Control and Signals
This is it -- the final episode of our shell project. Over the last three episodes we built a parser (episode 47), a process spawner with pipes and redirections (episode 48), and a set of built-in commands (episode 49). Our shell can parse complex command lines, run programs, pipe data between them, and handle cd/export/exit. Pretty decent.
But there's one massive problem: press Ctrl+C while a command is running, and our entire shell dies. That's because SIGINT gets delivered to the entire foreground process group -- and right now our shell and all its children are in the same group. A real shell catches Ctrl+C, kills only the foreground job, and keeps running. That's job control, and it's what separates a toy shell from something you could actually use.
This episode covers the final pieces: signal handling, process groups, terminal ownership, background jobs with &, the jobs/fg/bg built-ins, and a startup config file. We're also going to step back and look at what real shells do that we intentionally skipped. Here we go!
Signal handling: surviving Ctrl+C
The first thing we need is to prevent SIGINT from killing our shell. On Unix, signals are delivered to processes based on their process group membership. When you press Ctrl+C, the terminal sends SIGINT to every process in the foreground process group. If the shell and its children share a group (the default), everyone dies together.
The fix has two parts: (1) put child processes in their own process group, and (2) install a signal handler in the shell that ignores SIGINT (or handles it gracefully).
Let's start with the signal handler. Zig exposes the raw POSIX sigaction interface through std.posix:
const std = @import("std");
const posix = std.posix;
const linux = std.os.linux;
const SignalHandler = struct {
original_sigint: posix.Sigaction,
original_sigtstp: posix.Sigaction,
original_sigchld: posix.Sigaction,
fn install() SignalHandler {
var self: SignalHandler = undefined;
// Ignore SIGINT in the shell process -- children get their own group
var sa_int: posix.Sigaction = .{
.handler = .{ .handler = SignalHandler.handleSigint },
.mask = posix.empty_sigset,
.flags = 0,
};
posix.sigaction(posix.SIG.INT, &sa_int, &self.original_sigint);
// Ignore SIGTSTP (Ctrl+Z) in the shell itself
var sa_tstp: posix.Sigaction = .{
.handler = .{ .handler = posix.SIG.IGN },
.mask = posix.empty_sigset,
.flags = 0,
};
posix.sigaction(posix.SIG.TSTP, &sa_tstp, &self.original_sigtstp);
// Handle SIGCHLD to reap background children
var sa_chld: posix.Sigaction = .{
.handler = .{ .handler = SignalHandler.handleSigchld },
.mask = posix.empty_sigset,
.flags = linux.SA.NOCLDSTOP,
};
posix.sigaction(posix.SIG.CHLD, &sa_chld, &self.original_sigchld);
return self;
}
fn restore(self: *const SignalHandler) void {
posix.sigaction(posix.SIG.INT, &self.original_sigint, null);
posix.sigaction(posix.SIG.TSTP, &self.original_sigtstp, null);
posix.sigaction(posix.SIG.CHLD, &self.original_sigchld, null);
}
fn handleSigint(_: c_int) callconv(.C) void {
// Do nothing -- the shell survives. The foreground child
// (in its own process group) will still receive SIGINT
// from the terminal.
}
fn handleSigchld(_: c_int) callconv(.C) void {
// Reap any finished background children.
// We can't do complex work in a signal handler, so we just
// call waitpid in a loop with WNOHANG to collect zombies.
while (true) {
const result = posix.waitpid(-1, .{ .NOHANG = true, .UNTRACED = true });
if (result.pid <= 0) break;
}
}
};
Three signals, three handlers:
SIGINT (Ctrl+C): Our handler does nothing. The shell ignores it. The foreground child process will still receive SIGINT because it's in a different process group that the terminal targets directly. We'll set that up in the next section.
SIGTSTP (Ctrl+Z): Ignored in the shell. In a full implementation, Ctrl+Z should suspend the foreground job and return control to the shell. We're simplifying here -- we ignore it in the shell and let the child handle it (getting stopped).
SIGCHLD: Delivered when a child process changes state (exits, gets stopped, gets continued). We use this to reap zombie processes. The SA.NOCLDSTOP flag means we only get notified on termination, not on stop/continue -- keeps things simpler.
The handleSigchld function calls waitpid(-1, WNOHANG) in a loop. The -1 means "any child process", and WNOHANG means "don't block if no child has exited". Each successful waitpid collects one zombie. We loop until there are no more to collect. This is a common pattern -- you need the loop because multiple children might have exited between signal deliveries (signals don't queue on most Unix systems).
One critical thing about signal handlers in Zig (and C): you can only call async-signal-safe functions inside them. That means no allocation, no std.debug.print, no mutex locking. The waitpid syscall is safe, which is why we do the reaping right in the handler. For anything more complex (like updating our job table), we'd use a self-pipe trick or just check in the main loop. We covered the threading and atomics side of this in episode 30.
Process groups: isolating children from the shell
A process group is a collection of processes that can be signaled together. Every process belongs to exactly one process group, identified by a PGID (process group ID). When the terminal driver sends SIGINT (Ctrl+C) or SIGTSTP (Ctrl+Z), it sends them to all processes in the foreground process group of the terminal.
The key insight: if we put each pipeline's processes in their own group, signals from the keyboard will target ONLY that pipeline, leaving our shell untouched.
Here's how to set up process groups when spawning children:
const JobId = u16;
const ProcessGroup = struct {
pgid: posix.pid_t,
fn create(first_child_pid: posix.pid_t) ProcessGroup {
// The first child's PID becomes the PGID for the group
return .{ .pgid = first_child_pid };
}
fn addProcess(self: *const ProcessGroup, pid: posix.pid_t) void {
// Move this process into our group
// We call this from the parent after fork, AND the child calls
// it on itself -- belt-and-suspenders to avoid race conditions
posix.setpgid(pid, self.pgid) catch {};
}
};
fn spawnInGroup(
allocator: std.mem.Allocator,
argv: []const []const u8,
group: ?*ProcessGroup,
stdin_fd: ?posix.fd_t,
stdout_fd: ?posix.fd_t,
) !posix.pid_t {
const pid = try posix.fork();
if (pid == 0) {
// Child process
// Set our process group
if (group) |g| {
posix.setpgid(0, g.pgid) catch {};
} else {
// First process in pipeline -- become our own group leader
posix.setpgid(0, 0) catch {};
}
// Wire up stdin/stdout
if (stdin_fd) |fd| {
posix.dup2(fd, posix.STDIN_FILENO) catch std.process.exit(126);
posix.close(fd);
}
if (stdout_fd) |fd| {
posix.dup2(fd, posix.STDOUT_FILENO) catch std.process.exit(126);
posix.close(fd);
}
// Reset signal handlers to default for the child
var sa_default: posix.Sigaction = .{
.handler = .{ .handler = posix.SIG.DFL },
.mask = posix.empty_sigset,
.flags = 0,
};
posix.sigaction(posix.SIG.INT, &sa_default, null);
posix.sigaction(posix.SIG.TSTP, &sa_default, null);
// Exec the program
const argv_z = allocator.allocSentinel(?[*:0]const u8, argv.len, null) catch std.process.exit(126);
for (argv, 0..) |arg, i| {
argv_z[i] = (allocator.dupeZ(u8, arg) catch std.process.exit(126)).ptr;
}
const err = posix.execvpeZ(argv_z[0].?, argv_z, std.c.environ);
_ = err;
std.process.exit(127); // exec failed
}
// Parent process
if (group) |g| {
g.addProcess(pid);
} else {
// First child -- create the group with this PID
posix.setpgid(pid, pid) catch {};
}
return pid;
}
The double setpgid call (once in the child, once in the parent) is intentional. There's a race condition: the parent might try to set the child's group before the child has started, or the child might run execve before the parent gets to setpgid. By doing it in both places, whoever runs first succeeds and the second call is a harmless no-op. This double-call pattern is standard in shell implementations -- you'll find it in bash source code too.
Notice that the child resets SIGINT and SIGTSTP to SIG.DFL (default behavior) before exec. This is important because our shell set those to be ignored, and signal dispositions are inherited across fork. If we didn't reset them, the child process would ALSO ignore Ctrl+C, which defeats the purpose. The child should die on SIGINT -- that's how the user cancels commands.
The execvpeZ call at the end does PATH resolution and replaces the child's memory with the target program. If it fails (program not found, permission denied), we exit with code 127 -- the convention for "command not found" that shells use. We can't return an error here because after fork() we're in a completely separate process.
Terminal control: who owns the TTY?
Process groups alone aren't enough. The terminal has a concept of its foreground process group -- the group that receives keyboard-generated signals AND can read from the terminal. Only one group can be the foreground group at a time. All other groups are "background" and will get SIGTTIN/SIGTTOU if they try to read/write the terminal.
When we spawn a foreground job, we need to:
- Put it in its own process group (done above)
- Give that group ownership of the terminal (
tcsetpgrp) - Wait for the job to finish
- Take back terminal ownership
const Terminal = struct {
tty_fd: posix.fd_t,
shell_pgid: posix.pid_t,
fn init() Terminal {
const tty_fd = posix.open("/dev/tty", .{ .ACCMODE = .RDWR }, 0) catch
std.io.getStdIn().handle; // Fallback to stdin
const shell_pgid = posix.getpgrp();
return .{
.tty_fd = tty_fd,
.shell_pgid = shell_pgid,
};
}
fn giveToGroup(self: *const Terminal, pgid: posix.pid_t) void {
// Make this process group the foreground group of our terminal
posix.tcsetpgrp(self.tty_fd, pgid) catch {};
}
fn takeBack(self: *const Terminal) void {
// Shell takes back terminal ownership
posix.tcsetpgrp(self.tty_fd, self.shell_pgid) catch {};
}
};
The tcsetpgrp call is the key. Before we wait for a foreground child, we give the terminal to the child's group. After the child finishes (or is stopped with Ctrl+Z), we take it back. If we forget to take it back, the shell won't be able to read input and the user gets a frozen terminal.
Here's the updated flow for running a foreground command:
fn runForegroundJob(
self: *Shell,
pipeline: *const Pipeline,
) !u8 {
const pids = try self.spawnPipeline(pipeline, false);
defer self.allocator.free(pids);
if (pids.len == 0) return 0;
// The first child's PID is the process group ID
const pgid = pids[0];
// Give terminal to the child group
self.terminal.giveToGroup(pgid);
// Wait for all processes in the pipeline
var last_status: u8 = 0;
for (pids) |pid| {
const result = posix.waitpid(pid, .{});
if (result.status) |status| {
switch (status) {
.exited => |code| last_status = code,
.signal => |sig| {
last_status = 128 + @as(u8, @truncate(sig));
// Print signal info for non-SIGINT signals
if (sig != posix.SIG.INT) {
const stdout = std.io.getStdOut().writer();
stdout.print("\n[Terminated by signal {d}]\n", .{sig}) catch {};
} else {
// Print newline after ^C
const stdout = std.io.getStdOut().writer();
stdout.print("\n", .{}) catch {};
}
},
.stopped => |sig| {
_ = sig;
// Job was stopped (Ctrl+Z) -- add to job table
self.job_table.addStopped(pgid, pipeline) catch {};
last_status = 148; // 128 + SIGTSTP(20)
},
else => last_status = 1,
}
}
}
// Take back the terminal
self.terminal.takeBack();
return last_status;
}
The .stopped case is where Ctrl+Z kicks in. When a child receives SIGTSTP, it stops (suspends), and waitpid returns with a "stopped" status. At that point we add the job to our job table as a stopped background job and return control to the shell. The user can later resume it with fg or bg.
Background processes: parsing & and launching jobs
A command ending in & should run in the background. The shell doesn't wait for it -- it just prints a job ID and prompts for the next command. We need to detect the & in our parser and adjust behavior accordingly.
First, the parser change. We extend our Pipeline struct from episode 47:
const Pipeline = struct {
commands: []Command,
background: bool, // true if line ends with &
fn deinit(self: *const Pipeline, allocator: std.mem.Allocator) void {
for (self.commands) |cmd| cmd.deinit(allocator);
allocator.free(self.commands);
}
};
fn parsePipeline(allocator: std.mem.Allocator, tokens: []const Token) !Pipeline {
var background = false;
// Check if last token is '&'
var effective_tokens = tokens;
if (tokens.len > 0 and tokens[tokens.len - 1].kind == .ampersand) {
background = true;
effective_tokens = tokens[0 .. tokens.len - 1];
}
// ... rest of existing parser logic using effective_tokens ...
const commands = try parseCommands(allocator, effective_tokens);
return .{
.commands = commands,
.background = background,
};
}
Simple -- if the last token is &, strip it off and set the background flag. The rest of the parser doesn't need to change at all.
Now the execution side. Background jobs don't get the terminal and we don't wait for them:
fn runBackgroundJob(
self: *Shell,
pipeline: *const Pipeline,
) !void {
const pids = try self.spawnPipeline(pipeline, true);
defer self.allocator.free(pids);
if (pids.len == 0) return;
const pgid = pids[0];
const job_id = try self.job_table.addRunning(pgid, pids, pipeline);
const stdout = std.io.getStdOut().writer();
stdout.print("[{d}] {d}\n", .{ job_id, pgid }) catch {};
}
That's it. We spawn the pipeline (without giving it the terminal), add it to the job table, print the [1] 12345 style notification that every Unix user recognizes, and return immediately to the prompt. The job runs asynchronously and we'll hear about it when it finishes (via our SIGCHLD handler or periodic checks).
The job table: tracking background work
The job table is the data structure that tracks all background jobs. Each entry has a job ID (the [1], [2] numbers users see), the process group ID, individual PIDs, the current status, and a string representation of the command for display:
const JobStatus = enum {
running,
stopped,
done,
};
const Job = struct {
id: JobId,
pgid: posix.pid_t,
pids: []posix.pid_t,
status: JobStatus,
command_str: []const u8,
};
const JobTable = struct {
jobs: std.ArrayList(Job),
next_id: JobId,
allocator: std.mem.Allocator,
fn init(allocator: std.mem.Allocator) JobTable {
return .{
.jobs = std.ArrayList(Job).init(allocator),
.next_id = 1,
.allocator = allocator,
};
}
fn deinit(self: *JobTable) void {
for (self.jobs.items) |job| {
self.allocator.free(job.pids);
self.allocator.free(job.command_str);
}
self.jobs.deinit();
}
fn addRunning(
self: *JobTable,
pgid: posix.pid_t,
pids: []const posix.pid_t,
pipeline: *const Pipeline,
) !JobId {
const id = self.next_id;
self.next_id += 1;
const owned_pids = try self.allocator.dupe(posix.pid_t, pids);
errdefer self.allocator.free(owned_pids);
const cmd_str = try pipelineToString(self.allocator, pipeline);
errdefer self.allocator.free(cmd_str);
try self.jobs.append(.{
.id = id,
.pgid = pgid,
.pids = owned_pids,
.status = .running,
.command_str = cmd_str,
});
return id;
}
fn addStopped(
self: *JobTable,
pgid: posix.pid_t,
pipeline: *const Pipeline,
) !JobId {
const id = self.next_id;
self.next_id += 1;
const pids = try self.allocator.alloc(posix.pid_t, 1);
pids[0] = pgid; // Simplified -- just track the group leader
const cmd_str = try pipelineToString(self.allocator, pipeline);
errdefer self.allocator.free(cmd_str);
try self.jobs.append(.{
.id = id,
.pgid = pgid,
.pids = pids,
.status = .stopped,
.command_str = cmd_str,
});
return id;
}
fn findById(self: *JobTable, id: JobId) ?*Job {
for (self.jobs.items) |*job| {
if (job.id == id) return job;
}
return null;
}
fn findMostRecent(self: *JobTable) ?*Job {
if (self.jobs.items.len == 0) return null;
// Return the last added job (most recent)
return &self.jobs.items[self.jobs.items.len - 1];
}
fn removeCompleted(self: *JobTable) void {
var i: usize = 0;
while (i < self.jobs.items.len) {
if (self.jobs.items[i].status == .done) {
const job = self.jobs.orderedRemove(i);
self.allocator.free(job.pids);
self.allocator.free(job.command_str);
} else {
i += 1;
}
}
}
};
fn pipelineToString(allocator: std.mem.Allocator, pipeline: *const Pipeline) ![]const u8 {
var result = std.ArrayList(u8).init(allocator);
errdefer result.deinit();
for (pipeline.commands, 0..) |cmd, i| {
if (i > 0) try result.appendSlice(" | ");
try result.appendSlice(cmd.program);
for (cmd.args) |arg| {
try result.append(' ');
try result.appendSlice(arg);
}
}
if (pipeline.background) try result.appendSlice(" &");
return try result.toOwnedSlice();
}
The pipelineToString function reconstructs a human-readable command string from the parsed pipeline. This is what gets displayed in jobs output. We store a copy because the original tokens will be freed after the command line is processed.
The removeCompleted function cleans up finished jobs from the table. We call it periodicaly (before printing the prompt is a good place) so the table doesn't grow unbounded.
Reaping background jobs: checking who's done
Between prompts, we need to check if any background jobs have finished. The SIGCHLD handler does basic reaping (preventing zombies), but we also need to update our job table. Here's a function to call before each prompt:
fn checkBackgroundJobs(self: *Shell) void {
const stdout = std.io.getStdOut().writer();
for (self.job_table.jobs.items) |*job| {
if (job.status == .done) continue;
// Try non-blocking wait on the group leader
const result = posix.waitpid(job.pgid, .{ .NOHANG = true });
if (result.pid > 0) {
// Process has finished
job.status = .done;
stdout.print("[{d}] Done {s}\n", .{
job.id,
job.command_str,
}) catch {};
} else if (result.pid == 0) {
// Still running, do nothing
}
}
self.job_table.removeCompleted();
}
We call waitpid with WNOHANG for each job's group leader. If it returns a positive PID, the job has finished. We mark it done, print the notification (same format as bash: [1] Done sleep 10), and clean up. If waitpid returns 0, the process is still running.
This check-and-notify approach is how every shell works. You run sleep 30 &, type a few more commands, and eventally see the [1] Done message appear just before a prompt. That's this function running.
Implementing jobs, fg, and bg
Now we wire up the built-in commands that let the user interact with the job table. These integrate into the dispatch table from episode 49:
fn builtinJobs(
args: []const []const u8,
shell: *Shell,
stdout: std.fs.File.Writer,
) BuiltinError!BuiltinResult {
_ = args;
for (shell.job_table.jobs.items) |job| {
const status_str = switch (job.status) {
.running => "Running",
.stopped => "Stopped",
.done => "Done",
};
stdout.print("[{d}] {s: <20}{s}\n", .{
job.id,
status_str,
job.command_str,
}) catch return error.WriteFailed;
}
return .ok;
}
fn builtinFg(
args: []const []const u8,
shell: *Shell,
stdout: std.fs.File.Writer,
) BuiltinError!BuiltinResult {
// Find the job -- either by ID (%1, %2) or most recent
const job = if (args.len > 0) blk: {
const id_str = if (args[0][0] == '%') args[0][1..] else args[0];
const id = std.fmt.parseInt(JobId, id_str, 10) catch
return error.InvalidArgs;
break :blk shell.job_table.findById(id) orelse
return error.InvalidArgs;
} else blk: {
break :blk shell.job_table.findMostRecent() orelse
return error.InvalidArgs;
};
stdout.print("{s}\n", .{job.command_str}) catch {};
// Give terminal to the job's process group
shell.terminal.giveToGroup(job.pgid);
// If stopped, send SIGCONT to resume it
if (job.status == .stopped) {
posix.kill(-(job.pgid), posix.SIG.CONT) catch {};
job.status = .running;
}
// Wait for it in the foreground
const result = posix.waitpid(job.pgid, .{ .UNTRACED = true });
if (result.status) |status| {
switch (status) {
.stopped => {
job.status = .stopped;
stdout.print("\n[{d}] Stopped {s}\n", .{
job.id, job.command_str,
}) catch {};
},
.exited, .signal => {
job.status = .done;
},
else => job.status = .done,
}
}
// Take back terminal
shell.terminal.takeBack();
return .ok;
}
fn builtinBg(
args: []const []const u8,
shell: *Shell,
stdout: std.fs.File.Writer,
) BuiltinError!BuiltinResult {
const job = if (args.len > 0) blk: {
const id_str = if (args[0][0] == '%') args[0][1..] else args[0];
const id = std.fmt.parseInt(JobId, id_str, 10) catch
return error.InvalidArgs;
break :blk shell.job_table.findById(id) orelse
return error.InvalidArgs;
} else blk: {
break :blk shell.job_table.findMostRecent() orelse
return error.InvalidArgs;
};
if (job.status != .stopped) {
stdout.print("bg: job {d} already running\n", .{job.id}) catch {};
return .ok;
}
// Send SIGCONT but don't give it the terminal -- it runs in background
posix.kill(-(job.pgid), posix.SIG.CONT) catch {};
job.status = .running;
stdout.print("[{d}] {s} &\n", .{ job.id, job.command_str }) catch {};
return .ok;
}
The fg command is the most complex. It does three things: (1) give the terminal to the job's process group, (2) send SIGCONT if the job was stopped, and (3) wait for the job to finish or get stopped again. When waitpid returns "stopped", we put the job back in the table and take the terminal back. When it returns "exited" or "signal", the job is done.
The negative PID in posix.kill(-(job.pgid), posix.SIG.CONT) is how you send a signal to an entire process group. A negative PID argument to kill() targets the group whose PGID is the absolute value. So if the group has 3 processes (a pipeline), all three get SIGCONT and resume together.
The bg command is simpler -- it just sends SIGCONT without giving the terminal. The job resumes executing in the background. If it tries to read from stdin, it'll get SIGTTIN and stop again (since it's not the foreground group), but most commands you'd run in the background either don't need stdin or have already read everything they need.
The startup config file: ~/.zigshrc
Real shells read a config file at startup. Bash has .bashrc, Zsh has .zshrc -- we'll have .zigshrc. This lets the user set environment variables, define aliases, or run any commands they want on every shell start.
The implementation is straightforward -- read the file line by line and execute each line as if the user typed it:
fn loadRcFile(self: *Shell) void {
const home = std.posix.getenv("HOME") orelse return;
var path_buf: [std.fs.max_path_bytes]u8 = undefined;
const rc_path = std.fmt.bufPrint(&path_buf, "{s}/.zigshrc", .{home}) catch return;
const file = std.fs.cwd().openFile(rc_path, .{}) catch return;
defer file.close();
const reader = file.reader();
var line_buf: [4096]u8 = undefined;
while (reader.readUntilDelimiter(&line_buf, '\n')) |line| {
// Skip empty lines and comments
const trimmed = std.mem.trim(u8, line, " \t");
if (trimmed.len == 0) continue;
if (trimmed[0] == '#') continue;
// Execute the line as a command
self.executeLine(trimmed) catch {};
} else |err| {
if (err != error.EndOfStream) return;
}
}
We silently ignore errors -- if .zigshrc doesn't exist, that's fine. If a line in it fails, we skip it and continue. This matches how real shells handle rc files -- you don't want a typo in your config to prevent the shell from starting.
A typical .zigshrc might look like:
# Set up environment
export EDITOR=vim
export PATH=/usr/local/bin:/usr/bin:/bin
# Greeting
echo Welcome to zsh-lite!
We call loadRcFile right after initializing the shell, before entering the main REPL loop. The export lines create variables in our Environment struct, and the echo just prints a greeting. Simple, effective.
Putting it all together: the Shell struct
Let's look at the complete shell structure that ties everything together:
const Shell = struct {
allocator: std.mem.Allocator,
env: Environment,
job_table: JobTable,
terminal: Terminal,
signal_handler: SignalHandler,
should_exit: bool,
last_exit_code: u8,
fn init(allocator: std.mem.Allocator) Shell {
var self = Shell{
.allocator = allocator,
.env = Environment.init(allocator),
.job_table = JobTable.init(allocator),
.terminal = Terminal.init(),
.signal_handler = SignalHandler.install(),
.should_exit = false,
.last_exit_code = 0,
};
// Ensure the shell is its own process group leader
posix.setpgid(0, 0) catch {};
// Take control of the terminal
self.terminal.takeBack();
// Load config
self.loadRcFile();
return self;
}
fn deinit(self: *Shell) void {
self.signal_handler.restore();
self.job_table.deinit();
self.env.deinit();
}
fn run(self: *Shell) void {
const stdin = std.io.getStdIn().reader();
const stdout = std.io.getStdOut().writer();
var line_buf: [4096]u8 = undefined;
while (!self.should_exit) {
// Check for completed background jobs
self.checkBackgroundJobs();
// Print prompt
stdout.print("zsh-lite> ", .{}) catch break;
// Read line
const line = stdin.readUntilDelimiter(&line_buf, '\n') catch |err| {
if (err == error.EndOfStream) break;
continue;
};
if (line.len == 0) continue;
self.executeLine(line) catch {};
}
stdout.print("Bye!\n", .{}) catch {};
}
fn executeLine(self: *Shell, line: []const u8) !void {
const tokens = try tokenize(self.allocator, line);
defer {
for (tokens) |tok| {
if (tok.kind == .word) self.allocator.free(tok.value);
}
self.allocator.free(tokens);
}
if (tokens.len == 0) return;
const pipeline = try parsePipeline(self.allocator, tokens);
defer pipeline.deinit(self.allocator);
// Check for built-in (single command, not backgrounded)
if (pipeline.commands.len == 1 and !pipeline.background) {
const cmd = &pipeline.commands[0];
if (isBuiltin(cmd.program)) {
// ... built-in dispatch (same as episode 49)
return;
}
}
// External command
if (pipeline.background) {
try self.runBackgroundJob(&pipeline);
} else {
self.last_exit_code = try self.runForegroundJob(&pipeline);
}
}
// ... spawnPipeline, runForegroundJob, runBackgroundJob, etc.
};
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer {
const check = gpa.deinit();
if (check == .leak) std.debug.print("WARNING: memory leak detected\n", .{});
}
var shell = Shell.init(gpa.allocator());
defer shell.deinit();
shell.run();
}
The initialization order matters: we install signal handlers first (so we're protected from SIGINT immediately), set ourselves as process group leader (so we can control the terminal), take terminal ownership, and then load the rc file. The defer shell.deinit() ensures we restore original signal handlers on exit -- good citizenship.
A complete session
Let's see the finished shell in action:
$ ./zsh-lite
Welcome to zsh-lite!
zsh-lite> sleep 30 &
[1] 54321
zsh-lite> sleep 60 &
[2] 54322
zsh-lite> jobs
[1] Running sleep 30 &
[2] Running sleep 60 &
zsh-lite> fg %1
sleep 30
^C
zsh-lite> jobs
[2] Running sleep 60 &
zsh-lite> ls | grep src | head -5
src
zsh-lite> sleep 10
^Z
[3] Stopped sleep 10
zsh-lite> bg %3
[3] sleep 10 &
zsh-lite> jobs
[2] Running sleep 60 &
[3] Running sleep 10 &
zsh-lite> exit
Bye!
Background jobs work, Ctrl+C kills only the foreground job, Ctrl+Z stops a job and lets you resume it with bg. This is a REAL shell -- not production-ready, but functionally complete for learning purposes.
Project retrospective: what real shells do that we skipped
Four episodes, roughly 1500 lines of Zig, and we have a working shell with parsing, process spawning, pipes, redirections, built-ins, environment management, job control, and signal handling. That's a lot. But real shells like bash, zsh, and fish have hundreds of thousands of lines of code. What did we skip?
Things we skipped that matter:
- Variable expansion:
$HOME,$PATH,$?(last exit code),$$(shell PID),$!(last background PID). This is a substantial subsystem -- you need to expand variables in words, handle${var:-default}syntax, array variables, etc. - Globbing:
*.txt,src/**/*.zig,file[0-9].log. Real shells expand wildcards before executing commands. This requires walking the filesystem with pattern matching. - Command substitution:
$(command)or backtick syntax. Running a command and inserting its output as text in another command. - Here documents:
cat <<EOF ... EOF. Multi-line input redirection from the script itself. - Aliases and functions: User-defined command shortcuts and shell functions with local variables.
- History: Command history with up/down arrows,
!!,!$,!-2, history search. Requires a line editor (readline or equivalent). - Line editing: Cursor movement, editing in place, tab completion. This alone is a massive project -- look at linenoise or GNU readline.
- Conditional execution:
&&,||,if/then/else/fi,while/do/done. A full scripting language. - Quoting edge cases: The interaction between single quotes, double quotes, backslashes, and variable expansion is fiendishly complex. Our parser handles the basics but a POSIX-compliant implementation has dozens of edge cases.
- Subshells:
(command; command)runs commands in a subshell (forked copy of the shell).
Things we intentionally simplified:
- Our signal handler does basic reaping but doesn't handle all race conditions perfectly. A production shell needs very careful handling of the SIGCHLD/waitpid interaction.
- We track only the group leader PID for stopped jobs. A proper implementation tracks every process in the pipeline.
- Our
fgonly waits on the group leader. If the pipeline has 3 processes, we should wait for all of them. - We don't implement
OLDPWDforcd -. - We don't handle
SIGTTIN/SIGTTOU(background processes trying to access the terminal). - Error messages could be more specific and match POSIX conventions.
What we DID build well:
- A tokenizer that correctly handles quoting, escaping, and operator detection
- A pipe plumbing system that avoids deadlocks by closing unused fd ends
- A clean built-in dispatch table using
StaticStringMap - Proper memory management with no leaks (verified by the GPA)
- Process group separation so signals reach the right targets
- A job table with fg/bg/jobs support
If you want to push this further, variable expansion is the most rewarding next step. It touches the parser (recognizing $ in tokens), needs a new expansion pass between parsing and execution, and makes the shell feel dramatically more useful. After that, globbing and command substitution are the features that turn a shell from "educational toy" to "I could actually use this for simple scripts".
But even without those features -- congratulations. You've built something that actually works at a systems level. You understand how fork/exec creates processes, how pipes thread data between them, how signals control process lifecycle, and how terminal ownership determines who gets keyboard input. That knowledge transfers directly to any systems programming you do in the future, regardless of language.
Wat we geleerd hebben
- Signal handling with
sigaction: installing custom handlers for SIGINT, SIGTSTP, and SIGCHLD, with proper async-signal-safety constraints - Process groups with
setpgid: isolating child pipelines into their own groups so keyboard signals target only the foreground job - Terminal control with
tcsetpgrp: transferring terminal ownership between the shell and foreground jobs, then taking it back after the job finishes or stops - Background job launching: detecting
&in the parser, spawning without waiting, and printing the[N] PIDnotification - The job table: tracking active jobs by ID, PGID, status, and command string, with cleanup of completed entries
- Non-blocking child reaping: using
waitpidwithWNOHANGto detect finished background jobs without blocking the shell - The
fgworkflow: give terminal to job group, send SIGCONT if stopped, wait for termination or re-stop, take terminal back - The
bgworkflow: send SIGCONT without terminal transfer, job continues in background - RC file loading: reading
~/.zigshrcline by line and executing each as a command, silently handling missing files - The full architecture: Shell struct combining allocator, environment, job table, terminal state, signal handlers, and the REPL loop
And with that, Project D is complete. Fifty episodes of Zig -- from "hello world" to building a shell with job control. Next time we're starting something completely different: an HTTP server from scratch. Parsing requests, routing, serving static files, middleware -- a whole new systems domain where Zig's performance and control really shine ;-)
Thanks for reading!