Learn Zig Series (#50) - Build a Shell: Job Control and Signals

Project D: Build Your Own Shell (4/4)

What will I learn

You will learn signal handling: installing signal handlers with sigaction so Ctrl+C doesn't kill the shell;
You will learn process groups: why every pipeline needs its own group and how setpgid works;
You will learn terminal control: tcsetpgrp and which group owns the terminal at any given moment;
You will learn background processes: parsing the & suffix and launching jobs without waiting;
You will learn a job table: tracking background jobs by ID, PID, status, and command string;
You will learn implementing jobs, fg, and bg built-in commands;
You will learn reaping children: using waitpid with WNOHANG to detect completed background jobs;
You will learn the ~/.zigshrc config file: reading and executing startup commands.

Requirements

A working modern computer running macOS, Windows or Ubuntu;
An installed Zig 0.14+ distribution (download from ziglang.org);
The ambition to learn Zig programming.

Difficulty

Advanced

Curriculum (of the `Learn Zig Series`):

Learn Zig Series (#50) - Build a Shell: Job Control and Signals

This is it -- the final episode of our shell project. Over the last three episodes we built a parser (episode 47), a process spawner with pipes and redirections (episode 48), and a set of built-in commands (episode 49). Our shell can parse complex command lines, run programs, pipe data between them, and handle cd/export/exit. Pretty decent.

But there's one massive problem: press Ctrl+C while a command is running, and our entire shell dies. That's because SIGINT gets delivered to the entire foreground process group -- and right now our shell and all its children are in the same group. A real shell catches Ctrl+C, kills only the foreground job, and keeps running. That's job control, and it's what separates a toy shell from something you could actually use.

This episode covers the final pieces: signal handling, process groups, terminal ownership, background jobs with &, the jobs/fg/bg built-ins, and a startup config file. We're also going to step back and look at what real shells do that we intentionally skipped. Here we go!

Signal handling: surviving Ctrl+C

The first thing we need is to prevent SIGINT from killing our shell. On Unix, signals are delivered to processes based on their process group membership. When you press Ctrl+C, the terminal sends SIGINT to every process in the foreground process group. If the shell and its children share a group (the default), everyone dies together.

The fix has two parts: (1) put child processes in their own process group, and (2) install a signal handler in the shell that ignores SIGINT (or handles it gracefully).

Let's start with the signal handler. Zig exposes the raw POSIX sigaction interface through std.posix:

const std = @import("std");
const posix = std.posix;
const linux = std.os.linux;

const SignalHandler = struct {
    original_sigint: posix.Sigaction,
    original_sigtstp: posix.Sigaction,
    original_sigchld: posix.Sigaction,

    fn install() SignalHandler {
        var self: SignalHandler = undefined;

        // Ignore SIGINT in the shell process -- children get their own group
        var sa_int: posix.Sigaction = .{
            .handler = .{ .handler = SignalHandler.handleSigint },
            .mask = posix.empty_sigset,
            .flags = 0,
        };
        posix.sigaction(posix.SIG.INT, &sa_int, &self.original_sigint);

        // Ignore SIGTSTP (Ctrl+Z) in the shell itself
        var sa_tstp: posix.Sigaction = .{
            .handler = .{ .handler = posix.SIG.IGN },
            .mask = posix.empty_sigset,
            .flags = 0,
        };
        posix.sigaction(posix.SIG.TSTP, &sa_tstp, &self.original_sigtstp);

        // Handle SIGCHLD to reap background children
        var sa_chld: posix.Sigaction = .{
            .handler = .{ .handler = SignalHandler.handleSigchld },
            .mask = posix.empty_sigset,
            .flags = linux.SA.NOCLDSTOP,
        };
        posix.sigaction(posix.SIG.CHLD, &sa_chld, &self.original_sigchld);

        return self;
    }

    fn restore(self: *const SignalHandler) void {
        posix.sigaction(posix.SIG.INT, &self.original_sigint, null);
        posix.sigaction(posix.SIG.TSTP, &self.original_sigtstp, null);
        posix.sigaction(posix.SIG.CHLD, &self.original_sigchld, null);
    }

    fn handleSigint(_: c_int) callconv(.C) void {
        // Do nothing -- the shell survives. The foreground child
        // (in its own process group) will still receive SIGINT
        // from the terminal.
    }

    fn handleSigchld(_: c_int) callconv(.C) void {
        // Reap any finished background children.
        // We can't do complex work in a signal handler, so we just
        // call waitpid in a loop with WNOHANG to collect zombies.
        while (true) {
            const result = posix.waitpid(-1, .{ .NOHANG = true, .UNTRACED = true });
            if (result.pid <= 0) break;
        }
    }
};

Three signals, three handlers:

SIGINT (Ctrl+C): Our handler does nothing. The shell ignores it. The foreground child process will still receive SIGINT because it's in a different process group that the terminal targets directly. We'll set that up in the next section.

SIGTSTP (Ctrl+Z): Ignored in the shell. In a full implementation, Ctrl+Z should suspend the foreground job and return control to the shell. We're simplifying here -- we ignore it in the shell and let the child handle it (getting stopped).

SIGCHLD: Delivered when a child process changes state (exits, gets stopped, gets continued). We use this to reap zombie processes. The SA.NOCLDSTOP flag means we only get notified on termination, not on stop/continue -- keeps things simpler.

The handleSigchld function calls waitpid(-1, WNOHANG) in a loop. The -1 means "any child process", and WNOHANG means "don't block if no child has exited". Each successful waitpid collects one zombie. We loop until there are no more to collect. This is a common pattern -- you need the loop because multiple children might have exited between signal deliveries (signals don't queue on most Unix systems).

One critical thing about signal handlers in Zig (and C): you can only call async-signal-safe functions inside them. That means no allocation, no std.debug.print, no mutex locking. The waitpid syscall is safe, which is why we do the reaping right in the handler. For anything more complex (like updating our job table), we'd use a self-pipe trick or just check in the main loop. We covered the threading and atomics side of this in episode 30.

Process groups: isolating children from the shell

A process group is a collection of processes that can be signaled together. Every process belongs to exactly one process group, identified by a PGID (process group ID). When the terminal driver sends SIGINT (Ctrl+C) or SIGTSTP (Ctrl+Z), it sends them to all processes in the foreground process group of the terminal.

The key insight: if we put each pipeline's processes in their own group, signals from the keyboard will target ONLY that pipeline, leaving our shell untouched.

Here's how to set up process groups when spawning children:

const JobId = u16;

const ProcessGroup = struct {
    pgid: posix.pid_t,

    fn create(first_child_pid: posix.pid_t) ProcessGroup {
        // The first child's PID becomes the PGID for the group
        return .{ .pgid = first_child_pid };
    }

    fn addProcess(self: *const ProcessGroup, pid: posix.pid_t) void {
        // Move this process into our group
        // We call this from the parent after fork, AND the child calls
        // it on itself -- belt-and-suspenders to avoid race conditions
        posix.setpgid(pid, self.pgid) catch {};
    }
};

fn spawnInGroup(
    allocator: std.mem.Allocator,
    argv: []const []const u8,
    group: ?*ProcessGroup,
    stdin_fd: ?posix.fd_t,
    stdout_fd: ?posix.fd_t,
) !posix.pid_t {
    const pid = try posix.fork();

    if (pid == 0) {
        // Child process

        // Set our process group
        if (group) |g| {
            posix.setpgid(0, g.pgid) catch {};
        } else {
            // First process in pipeline -- become our own group leader
            posix.setpgid(0, 0) catch {};
        }

        // Wire up stdin/stdout
        if (stdin_fd) |fd| {
            posix.dup2(fd, posix.STDIN_FILENO) catch std.process.exit(126);
            posix.close(fd);
        }
        if (stdout_fd) |fd| {
            posix.dup2(fd, posix.STDOUT_FILENO) catch std.process.exit(126);
            posix.close(fd);
        }

        // Reset signal handlers to default for the child
        var sa_default: posix.Sigaction = .{
            .handler = .{ .handler = posix.SIG.DFL },
            .mask = posix.empty_sigset,
            .flags = 0,
        };
        posix.sigaction(posix.SIG.INT, &sa_default, null);
        posix.sigaction(posix.SIG.TSTP, &sa_default, null);

        // Exec the program
        const argv_z = allocator.allocSentinel(?[*:0]const u8, argv.len, null) catch std.process.exit(126);
        for (argv, 0..) |arg, i| {
            argv_z[i] = (allocator.dupeZ(u8, arg) catch std.process.exit(126)).ptr;
        }

        const err = posix.execvpeZ(argv_z[0].?, argv_z, std.c.environ);
        _ = err;
        std.process.exit(127); // exec failed
    }

    // Parent process
    if (group) |g| {
        g.addProcess(pid);
    } else {
        // First child -- create the group with this PID
        posix.setpgid(pid, pid) catch {};
    }

    return pid;
}

The double setpgid call (once in the child, once in the parent) is intentional. There's a race condition: the parent might try to set the child's group before the child has started, or the child might run execve before the parent gets to setpgid. By doing it in both places, whoever runs first succeeds and the second call is a harmless no-op. This double-call pattern is standard in shell implementations -- you'll find it in bash source code too.

Notice that the child resets SIGINT and SIGTSTP to SIG.DFL (default behavior) before exec. This is important because our shell set those to be ignored, and signal dispositions are inherited across fork. If we didn't reset them, the child process would ALSO ignore Ctrl+C, which defeats the purpose. The child should die on SIGINT -- that's how the user cancels commands.

The execvpeZ call at the end does PATH resolution and replaces the child's memory with the target program. If it fails (program not found, permission denied), we exit with code 127 -- the convention for "command not found" that shells use. We can't return an error here because after fork() we're in a completely separate process.

Terminal control: who owns the TTY?

Process groups alone aren't enough. The terminal has a concept of its foreground process group -- the group that receives keyboard-generated signals AND can read from the terminal. Only one group can be the foreground group at a time. All other groups are "background" and will get SIGTTIN/SIGTTOU if they try to read/write the terminal.

When we spawn a foreground job, we need to:

Put it in its own process group (done above)
Give that group ownership of the terminal (tcsetpgrp)
Wait for the job to finish
Take back terminal ownership

const Terminal = struct {
    tty_fd: posix.fd_t,
    shell_pgid: posix.pid_t,

    fn init() Terminal {
        const tty_fd = posix.open("/dev/tty", .{ .ACCMODE = .RDWR }, 0) catch
            std.io.getStdIn().handle; // Fallback to stdin

        const shell_pgid = posix.getpgrp();

        return .{
            .tty_fd = tty_fd,
            .shell_pgid = shell_pgid,
        };
    }

    fn giveToGroup(self: *const Terminal, pgid: posix.pid_t) void {
        // Make this process group the foreground group of our terminal
        posix.tcsetpgrp(self.tty_fd, pgid) catch {};
    }

    fn takeBack(self: *const Terminal) void {
        // Shell takes back terminal ownership
        posix.tcsetpgrp(self.tty_fd, self.shell_pgid) catch {};
    }
};

The tcsetpgrp call is the key. Before we wait for a foreground child, we give the terminal to the child's group. After the child finishes (or is stopped with Ctrl+Z), we take it back. If we forget to take it back, the shell won't be able to read input and the user gets a frozen terminal.

Here's the updated flow for running a foreground command:

fn runForegroundJob(
    self: *Shell,
    pipeline: *const Pipeline,
) !u8 {
    const pids = try self.spawnPipeline(pipeline, false);
    defer self.allocator.free(pids);

    if (pids.len == 0) return 0;

    // The first child's PID is the process group ID
    const pgid = pids[0];

    // Give terminal to the child group
    self.terminal.giveToGroup(pgid);

    // Wait for all processes in the pipeline
    var last_status: u8 = 0;
    for (pids) |pid| {
        const result = posix.waitpid(pid, .{});
        if (result.status) |status| {
            switch (status) {
                .exited => |code| last_status = code,
                .signal => |sig| {
                    last_status = 128 + @as(u8, @truncate(sig));
                    // Print signal info for non-SIGINT signals
                    if (sig != posix.SIG.INT) {
                        const stdout = std.io.getStdOut().writer();
                        stdout.print("\n[Terminated by signal {d}]\n", .{sig}) catch {};
                    } else {
                        // Print newline after ^C
                        const stdout = std.io.getStdOut().writer();
                        stdout.print("\n", .{}) catch {};
                    }
                },
                .stopped => |sig| {
                    _ = sig;
                    // Job was stopped (Ctrl+Z) -- add to job table
                    self.job_table.addStopped(pgid, pipeline) catch {};
                    last_status = 148; // 128 + SIGTSTP(20)
                },
                else => last_status = 1,
            }
        }
    }

    // Take back the terminal
    self.terminal.takeBack();

    return last_status;
}

The .stopped case is where Ctrl+Z kicks in. When a child receives SIGTSTP, it stops (suspends), and waitpid returns with a "stopped" status. At that point we add the job to our job table as a stopped background job and return control to the shell. The user can later resume it with fg or bg.

Background processes: parsing `&` and launching jobs

A command ending in & should run in the background. The shell doesn't wait for it -- it just prints a job ID and prompts for the next command. We need to detect the & in our parser and adjust behavior accordingly.

First, the parser change. We extend our Pipeline struct from episode 47:

const Pipeline = struct {
    commands: []Command,
    background: bool, // true if line ends with &

    fn deinit(self: *const Pipeline, allocator: std.mem.Allocator) void {
        for (self.commands) |cmd| cmd.deinit(allocator);
        allocator.free(self.commands);
    }
};

fn parsePipeline(allocator: std.mem.Allocator, tokens: []const Token) !Pipeline {
    var background = false;

    // Check if last token is '&'
    var effective_tokens = tokens;
    if (tokens.len > 0 and tokens[tokens.len - 1].kind == .ampersand) {
        background = true;
        effective_tokens = tokens[0 .. tokens.len - 1];
    }

    // ... rest of existing parser logic using effective_tokens ...
    const commands = try parseCommands(allocator, effective_tokens);

    return .{
        .commands = commands,
        .background = background,
    };
}

Simple -- if the last token is &, strip it off and set the background flag. The rest of the parser doesn't need to change at all.

Now the execution side. Background jobs don't get the terminal and we don't wait for them:

fn runBackgroundJob(
    self: *Shell,
    pipeline: *const Pipeline,
) !void {
    const pids = try self.spawnPipeline(pipeline, true);
    defer self.allocator.free(pids);

    if (pids.len == 0) return;

    const pgid = pids[0];
    const job_id = try self.job_table.addRunning(pgid, pids, pipeline);

    const stdout = std.io.getStdOut().writer();
    stdout.print("[{d}] {d}\n", .{ job_id, pgid }) catch {};
}

That's it. We spawn the pipeline (without giving it the terminal), add it to the job table, print the [1] 12345 style notification that every Unix user recognizes, and return immediately to the prompt. The job runs asynchronously and we'll hear about it when it finishes (via our SIGCHLD handler or periodic checks).

The job table: tracking background work

The job table is the data structure that tracks all background jobs. Each entry has a job ID (the [1], [2] numbers users see), the process group ID, individual PIDs, the current status, and a string representation of the command for display:

const JobStatus = enum {
    running,
    stopped,
    done,
};

const Job = struct {
    id: JobId,
    pgid: posix.pid_t,
    pids: []posix.pid_t,
    status: JobStatus,
    command_str: []const u8,
};

const JobTable = struct {
    jobs: std.ArrayList(Job),
    next_id: JobId,
    allocator: std.mem.Allocator,

    fn init(allocator: std.mem.Allocator) JobTable {
        return .{
            .jobs = std.ArrayList(Job).init(allocator),
            .next_id = 1,
            .allocator = allocator,
        };
    }

    fn deinit(self: *JobTable) void {
        for (self.jobs.items) |job| {
            self.allocator.free(job.pids);
            self.allocator.free(job.command_str);
        }
        self.jobs.deinit();
    }

    fn addRunning(
        self: *JobTable,
        pgid: posix.pid_t,
        pids: []const posix.pid_t,
        pipeline: *const Pipeline,
    ) !JobId {
        const id = self.next_id;
        self.next_id += 1;

        const owned_pids = try self.allocator.dupe(posix.pid_t, pids);
        errdefer self.allocator.free(owned_pids);

        const cmd_str = try pipelineToString(self.allocator, pipeline);
        errdefer self.allocator.free(cmd_str);

        try self.jobs.append(.{
            .id = id,
            .pgid = pgid,
            .pids = owned_pids,
            .status = .running,
            .command_str = cmd_str,
        });

        return id;
    }

    fn addStopped(
        self: *JobTable,
        pgid: posix.pid_t,
        pipeline: *const Pipeline,
    ) !JobId {
        const id = self.next_id;
        self.next_id += 1;

        const pids = try self.allocator.alloc(posix.pid_t, 1);
        pids[0] = pgid; // Simplified -- just track the group leader

        const cmd_str = try pipelineToString(self.allocator, pipeline);
        errdefer self.allocator.free(cmd_str);

        try self.jobs.append(.{
            .id = id,
            .pgid = pgid,
            .pids = pids,
            .status = .stopped,
            .command_str = cmd_str,
        });

        return id;
    }

    fn findById(self: *JobTable, id: JobId) ?*Job {
        for (self.jobs.items) |*job| {
            if (job.id == id) return job;
        }
        return null;
    }

    fn findMostRecent(self: *JobTable) ?*Job {
        if (self.jobs.items.len == 0) return null;
        // Return the last added job (most recent)
        return &self.jobs.items[self.jobs.items.len - 1];
    }

    fn removeCompleted(self: *JobTable) void {
        var i: usize = 0;
        while (i < self.jobs.items.len) {
            if (self.jobs.items[i].status == .done) {
                const job = self.jobs.orderedRemove(i);
                self.allocator.free(job.pids);
                self.allocator.free(job.command_str);
            } else {
                i += 1;
            }
        }
    }
};

fn pipelineToString(allocator: std.mem.Allocator, pipeline: *const Pipeline) ![]const u8 {
    var result = std.ArrayList(u8).init(allocator);
    errdefer result.deinit();

    for (pipeline.commands, 0..) |cmd, i| {
        if (i > 0) try result.appendSlice(" | ");
        try result.appendSlice(cmd.program);
        for (cmd.args) |arg| {
            try result.append(' ');
            try result.appendSlice(arg);
        }
    }

    if (pipeline.background) try result.appendSlice(" &");
    return try result.toOwnedSlice();
}

The pipelineToString function reconstructs a human-readable command string from the parsed pipeline. This is what gets displayed in jobs output. We store a copy because the original tokens will be freed after the command line is processed.

The removeCompleted function cleans up finished jobs from the table. We call it periodicaly (before printing the prompt is a good place) so the table doesn't grow unbounded.

Reaping background jobs: checking who's done

Between prompts, we need to check if any background jobs have finished. The SIGCHLD handler does basic reaping (preventing zombies), but we also need to update our job table. Here's a function to call before each prompt:

fn checkBackgroundJobs(self: *Shell) void {
    const stdout = std.io.getStdOut().writer();

    for (self.job_table.jobs.items) |*job| {
        if (job.status == .done) continue;

        // Try non-blocking wait on the group leader
        const result = posix.waitpid(job.pgid, .{ .NOHANG = true });
        if (result.pid > 0) {
            // Process has finished
            job.status = .done;
            stdout.print("[{d}]  Done                    {s}\n", .{
                job.id,
                job.command_str,
            }) catch {};
        } else if (result.pid == 0) {
            // Still running, do nothing
        }
    }

    self.job_table.removeCompleted();
}

We call waitpid with WNOHANG for each job's group leader. If it returns a positive PID, the job has finished. We mark it done, print the notification (same format as bash: [1] Done sleep 10), and clean up. If waitpid returns 0, the process is still running.

This check-and-notify approach is how every shell works. You run sleep 30 &, type a few more commands, and eventally see the [1] Done message appear just before a prompt. That's this function running.

Implementing `jobs`, `fg`, and `bg`

Now we wire up the built-in commands that let the user interact with the job table. These integrate into the dispatch table from episode 49:

fn builtinJobs(
    args: []const []const u8,
    shell: *Shell,
    stdout: std.fs.File.Writer,
) BuiltinError!BuiltinResult {
    _ = args;

    for (shell.job_table.jobs.items) |job| {
        const status_str = switch (job.status) {
            .running => "Running",
            .stopped => "Stopped",
            .done => "Done",
        };
        stdout.print("[{d}]  {s: <20}{s}\n", .{
            job.id,
            status_str,
            job.command_str,
        }) catch return error.WriteFailed;
    }

    return .ok;
}

fn builtinFg(
    args: []const []const u8,
    shell: *Shell,
    stdout: std.fs.File.Writer,
) BuiltinError!BuiltinResult {
    // Find the job -- either by ID (%1, %2) or most recent
    const job = if (args.len > 0) blk: {
        const id_str = if (args[0][0] == '%') args[0][1..] else args[0];
        const id = std.fmt.parseInt(JobId, id_str, 10) catch
            return error.InvalidArgs;
        break :blk shell.job_table.findById(id) orelse
            return error.InvalidArgs;
    } else blk: {
        break :blk shell.job_table.findMostRecent() orelse
            return error.InvalidArgs;
    };

    stdout.print("{s}\n", .{job.command_str}) catch {};

    // Give terminal to the job's process group
    shell.terminal.giveToGroup(job.pgid);

    // If stopped, send SIGCONT to resume it
    if (job.status == .stopped) {
        posix.kill(-(job.pgid), posix.SIG.CONT) catch {};
        job.status = .running;
    }

    // Wait for it in the foreground
    const result = posix.waitpid(job.pgid, .{ .UNTRACED = true });
    if (result.status) |status| {
        switch (status) {
            .stopped => {
                job.status = .stopped;
                stdout.print("\n[{d}]  Stopped                 {s}\n", .{
                    job.id, job.command_str,
                }) catch {};
            },
            .exited, .signal => {
                job.status = .done;
            },
            else => job.status = .done,
        }
    }

    // Take back terminal
    shell.terminal.takeBack();

    return .ok;
}

fn builtinBg(
    args: []const []const u8,
    shell: *Shell,
    stdout: std.fs.File.Writer,
) BuiltinError!BuiltinResult {
    const job = if (args.len > 0) blk: {
        const id_str = if (args[0][0] == '%') args[0][1..] else args[0];
        const id = std.fmt.parseInt(JobId, id_str, 10) catch
            return error.InvalidArgs;
        break :blk shell.job_table.findById(id) orelse
            return error.InvalidArgs;
    } else blk: {
        break :blk shell.job_table.findMostRecent() orelse
            return error.InvalidArgs;
    };

    if (job.status != .stopped) {
        stdout.print("bg: job {d} already running\n", .{job.id}) catch {};
        return .ok;
    }

    // Send SIGCONT but don't give it the terminal -- it runs in background
    posix.kill(-(job.pgid), posix.SIG.CONT) catch {};
    job.status = .running;

    stdout.print("[{d}]  {s} &\n", .{ job.id, job.command_str }) catch {};

    return .ok;
}

The fg command is the most complex. It does three things: (1) give the terminal to the job's process group, (2) send SIGCONT if the job was stopped, and (3) wait for the job to finish or get stopped again. When waitpid returns "stopped", we put the job back in the table and take the terminal back. When it returns "exited" or "signal", the job is done.

The negative PID in posix.kill(-(job.pgid), posix.SIG.CONT) is how you send a signal to an entire process group. A negative PID argument to kill() targets the group whose PGID is the absolute value. So if the group has 3 processes (a pipeline), all three get SIGCONT and resume together.

The bg command is simpler -- it just sends SIGCONT without giving the terminal. The job resumes executing in the background. If it tries to read from stdin, it'll get SIGTTIN and stop again (since it's not the foreground group), but most commands you'd run in the background either don't need stdin or have already read everything they need.

The startup config file: `~/.zigshrc`

Real shells read a config file at startup. Bash has .bashrc, Zsh has .zshrc -- we'll have .zigshrc. This lets the user set environment variables, define aliases, or run any commands they want on every shell start.

The implementation is straightforward -- read the file line by line and execute each line as if the user typed it:

fn loadRcFile(self: *Shell) void {
    const home = std.posix.getenv("HOME") orelse return;

    var path_buf: [std.fs.max_path_bytes]u8 = undefined;
    const rc_path = std.fmt.bufPrint(&path_buf, "{s}/.zigshrc", .{home}) catch return;

    const file = std.fs.cwd().openFile(rc_path, .{}) catch return;
    defer file.close();

    const reader = file.reader();
    var line_buf: [4096]u8 = undefined;

    while (reader.readUntilDelimiter(&line_buf, '\n')) |line| {
        // Skip empty lines and comments
        const trimmed = std.mem.trim(u8, line, " \t");
        if (trimmed.len == 0) continue;
        if (trimmed[0] == '#') continue;

        // Execute the line as a command
        self.executeLine(trimmed) catch {};
    } else |err| {
        if (err != error.EndOfStream) return;
    }
}

We silently ignore errors -- if .zigshrc doesn't exist, that's fine. If a line in it fails, we skip it and continue. This matches how real shells handle rc files -- you don't want a typo in your config to prevent the shell from starting.

A typical .zigshrc might look like:

# Set up environment
export EDITOR=vim
export PATH=/usr/local/bin:/usr/bin:/bin

# Greeting
echo Welcome to zsh-lite!

We call loadRcFile right after initializing the shell, before entering the main REPL loop. The export lines create variables in our Environment struct, and the echo just prints a greeting. Simple, effective.

Putting it all together: the Shell struct

Let's look at the complete shell structure that ties everything together:

const Shell = struct {
    allocator: std.mem.Allocator,
    env: Environment,
    job_table: JobTable,
    terminal: Terminal,
    signal_handler: SignalHandler,
    should_exit: bool,
    last_exit_code: u8,

    fn init(allocator: std.mem.Allocator) Shell {
        var self = Shell{
            .allocator = allocator,
            .env = Environment.init(allocator),
            .job_table = JobTable.init(allocator),
            .terminal = Terminal.init(),
            .signal_handler = SignalHandler.install(),
            .should_exit = false,
            .last_exit_code = 0,
        };

        // Ensure the shell is its own process group leader
        posix.setpgid(0, 0) catch {};

        // Take control of the terminal
        self.terminal.takeBack();

        // Load config
        self.loadRcFile();

        return self;
    }

    fn deinit(self: *Shell) void {
        self.signal_handler.restore();
        self.job_table.deinit();
        self.env.deinit();
    }

    fn run(self: *Shell) void {
        const stdin = std.io.getStdIn().reader();
        const stdout = std.io.getStdOut().writer();

        var line_buf: [4096]u8 = undefined;

        while (!self.should_exit) {
            // Check for completed background jobs
            self.checkBackgroundJobs();

            // Print prompt
            stdout.print("zsh-lite> ", .{}) catch break;

            // Read line
            const line = stdin.readUntilDelimiter(&line_buf, '\n') catch |err| {
                if (err == error.EndOfStream) break;
                continue;
            };

            if (line.len == 0) continue;

            self.executeLine(line) catch {};
        }

        stdout.print("Bye!\n", .{}) catch {};
    }

    fn executeLine(self: *Shell, line: []const u8) !void {
        const tokens = try tokenize(self.allocator, line);
        defer {
            for (tokens) |tok| {
                if (tok.kind == .word) self.allocator.free(tok.value);
            }
            self.allocator.free(tokens);
        }

        if (tokens.len == 0) return;

        const pipeline = try parsePipeline(self.allocator, tokens);
        defer pipeline.deinit(self.allocator);

        // Check for built-in (single command, not backgrounded)
        if (pipeline.commands.len == 1 and !pipeline.background) {
            const cmd = &pipeline.commands[0];
            if (isBuiltin(cmd.program)) {
                // ... built-in dispatch (same as episode 49)
                return;
            }
        }

        // External command
        if (pipeline.background) {
            try self.runBackgroundJob(&pipeline);
        } else {
            self.last_exit_code = try self.runForegroundJob(&pipeline);
        }
    }

    // ... spawnPipeline, runForegroundJob, runBackgroundJob, etc.
};

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer {
        const check = gpa.deinit();
        if (check == .leak) std.debug.print("WARNING: memory leak detected\n", .{});
    }

    var shell = Shell.init(gpa.allocator());
    defer shell.deinit();

    shell.run();
}

The initialization order matters: we install signal handlers first (so we're protected from SIGINT immediately), set ourselves as process group leader (so we can control the terminal), take terminal ownership, and then load the rc file. The defer shell.deinit() ensures we restore original signal handlers on exit -- good citizenship.

A complete session

Let's see the finished shell in action:

$ ./zsh-lite
Welcome to zsh-lite!
zsh-lite> sleep 30 &
[1] 54321
zsh-lite> sleep 60 &
[2] 54322
zsh-lite> jobs
[1]  Running             sleep 30 &
[2]  Running             sleep 60 &
zsh-lite> fg %1
sleep 30
^C
zsh-lite> jobs
[2]  Running             sleep 60 &
zsh-lite> ls | grep src | head -5
src
zsh-lite> sleep 10
^Z
[3]  Stopped                 sleep 10
zsh-lite> bg %3
[3]  sleep 10 &
zsh-lite> jobs
[2]  Running             sleep 60 &
[3]  Running             sleep 10 &
zsh-lite> exit
Bye!

Background jobs work, Ctrl+C kills only the foreground job, Ctrl+Z stops a job and lets you resume it with bg. This is a REAL shell -- not production-ready, but functionally complete for learning purposes.

Project retrospective: what real shells do that we skipped

Four episodes, roughly 1500 lines of Zig, and we have a working shell with parsing, process spawning, pipes, redirections, built-ins, environment management, job control, and signal handling. That's a lot. But real shells like bash, zsh, and fish have hundreds of thousands of lines of code. What did we skip?

Things we skipped that matter:

Variable expansion: $HOME, $PATH, $? (last exit code), $$ (shell PID), $! (last background PID). This is a substantial subsystem -- you need to expand variables in words, handle ${var:-default} syntax, array variables, etc.
Globbing: *.txt, src/**/*.zig, file[0-9].log. Real shells expand wildcards before executing commands. This requires walking the filesystem with pattern matching.
Command substitution: $(command) or backtick syntax. Running a command and inserting its output as text in another command.
Here documents: cat <<EOF ... EOF. Multi-line input redirection from the script itself.
Aliases and functions: User-defined command shortcuts and shell functions with local variables.
History: Command history with up/down arrows, !!, !$, !-2, history search. Requires a line editor (readline or equivalent).
Line editing: Cursor movement, editing in place, tab completion. This alone is a massive project -- look at linenoise or GNU readline.
Conditional execution: &&, ||, if/then/else/fi, while/do/done. A full scripting language.
Quoting edge cases: The interaction between single quotes, double quotes, backslashes, and variable expansion is fiendishly complex. Our parser handles the basics but a POSIX-compliant implementation has dozens of edge cases.
Subshells: (command; command) runs commands in a subshell (forked copy of the shell).

Things we intentionally simplified:

Our signal handler does basic reaping but doesn't handle all race conditions perfectly. A production shell needs very careful handling of the SIGCHLD/waitpid interaction.
We track only the group leader PID for stopped jobs. A proper implementation tracks every process in the pipeline.
Our fg only waits on the group leader. If the pipeline has 3 processes, we should wait for all of them.
We don't implement OLDPWD for cd -.
We don't handle SIGTTIN/SIGTTOU (background processes trying to access the terminal).
Error messages could be more specific and match POSIX conventions.

What we DID build well:

A tokenizer that correctly handles quoting, escaping, and operator detection
A pipe plumbing system that avoids deadlocks by closing unused fd ends
A clean built-in dispatch table using StaticStringMap
Proper memory management with no leaks (verified by the GPA)
Process group separation so signals reach the right targets
A job table with fg/bg/jobs support

If you want to push this further, variable expansion is the most rewarding next step. It touches the parser (recognizing $ in tokens), needs a new expansion pass between parsing and execution, and makes the shell feel dramatically more useful. After that, globbing and command substitution are the features that turn a shell from "educational toy" to "I could actually use this for simple scripts".

But even without those features -- congratulations. You've built something that actually works at a systems level. You understand how fork/exec creates processes, how pipes thread data between them, how signals control process lifecycle, and how terminal ownership determines who gets keyboard input. That knowledge transfers directly to any systems programming you do in the future, regardless of language.

Wat we geleerd hebben

Signal handling with sigaction: installing custom handlers for SIGINT, SIGTSTP, and SIGCHLD, with proper async-signal-safety constraints
Process groups with setpgid: isolating child pipelines into their own groups so keyboard signals target only the foreground job
Terminal control with tcsetpgrp: transferring terminal ownership between the shell and foreground jobs, then taking it back after the job finishes or stops
Background job launching: detecting & in the parser, spawning without waiting, and printing the [N] PID notification
The job table: tracking active jobs by ID, PGID, status, and command string, with cleanup of completed entries
Non-blocking child reaping: using waitpid with WNOHANG to detect finished background jobs without blocking the shell
The fg workflow: give terminal to job group, send SIGCONT if stopped, wait for termination or re-stop, take terminal back
The bg workflow: send SIGCONT without terminal transfer, job continues in background
RC file loading: reading ~/.zigshrc line by line and executing each as a command, silently handling missing files
The full architecture: Shell struct combining allocator, environment, job table, terminal state, signal handlers, and the REPL loop

And with that, Project D is complete. Fifty episodes of Zig -- from "hello world" to building a shell with job control. Next time we're starting something completely different: an HTTP server from scratch. Parsing requests, routing, serving static files, middleware -- a whole new systems domain where Zig's performance and control really shine ;-)

Thanks for reading!

Hive account@scipio

Learn Zig Series (#50) - Build a Shell: Job Control and Signals

Learn Zig Series (#50) - Build a Shell: Job Control and Signals

What will I learn

Requirements

Difficulty

Curriculum (of the Learn Zig Series):

Learn Zig Series (#50) - Build a Shell: Job Control and Signals

Signal handling: surviving Ctrl+C

Process groups: isolating children from the shell

Terminal control: who owns the TTY?

Background processes: parsing & and launching jobs

The job table: tracking background work

Reaping background jobs: checking who's done

Implementing jobs, fg, and bg

The startup config file: ~/.zigshrc

Putting it all together: the Shell struct

A complete session

Project retrospective: what real shells do that we skipped

Wat we geleerd hebben

Curriculum (of the `Learn Zig Series`):

Background processes: parsing `&` and launching jobs

Implementing `jobs`, `fg`, and `bg`

The startup config file: `~/.zigshrc`