A first try on Zig and C interop

Tuesday 18 June 2024 · 44 mins read · Viewed 59 times

Table of contents 🔗

Table of contents
Introduction
The concept
Zig and tricks
Developing the AV1 transcoder
- Developing the Remuxer
- Developing the Transcoder
Last part: the build system
Conclusion
References

Introduction 🔗

WARNING

Zig is still in development, and the language API is not stable. The code in this article may not work in future versions of Zig.

The version of Zig used in this article is 0.13.0-dev.380+b32aa99b8.

Lately, I've been programming in Zig for the Advent of Code 2023. What I've learned about the language is that it works perfectly well for low-level programming, but lacked in some areas (my biggest gripe is the lack of HTTP2).

However, Zig claims to be able to interop with C, which is something that I've been wanting to try for a long time. As you know, I used to use CGO in Go, which permits me to use complex C libraries with a high-level layer in Go.

In this article, I will try to demonstrate my experience with Zig and C interop. I will use a simple example: a C library that transcode a video into AV1 format, and a Zig program that uses this library.

The concept 🔗

Transcoding a video seems quite a complex task, but thanks to the libavcodec and libavformat C libraries, it is quite "easy" to do (or at least, to understand).

Transcoding a video follows these steps:

Open the input video file.
Demux the input video file: read packets from the input video file.
Decode the packets: decode the packets into frames by passing them to the decoder.
Encode the frames: encode the frames into packets by passing them to the encoder.
Mux the packets: write the packets to the output video file.

The aim of this article is to demonstrate how to use a C library without the need to make a Zig wrapper around it. This is a common practice in Go, where you can use CGO to call C functions directly.

Zig and tricks 🔗

I'm taking account that people reading this article are not familiar with Zig, so I will explain some concepts that I've learned while programming in Zig.

Memory allocators 🔗

Zig does not have a garbage collector, so you have to manage memory "yourself". By yourself, I mean that you have to allocate and deallocate memory manually.

Compared to C, you are free to choose the type of allocator you want to use. The standard library provides a std.mem.Allocator interface that you can implement to create your own allocator.

In this article, I will use two allocators:

std.heap.GeneralPurposeAllocator: a simple allocator. We could have used the std.heap.page_allocator or the std.heap.c_allocator, but for the sake of being simple, I will use the GeneralPurposeAllocator.
std.heap.ArenaAllocator: an allocator that groups allocations in arenas. This is useful when you want to deallocate a group of allocations at once, like the arguments of the program.

In Zig, the GeneralPurposeAllocator can be used to detect memory leaks:

1var gpa = std.heap.GeneralPurposeAllocator(.{}){};
2const gpa_allocator = gpa.allocator();
3
4pub fn main() !void {
5    // Detect memory leaks
6    defer std.debug.assert(gpa.deinit() == .ok);
7    // Your code here
8}

Oh yeah, Zig has a defer statement, which is quite similar to Go's defer, but it is scoped to the curly braces, compared to Go's defer, which is scoped to the function. And the .ok is an enum value.

About the gpa declaration, I'm calling a function with an "empty" struct as an argument, which returns a type. Do note I'm quoting "empty", because Zig has default values for structs, which is quite useful.

To instanciate the type std.heap.GeneralPurposeAllocator(.{}), we add the curly braces after the type name:

1const my_struct: type = struct {
2    a: u8,
3    b: u8,
4};
5
6var object = my_struct{};
7var object: my_struct = .{}; // Equivalent

Easy memory management with `defer` 🔗

The best feature of Zig is the defer statement because it completes the "flow" of the function. Similar to Go, defer can be used to clean up resources at the end of the function:

1func do() !void {
2    var array = try allocator.alloc(i64, array_size);
3    defer allocator.free(array);
4
5    // Your code here
6}

Compared to C:

 1int do() {
 2    int ret = 0;
 3    int *array = malloc(array_size * sizeof(int));
 4    if (array == NULL) {
 5        ret = 1;
 6        goto end;
 7    }
 8
 9    // Your code here
10
11end:
12    if (array != NULL) free(array);
13    return ret;
14}

But, one issue that I've had is that since defer is scoped to the curly braces, it's almost unusable in if statements:

1var data: ?[]u8 = null;
2if (first_file) {
3    try allocator.alloc(u8, data_size);
4    defer allocator.free(data);
5} // Is cleared here

Zig has errdefer, which is "almost" what I what, but only triggers when an error occurs:

1var data: ?[]u8 = null;
2if (first_file) {
3    try allocator.alloc(u8, data_size);
4    errdefer allocator.free(data);
5}
6
7return; // errdefer is not triggered: no error returned.

It would be nice to have an actual equivalent of Go's defer in Zig.

Error handling 🔗

Zig has an error handling similar to Go, but slightly more primitive. To compare:

1int my_function() {
2    if (error_condition) {
3        return -1;
4    }
5    return 0;
6}

Go:

1func myFunction() error {
2    if errorCondition {
3        return errors.New("my error")
4    }
5    return nil
6}

Zig:

1fn my_function() !void {
2    if (error_condition) {
3        return error.MyError;
4    }
5}

In Zig, errors are not implementations of an error interface like in Go, but are enums. And errors can have subsets and supersets, which is somewhat confusing at first, but quite powerful:

 1const std = @import("std");
 2
 3const FileOpenError = error{
 4    AccessDenied,
 5    OutOfMemory,
 6    FileNotFound,
 7};
 8
 9const AllocationError = error{
10    OutOfMemory,
11};
12
13test "coerce subset to superset" {
14    const err = subset_to_superset(AllocationError.OutOfMemory);
15    try std.testing.expect(err == FileOpenError.OutOfMemory);
16}
17
18fn subset_to_superset(err: AllocationError) FileOpenError {
19    return err;
20}

error is the base superset.

As you can see, there is some flexibility in error handling in Zig, but has one downside: error does not have any value (no message, no custom data). However, Zig errors remember the stack trace, which is quite useful for debugging and could help avoid the need to pass custom data in the error.

Zig's quick error handling 🔗

In Go, we often have this pattern:

1func myFunction() error {
2    if err := someFunction(); err != nil {
3        return err
4    }
5    return nil
6}

In Zig, we can use the try keyword to return an error immediately:

1fn my_function() !void {
2    try some_function();
3}

This helps to streamline the error handling in Zig:

1fn complex_function() !void {
2    try handle_data(try fetch_data());
3}

C interop 🔗

Zig has its own C compiler and own "C-translator". To include a C library in Zig, you have to use the @cImport directive:

1const c = @cImport({
2    @cInclude("libavcodec/avcodec.h");
3    @cInclude("libavformat/avformat.h");
4    @cInclude("libavutil/avutil.h");
5});
6
7pub fn main() !void {
8    c.call_some_c_function();
9}

Which is quite similar to Go:

 1/*
 2#cgo pkg-config: libavformat libavcodec libavutil
 3#include <libavcodec/avcodec.h>
 4#include <libavformat/avformat.h>
 5#include <libavutil/avutil.h>
 6*/
 7import "C"
 8
 9func main() {
10    C.call_some_c_function()
11}

About pkg-config, Zig has its own build system. You can create a build.zig and pass the C libraries you want to link to:

 1pub fn build(b: *std.Build) void {
 2    const target = b.standardTargetOptions(.{});
 3    const optimize = b.standardOptimizeOption(.{});
 4
 5    const exe = b.addExecutable(.{
 6        .name = "av1-transcoder",
 7        .root_source_file = b.path("src/main.zig"),
 8        .target = target,
 9        .optimize = optimize,
10        .link_libc = true,
11    });
12
13    exe.addIncludePath(.{
14        .src_path = .{ .owner = b, .sub_path = "src" },
15    });
16    exe.linkSystemLibrary2("libavcodec", .{ .preferred_link_mode = .static });
17    exe.linkSystemLibrary2("libavutil", .{ .preferred_link_mode = .static });
18    exe.linkSystemLibrary2("libavformat", .{ .preferred_link_mode = .static });
19    exe.linkSystemLibrary2("swresample", .{ .preferred_link_mode = .static });
20    exe.linkSystemLibrary2("SvtAv1Dec", .{ .preferred_link_mode = .static });
21    exe.linkSystemLibrary2("SvtAv1Enc", .{ .preferred_link_mode = .static });
22
23    b.installArtifact(exe);
24}

Your IDE/LSP won't be able to detect the C symbols at first, but after compiling the project, it will be able to detect them.

But one difference is certain: Zig has less "bridges" between Zig and C, which makes the code more readable than Go's:

1// In the .zig-cache directory, there is the translation of the C library to Zig
2pub extern fn avformat_open_input(ps: [*c][*c]AVFormatContext, url: [*c]const u8, fmt: [*c]const AVInputFormat, options: [*c]?*AVDictionary) c_int;
3
4// Usage
5fn test_avformat_open_input(input [*:0]const u8) c_int {
6    var ifmt_ctx: ?[*]c.AVFormatContext = null;
7    var ret = c.avformat_open_input(&ifmt_ctx, input_file, null, null);
8}

 1// This is stored in the $HOME/.cache/go-build directory. The translation is not readable.
 2//go:cgo_unsafe_args
 3func _Cfunc_avformat_open_input(p0 **_Ctype_struct_AVFormatContext, p1 *_Ctype_char, p2 *_Ctype_struct_AVInputFormat, p3 **_Ctype_struct_AVDictionary) (r1 _Ctype_int) {
 4	_cgo_runtime_cgocall(_cgo_fa42a779fc4c_Cfunc_avformat_open_input, uintptr(unsafe.Pointer(&p0)))
 5	if _Cgo_always_false {
 6		_Cgo_use(p0)
 7		_Cgo_use(p1)
 8		_Cgo_use(p2)
 9		_Cgo_use(p3)
10	}
11	return
12}
13
14// Usage
15func testAVFormatOpenInput(input string) C.int {
16	var ifmt *C.AVFormatContext = nil
17	return C.avformat_open_input(&ifmt, C.CString(input), nil, nil)
18}

Oh wait, again! Here, Zig has multiple features that enhance the safety of the code:

[*:0]const u8 indicates a slice ([]const u8) that is sentinel-terminated ([:0]const u8) and is the pointer ([*]const u8, which makes [*:0]const u8). To summarize, this is a C string. Zig strings does not need to be manipulated with a pointer.
?[*]c.AVFormatContext is a nullable pointer to a c.AVFormatContext struct. Zig has basic null safety. Since C does not have any null safety, you may see instead [*c]c.AVFormatContext which is a C pointer to a c.AVFormatContext struct and can be null.

Both languages suffer from one major issue: the comments are not passed to the translation, which means that deprecation notices or warnings are not passed to the Zig/Go code.

Overall, Zig has a slightly better C interop than Go.

Limitations of the C interop 🔗

Zig has some limitations when it comes to C interop:

Some macros are translated to Zig, but not all of them. You may have to write the Zig equivalent of the macro:
```
1pub const av_err2str = @compileError("unable to translate C expr: expected ')' instead got '['");
```
The worst issue that I had is due to the strictness of Zig's type system. Some macros do not translate well between u64, c_int, usize... This is quite a pain, because FFmpeg (libavutil) uses a macro to define errors at compile time.
const-hell. Zig is able to handle const at pointer ("pointer's value is immutable") and struct level ("struct is immutable"). However, Zig is quite picky when passing a const pointer to a C function (developper's fault):
```
1c.av_guess_frame_rate(@constCast(ifmt_ctx), @constCast(in_stream), null);
```
Technically, av_guess_frame_rate accepts a const pointer because it does not modify the pointer.

Visibility 🔗

Zig visibility is scoped to the file, which is more similar to Python or C than Go. This forces you to mostly develop in a single file, which is quite a pain when you have a large project.

To export a function, you have to use the pub keyword:

1pub fn my_function() void {
2    // Your code here
3}

To import a function, you have, well..., to import it:

1const my_module = @import("my_module.zig");
2
3pub fn main() void {
4    my_module.my_function();
5}

Somewhat, I prefer Go's visibility, which is scoped to the package and allows you to separate responsibilities easily.

I mean, just look at the std package in Zig. It's quite a mess. (Example: general_purpose_allocator.zig). Tests are also in the same file, which, I guess, it's fine.

The reason why I think that Zig is more messy than C and Python is because C's header indicates explicitly what is exported and what is not. And, Python hasn't really a visibility system, but it's quite easy to understand what is exported simply by looking at the variables and functions names.

Overall, Zig's visilibilty tries to be the best of both worlds: everything private in one file like in C, but without the hassle of header files, with the sacrifice of having one messy file. I hope there will be some styling guidelines in the future.

Syntax 🔗

Zig's syntax has a lot of quality of life improvements compared to C and Go. I won't go too much into detail, but here are some examples:

Dereferencing and null-safety chaining:

1my_ptr.?.*.another_ptr.?.*.nullable_struct.?.ptr.*

Go equivalent:

1if myPtr != nil && myPtr.anotherPtr != nil && myPtr.anotherPtr.nullableStruct != nil {
2    myPtr.anotherPtr.nullableStruct.ptr // Implicit dereference
3}

C equivalent:

1if (my_ptr != NULL && my_ptr->another_ptr != NULL && my_ptr->another_ptr->nullable_struct != NULL) {
2    *(my_ptr->another_ptr->nullable_struct->ptr);
3}

For loops can uses ranges and zip (simultanous iteration)
Etc...

Developing the AV1 transcoder 🔗

Developing the Remuxer 🔗

I had some issue with developing with libav* libraries due to the lack of resources. But after a while, I was able to do it in Zig without any wrapper.

To transcode a video, you must first think about remuxing the video: Read packets and write packets into a new container. Remuxing is relatively easy to do, since it's all about concatenating packets.

The steps are:

Open the input video file.
Open the output video file.
Demux the input video file: read packets (av_read_frame) from the input video file in a while loop.
Process the packet: rescale the timestamps of the packet, and fix eventual discontinuities.
Mux the packets: write the packets to the output video file (av_interleaved_write_frame)
Close the input and output video files. Clean up everything.

The example given by the FFmpeg documentation is quite accurate (minus the mpegts discontinuities fix)

I won't go too much into details, but here are the sexy stuff which was improved by Zig:

No more goto, no more dangling ret. Zig defer is powerful enough to handle most the memory management:

 1// Using Zig's allocator
 2var stream_mapping = try allocator.alloc(i64, stream_mapping_size);
 3defer allocator.free(stream_mapping);
 4
 5// Using C's (libav) allocator
 6var enc_ctx = c.avcodec_alloc_context3(enc);
 7if (enc_ctx == null) {
 8  // Error handling
 9}
10defer c.avcodec_free_context(&enc_ctx);

Slightly object-oriented, to bind multiple lifecycles into one:

 1const Context = struct {
 2  stream_mapping: []i64,
 3  dts_offset: []i64,
 4  // ...
 5  allocator: std.mem.Allocator,
 6
 7  fn init(allocator: Allocator, size: usize) !Context {
 8    return .{
 9      .stream_mapping = try allocator.alloc(i64, size),
10      .dts_offset = try allocator.alloc(i64, size),
11      // ...
12    };
13  }
14
15  fn deinit(self: *Context) void {
16    self.allocator.free(self.stream_mapping);
17    self.allocator.free(self.dts_offset);
18    // ...
19  }
20};

Now, let's develop the transcoding side of the program.

Developing the Transcoder 🔗

Transcoding a video adds 6 steps:

Initializing the decoder.
Initializing the encoder.
Add a while loop to decode frames.
Add a while loop to encode frames.
Flush the encoder.
Flush the decoder.

The steps include also fixing the timestamps and the frame rate.

You can pretty much use the example given by the FFmpeg documentation (minus the filters).

The code looks like this:

 1// .. First loop, in the remuxer
 2while (true) {
 3    ret = c.av_read_frame(ifmt_ctx, pkt);
 4    if (ret < 0) {
 5        // No more packets
 6        break;
 7    }
 8    defer c.av_packet_unref(pkt);
 9
10    const in_stream_index = @as(usize, @intCast(pkt.stream_index));
11
12    // Packet is blacklisted
13    if (in_stream_index >= stream_mapping_size or stream_mapping[in_stream_index] < 0) {
14        continue;
15    }
16    const out_stream_index = @as(usize, @intCast(stream_mapping[in_stream_index]));
17    pkt.stream_index = @as(c_int, @intCast(out_stream_index));
18
19    const stream_ctx = stream_ctxs[out_stream_index];
20
21    try stream_ctx.fix_discontinuity_ts(pkt);
22
23    // Input to decoder timebase
24    try stream_ctx.transcode_write_frame(pkt);
25} // while packets.
26
27// ...
28
29// Second loop, in the decoder (stream_ctx)
30fn transcode_write_frame(self: StreamContext, pkt: ?*c.AVPacket) !void {
31    // Send packet to decoder
32    try self.decoder.send_packet(pkt);
33
34    while (true) {
35        // Fetch decoded frame from decoded packet
36        const frame = self.decoder.receive_frame() catch |e| switch (e) {
37            AVError.EAGAIN => return,
38            AVError.EOF => return,
39            else => return e,
40        };
41        defer c.av_frame_unref(frame);
42
43        frame.*.pts = frame.*.best_effort_timestamp;
44
45        if (frame.*.pts != c.AV_NOPTS_VALUE) {
46            frame.*.pts = c.av_rescale_q(frame.*.pts, self.decoder.dec_ctx.?.*.pkt_timebase, self.encoder.enc_ctx.?.*.time_base);
47        }
48
49        try self.encode_write_frame(frame);
50    }
51}
52
53// Third loop, in the encoder (also in stream_ctx)
54fn encode_write_frame(self: StreamContext, dec_frame: ?*c.AVFrame) !void {
55    self.encoder.unref_pkt();
56
57    try self.encoder.send_frame(dec_frame);
58
59    while (true) {
60        // Read encoded data from the encoder.
61        var pkt = self.encoder.receive_packet() catch |e| switch (e) {
62            AVError.EAGAIN => return,
63            AVError.EOF => return,
64            else => return e,
65        };
66
67        // Remux the packet
68        pkt.stream_index = @as(c_int, @intCast(self.stream_index));
69
70        // Encoder to output timebase
71        c.av_packet_rescale_ts(pkt, self.encoder.enc_ctx.?.*.time_base, self.out_stream.*.time_base);
72
73        try self.fix_monotonic_ts(pkt);
74
75        // Write packet
76        const ret = c.av_interleaved_write_frame(self.ofmt_ctx, pkt);
77        if (ret < 0) {
78            err.print("av_interleaved_write_frame", ret);
79            return ret_to_error(ret);
80        }
81    }
82}

Or in simple words:

Read a packet from the input video file by calling av_read_frame.
Send packet to the decoder by calling avcodec_send_packet.
Receive a frame from the decoder by calling avcodec_receive_frame.
Send frame to the encoder by calling avcodec_send_frame.
Receive a packet from the encoder by calling avcodec_receive_packet.
Write the packet to the output video file by calling av_interleaved_write_frame.

And that's it! You have a video transcoder in Zig. (You'll also need to fix the timestamps and discontinuities, but that's another story).

Last part: the build system 🔗

The build.zig file that I've given earlier is quite enough to build the project. Zig automatically statically links the libraries, making the executable portable.

Oh, and you'll need to fork SVT-AV1 to enable the static flag:

 1# svt-av1-9999.ebuild, using Gentoo's Portage to build the package
 2
 3multilib_src_configure() {
 4  append-ldflags -Wl,-z,noexecstack
 5
 6  local mycmakeargs=(
 7    -DBUILD_TESTING=OFF
 8    -DCMAKE_OUTPUT_DIRECTORY="${BUILD_DIR}"
 9    -DBUILD_SHARED_LIBS="$(usex static-libs OFF ON)" # Enable static libraries based on the USE flag "static-libs"
10  )
11
12  [[ ${ABI} != amd64 ]] && mycmakeargs+=(-DCOMPILE_C_ONLY=ON)
13
14  cmake_src_configure
15}

However, I have one MAJOR issue: the libc is not statically linked, which means I'm unable to create a distroless Docker image. It would have been nice to have a prefered_link_mode for the libc.

When disabling link_libc, the executable does not compile since symbols are missing (even with linkSystemLibrary2("c", .{ .preferred_link_mode = .static })). Normally, Zig automatically links the libc statically, but it seems that this isn't the case here.

To force the static linking, you can enable .linkage = .static in the addExecutable function. And instead of using linkSystemLibrary2, you can use addObjectFile. The issue with this technique is that you have to do everything manually, and you cannot use pkg-config to find the missing includes and libraries:

 1pub fn build(b: *std.Build) void {
 2    const target = b.standardTargetOptions(.{});
 3    const optimize = b.standardOptimizeOption(.{});
 4
 5    const exe = b.addExecutable(.{
 6        .name = "av1-transcoder",
 7        .root_source_file = b.path("src/main.zig"),
 8        .target = target,
 9        .optimize = optimize,
10        .linkage = .static,
11        .link_libc = true,
12    });
13
14    exe.addIncludePath(.{
15        .src_path = .{ .owner = b, .sub_path = "src" },
16    });
17    exe.addIncludePath(.{
18        .src_path = .{ .owner = b, .sub_path = "/usr/include" },
19    });
20    exe.addObjectFile(.{ .src_path = .{
21        .owner = b,
22        .sub_path = "/usr/lib/libavcodec.a",
23    } });
24    exe.addObjectFile(.{ .src_path = .{
25        .owner = b,
26        .sub_path = "/usr/lib/libavutil.a",
27    } });
28    exe.addObjectFile(.{ .src_path = .{
29        .owner = b,
30        .sub_path = "/usr/lib/libavformat.a",
31    } });
32    exe.addObjectFile(.{ .src_path = .{
33        .owner = b,
34        .sub_path = "/usr/lib/libswresample.a",
35    } });
36    exe.addObjectFile(.{ .src_path = .{
37        .owner = b,
38        .sub_path = "/usr/lib/libSvtAv1Dec.a",
39    } });
40    exe.addObjectFile(.{ .src_path = .{
41        .owner = b,
42        .sub_path = "/usr/lib/libSvtAv1Enc.a",
43    } });
44

At this point, the generated artifact is a static executable:

1$ ldd ./zig-out/bin/av1-transcoder
2ldd: ./zig-out/bin/av1-transcoder: Not a valid dynamic program

Yay!

Small issue: Because the paths are hardcoded, it will be quite difficult to cross-compile the project at the moment.

Conclusion 🔗

Zig C interop is almost impeccable, and at least way better than Go's. Symbols are directly translated to Zig, and the memory management is quite easy to handle. The Zig API plugs well with the C API, making C code slightly safer.

However, the build system, while quite powerful, lacks of flexibility around the libc linking and pkg-config support. Perhaps sticking to a Makefile would be better for now.

Overall, Zig presents some potential for low-level programming, or at least for dynamic libraries development. The features and syntax complement very well with C. But, I would still recommend C++ if you want to develop stable and production-ready software. While C++ is complex due to the richness of the language, C++ offers its kind of safety (smart pointers) which can help you avoid memory leaks and dangling pointers.

Lastly, because Zig is still in development, Zig lacks of high-level libraries and frameworks, which limits the use of Zig in production (no gRPC).

So, to conclude, I will use Zig for competitive programming and for basic API like my AV1 transcoder bot. But for production, I will stick to Go.

A first try on Zig and C interop

Table of contents 🔗

Introduction 🔗

The concept 🔗

Zig and tricks 🔗

Memory allocators 🔗

Easy memory management with defer 🔗

Error handling 🔗

Zig's quick error handling 🔗

C interop 🔗

Limitations of the C interop 🔗

Visibility 🔗

Syntax 🔗

Developing the AV1 transcoder 🔗

Developing the Remuxer 🔗

Developing the Transcoder 🔗

Last part: the build system 🔗

Conclusion 🔗

References 🔗

Easy memory management with `defer` 🔗