Box64 vs FEX emulation performance

This is a quick comparison of Box64 and FEX x86_64 (AMD64) emulators designed to run x86 binaries on ARM architecture.
The test is done to show emulation speed vs native ARM64 printer drivers execution.

We'll test two popular open-source drivers: foo2zjs (HP LaserJet raster filter) and splix (Samsung SCX raster filter).

Both drivers are downloaded in a binary form directly from ARM64 and AMD64 Debian 13 (Trixie) repositories, which means they were built without special compilation flags or optimizations for specific CPU core.
Box64 and FEX, however, are compiled for ARMv8(.0) Cortex-A53, with VPF4 and NEON, specifically for the UoWPrint print server.

Cortex-A53 is an energy-efficient core from ARM with 2-way superscalar, in-order execution pipeline.
AllWinner H618 core used in the print server sports 1 MB of L2 cache.


foo2zjs

foo2zjs is a single-threaded binary written in C, which accepts netpbm 1-bit black-and-white file format, with the only dependency: libJBIG.
We're testing raw emulation performance: neither of the emulators have a wrapper/thunk for libJBIG, no call redirection to the native library version is used.

Test outline:

  1. Convert 4-page document-a4.pdf file to 1-bit PBM file for foo2zjs (1200×600 DPI)
  2. Run native ARM64 foo2zjs, get the execution time
  3. Run x86_64 foo2zjs via box64 with the stock settings
  4. Run x86_64 foo2zjs via box64 with BOX64_DYNAREC_FORWARD=512 (larger JIT code block size) and BOX64_DYNAREC_CALLRET=1 (try to optimize CALL/RET, skipping the jump table when possible)
  5. Run x86_64 foo2zjs via FEX, stock settings

The output of each command is SHA1-hashed, to ensure byte-to-byte identical output.

=== Running native ARM64 foo2zjs ===
+ /home/user/arm64/usr/bin/foo2zjs document-a4.pbm
+ sha1sum
5835de22b8e0ebf9b32b41e1a83647ee740d8b74  -

real    0m7.803s
user    0m7.683s
sys     0m0.138s

=== Running x86_64 foo2zjs in box64, stock settings ===
+ box64 /home/user/x86_64/usr/bin/foo2zjs document-a4.pbm
+ sha1sum
5835de22b8e0ebf9b32b41e1a83647ee740d8b74  -

real    0m26.086s
user    0m25.976s
sys     0m0.126s

=== Running x86_64 foo2zjs in box64, optimized settings ===
+ BOX64_DYNAREC_CALLRET=1
+ BOX64_DYNAREC_FORWARD=512
+ box64 /home/user/x86_64/usr/bin/foo2zjs document-a4.pbm
+ sha1sum
5835de22b8e0ebf9b32b41e1a83647ee740d8b74  -

real    0m19.246s
user    0m19.121s
sys     0m0.141s

=== Running x86_64 foo2zjs in FEX ===
+ FEX /home/user/x86_64/usr/bin/foo2zjs document-a4.pbm
+ sha1sum
5835de22b8e0ebf9b32b41e1a83647ee740d8b74  -

real    0m21.231s
user    0m20.101s
sys     0m0.101s

Box64 is somewhat slower in stock configuration, however with a bit of tuning it beats FEX by 2 seconds.
Both emulators are very far from the native version though, giving ×2.5-×2.8 of emulation overhead.


splix

Splix is a multi-threaded C++ binary which accepts CUPS-raster, a very similar file format to the industry-established Apple Raster (URF) and PWG Raster. In fact, it is the ancestor of both, as all three formats were created by Michael Sweet: the CUPS developer, former Apple employee, and PWG specification author.

Test outline:

  1. Convert 4-page document-a4.pdf file to 1-bit CUPS-Raster for splix (600×600 DPI)
  2. Run native ARM64 rastertoqpdl, get the execution time
  3. Run x86_64 rastertoqpdl via box64 with the stock settings
  4. Run x86_64 rastertoqpdl via box64 with BOX64_DYNAREC_FORWARD=512 (larger JIT code block size) and BOX64_DYNAREC_CALLRET=1 (try to optimize CALL/RET, skipping the jump table when possible)
  5. Run x86_64 rastertoqpdl via FEX, stock settings

Unlike foo2zjs, splix is a "native" CUPS filter, which means it is integrated with CUPS printing workflow, reads printer options directly from the PPD printer description file, and reads the CUPS-raster data with the aid of libcupsimage.so library.

Box64 includes the wrapper for libcups.so library, but with only a handful functions wrapped, which is not sufficient for loading libcupsimage.so. Running splix in Box64 produces the following internal loader error:

[BOX64] Using emulated /lib/x86_64-linux-gnu/libcupsimage.so.2
[BOX64] Using native(wrapped) libcups.so.2
[BOX64] Using emulated /lib/x86_64-linux-gnu/libjbig.so.0
[BOX64] Using emulated /lib/x86_64-linux-gnu/libstdc++.so.6
[BOX64] Using emulated /lib/x86_64-linux-gnu/libgcc_s.so.1
[BOX64] Using native(wrapped) libc.so.6
[BOX64] Using native(wrapped) ld-linux-x86-64.so.2
[BOX64] Using native(wrapped) libpthread.so.0
[BOX64] Using native(wrapped) libdl.so.2
[BOX64] Using native(wrapped) libutil.so.1
[BOX64] Using native(wrapped) libresolv.so.2
[BOX64] Using native(wrapped) librt.so.1
[BOX64] Using native(wrapped) libbsd.so.0
[BOX64] Using native(wrapped) libm.so.6
[BOX64] Error: Symbol _cupsRasterWriteHeader not found, cannot apply R_X86_64_JUMP_SLOT @0x7fff00003f88 (0x1046) in /lib/x86_64-linux-gnu/libcupsimage.so.2 (ver=1 / (none))
[BOX64] Error: Symbol _cupsRasterReadHeader not found, cannot apply R_X86_64_JUMP_SLOT @0x7fff00003f98 (0x1066) in /lib/x86_64-linux-gnu/libcupsimage.so.2 (ver=1 / (none))
[BOX64] Error: Symbol _cupsRasterInterpretPPD not found, cannot apply R_X86_64_JUMP_SLOT @0x7fff00003fa0 (0x1076) in /lib/x86_64-linux-gnu/libcupsimage.so.2 (ver=1 / (none))
[BOX64] Error: Symbol _cupsRasterNew not found, cannot apply R_X86_64_JUMP_SLOT @0x7fff00003fa8 (0x1086) in /lib/x86_64-linux-gnu/libcupsimage.so.2 (ver=1 / (none))
[BOX64] Error: Symbol _cupsRasterInitPWGHeader not found, cannot apply R_X86_64_JUMP_SLOT @0x7fff00003fb0 (0x1096) in /lib/x86_64-linux-gnu/libcupsimage.so.2 (ver=1 / (none))
[BOX64] Error: Symbol _cupsRasterWritePixels not found, cannot apply R_X86_64_JUMP_SLOT @0x7fff00003fc0 (0x10b6) in /lib/x86_64-linux-gnu/libcupsimage.so.2 (ver=1 / (none))
[BOX64] Error: Symbol _cupsRasterDelete not found, cannot apply R_X86_64_JUMP_SLOT @0x7fff00003fc8 (0x10c6) in /lib/x86_64-linux-gnu/libcupsimage.so.2 (ver=1 / (none))
[BOX64] Error: Symbol _cupsRasterErrorString not found, cannot apply R_X86_64_JUMP_SLOT @0x7fff00003fd0 (0x10d6) in /lib/x86_64-linux-gnu/libcupsimage.so.2 (ver=1 / (none))
[BOX64] Error: Symbol _cupsRasterReadPixels not found, cannot apply R_X86_64_JUMP_SLOT @0x7fff00003fd8 (0x10e6) in /lib/x86_64-linux-gnu/libcupsimage.so.2 (ver=1 / (none))
[BOX64] Error: relocating Plt symbols in elf libcupsimage.so.2
[BOX64] Error initializing needed lib libc.so.6
[BOX64] Error loading one of needed lib
[BOX64] Error: Loading needed libs in elf /home/user/x86_64/usr/lib/cups/filter/rastertoqpdl

Let's try forcing emulation of libcups.so.2 by using BOX64_EMULATED_LIBS=libcups.so.2 environment variable.
The result is as follows:

[BOX64] Using emulated /lib/x86_64-linux-gnu/libcupsimage.so.2
[BOX64] Using emulated /lib/x86_64-linux-gnu/libcups.so.2
[BOX64] Using emulated /lib/x86_64-linux-gnu/libjbig.so.0
[BOX64] Using emulated /lib/x86_64-linux-gnu/libstdc++.so.6
[BOX64] Using emulated /lib/x86_64-linux-gnu/libgcc_s.so.1
[BOX64] Using native(wrapped) libc.so.6
[BOX64] Using native(wrapped) ld-linux-x86-64.so.2
[BOX64] Using native(wrapped) libpthread.so.0
[BOX64] Using native(wrapped) libdl.so.2
[BOX64] Using native(wrapped) libutil.so.1
[BOX64] Using native(wrapped) libresolv.so.2
[BOX64] Using native(wrapped) librt.so.1
[BOX64] Using native(wrapped) libbsd.so.0
[BOX64] Using native(wrapped) libm.so.6
[BOX64] Using native(wrapped) libgssapi_krb5.so.2
[BOX64] Using emulated /lib/x86_64-linux-gnu/libavahi-common.so.3
[BOX64] Using emulated /lib/x86_64-linux-gnu/libavahi-client.so.3
[BOX64] Using native(wrapped) libgnutls.so.30
[BOX64] Using native(wrapped) libz.so.1
[BOX64] Using native(wrapped) libdbus-1.so.3
[BOX64] Error: Symbol __strlcpy_chk not found, cannot apply R_X86_64_JUMP_SLOT @0x7fff0109eeb8 (0x1f256) in /lib/x86_64-linux-gnu/libcups.so.2 (optver=6 / GLIBC_2.38)
[BOX64] Error: relocating Plt symbols in elf libcups.so.2
[BOX64] Error initializing needed lib libgcc_s.so.1
[BOX64] Error loading one of needed lib
[BOX64] Error: Loading needed libs in elf /home/user/x86_64/usr/lib/cups/filter/rastertoqpdl

Box64 heavily relies on library wrapping, especially for lower-level components such as libc, libdl and other core functions, like malloc() redirection.
Emulation of these libraries are not supported: attempt to use emulated-only environment will result in Box64 crash.

Both emulators are used mostly for gaming on ARM64, thus aim at Steam userspace, which is based on a rather outdated Ubuntu release. Our Debian 13 is a too fresh target for Box64, that's why __strlcpy_chk libc symbol is missing in the wrapper.
Fortunately adding a wrapper for this symbol is just a matter of a single line patch:

diff --git a/src/wrapped/wrappedlibc_private.h b/src/wrapped/wrappedlibc_private.h
index 51eac52..7476206 100644
--- a/src/wrapped/wrappedlibc_private.h
+++ b/src/wrapped/wrappedlibc_private.h
@@ -1995,6 +1995,7 @@ GO(__strncat_chk, pFppLL)
 GO(strncmp, iFppL)
 GO(strncpy, pFppL)
 GO(__strncpy_chk, pFppLL)
+GO(__strlcpy_chk, LFppLL)
 GO(__strndup, pFpL)
 GO(strndup, pFpL)
 GO(strnlen, LFpL)

Just as before, the numbers we have is for raw emulation performance, since none of the heavy-lifting driver libraries are wrapped.
Even so, both NetPBM and CUPS-Raster are very similar and simple uncompressed bitmap formats, which won't benefit from the native redirection of libcupsimage.

=== Running native ARM64 splix ===
+ /home/user/arm64/usr/lib/cups/filter/rastertoqpdl 1 1 1 1 PageSize=A4 document-a4.cups
+ sha1sum
c6badbc46d7575d3de875d0b304b5bff8f0da1ec  -

real    0m0.381s
user    0m0.484s
sys     0m0.140s

=== Running x86_64 splix in box64, stock settings ===
+ box64 /home/user/x86_64/usr/lib/cups/filter/rastertoqpdl 1 1 1 1 PageSize=A4 document-a4.cups
+ sha1sum
c6badbc46d7575d3de875d0b304b5bff8f0da1ec  -

real    0m0.908s
user    0m1.155s
sys     0m0.217s

=== Running x86_64 splix in box64, optimized settings ===
+ BOX64_DYNAREC_CALLRET=1
+ BOX64_DYNAREC_FORWARD=512
+ box64 /home/user/x86_64/usr/lib/cups/filter/rastertoqpdl 1 1 1 1 PageSize=A4 document-a4.cups
+ sha1sum
c6badbc46d7575d3de875d0b304b5bff8f0da1ec  -

real    0m0.949s
user    0m1.123s
sys     0m0.254s

=== Running x86_64 splix in FEX ===
+ FEX /home/user/x86_64/usr/lib/cups/filter/rastertoqpdl 1 1 1 1 PageSize=A4 document-a4.cups
+ sha1sum
c6badbc46d7575d3de875d0b304b5bff8f0da1ec  -

real    0m2.260s
user    0m1.730s
sys     0m0.187s

The results are quite different from the first test: FEX is the only contestant which real time is larger than user: multi-threading of splix gave it a slow down, not a boost, with the ×6 performance penalty compared to the native version.

Box64 shows ×2.4-2.5 slower execution, but unlike the first test, using special tunables made it a bit slower.


Conclusion

FEX is slower but easy to handle: just as qemu-user, it is compatible with any userspace, be it old or new.
Box64 is faster, but using it in non-Steam userspace requires thorough testing and either wrapper modifications or forcing of particular library emulation. Unlike FEX, it has extensive tunables and hacks, which could be applied per-process.

Both in general are about ×4 faster than qemu-user's TCG, and are a great software!

foo2zjs FEX:        real    0m21.263s
foo2zjs qemu-user:  real    1m29.166s

Versions used: