Firewall throughput measurements: OPNsense on APU4d4 and Edge4Go, OPNsense in a Proxmox VM, and OpenWRT on Turris Omnia

Posted on 2026-03-18

Why

For a few weeks, I have been struggling to make OPNsense perform well from a performance point of view on my low-power test box, an APU4d4. While OPNsense is very well done from a firewall rules management point of view (alhtough I am not happy that forwarding rules cannot specify both incoming and outgoing interfaces like it is possible with Linux Netfilter…) and has many features of expensive firewall products (including web interface based management for clustering/failover), the FreeBSD/HardenedBSD kernel seems to be struggling with higher throughputs. After not progressing with simple trial&error with various settings gathered from different howto guides, I decided to first measure my current status properly.

In the last ca. 10 years, I have running my home lab setup with OpenWRT based routers (for a long time on Mikrotik RB2011, which is extremely power efficient for what it can do), more recently a Turris Omnia for the automatic updates coupled with maximum flexbility (and the snapshot features are really well integrated). However, for teaching our course on “Network Security” at the Institute of Networks and Security at JKU Linz, we decided to use OPNsense because it comes with an easy-to-understand web interface and is open source. A direct comparison therefore seems useful.

All systems under test have a roughly equal IP (v4 and V6) and firewall rules configuration. For completeness, I compare the OPNsense installation on the APU4d4 to a similarly configured OPNsense instance inside a VM on the same Proxmox host.

Update 2026-03

I have now added a Thomas Krenn Edge4Go hardware box to my homelab testing mix, because it is also small and power-efficient, but comes with a much more powerful CPU (Intel Celeron J3455) compared with the APU4d4 (AMD Embedded G series GX-412TC). Both have 4 Gigabit Ethernet ports, 4 GB RAM, and are fanless and therefore completely quiet designs. This post has therefore been updated with the new throughput measured on this hardware.

How

My setup is pretty simple: a Proxmox server hosting a small number of VMs that are connected to a DMZ VLAN, attached through a Linux host bridge that connects virtio network interfaces for the VMs with a tagged VLAN on the hardware NIC as a trunk to the local Ethernet switch. On the same switch, I have a desktop connected through a 1Gbps link. The switch is configured as a pure L2 switch (with multiple VLANs), all routing is done through the firewalls under test. One VM on the Proxmox host runs the iperf3 service, the host itself (on a different VLAN) as well as the separate desktop run iperf3 clients.

The four systems to compare are:

	Turris Omnia	APU4d4 OPNsense	Edge4Go OPNsense	VM OPNsense
CPU	Marvell Armada 380/385	AMD GX-412TC	Intel Celeron J3455	Intel Celeron G3900
CPU speed	1.6 GHz	1 GHz (1.4 GHz boost)	1.5 GHz	2.8 GHz
CPU cores	2	4	4	2
RAM	2GB	4GB	4GB	4GB
NIC	builtin	Intel i211AT	Intel I210	virtio (vhost_net) / Intel 82599
OS version	5.1.10 (Linux kernel 4.14.222)	26.1.4 (FreeBSD 14.3-RELEASE-p9)	26.1.4 (FreeBSD 14.3-RELEASE-p9)	26.1.4 (FreeBSD 14.3-RELEASE-p9)
Power usage (normal load)	14-16W	9-11W	11-14W	marginal (the host is running anyways)
Power usage (GBit load)	16W	14W	16W	marginal (the host is running anyways)

Previous measurements from 2021

For reference, this is the previous configuration for the initial measurements in 2021:

	Turris Omnia	APU4d4 OPNsense	VM OPNsense
CPU	Marvell Armada 380/385	AMD GX-412TC	Intel Celeron G3900
CPU speed	1.6 GHz	1 GHz (1.4 GHz boost)	2.8 GHz
CPU cores	2	4	2
RAM	2GB	4GB	4GB
NIC	builtin	Intel i211AT	virtio (vhost_net) / Intel 82599
OS version	5.1.10 (Linux kernel 4.14.222)	21.1.4 (FreeBSD 12.1-RELEASE-p15-HBSD)	21.1 (FreeBSD 12.1-RELEASE-p12-HBSD)
Power usage (under load)	14-16W	9-14W	marginal (the host is running anyways)

OPNsense on the APU4d4 and Edge4Go has the recommended settings from here and here applied. OPNsense inside the VM has NIC hardware offload features disabled and VM configured with recommended settings from here as well as all VLANs terminated on the Linux kernel host side and bridged into the VM as independent virtual network interfaces (the consensus seems to be that VLAN tag handling is faster on Linux than BSD in such a virtualized setting).

Results

First I took a baseline mearurement with an iperf3 client running on the VM host itself, connecting to an iperf3 server running within a Debian 10 VM without any of these test systems in the routing path, but simply a virtio network connection on a single VLAN / IP subnet. The limit was CPU bound, as my Proxmox host (wich a low-power CPU) ran around 90% load over both cores during this baseline test.

All measurements were taken with iperf3 in TCP mode with 1 or 4 parallel streams and 20 repetitions:

iperf3 -c <server IP> -P <number of streams> -t 20

Baseline	Average throughput (retry packets)
IPv4 1 stream	4.47 Gbps (273 retries)
IPv6 1 stream	4.25 Gbps (229 retries)
IPv4 4 streams	4.45 (4233 retries)
IPv6 4 streams	4.48 Gbps (5247 retries)

Measuring from the VM host (for measuring VM OPNsense without GBit Ethernet limitations) to the VM (but on different VLANs, forcing traffic to be routed through the firewall under test), first without IPsec active (all transfer rates in Mbps):

VM->VM no IPsec	Turris Omnia	APU4D4 OPNsense	Edge4Go OPNsense	VM OPNsense
IPv4 1 stream	695 (1485 retr)	355 (417 retr)	868 (383 retr)	1180 (365 retr)
IPv6 1 stream	422 (1327 retr)	368 (294 retr)	869 (271 retr)	1060 (174 retr)
IPv4 4 streams	732 (10981 retr)	569 (2501 retr)	879 (1529 retr)	1150 (4110 retr)
IPv6 4 streams	415 (6570 retr)	647 (3486 retr)	869 (1884 retr)	1150 (2732 retr)

Note that the VM->VM measurements, when going through the VM OPNsense instance, are all on the same physical host and therefore not bound by any hardware network limits, but only CPU and efficiency of the 3 network stacks involved. It’s interesting that IPv6 traffic was faster than IPv4 in this case.

Previous measurements from 2021

Again the original measurements from 2021 (with older OPNsense versions) for reference (Turris Omnia unchanged because I haven’t re-measured that one):

VM->VM no IPsec	Turris Omnia	APU4D4 OPNsense	VM OPNsense
IPv4 1 stream	695 (1485 retr)	493 (1095 retr)	1090 (148 retr)
IPv6 1 stream	422 (1327 retr)	341 (719 retr)	714 (retr)
IPv4 4 streams	732 (10981 retr)	736 (12236 retr)	1140 (1993 retr)
IPv6 4 streams	415 (6570 retr)	629 (10386 retr)	793 (997 retr)

Then with two IPsec tunnels to external sites configured and up/routed, but with the traffic under test explicitly not being routed through the tunnels. That is, those tunnel policies are loaded in the kernel, but the test traffic should not be matched by those policies. As we can see in the results, there is however a very clear performance impact on OPNsense when we hit CPU limits (I have only measured 4 streams, as we already know that single stream performance is limited on OPNsense with low power CPUs):

(These are only the originals from 2021, as I no longer use IPsec in favor of Wireguard tunnels and no longer have this slowdown.)

VM->VM with IPsec	Turris Omnia	APU4D4 OPNsense	VM OPNsense
IPv4 4 streams	689 (10369 retr)	551 (9520 retr)	869 (1526 retr)
IPv6 4 streams	405 (5854 retr)	413 (4911 retr)	799 (430 retr)

Repeating the measurements from a physically separate client, with the test traffic going through a physical switch to the (physical or virtual) firewall under test, then (for the two physical firewalls, not the VM one) through the same switch (different VLAN) and to the Proxmox host, where the VLAN tagged traffic is bridged into the VM running the iperf3 server:

VM->VM no IPsec	Turris Omnia	APU4D4 OPNsense	Edge4Go OPNsense	VM OPNsense
IPv4 1 stream	771 (669 retr)	373 (315 retr)	825 (474 retr)	811 (19 retr)
IPv6 1 stream	435 (586 retr)	338 (86 retr)	907 (136 retr)	803 (822 retr)
IPv4 4 streams	777 (1805 retr)	841 (4674 retr)	924 (634 retr)	784 (2342 retr)
IPv6 4 streams	413 (873 retr)	563 (1949 retr)	884 (205 retr)	780 (1564 retr)

Previous measurements from 2021

Again the original measurements from 2021 (with older OPNsense versions) for reference (Turris Omnia unchanged because I haven’t re-measured that one):

VM->VM no IPsec	Turris Omnia	APU4D4 OPNsense	VM OPNsense
IPv4 1 stream	771 (669 retr)	525 (62 retr)	737 (15 retr)
IPv6 1 stream	435 (586 retr)	401 (38 retr)	653 (9 retr)
IPv4 4 streams	777 (1805 retr)	867 (302 retr)	784 (661 retr)
IPv6 4 streams	413 (873 retr)	663 (662 retr)	699 (513 retr)

VM->VM with IPsec	Turris Omnia	APU4D4 OPNsense	VM OPNsense
IPv4 4 streams	737 (1577 retr)	562 (304 retr)	710 (505 retr)
IPv6 4 streams	402 (1584 retr)	585 (206 retr)	728 (220 retr)

Conclusions

For standard routing and firewalling of multiple parallel streams, OPNsense on a low-power APU4d4 system performs a bit better (noticably better with IPv6) than a Turris Omnia with slightly less electrical power draw under load. OPNsense has the advantage of much nicer UI for firewall rules (including the possibility to define host objects and groups spanning IPv4 and IPv6), more control in terms of monitoring the firewall, nicely integrated modules like VPN protocols, and the beginnings of an API for automated configuration. Pretty much all of that can also be done with OpenWRT, but mostly off the shell or through config files of wide variety. None of these physical systems reach a full Gbps firewalling speed like the even lower powered Mikrotik systems with RouterOS and Fasttrack do.

However, there are currently two areas of concern:

Single stream performance is worse, and this is a known problem for FreeBSD kernels. For a single stream (e.g. uploads/downloads to/from a local fileserver), OPNsense (both physical on the APU4d4 and virtual on a power efficient server CPU) is limited to about half the maximum Ethernet throughput. This may or may not be relevant to your use case.
When IPsec is active - even if the relevant traffic is not part of the IPsec policy - throughput is decreased by nearly 1/3. This seems like a real performance issue / bug in the FreeBSD/HardenedBSD kernel. I will need to try with VTI based IPsec routing to see if the in-kernel policy matching is a problem.
These tests intentionally deactivated some of the interesting OPNsense features such as traffic analysis with samplicate/flowd_aggregate. Enabling them will again cost around 150-200Mbps throughput on the APU4d4, stacked on top of the performance drop of IPsec if all are enabled.

Update 2026: From the measurements repeated 5 years later with broadly comparable configuration, but a bit of added complexity in terms of number of firewall rules (and running an active Wireguard tunnel), the APU4d4 became slightly worse in terms of single-stream throughput. That is a clear sign that the AMD GX-412TC CPU is a major bottleneck with OPNsense and the FreeBSD network/pf stack single-thread performance.

On the other hand, newer hardware - in particular the low-power Edge4Go with an Intel Celeron J3455 - allows a current OPNsense to reach throughputs that saturate GBit Ethernet even for single-stream traffic (on par with OPNsense in a VM). The web interface also feels significantly more snappy than on the older APU4d4 CPU. This is not completely neutral, though, as power consumption of the Edge4Go is also marginally higher (2-3W more compared to APU4d4, measured with a very simple power meter, so take with a grain of salt).

I do not know if the IPsec performance degradation is still an issue, as I have since abandoned IPSec in favor of Wireguard.