Firewall throughput measurements: OPNsense on APU4d4 and Edge4Go, OPNsense in a Proxmox VM, and OpenWRT on Turris Omnia
Why
For a few weeks, I have been struggling to make OPNsense perform well from a performance point of view on my low-power test box, an APU4d4. While OPNsense is very well done from a firewall rules management point of view (alhtough I am not happy that forwarding rules cannot specify both incoming and outgoing interfaces like it is possible with Linux Netfilter…) and has many features of expensive firewall products (including web interface based management for clustering/failover), the FreeBSD/HardenedBSD kernel seems to be struggling with higher throughputs. After not progressing with simple trial&error with various settings gathered from different howto guides, I decided to first measure my current status properly.
In the last ca. 10 years, I have running my home lab setup with OpenWRT based routers (for a long time on Mikrotik RB2011, which is extremely power efficient for what it can do), more recently a Turris Omnia for the automatic updates coupled with maximum flexbility (and the snapshot features are really well integrated). However, for teaching our course on “Network Security” at the Institute of Networks and Security at JKU Linz, we decided to use OPNsense because it comes with an easy-to-understand web interface and is open source. A direct comparison therefore seems useful.
All systems under test have a roughly equal IP (v4 and V6) and firewall rules configuration. For completeness, I compare the OPNsense installation on the APU4d4 to a similarly configured OPNsense instance inside a VM on the same Proxmox host.
I have now added a Thomas Krenn Edge4Go hardware box to my homelab testing mix, because it is also small and power-efficient, but comes with a much more powerful CPU (Intel Celeron J3455) compared with the APU4d4 (AMD Embedded G series GX-412TC). Both have 4 Gigabit Ethernet ports, 4 GB RAM, and are fanless and therefore completely quiet designs. This post has therefore been updated with the new throughput measured on this hardware.
How
My setup is pretty simple: a Proxmox server hosting a small number of VMs that are connected to a DMZ VLAN, attached through a Linux host bridge that connects virtio network interfaces for the VMs with a tagged VLAN on the hardware NIC as a trunk to the local Ethernet switch. On the same switch, I have a desktop connected through a 1Gbps link. The switch is configured as a pure L2 switch (with multiple VLANs), all routing is done through the firewalls under test. One VM on the Proxmox host runs the iperf3 service, the host itself (on a different VLAN) as well as the separate desktop run iperf3 clients.
The four systems to compare are:
| Turris Omnia | APU4d4 OPNsense | Edge4Go OPNsense | VM OPNsense | |
|---|---|---|---|---|
| CPU | Marvell Armada 380/385 | AMD GX-412TC | Intel Celeron J3455 | Intel Celeron G3900 |
| CPU speed | 1.6 GHz | 1 GHz (1.4 GHz boost) | 1.5 GHz | 2.8 GHz |
| CPU cores | 2 | 4 | 4 | 2 |
| RAM | 2GB | 4GB | 4GB | 4GB |
| NIC | builtin | Intel i211AT | Intel I210 | virtio (vhost_net) / Intel 82599 |
| OS version | 5.1.10 (Linux kernel 4.14.222) | 26.1.4 (FreeBSD 14.3-RELEASE-p9) | 26.1.4 (FreeBSD 14.3-RELEASE-p9) | 26.1.4 (FreeBSD 14.3-RELEASE-p9) |
| Power usage (normal load) | 14-16W | 9-11W | 11-14W | marginal (the host is running anyways) |
| Power usage (GBit load) | 16W | 14W | 16W | marginal (the host is running anyways) |
For reference, this is the previous configuration for the initial measurements in 2021:
| Turris Omnia | APU4d4 OPNsense | VM OPNsense | |
|---|---|---|---|
| CPU | Marvell Armada 380/385 | AMD GX-412TC | Intel Celeron G3900 |
| CPU speed | 1.6 GHz | 1 GHz (1.4 GHz boost) | 2.8 GHz |
| CPU cores | 2 | 4 | 2 |
| RAM | 2GB | 4GB | 4GB |
| NIC | builtin | Intel i211AT | virtio (vhost_net) / Intel 82599 |
| OS version | 5.1.10 (Linux kernel 4.14.222) | 21.1.4 (FreeBSD 12.1-RELEASE-p15-HBSD) | 21.1 (FreeBSD 12.1-RELEASE-p12-HBSD) |
| Power usage (under load) | 14-16W | 9-14W | marginal (the host is running anyways) |
OPNsense on the APU4d4 and Edge4Go has the recommended settings from here and here applied. OPNsense inside the VM has NIC hardware offload features disabled and VM configured with recommended settings from here as well as all VLANs terminated on the Linux kernel host side and bridged into the VM as independent virtual network interfaces (the consensus seems to be that VLAN tag handling is faster on Linux than BSD in such a virtualized setting).
Results
First I took a baseline mearurement with an iperf3 client running on the VM host itself, connecting to an iperf3 server running within a Debian 10 VM without any of these test systems in the routing path, but simply a virtio network connection on a single VLAN / IP subnet. The limit was CPU bound, as my Proxmox host (wich a low-power CPU) ran around 90% load over both cores during this baseline test.
All measurements were taken with iperf3 in TCP mode with 1 or 4 parallel streams and 20 repetitions:
iperf3 -c <server IP> -P <number of streams> -t 20
| Baseline | Average throughput (retry packets) |
|---|---|
| IPv4 1 stream | 4.47 Gbps (273 retries) |
| IPv6 1 stream | 4.25 Gbps (229 retries) |
| IPv4 4 streams | 4.45 (4233 retries) |
| IPv6 4 streams | 4.48 Gbps (5247 retries) |
Measuring from the VM host (for measuring VM OPNsense without GBit Ethernet limitations) to the VM (but on different VLANs, forcing traffic to be routed through the firewall under test), first without IPsec active (all transfer rates in Mbps):
| VM->VM no IPsec | Turris Omnia | APU4D4 OPNsense | Edge4Go OPNsense | VM OPNsense |
|---|---|---|---|---|
| IPv4 1 stream | 695 (1485 retr) | 355 (417 retr) | 868 (383 retr) | 1180 (365 retr) |
| IPv6 1 stream | 422 (1327 retr) | 368 (294 retr) | 869 (271 retr) | 1060 (174 retr) |
| IPv4 4 streams | 732 (10981 retr) | 569 (2501 retr) | 879 (1529 retr) | 1150 (4110 retr) |
| IPv6 4 streams | 415 (6570 retr) | 647 (3486 retr) | 869 (1884 retr) | 1150 (2732 retr) |
Note that the VM->VM measurements, when going through the VM OPNsense instance, are all on the same physical host and therefore not bound by any hardware network limits, but only CPU and efficiency of the 3 network stacks involved. It’s interesting that IPv6 traffic was faster than IPv4 in this case.
Again the original measurements from 2021 (with older OPNsense versions) for reference (Turris Omnia unchanged because I haven’t re-measured that one):
| VM->VM no IPsec | Turris Omnia | APU4D4 OPNsense | VM OPNsense |
|---|---|---|---|
| IPv4 1 stream | 695 (1485 retr) | 493 (1095 retr) | 1090 (148 retr) |
| IPv6 1 stream | 422 (1327 retr) | 341 (719 retr) | 714 (retr) |
| IPv4 4 streams | 732 (10981 retr) | 736 (12236 retr) | 1140 (1993 retr) |
| IPv6 4 streams | 415 (6570 retr) | 629 (10386 retr) | 793 (997 retr) |
Then with two IPsec tunnels to external sites configured and up/routed, but with the traffic under test explicitly not being routed through the tunnels. That is, those tunnel policies are loaded in the kernel, but the test traffic should not be matched by those policies. As we can see in the results, there is however a very clear performance impact on OPNsense when we hit CPU limits (I have only measured 4 streams, as we already know that single stream performance is limited on OPNsense with low power CPUs):
(These are only the originals from 2021, as I no longer use IPsec in favor of Wireguard tunnels and no longer have this slowdown.)
| VM->VM with IPsec | Turris Omnia | APU4D4 OPNsense | VM OPNsense |
|---|---|---|---|
| IPv4 4 streams | 689 (10369 retr) | 551 (9520 retr) | 869 (1526 retr) |
| IPv6 4 streams | 405 (5854 retr) | 413 (4911 retr) | 799 (430 retr) |
Repeating the measurements from a physically separate client, with the test traffic going through a physical switch to the (physical or virtual) firewall under test, then (for the two physical firewalls, not the VM one) through the same switch (different VLAN) and to the Proxmox host, where the VLAN tagged traffic is bridged into the VM running the iperf3 server:
| VM->VM no IPsec | Turris Omnia | APU4D4 OPNsense | Edge4Go OPNsense | VM OPNsense |
|---|---|---|---|---|
| IPv4 1 stream | 771 (669 retr) | 373 (315 retr) | 825 (474 retr) | 811 (19 retr) |
| IPv6 1 stream | 435 (586 retr) | 338 (86 retr) | 907 (136 retr) | 803 (822 retr) |
| IPv4 4 streams | 777 (1805 retr) | 841 (4674 retr) | 924 (634 retr) | 784 (2342 retr) |
| IPv6 4 streams | 413 (873 retr) | 563 (1949 retr) | 884 (205 retr) | 780 (1564 retr) |
Again the original measurements from 2021 (with older OPNsense versions) for reference (Turris Omnia unchanged because I haven’t re-measured that one):
| VM->VM no IPsec | Turris Omnia | APU4D4 OPNsense | VM OPNsense |
|---|---|---|---|
| IPv4 1 stream | 771 (669 retr) | 525 (62 retr) | 737 (15 retr) |
| IPv6 1 stream | 435 (586 retr) | 401 (38 retr) | 653 (9 retr) |
| IPv4 4 streams | 777 (1805 retr) | 867 (302 retr) | 784 (661 retr) |
| IPv6 4 streams | 413 (873 retr) | 663 (662 retr) | 699 (513 retr) |
| VM->VM with IPsec | Turris Omnia | APU4D4 OPNsense | VM OPNsense |
|---|---|---|---|
| IPv4 4 streams | 737 (1577 retr) | 562 (304 retr) | 710 (505 retr) |
| IPv6 4 streams | 402 (1584 retr) | 585 (206 retr) | 728 (220 retr) |
Conclusions
For standard routing and firewalling of multiple parallel streams, OPNsense on a low-power APU4d4 system performs a bit better (noticably better with IPv6) than a Turris Omnia with slightly less electrical power draw under load. OPNsense has the advantage of much nicer UI for firewall rules (including the possibility to define host objects and groups spanning IPv4 and IPv6), more control in terms of monitoring the firewall, nicely integrated modules like VPN protocols, and the beginnings of an API for automated configuration. Pretty much all of that can also be done with OpenWRT, but mostly off the shell or through config files of wide variety. None of these physical systems reach a full Gbps firewalling speed like the even lower powered Mikrotik systems with RouterOS and Fasttrack do.
However, there are currently two areas of concern:
- Single stream performance is worse, and this is a known problem for FreeBSD kernels. For a single stream (e.g. uploads/downloads to/from a local fileserver), OPNsense (both physical on the APU4d4 and virtual on a power efficient server CPU) is limited to about half the maximum Ethernet throughput. This may or may not be relevant to your use case.
- When IPsec is active - even if the relevant traffic is not part of the IPsec policy - throughput is decreased by nearly 1/3. This seems like a real performance issue / bug in the FreeBSD/HardenedBSD kernel. I will need to try with VTI based IPsec routing to see if the in-kernel policy matching is a problem.
- These tests intentionally deactivated some of the interesting OPNsense features such as traffic analysis with samplicate/flowd_aggregate. Enabling them will again cost around 150-200Mbps throughput on the APU4d4, stacked on top of the performance drop of IPsec if all are enabled.
Update 2026: From the measurements repeated 5 years later with broadly comparable configuration, but a bit of added complexity in terms of number of firewall rules (and running an active Wireguard tunnel), the APU4d4 became slightly worse in terms of single-stream throughput. That is a clear sign that the AMD GX-412TC CPU is a major bottleneck with OPNsense and the FreeBSD network/pf stack single-thread performance.
On the other hand, newer hardware - in particular the low-power Edge4Go with an Intel Celeron J3455 - allows a current OPNsense to reach throughputs that saturate GBit Ethernet even for single-stream traffic (on par with OPNsense in a VM). The web interface also feels significantly more snappy than on the older APU4d4 CPU. This is not completely neutral, though, as power consumption of the Edge4Go is also marginally higher (2-3W more compared to APU4d4, measured with a very simple power meter, so take with a grain of salt).
I do not know if the IPsec performance degradation is still an issue, as I have since abandoned IPSec in favor of Wireguard.