Userspace is not slow, some kernel interfaces are

We have significantly improved the throughput of wireguard-go, which is the user-space WireGuard® implementation used by Tailscale. What this means for you: Improved performance of the Tailscale client on Linux. We also intend to apply these changes to WireGuard upstream as well.

You can experience these improvements in the current unstable version of the Tailscale client, as well as in Tailscale v1.36, available in early 2023. Read on to find out how we did it, or skip to the Results section if you want just numbers.

Background

The Tailscale client leverages wireguard-go, a user-space WireGuard implementation written in Go, for data plane functionality. In Tailscale, wireguard-go receives unencrypted packets from the kernel, encrypts them, and sends them over a UDP socket to another WireGuard peer. The reverse flow is reversed - when receiving communications from a peer, wireguard-go first reads encrypted packets from a UDP socket, then decrypts them and writes them back to the kernel. This is a simplified view of the pipeline inside wireguard-go: the Tailscale client adds additional functionality, such as NAT traversal, access control, and key distribution.

Reference

Network performance is a complicated subject, largely because networked applications can have radically different requirements and goals. In this article, we will focus on throughput. By throughput, we mean the amount of data that can be transferred between Tailscale clients within a given time frame.

Benchmarks Disclaimer: This article contains benchmarks! These benchmarks are reproducible at the time of writing, and we provide details of the environments in which we ran them. Benchmark results tend to vary across environments, and they also tend to become outdated over time. Your mileage may vary.

We'll start with some basic numbers for wireguard-go and WireGuard in the core. Towards the end, we'll show the results of our changes. Throughput tests are performed using iperf3 on a single TCP stream, with cubic-like congestion control. Ubuntu 22.04 is the operating system on all hosts.

For these basic tests, we will use two c6i.8xlarge virtual hosts in AWS. These instances have fast network interfaces and enough CPU capacity to handle encryption at network speeds. They are in the same Region and Availability Zone:

ubuntu@thru6:~$ec2metadata | grep -E 'instance-type:|area-availability:' availability zone: us-west-2d instance type: c6i.8xlarge ubuntu@thru7:~$ec2metadata | grep -E 'instance-type:|area-availability:' availability zone: us-west-2d instance type: c6i.8xlarge ubuntu@thru6:~$ ping 172.31.56.191 -c 5 -q PING 172.31.56.191 (172.31.56.191) 56(84) bytes of data. --- 172.31.56.191 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4099ms rtt min/avg/max/mdev = 0.098/0.119/0.150/0.017 ms

This first benchmark does not use wireguard-go. It sets a throughput baseline without any WireGuard overhead:

ubuntu@thru6:~$ iperf3 -i 0 -c 172.31.56.191 -t 10 -C cube -V iperf 3.9 Linux at 6 5.15.0-1026-aws #30-Ubuntu SMP Wed Nov 23 14:15:21 UTC 2022 x86_64 MSS 8949 control connection Time: Thu Dec 08 2022 19:29:39 GMT Connecting to host 172.31.56.191, port 5201 Cookie: dcnjnuzjeobo4dne6djnj3waeq4dugc2fh7a TCP MSS: 8949 (default) [ 5] local 172.31.51.101 port 40158 connected to 172.31.56.191 port 5201 Boot Test: Protocol: TCP, 1 stream, 131072 byte blocks, skipping 0 seconds, 10 seconds test, tos 0 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-10.00s 11.1GB 9.53Gbps 0 1.29MB - - - - - - - - - - - - - - - - - - - - - - - - - Test completed. Summary results: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 s 11.1 GB 9.53 Gbps 0 sender [ 5] 0.00-10.04 s 11.1 GB 9.49 Gbps receiver CPU utilization: local/sender 10.0% (0.2%u/9.7%s), remote/receiver 4.4% (0.3%u/4.0%s) cubic snd_tcp_congestion rcv_tcp_cubic congestion

This second benchmark uses WireGuard in the core:

ubuntu@thru6:~$ iperf3 -i 0...

Userspace is not slow, some kernel interfaces are

We have significantly improved the throughput of wireguard-go, which is the user-space WireGuard® implementation used by Tailscale. What this means for you: Improved performance of the Tailscale client on Linux. We also intend to apply these changes to WireGuard upstream as well.

You can experience these improvements in the current unstable version of the Tailscale client, as well as in Tailscale v1.36, available in early 2023. Read on to find out how we did it, or skip to the Results section if you want just numbers.

Background

The Tailscale client leverages wireguard-go, a user-space WireGuard implementation written in Go, for data plane functionality. In Tailscale, wireguard-go receives unencrypted packets from the kernel, encrypts them, and sends them over a UDP socket to another WireGuard peer. The reverse flow is reversed - when receiving communications from a peer, wireguard-go first reads encrypted packets from a UDP socket, then decrypts them and writes them back to the kernel. This is a simplified view of the pipeline inside wireguard-go: the Tailscale client adds additional functionality, such as NAT traversal, access control, and key distribution.

Reference

Network performance is a complicated subject, largely because networked applications can have radically different requirements and goals. In this article, we will focus on throughput. By throughput, we mean the amount of data that can be transferred between Tailscale clients within a given time frame.

Benchmarks Disclaimer: This article contains benchmarks! These benchmarks are reproducible at the time of writing, and we provide details of the environments in which we ran them. Benchmark results tend to vary across environments, and they also tend to become outdated over time. Your mileage may vary.

We'll start with some basic numbers for wireguard-go and WireGuard in the core. Towards the end, we'll show the results of our changes. Throughput tests are performed using iperf3 on a single TCP stream, with cubic-like congestion control. Ubuntu 22.04 is the operating system on all hosts.

For these basic tests, we will use two c6i.8xlarge virtual hosts in AWS. These instances have fast network interfaces and enough CPU capacity to handle encryption at network speeds. They are in the same Region and Availability Zone:

ubuntu@thru6:~$ec2metadata | grep -E 'instance-type:|area-availability:' availability zone: us-west-2d instance type: c6i.8xlarge ubuntu@thru7:~$ec2metadata | grep -E 'instance-type:|area-availability:' availability zone: us-west-2d instance type: c6i.8xlarge ubuntu@thru6:~$ ping 172.31.56.191 -c 5 -q PING 172.31.56.191 (172.31.56.191) 56(84) bytes of data. --- 172.31.56.191 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4099ms rtt min/avg/max/mdev = 0.098/0.119/0.150/0.017 ms

This first benchmark does not use wireguard-go. It sets a throughput baseline without any WireGuard overhead:

ubuntu@thru6:~$ iperf3 -i 0 -c 172.31.56.191 -t 10 -C cube -V iperf 3.9 Linux at 6 5.15.0-1026-aws #30-Ubuntu SMP Wed Nov 23 14:15:21 UTC 2022 x86_64 MSS 8949 control connection Time: Thu Dec 08 2022 19:29:39 GMT Connecting to host 172.31.56.191, port 5201 Cookie: dcnjnuzjeobo4dne6djnj3waeq4dugc2fh7a TCP MSS: 8949 (default) [ 5] local 172.31.51.101 port 40158 connected to 172.31.56.191 port 5201 Boot Test: Protocol: TCP, 1 stream, 131072 byte blocks, skipping 0 seconds, 10 seconds test, tos 0 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-10.00s 11.1GB 9.53Gbps 0 1.29MB - - - - - - - - - - - - - - - - - - - - - - - - - Test completed. Summary results: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 s 11.1 GB 9.53 Gbps 0 sender [ 5] 0.00-10.04 s 11.1 GB 9.49 Gbps receiver CPU utilization: local/sender 10.0% (0.2%u/9.7%s), remote/receiver 4.4% (0.3%u/4.0%s) cubic snd_tcp_congestion rcv_tcp_cubic congestion

This second benchmark uses WireGuard in the core:

ubuntu@thru6:~$ iperf3 -i 0...

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow