1.4 Performance
As you’re probably aware, computation speed is a significant limiting factor for any cryptographic operation. OpenSSL comes with a built-in benchmarking tool that you can use to get an idea about a system’s capabilities and limits. You can invoke the benchmark using the speed
command.
If you invoke speed
without any parameters, OpenSSL produces a lot of output, little of which will be of interest. A better approach is to test only those algorithms that are directly relevant to you. For example, for usage in a secure web server, you might care about the performance of RSA and ECDSA and will do something like this:
$ openssl speed rsa ecdsa
The first part of the resulting output consists of the OpenSSL version number and compile-time configuration. This information is useful for record-keeping and if you’re testing different versions of OpenSSL:
OpenSSL 1.1.1f 31 Mar 2020
built on: Mon Apr 20 11:53:50 2020 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-P_ODHM/openssl-1.1.1f=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_TLS_SECURITY_LEVEL=2 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
The rest of the output contains the benchmark results. Let’s first take a look at the RSA key operations:
sign verify sign/s verify/s
rsa 512 bits 0.000073s 0.000005s 13736.4 187091.4
rsa 1024 bits 0.000207s 0.000014s 4828.4 71797.6
rsa 2048 bits 0.000991s 0.000045s 1009.1 22220.4
rsa 3072 bits 0.004796s 0.000096s 208.5 10463.5
rsa 4096 bits 0.011073s 0.000165s 90.3 6054.5
rsa 7680 bits 0.090541s 0.000565s 11.0 1769.7
rsa 15360 bits 0.521500s 0.002204s 1.9 453.7
RSA is most commonly used at 2,048 bits. In my results, one CPU of the tested server can perform about 1,000 sign (server) operations and 22,000 verify (client) operations every second. As for ECDSA, it’s typically only used at 256 bits. We can see that at this length, ECDSA can do 10 times as many signatures. On the other hand, it’s slower when it comes to the verifications, at barely 6,500 operations per second:
sign verify sign/s verify/s
256 bits ecdsa (nistp256) 0.0000s 0.0002s 20508.1 6566.2
384 bits ecdsa (nistp384) 0.0017s 0.0013s 580.4 755.0
521 bits ecdsa (nistp521) 0.0006s 0.0012s 1711.5 840.8
In practice, you care more about the sign operations because servers are designed to provide services to a great many clients. The clients, on the other hand, are typically communicating with only a small number of servers at the same time. The fact that ECDSA is slower in this scenario doesn’t matter much.
What’s this output of speed
useful for? You should be able to compare how compile-time options affect speed or how different versions of OpenSSL compare on the same platform. If you’re thinking of switching servers, benchmarking OpenSSL can give you an idea of the differences in computing power. You can also verify that the hardware acceleration is in place.
Using the benchmark results to estimate deployment performance is not straightforward because of the great number of factors that influence performance in real life. Further, many of those factors lie outside TLS (e.g., HTTP keep-alive settings, caching, etc.). At best, you can use these numbers only for a rough estimate.
But before you can do that, you need to consider something else. By default, the speed
command will use only a single process. Most servers have multiple cores, so to find out how many TLS operations are supported by the entire server, you must instruct speed
to use several instances in parallel. You can achieve this with the -multi
switch. My server has two cores, so that’s what I’m going to specify:
$ openssl speed -multi 2 rsa
[...]
sign verify sign/s verify/s
rsa 512 bits 0.000037s 0.000003s 27196.5 367409.6
rsa 1024 bits 0.000106s 0.000007s 9467.8 144188.0
rsa 2048 bits 0.000503s 0.000023s 1988.1 43838.4
rsa 3072 bits 0.002415s 0.000050s 414.1 20152.2
rsa 4096 bits 0.005589s 0.000084s 178.9 11880.8
rsa 7680 bits 0.045659s 0.000285s 21.9 3506.1
rsa 15360 bits 0.264904s 0.001130s 3.8 884.8
As expected, the performance is about two times better. I’m again looking at how many RSA signatures can be completed per second, because this is the most CPU-intensive cryptographic operation performed on a server and is thus always the first bottleneck. The result of 1,988 signatures/second (with a 2,048-bit key) tells us that this small server will most definitely handle hundreds of brand-new TLS connections per second. (We have to assume that the server will do other things, not only TLS handshakes.) In my case, that’s sufficient—with a very healthy safety margin. Because I also have session resumption enabled on the server—and that bypasses public encryption—I know that the performance will be even better.
When testing speed, it’s important to always enable hardware acceleration using the -evp
switch. If you don’t, the results can be vastly different. As an illustration, take a look at the performance differences on a server that supports AES-NI hardware acceleration. I got the following with a software-only implementation:
$ openssl speed aes-128-cbc
[...]
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128 cbc 131377.50k 135401.41k 134796.12k 133931.35k 134778.95k
The performance is more than three times better with hardware acceleration:
$ openssl speed -evp aes-128-cbc
[...]
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 421949.23k 451223.42k 460066.13k 463651.84k 462883.50k
When you’re looking at the speed of cryptographic operations, you should focus on the primitives you will actually deploy. For example, CBC is obsolete, so you want to use AES in GCM mode instead. And here we see how the GCM performance is three to four times better:
$ openssl speed -evp aes-128-gcm
[...]
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-gcm 219599.85k 588822.40k 1313242.97k 1680529.75k 1989388.97k
Then there is ChaCha20-Poly1305, which is a relatively recent addition. Its performance can’t compete with hardware-accelerated AES, but it doesn’t need to; this authenticated cipher is designed to be fast on mobile phones. Compare its speed to nonaccelerated AES-128-CBC instead.
$ openssl speed -evp chacha20-poly1305
[...]
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
chacha20-poly1305 148729.65k 273026.35k 590953.90k 1027021.82k 1092427.78k
Starting with OpenSSL 3.0, hardware acceleration is always used when supported by the CPU, irrespective of the presence or absence of the -evp
switch.