By Janne Ruostemaa, UpCloud
Finding the best cloud servers for your use case can be challenging. Most providers offer a number of different configurations with varying amounts of resources to choose from. However, depending on your requirements, similar system specifications might not result in comparable performance. To help save valuable time and make an educated choice, you should begin by checking out benchmarks.
In this post, we’ll go over the different aspects of benchmarking cloud servers and how to evaluate the results. Follow along if you’d like to gain a better understanding of cloud server performance and how it is measured.
The goal of benchmarking cloud servers is to allow you to compare systems across different tasks and platforms. Benchmarks are essentially a series of tasks that are used to evaluate the system performance. While such tasks could be performed manually, it is far easier to use a trusted benchmark software such as sysbench.
Sysbench is a popular, open source, scriptable and multi-threaded benchmarking suite. It provides extensive statistics about operation rates and latency with minimal overhead even with thousands of concurrent threads. Sysbench is a great tool for testing anything from databases to general system performance. It is one of the best options around for reliable server benchmarking.
Sysbench is available in the public repositories for most Linux distributions. The commands and results in this post use the release version 1.0 which can be found on the Ubuntu 18.04 package manager.
Running a suite of benchmarks, such as sysbench, can yield much interesting data from raw throughput numbers to system averages. While numeric scores are useful for ranking hosts, actually understanding the results is just as valuable. But, due to the vast amount of data provided by most benchmarking tools, getting the hang of reading the results can be tough.
So what do benchmarks measure? Commonly servers are tested for their system resources: CPU, memory, storage, and networking. Although application specific benchmarks are also useful, resource-oriented tests aim to give a good overview of the host’s general performance. Continue on as we break down benchmarking each of the main three resources and how to read the results.
Computing performance can be measured by how many operations the system is capable of performing within a given time (events/sec) or by how long a certain task takes to complete.
The results largely depend on the number of virtual CPU cores allocated to the server but that is not the whole truth. While the race on clock speeds has slowed down, there are still noticeable differences between CPU models and generational upgrades. Therefore, the same number of cores might not perform the same between providers.
Below is an example of a sysbench command for testing CPU performance. The test calculates prime numbers up to the max prime using a set number of threads for 60 seconds.
sysbench cpu --cpu-max-prime=20000 --threads=4 --time=60 run
The output will include results such as in the following example from a benchmark on a single CPU core.
CPU speed:
events per second: 494.97
General statistics:
total time: 60.0003s
total number of events: 29699
Latency (ms):
min: 2.01
avg: 2.02
max: 2.91
95th percentile: 2.03
sum: 59994.07
Threads fairness:
events (avg/stddev): 29699.0000/0.00
execution time (avg/stddev): 59.9941/0.00
In the above example, the CPU ran the task for 60 seconds reaching close to 30k total events while averaging 494.97 events per seconds. Such results can be put in a better perspective when compared to other configurations, but single core performance is often the best starting point.
The freedom of virtualisation on cloud servers allows for many benefits. On shared infrastructure like in public cloud, this can offer additional control over the resource usage to both the users and the providers.
Some public clouds are only suited for short duration intermittent bursts of usage. These type of hosts might offer good enough performance for certain kinds of use cases. However, the provider could throttle sustained CPU usage to guarantee the burst power is available when needed. Some providers also offer specialized computing platforms that are aimed at high CPU usage but come with premium pricing.
It is therefore exactly because of the different approaches to the CPU scheduling that benchmarks must account for hidden management systems. The duration of the CPU stressing while benchmarking is important to avoid inflated numbers from limited time boost clocks.
Another approach that more dedicated testers can take is to spread the benchmarks over a longer period of time while limiting the CPU load to avoid use that might be considered abusive. This type of CPU endurance testing is vital for evaluating the stability of available CPU cycles over time and detecting any underlying computation quotas.
The primary purpose of system memory is to reduce the time it takes for processes to access information. While fetching records from RAM is much faster than having to read it from a storage device, it is still considered slow in terms of CPU speeds.
Luckily system memory is one of the simpler things to benchmark. Sysbench, for example, has easy to run throughput tests for both reads and writes like the command underneath. This test commands the system to write 100GB worth of data into memory with a 30 second time limit to prevent prolonged tests on slower hosts.
sysbench memory --memory-oper=write --memory-block-size=1K --memory-scope=global --memory-total-size=100G --threads=4 --time=30 run
Memory performance is usually measured in either transfer rate (MB/s) or operations rate (ops/sec). The results from the above test will show something along the lines of the example below.
Total operations: 104857600 (5501166.03 per second)
102400.00 MiB transferred (5372.23 MiB/sec)
General statistics:
total time: 19.0596s
total number of events: 104857600
Latency (ms):
min: 0.00
avg: 0.00
max: 1.09
95th percentile: 0.00
sum: 63754.83
Threads fairness:
events (avg/stddev): 26214400.0000/0.00
execution time (avg/stddev): 15.9387/0.01
The important numbers in the results above are the total time taken as well as the low latencies which indicate stable performance. Notice how the total time is actually less than the server was afforded during the test, meaning that it was capable of finishing the task in 2/3 of the time.
However, the results can differ between providers due to differences in server memory speeds. Newer CPU architectures support faster memory and offer better performance in general. Because of system-wide advances, memory speeds often go hand in hand with CPU performance.
Not everything can always be stored in system memory and occasionally processes must access the storage device. Whether it’s to read from a database or write to a system log, the speeds and latencies of these operations can make a big difference. And as expected, sysbench has great tools for testing storage speeds.
Running storage benchmarks using sysbench requires test files. The following fileio command prepares a number of test files for a total size of 10 gigabytes. It is important to make sure the total file size exceeds the amount of system memory to avoid inflated results due to caching.
sysbench fileio --file-total-size=10G prepare
Storage speed tests are generally divided between reads and writes due to the differences in these operations, but also in whether the storage is accessed randomly or sequentially. Certain types of tasks benefit differently from storage performance in random and sequential operations. Sysbench can run any combination of these test types and provides results comparable to most use cases. Regardless of the access type or pattern, the storage throughput is commonly measured in megabytes (MB/s) or operations per second (IOPS).
Random reads and writes are probably the more common types of storage loads. Due to different requests from varying tasks, consecutive accesses to storage rarely fall in neighbouring addresses, hence the access pattern is called random.
The next example command runs a random read test on the storage disk using the files prepared in advance with 4 kilobytes block size.
sysbench fileio --file-test-mode=rndrd --file-total-size=10G --file-block-size=4K --threads=4 --time=60 run
The output in the example underneath shows results from the above random read test.
File operations:
reads/s: 55136.12
writes/s: 0.00
fsyncs/s: 0.00
Throughput:
read, MiB/s: 215.38
written, MiB/s: 0.00
General statistics:
total time: 60.0013s
total number of events: 3308356
Latency (ms):
min: 0.00
avg: 0.07
max: 6.28
95th percentile: 0.11
sum: 239277.34
Threads fairness:
events (avg/stddev): 827089.0000/664.42
execution time (avg/stddev): 59.8193/0.00
The numbers to focus on in the above are reads per second, read throughput, and the average latencies. The test result between IOPS and throughput depend a lot on the block size where 4K is a balanced option. Try running the same benchmark with different block sizes to see how it affects the results.
Sequential access to storage is common with large file sizes such as audio and video. When a system is reading or writing in sequential order, the storage device wastes less time in related operations. Thanks to the faster access, sequential operations provide better throughput and benchmark scores. For the same reason manufacturers often quote sequential operations when listing disk speeds.
Below is an example of a sequential write test with the usual 4K block size.
sysbench fileio --file-test-mode=seqwr --file-total-size=10G --file-block-size=4K --threads=4 --time=60 run
The following output shows the results from a sequential write test such as the command above.
File operations:
reads/s: 0.00
writes/s: 40914.64
fsyncs/s: 52368.85
Throughput:
read, MiB/s: 0.00
written, MiB/s: 159.82
General statistics:
total time: 60.0011s
total number of events: 5597287
Latency (ms):
min: 0.00
avg: 0.04
max: 6.58
95th percentile: 0.12
sum: 238350.67
Threads fairness:
events (avg/stddev): 1399321.7500/2122.56
execution time (avg/stddev): 59.5877/0.00
The test results look much the same as for the previous example for the random read. The important bits are again in the number of operations per second (IOPS), throughput (MB/s), and latencies (ms).
As mentioned, the time it takes for the devices to complete a request is another interesting metric for storage performance. The latency is usually displayed in milliseconds (ms), and in sysbench results, the important numbers to watch for are the averages and 95th percentiles. Systems that usually server static web pages rely on low latencies in storage reads to provide snappy user experiences.
You should also note that while most providers have moved to mainly storage solutions on SSDs and beyond, it’s still possible to come across an HDD backend for cheaper mass storage. The differences between spinning discs and solid state drives are huge in both price and performance and are not directly comparable. As the capacities of individual SSDs increase, the need for the slower alternative is diminishing.
Finding the best cloud servers for your use case becomes a lot easier with good benchmarks. You can gain real insight by understanding how benchmarks work and what the results mean. As such, many of the benchmark metrics go hand in hand and should be evaluated together.
Having a good grasp of benchmarking methods will also help in testing providers yourself. With great readily available benchmarking tools such as sysbench, public results are easy to validate.
While benchmark scores are great for objectively rating providers, actual use testing should still be the follow-up. In many cases, you might achieve significant performance boost on a different provider even at a similar price and configuration. Comparing benchmarks will help you pick the best for real testing. Avoid wasting time by not configuring your test environment on unsuitable options. Furthermore, good benchmarks should help in finding the sweet spot in balancing the system resources with throughputs and response times to optimize your platform.