CAF 0.18 is out!
More than a year has passed from designing and prototyping the new messaging layer to cutting the release that implements the changes. In our announcement on breaking changes from 11/2019, we quoted performance considerations as one of our primary motivations for implementing the new messaging layer:
Over the course of the last years of developing CAF, we came to understand
message
as avariant
-like type (…), but the implementation (…) comes at a cost. Obvious costs through virtual dispatch, but also runtime overhead for matching message handler signatures to the content of incoming messages. Further, because a message can essentially hold anything, CAF has to include a lot of meta data in each message on the wire. This increases size overhead on the wire but also makes deserializing messages costly.
Prior to CAF 0.18, we relied on std::type_info
and held a run-time generated
table that users filled with custom types by calling
actor_system_config::add_message_type
prior to initializing the
actor_system
.
With the new type ID blocks, users instead have to list all their types
statically. This also comes with the benefit of allowing CAF to check whether a
type was listed as a legal message type. Not calling
actor_system_config::add_message_type
was a common source of error, especially
when introducing new message types to an existing software. With the new type ID
blocks, you still have to initialize the run-time lookup table (manually or by
passing the type ID to CAF_MAIN
) but this is robust under refactoring: adding
a type to the block also updates the initialization logic.
The new type IDs in CAF allow us to reduce the amount of meta information
drastically. A caf::message
simply points to an array of 16-bit values that
encode the run-time types of all elements. On the wire, CAF no longer prefixes a
message with the full type names as string but again with a list of 16-bit
integers. The new implementation for caf::message
also uses a flat memory
layout instead of the previous tree structure.
So how much performance improvement, if any, can we expect after porting an application to 0.18? To answer this question, we have implemented a set of micro benchmarks. The full source code is available online on GitHub.
Our first benchmark set is called message_creation
. It looks at the run-time
cost of creating messages. CAF has two APIs to do this: pass all elements at
once to make_message
or create a message dynamically using a
message_builder
:
BENCHMARK_F(message_creation, make_message)(benchmark::State& state) {
for (auto _ : state) {
auto msg = make_message(size_t{0});
benchmark::DoNotOptimize(msg);
}
}
BENCHMARK_F(message_creation, message_builder)(benchmark::State& state) {
for (auto _ : state) {
message_builder mb;
message msg = mb.append(size_t{0}).move_to_message();
benchmark::DoNotOptimize(msg);
}
}
The next two benchmark sets pattern_matching
and or_else
measure how long it
takes to dispatch a message to a behavior
. The only difference between the two
sets is that the former creates the behavior form a single list of lambda
expressions, while the latter wraps each lambda into an individual behavior
before gluing all of them together via or_else
. The messages as well as the
behaviors are created once during setup before running the benchmark loop. The
implementation for these benchmarks involves setting up fixtures, so we omit
them here for brevity.
The last benchmark set is called serialization
and it measures the run-time
cost of saving or loading builtin and custom types. As custom types, the
benchmark uses a simple POD called foo
and a slightly more “advanced” type
bar
that includes the custom type foo
as a member variable to force CAF to
recursively call inspect
:
struct foo {
int32_t a;
int32_t b;
};
struct bar {
foo a;
std::string b;
};
#if CAF_VERSION >= 1800
template <typename Inspector>
bool inspect(Inspector& f, foo& x) {
return f.object(x).fields(f.field("a", x.a), f.field("b", x.b));
}
template <typename Inspector>
bool inspect(Inspector& f, bar& x) {
return f.object(x).fields(f.field("a", x.a), f.field("b", x.b));
}
#else // CAF_VERSION >= 1800
template <typename Inspector>
typename Inspector::result_type inspect(Inspector& f, foo& x) {
return f(x.a, x.b);
}
template <typename Inspector>
typename Inspector::result_type inspect(Inspector& f, bar& x) {
return f(x.a, x.b);
}
#endif // CAF_VERSION >= 1800
This benchmark also creates the messages once before running the benchmark loops. We omit the implementations details here also for brevity. The sources are just one click away, though.
Hardware in use:
Software versions:
b7d063b0d884ab9c210221c5d101e4a4797b2b7e
0.16.5
0.17.6
8d27730124440c5c7422126a222b1bf2a0262e3f
(slightly
before the final release tag)Without further ado, here are the results for the micro benchmarks using the last three major release versions. All values are wall-clock time in nanoseconds as reported by Google Benchmark.
Benchmark | CAF 0.16.5 | CAF 0.17.6 | CAF 0.18 |
---|---|---|---|
message_creation/make_message |
57.3 | 57.3 | 53.5 |
message_creation/message_builder |
258 | 248 | 339 |
pattern_matching/make_message |
233 | 219 | 103 |
pattern_matching/message_builder |
239 | 234 | 102 |
or_else/make_message |
364 | 347 | 168 |
or_else/message_builder |
348 | 346 | 167 |
serialization/save_foo_binary |
143 | 136 | 110 |
serialization/save_bar_binary |
174 | 174 | 127 |
serialization/save_msg_2int_binary |
247 | 237 | 138 |
serialization/load_foo_binary |
47.8 | 31.0 | 8.44 |
serialization/load_bar_binary |
84.7 | 55.5 | 16.8 |
serialization/load_msg_2int_binary |
627 | 566 | 266 |
Creating messages using make_message
is slightly faster than before, due to
the simpler (flat) data structure used internally as of 0.18. However, the same
design makes the message_builder
slower than before, because once all values
are added to the builder, it has to allocate a new block of memory and move or
copy everything into place.
Selecting a matching handler from a behavior
is much quicker than before with
a speedup of > 2 compared to earlier versions of CAF.
The serialization components also show significant performance improvements.
Saving values to a binary_serializer
is now 20-70% faster, because CAF has to
generate less data. The biggest win comes when deserializing objects from a
binary_deserializer
. There is less data to read and string lookups became
simple offset lookups due to the new type IDs.
When browsing the micro benchmarks repository, you might notice one additional
benchmark not covered here: actors/spawn_and_await
. This benchmark spawns an
actor that does nothing and then waits until it terminated. The values we had
measured in our performance study where 41355ns, 41142ns and 42618ns for CAF
0.16.5, 0.17.6 and 0.18 respectively. This result fluctuates more than the
others. We did not focus on this result, because it measures many components in
CAF all at once.
If you had followed our past benchmarking articles or read some of our research
papers, you may also remember the benchmark suite consisting of mixed_case
,
actor_creation
, and mailbox_performance
. For the sake of completeness, we
also recorded values for those. This time, all values in seconds:
Benchmark | CAF 0.16.5 | CAF 0.17.6 | CAF 0.18 |
---|---|---|---|
mixed_case |
22.8 | 26.8 | 24.2 |
actor_creation |
2.9 | 1.3 | 0.7 |
mailbox_performance |
34.9 | 35.2 | 33.7 |
Results shown in the table above are the average of ten runs. None of these
benchmarks do any networking and thus no serialization. We included them mostly
as a sanity check here to make sure CAF 0.18 does not introduce performance
regressions. You can also find these three benchmarks (among others)
online on GitHub. The exact
commit used here is 383b745a4ad18b22a48166a048cb34e1bce40aaf
.
CAF 0.18 introduced a new type inspection API and static type IDs for user-defined types. These two changes are most obvious when switching to the new release, because they require code changes.
Overall, we are very happy with the performance we get from the new messaging layer. Keep in mind that micro-benchmark results for serialization with speedup of 2 or more do not translate to a speedup of that magnitude for a distributed CAF application. Serialization is only one fraction of the total work.
Aside from any performance gains, we are most excited about new capabilities
opening up as a result of the new type inspection and messaging layer. The new
type inspection API passes more information to CAF (especially field names),
which allows us to read and write more data formats such as JSON. Using more
compact meta information on the wire and avoiding dependencies on
std::type_info
also allows us to tune CAF for resource-constrained devices to
enable more IOT and edge computing use cases.