Performance of term_to_binary vs Bbinary_to

Discussion:

Performance of term_to_binary vs Bbinary_to_term

Valentin Micic

2021-06-06 00:07:26 UTC

Hi all,

I did some performance measurement recently that included conversion of an arbitrary erlang term to its external binary representation via term_to_binary/1, as well as reversing the result using binary_to_term/1.

Iâve noticed that term_to_binary/1 is significantly faster than binary_to_term/1.

Also, Iâve observed that binary_to_term/1 performance gets considerably worse as complexity of specified term increases, whilst term_to_binary/1 maintains (more-less) steady performance.

(***@MacBook-Pro)40> tconvert:run( a, 10000000 ).

term_to_binary/1 RETURN VALUE:<<131,100,0,1,97>>
REQUEST COUNT:10000000
ELAPSED TIME (usec):97070
TIME PER REQUEST (usec): 0.009707
PROJECTED RATE (req/sec): 103018440.30081384

binary_to_term/1 RETURN VALUE:a
REQUEST COUNT:10000000
ELAPSED TIME (usec):3383483
TIME PER REQUEST (usec): 0.3383483
PROJECTED RATE (req/sec): 2955534.2822765773
ok

(***@MacBook-Pro)41> tconvert:run( {a,<<1,2,3>>, b, [1,2,3], c, {1,2,3}, d, #{a=>1, b=>2, c=>3}}, 10000000 ).

term_to_binary/1 RETURN VALUE:<<131,104,8,100,0,1,97,109,0,0,0,3,1,2,3,100,0,1,
98,107,0,3,1,2,3,100,0,1,99,104,3,97,1,97,2,97,
3,100,0,1,100,116,0,0,0,3,100,0,1,97,97,1,100,
0,1,98,97,2,100,0,1,99,97,3>>
REQUEST COUNT:10000000
ELAPSED TIME (usec):97307
TIME PER REQUEST (usec): 0.0097307
PROJECTED RATE (req/sec): 102767529.57135664

binary_to_term/1 RETURN VALUE:{a,<<1,2,3>>,
b,
[1,2,3],
c,
{1,2,3},
d,
#{a => 1,b => 2,c => 3}}
REQUEST COUNT:10000000
ELAPSED TIME (usec):8747426
TIME PER REQUEST (usec): 0.8747426
PROJECTED RATE (req/sec): 1143193.4377038456
ok

Iâve performed testing on R21.1.
Any thoughts?

V/

Valentin Micic

2021-06-07 12:25:19 UTC

Permalink

HmmmâŠ I realised that my previous email may be seen as a comment rather than a question, so let me ask the question clearly.

Given that binary_to_term/1 is about two orders of magnitude slower than term_to_binary/1, is there anyone out there that may have a reasonable explanation for that?

Kind regards

V/

Post by Valentin Micic
Hi all,
I did some performance measurement recently that included conversion of an arbitrary erlang term to its external binary representation via term_to_binary/1, as well as reversing the result using binary_to_term/1.
Iâve noticed that term_to_binary/1 is significantly faster than binary_to_term/1.
Also, Iâve observed that binary_to_term/1 performance gets considerably worse as complexity of specified term increases, whilst term_to_binary/1 maintains (more-less) steady performance.
term_to_binary/1 RETURN VALUE:<<131,100,0,1,97>>
REQUEST COUNT:10000000
ELAPSED TIME (usec):97070
TIME PER REQUEST (usec): 0.009707
PROJECTED RATE (req/sec): 103018440.30081384
binary_to_term/1 RETURN VALUE:a
REQUEST COUNT:10000000
ELAPSED TIME (usec):3383483
TIME PER REQUEST (usec): 0.3383483
PROJECTED RATE (req/sec): 2955534.2822765773
ok
term_to_binary/1 RETURN VALUE:<<131,104,8,100,0,1,97,109,0,0,0,3,1,2,3,100,0,1,
98,107,0,3,1,2,3,100,0,1,99,104,3,97,1,97,2,97,
3,100,0,1,100,116,0,0,0,3,100,0,1,97,97,1,100,
0,1,98,97,2,100,0,1,99,97,3>>
REQUEST COUNT:10000000
ELAPSED TIME (usec):97307
TIME PER REQUEST (usec): 0.0097307
PROJECTED RATE (req/sec): 102767529.57135664
binary_to_term/1 RETURN VALUE:{a,<<1,2,3>>,
b,
[1,2,3],
c,
{1,2,3},
d,
#{a => 1,b => 2,c => 3}}
REQUEST COUNT:10000000
ELAPSED TIME (usec):8747426
TIME PER REQUEST (usec): 0.8747426
PROJECTED RATE (req/sec): 1143193.4377038456
ok
Iâve performed testing on R21.1.
Any thoughts?
V/

Thomas Depierre

2021-06-07 12:38:36 UTC

Permalink

Hi Valentin.

Yes there is a pretty simple answer :) Parsing is far harder than
serialization ! for parsing, you have to read a bit of the binary stream,
find what type it is, then translate the data to a data type, which means
allocating memory. On top of that you need to validate that it is not a
mangled binary stream. And you need to do it piecemeal, with a lot of
information about current steps. This is far more complex than translating
a particular memory setting that you know the size of into a binary stream.

Now is there space for speedup ? maybe. It is always an interesting area to
explore. But fundamentally, parsing being slower than serialisation is not
surprising,

Thomas Depierre

Post by Valentin Micic
HmmmâŠ I realised that my previous email may be seen as a comment rather
than a question, so let me ask the question clearly.
Given that binary_to_term/1 is about two orders of magnitude slower than
term_to_binary/1, is there anyone out there that may have a reasonable
explanation for that?
Kind regards
V/
Hi all,
I did some performance measurement recently that included conversion of an
arbitrary erlang term to its external binary representation via
term_to_binary/1, as well as reversing the result using binary_to_term/1.
Iâve noticed that term_to_binary/1 is significantly faster than binary_to_term/1.
Also, Iâve observed that binary_to_term/1 performance gets considerably
worse as complexity of specified term increases, whilst term_to_binary/1
maintains (more-less) steady performance.
term_to_binary/1 RETURN VALUE:<<131,100,0,1,97>>
REQUEST COUNT:10000000
ELAPSED TIME (usec):97070
TIME PER REQUEST (usec): 0.009707
PROJECTED RATE (req/sec): *103018440*.30081384
binary_to_term/1 RETURN VALUE:a
REQUEST COUNT:10000000
ELAPSED TIME (usec):3383483
TIME PER REQUEST (usec): 0.3383483
PROJECTED RATE (req/sec): *2955534*.2822765773
ok
d, #{a=>1, b=>2, c=>3}}, 10000000 ).
term_to_binary/1 RETURN
VALUE:<<131,104,8,100,0,1,97,109,0,0,0,3,1,2,3,100,0,1,
98,107,0,3,1,2,3,100,0,1,99,104,3,97,1,97,2,97,
3,100,0,1,100,116,0,0,0,3,100,0,1,97,97,1,100,
0,1,98,97,2,100,0,1,99,97,3>>
REQUEST COUNT:10000000
ELAPSED TIME (usec):97307
TIME PER REQUEST (usec): 0.0097307
PROJECTED RATE (req/sec): *102767529*.57135664
binary_to_term/1 RETURN VALUE:{a,<<1,2,3>>,
b,
[1,2,3],
c,
{1,2,3},
d,
#{a => 1,b => 2,c => 3}}
REQUEST COUNT:10000000
ELAPSED TIME (usec):8747426
TIME PER REQUEST (usec): 0.8747426
PROJECTED RATE (req/sec): *1143193*.4377038456
ok
Iâve performed testing on R21.1.
Any thoughts?
V/

Lukas Larsson

2021-06-07 12:47:32 UTC

Permalink

Post by Thomas Depierre
Yes there is a pretty simple answer :) Parsing is far harder than
serialization ! for parsing, you have to read a bit of the binary stream,
find what type it is, then translate the data to a data type, which means
allocating memory. On top of that you need to validate that it is not a
mangled binary stream. And you need to do it piecemeal, with a lot of
information about current steps. This is far more complex than translating
a particular memory setting that you know the size of into a binary stream.

Another thing that comes to mind is that the GC may be interfering with the
results as binary_to_term would create more garbage than term_to_binary.

Lukas

Valentin Micic

2021-06-07 13:04:08 UTC

Permalink

Hi Thomas,

Thanks for your input.

Indeed, parsing being slower that serialisation is not necessarily surprising. However, when a difference in performance is, well, two orders of magnitude, that it may be reasonable to ask if term_to_binary/1 does any actual work. Put differently, is there any difference between internal and external term presentation?

V/

And you need to do it piecemeal, with a lot of information about current steps. This is far more complex than translating a particular memory setting that you know the size of into a binary stream.
Hi Valentin.
Yes there is a pretty simple answer :) Parsing is far harder than serialization ! for parsing, you have to read a bit of the binary stream, find what type it is, then translate the data to a data type, which means allocating memory. On top of that you need to validate that it is not a mangled binary stream. And you need to do it piecemeal, with a lot of information about current steps. This is far more complex than translating a particular memory setting that you know the size of into a binary stream.
Now is there space for speedup ? maybe. It is always an interesting area to explore. But fundamentally, parsing being slower than serialisation is not surprising,
Thomas Depierre
HmmmâŠ I realised that my previous email may be seen as a comment rather than a question, so let me ask the question clearly.
Given that binary_to_term/1 is about two orders of magnitude slower than term_to_binary/1, is there anyone out there that may have a reasonable explanation for that?
Kind regards
V/

Lukas Larsson

2021-06-07 13:16:24 UTC

Permalink

Post by Valentin Micic
Put differently, is there any difference between internal and external
term presentation?

They are very different. The internal representation is larger in size but
faster to navigate/manipulate. They also have very different requirements
that make them different.

You can read more about the external term format here:
https://erlang.org/doc/apps/erts/erl_ext_dist.html
And the internal format here:
https://blog.stenmans.org/theBeamBook/#CH-TypeSystem

Lukas

Valentin Micic

2021-06-07 13:42:01 UTC

Permalink

Thanks Lukas.

Post by Valentin Micic
Put differently, is there any difference between internal and external term presentation?
They are very different. The internal representation is larger in size but faster to navigate/manipulate. They also have very different requirements that make them different.
You can read more about the external term format here: https://erlang.org/doc/apps/erts/erl_ext_dist.html <https://erlang.org/doc/apps/erts/erl_ext_dist.html>
And the internal format here: https://blog.stenmans.org/theBeamBook/#CH-TypeSystem <https://blog.stenmans.org/theBeamBook/#CH-TypeSystem>
Lukas

Valentin Micic

2021-06-07 13:42:13 UTC

Permalink

Thank you.

Post by Valentin Micic
is there any difference between internal and external term presentation?

There must be at least for integers, as ETF uses network order (also known as big endian) and internally VM uses whatever is native for the given machine.
--
Łukasz Niemier

Antoine Koener

2021-06-07 18:22:19 UTC

Permalink

Not sure that this will explain all, but reconstruction of terms needs more work than serialising, think about atoms than may need to be created, or complex structure that need allocation.

Converting everything to binary donât need any check at all, whereas the reverse is not true.

What do you think?

ï»¿HmmmâŠ I realised that my previous email may be seen as a comment rather than a question, so let me ask the question clearly.
Given that binary_to_term/1 is about two orders of magnitude slower than term_to_binary/1, is there anyone out there that may have a reasonable explanation for that?
Kind regards
V/

Valentin Micic

2021-06-07 22:02:41 UTC

Permalink

Yes, your (as well as everyone elseâs) explanation may account for binary_to_term/1 being slower, I get it â itâs a story about complexity and allâŠ however, something does not add up.

Consider the first example in my measurements:

The term being converted to external representation is an atom âaâ.

So, conversion from an atom âaâ to its external representation: <<131,100,0,1,97>> (using term_to_binary/1) is projected to run at the rate of more than 103 million conversions per second.

Since this term has a very simple external representation: <<131, 100,0, 1, 97>>; one would expect conversion back to its internal representation to be just marginally slower.

And yet, binary_to_term/1 is an order of magnitude (or more than 34 times) slower!

I donât think that story of complexity can pass a red-face test here.

Kind regards

V/

Post by Antoine Koener
Not sure that this will explain all, but reconstruction of terms needs more work than serialising, think about atoms than may need to be created, or complex structure that need allocation.
Converting everything to binary donât need any check at all, whereas the reverse is not true.
What do you think?

Jacob

2021-06-08 05:45:56 UTC

Permalink

Hi,

I've tried to reproduce the measurement, but according to my
measurements, there is just a factor of 2 on Erlang/OTP 22.

1> timer:tc(fun () -> bench:t2b(a, 1000000) end)
{109357,<<131,100,0,1,97>>}
2> timer:tc(fun () -> bench:b2t(<<131,100,0,1,97>>, 1000000) end).
{199488,a}

If I do not use the result of each term_to_binary call, the factor (~14)
is much closer to your measurements:

3> timer:tc(fun () -> bench:broken_t2b(a, 1000000) end).
{14404,<<>>}

Are you indeed sure, that the compiler did not optimise away the entire
call?

/Jacob

======================== bench.erl ==============================
-module(bench).

-export([t2b/2, b2t/2, broken_t2b/2]).

t2b(T, N) -> t2b(T, N, undefined).

t2b(_, 0, R) -> R;
t2b(T, N, _) -> R = term_to_binary(T), t2b(T, N-1, R).

b2t(T, N) -> b2t(T, N, undefined).

b2t(_, 0, R) -> R;
b2t(T, N, _) -> R = binary_to_term(T), b2t(T, N-1, R).

broken_t2b(T, N) -> broken_t2b(T, N, undefined).

broken_t2b(_, 0, R) -> R;
broken_t2b(T, N, R) -> _ = term_to_binary(T), broken_t2b(T, N-1, R).
=================================================================

Post by Valentin Micic
Hi all,
I did some performance measurement recently that included conversion of
an arbitrary erlang term to its external binary representation via
term_to_binary/1, as well as reversing the result using binary_to_term/1.
I’ve noticed that term_to_binary/1 is significantly faster than
binary_to_term/1.
Also, I’ve observed that binary_to_term/1 performance gets considerably
worse as complexity of specified term increases, whilst term_to_binary/1
maintains (more-less) steady performance.
term_to_binary/1 RETURN VALUE:<<131,100,0,1,97>>
REQUEST COUNT:10000000
ELAPSED TIME (usec):97070
TIME PER REQUEST (usec): 0.009707
PROJECTED RATE (req/sec): *103018440*.30081384
binary_to_term/1 RETURN VALUE:a
REQUEST COUNT:10000000
ELAPSED TIME (usec):3383483
TIME PER REQUEST (usec): 0.3383483
PROJECTED RATE (req/sec): *2955534*.2822765773
ok
d, #{a=>1, b=>2, c=>3}}, 10000000 ).
term_to_binary/1 RETURN
VALUE:<<131,104,8,100,0,1,97,109,0,0,0,3,1,2,3,100,0,1,

98,107,0,3,1,2,3,100,0,1,99,104,3,97,1,97,2,97,

3,100,0,1,100,116,0,0,0,3,100,0,1,97,97,1,100,
0,1,98,97,2,100,0,1,99,97,3>>
REQUEST COUNT:10000000
ELAPSED TIME (usec):97307
TIME PER REQUEST (usec): 0.0097307
PROJECTED RATE (req/sec): *102767529*.57135664
binary_to_term/1 RETURN VALUE:{a,<<1,2,3>>,
   b,
   [1,2,3],
   c,
   {1,2,3},
   d,
   #{a => 1,b => 2,c => 3}}
REQUEST COUNT:10000000
ELAPSED TIME (usec):8747426
TIME PER REQUEST (usec): 0.8747426
PROJECTED RATE (req/sec): *1143193*.4377038456
ok
I’ve performed testing on R21.1.
Any thoughts?
V/