Long gen_server init question (not about order of messages :)

Discussion:

Stanislav Ledenev

2021-05-15 18:55:17 UTC

Hi!
Need advice from an experienced community about synchronizing long
initializing gen_server with other parts of application.

Let's say we have 4 gen_servers:
* mod_api - accept requests;
* mod_func - real job;
* mod_x, mod_y - users of mod_func.

mod_api accepts requests and transforms them to calls to mod_x, mod_y.
mod_func is crucial for the application but it needs a time consuming
procedure
of initialization. IRL it is some cryptography related stuff.
While mod_func is in the initialization state, mod_api must return
'not_ready' for all requests.
While mod_func initializing it is not available for any requests.

I was thinking about options of notification mechanism implementation and
see two of them:
1. Some kind of polling from mod_api to mod_func with gen_server:call with
timeout: mod_api after initialization begin send_after loop with timeout
call to mod_func;
2. gen_event based solution. Run up gen_event and wait for notification
about readiness of mod_func.

Am I missing something? Are there any better solutions for such a task?

Leonard B

2021-05-15 23:50:18 UTC

Permalink

Why not notify mod_api from mod_func once it's initialized. If you wanted
mod_func initialization to be non-blocking you could use handle_continue.

Kind regards,
Leonard

Post by Stanislav Ledenev
Hi!
Need advice from an experienced community about synchronizing long
initializing gen_server with other parts of application.
* mod_api - accept requests;
* mod_func - real job;
* mod_x, mod_y - users of mod_func.
mod_api accepts requests and transforms them to calls to mod_x, mod_y.
mod_func is crucial for the application but it needs a time consuming
procedure
of initialization. IRL it is some cryptography related stuff.
While mod_func is in the initialization state, mod_api must return
'not_ready' for all requests.
While mod_func initializing it is not available for any requests.
I was thinking about options of notification mechanism implementation and
1. Some kind of polling from mod_api to mod_func with gen_server:call with
timeout: mod_api after initialization begin send_after loop with timeout
call to mod_func;
2. gen_event based solution. Run up gen_event and wait for notification
about readiness of mod_func.
Am I missing something? Are there any better solutions for such a task?

Stanislav Ledenev

2021-05-16 05:53:19 UTC

Permalink

I know about hanle_continue but my question is not about long init()
function.
To make it simpler it is about long-running functions inside of
handle_continue.
It may be possible to spawn those functions in different processes but it
is not
desirable because of requirements.

mod_func is quite general (general like a library) and should not know
about
anything specific outside of it. Passing to mod_func something like
'reply_to' reference
could be possible but it is fragile - response side could be not ready at
the
moment of replying and this *single* notification would just vanish.
So notification from mod_func to mod_api is the same problem but from the
other side.

Post by Leonard B
Why not notify mod_api from mod_func once it's initialized. If you wanted
mod_func initialization to be non-blocking you could use handle_continue.
Kind regards,
Leonard

Post by Stanislav Ledenev
Hi!
Need advice from an experienced community about synchronizing long
initializing gen_server with other parts of application.
* mod_api - accept requests;
* mod_func - real job;
* mod_x, mod_y - users of mod_func.
mod_api accepts requests and transforms them to calls to mod_x, mod_y.
mod_func is crucial for the application but it needs a time consuming
procedure
of initialization. IRL it is some cryptography related stuff.
While mod_func is in the initialization state, mod_api must return
'not_ready' for all requests.
While mod_func initializing it is not available for any requests.
I was thinking about options of notification mechanism implementation and
1. Some kind of polling from mod_api to mod_func with gen_server:call
with timeout: mod_api after initialization begin send_after loop with
timeout call to mod_func;
2. gen_event based solution. Run up gen_event and wait for notification
about readiness of mod_func.
Am I missing something? Are there any better solutions for such a task?

Ali Sabil

2021-05-16 09:57:42 UTC

Permalink

Did you consider changing the overall architecture and use 1 process for
each request?

Post by Stanislav Ledenev
I know about hanle_continue but my question is not about long init()
function.
To make it simpler it is about long-running functions inside of
handle_continue.
It may be possible to spawn those functions in different processes but it
is not
desirable because of requirements.
mod_func is quite general (general like a library) and should not know
about
anything specific outside of it. Passing to mod_func something like
'reply_to' reference
could be possible but it is fragile - response side could be not ready at
the
moment of replying and this *single* notification would just vanish.
So notification from mod_func to mod_api is the same problem but from the
other side.

Post by Leonard B
Why not notify mod_api from mod_func once it's initialized. If you wanted
mod_func initialization to be non-blocking you could use handle_continue.
Kind regards,
Leonard

Post by Stanislav Ledenev
Hi!
Need advice from an experienced community about synchronizing long
initializing gen_server with other parts of application.
* mod_api - accept requests;
* mod_func - real job;
* mod_x, mod_y - users of mod_func.
mod_api accepts requests and transforms them to calls to mod_x, mod_y.
mod_func is crucial for the application but it needs a time consuming
procedure
of initialization. IRL it is some cryptography related stuff.
While mod_func is in the initialization state, mod_api must return
'not_ready' for all requests.
While mod_func initializing it is not available for any requests.
I was thinking about options of notification mechanism implementation
1. Some kind of polling from mod_api to mod_func with gen_server:call
with timeout: mod_api after initialization begin send_after loop with
timeout call to mod_func;
2. gen_event based solution. Run up gen_event and wait for notification
about readiness of mod_func.
Am I missing something? Are there any better solutions for such a task?

Stanislav Ledenev

2021-05-18 06:56:04 UTC

Permalink

Post by Ali Sabil
Did you consider changing the overall architecture and use 1 process for
each request?

What part of my message made you think that I have any kind of
problems with load balancing or something similar? I have not.

In general all I need is a method of notification between multiple
gen_server's
about their readiness to do some work. And this is not about init callback.
It is about their inner process of initialization. For example, some
specific
kinds of cryptography require a lot of calculations and user interactions
for these calculations made possible.

Post by Ali Sabil
Did you consider changing the overall architecture and use 1 process for
each request?

Post by Leonard B
Why not notify mod_api from mod_func once it's initialized. If you
wanted mod_func initialization to be non-blocking you could use
handle_continue.
Kind regards,
Leonard

Post by Stanislav Ledenev
Hi!
Need advice from an experienced community about synchronizing long
initializing gen_server with other parts of application.
* mod_api - accept requests;
* mod_func - real job;
* mod_x, mod_y - users of mod_func.
mod_api accepts requests and transforms them to calls to mod_x, mod_y.
mod_func is crucial for the application but it needs a time consuming
procedure
of initialization. IRL it is some cryptography related stuff.
While mod_func is in the initialization state, mod_api must return
'not_ready' for all requests.
While mod_func initializing it is not available for any requests.
I was thinking about options of notification mechanism implementation
1. Some kind of polling from mod_api to mod_func with gen_server:call
with timeout: mod_api after initialization begin send_after loop with
timeout call to mod_func;
2. gen_event based solution. Run up gen_event and wait for notification
about readiness of mod_func.
Am I missing something? Are there any better solutions for such a task?

Roger Lipscombe

2021-05-18 08:10:31 UTC

Permalink

Assumptions:

- The server is a singleton (within the node).
- While the server is initializing, a caller should get 'not_ready'.
It should not block.

I can see two ways to do this, both involving an extra process:

1. The proxy starts up the server, passing self(). When the server is
ready, it notifies the proxy. If the proxy receives a request while
the server's not ready, it replies with 'not_ready'. If the server's
ready, the proxy passes the request on.
2. The server starts in 'not_ready' mode, and starts a worker. The
worker does the initialization, and when it's done, it sends the
resulting state to the server. From that point, the server can reply
to requests.

If you want to do this without an extra process:

- The server starts. It's initially unregistered. Any calls to the
server look up the registration. If it's not registered, the call
returns 'not_ready'. When it's ready, the server registers itself,
meaning that calls can find the pid and make the call.

We use a variant of this in production.

Hi!
Need advice from an experienced community about synchronizing long initializing gen_server with other parts of application.
* mod_api - accept requests;
* mod_func - real job;
* mod_x, mod_y - users of mod_func.
mod_api accepts requests and transforms them to calls to mod_x, mod_y.
mod_func is crucial for the application but it needs a time consuming procedure
of initialization. IRL it is some cryptography related stuff.
While mod_func is in the initialization state, mod_api must return 'not_ready' for all requests.
While mod_func initializing it is not available for any requests.
1. Some kind of polling from mod_api to mod_func with gen_server:call with timeout: mod_api after initialization begin send_after loop with timeout call to mod_func;
2. gen_event based solution. Run up gen_event and wait for notification about readiness of mod_func.
Am I missing something? Are there any better solutions for such a task?

Stanislav Ledenev

2021-05-18 13:13:50 UTC

Permalink

Thank you for your response.
Option with some kind of middle-man (proxy) is one of the implementations I
was thinking about.

I'd like to clarify the option with an unregistered server.
It is very interesting, especially due to the fact that it is used in
production ( some variant of it).
But it sounds like that unregistered server starts outside of the
supervision tree.
Or maybe without OTP at all. Is it so?

Post by Roger Lipscombe
- The server is a singleton (within the node).
- While the server is initializing, a caller should get 'not_ready'.
It should not block.
1. The proxy starts up the server, passing self(). When the server is
ready, it notifies the proxy. If the proxy receives a request while
the server's not ready, it replies with 'not_ready'. If the server's
ready, the proxy passes the request on.
2. The server starts in 'not_ready' mode, and starts a worker. The
worker does the initialization, and when it's done, it sends the
resulting state to the server. From that point, the server can reply
to requests.
- The server starts. It's initially unregistered. Any calls to the
server look up the registration. If it's not registered, the call
returns 'not_ready'. When it's ready, the server registers itself,
meaning that calls can find the pid and make the call.
We use a variant of this in production.

Roger Lipscombe

2021-05-18 13:46:36 UTC

Permalink

Post by Stanislav Ledenev
I'd like to clarify the option with an unregistered server.
It is very interesting, especially due to the fact that it is used in production ( some variant of it).
But it sounds like that unregistered server starts outside of the supervision tree.
Or maybe without OTP at all. Is it so?

No.

Where you'd ordinarily write something like this...

start_link(Opts) -> gen_server:start_link({local, ?SERVER}, ?MODULE, Opts, []).

...which registers ?SERVER as soon as possible, you'd defer that part
(leave out the {local, ?SERVER} part):

start_link(Opts) -> gen_server:start_link(?MODULE, Opts, []).

init(Opts) ->
{continue, ...}

handle_continue(...) ->
% hard things
register(?SERVER, self()),
{whatever_handle_continue_returns}.

And then the caller would be something like this:

do_call(Args) ->
case whereis(?SERVER) of
undefined -> not_ready;
Pid -> gen_server:call(Pid, Args)
% ...

You get the idea, hopefully.

There are a couple of rough edges here:
- If the process is already registered and you start it again, you'll
get an error _after_ you've done the hard work. You can fix that by
checking whether the process exists first, but there's a small race
condition there. If you know that you'll only ever start one, you can
skip that part.
- There's a really small race between the whereis and the
gen_server:call; if this happens when the process is starting, no
problem -- you can ignore that. If the server process dies in between
the two steps, you'll get an error in the gen_server:call. But you
need to deal with that anyway.

All of the above can still live in a supervision tree -- supervised
processes don't have to be registered. The ID that you give to the
supervisor is only scoped to that supervisor, and isn't registered
elsewhere.

The variant we have in production runs across two nodes. Node A
depends on a process in node B. When b_server finishes starting, it
notifies a named process in node A, which puts that pid in an ETS
table in A. If the process dies, or we get a nodedown message, A
clears the ETS table. Then the caller (in A) can grab the pid from the
ETS table and make a call to the process in B.

Now I look back at it, it's overcomplicated for what it does (we'd
planned to have multiple instances of B and have each register for a
shard, but it turned out not to be needed). We could have just used
'global' or 'pg2' (now just 'pg').

Jesper Louis Andersen

2021-05-18 14:07:08 UTC

Permalink

Post by Stanislav Ledenev
mod_api accepts requests and transforms them to calls to mod_x, mod_y.
mod_func is crucial for the application but it needs a time consuming
procedure
of initialization. IRL it is some cryptography related stuff.

While mod_func is in the initialization state, mod_api must return

Post by Stanislav Ledenev
'not_ready' for all requests.
While mod_func initializing it is not available for any requests.

I'd definitely go with Roger's idea.

Spawn mod_func as part of a supervision tree. This starts out as the proxy.
It spawns mod_func_bg which does the block and the initialization. While
initialization is ongoing, mod_func responds not_ready, and will do so with
a low latency. Once mod_func_bg is done, it sends its data to mod_func. Now
mod_func "becomes" the real process, and can answer requests. This avoids
the proxy in the common path.

In this solution the "notification" *is* the data.

A crash of mod_func repeats the process. You will go back to the not_ready
state. And once you are ready for processing, the state will flip with a
new notification from mod_func_bg. Failure in mod_func_bg can be detected
by a link (or monitor). I'd probably just link them. The two processes'
lifetimes are following each other anyway.

--
J.

Stanislav Ledenev

2021-05-18 14:51:37 UTC

Permalink

Roger, thanks again for such a comprehensive answer!
I fully understand the idea and risks.

Jesper, thank you too for participation in this discussion!