Skip to content

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Jan 10, 2019

Today a peer-recovery may run into a deadlock if the value of
node_concurrent_recoveries is too high. This happens because the
peer-recovery is executed in a blocking fashion. This commit attempts to
make the recovery source partially non-blocking. I will make three
follow-ups to make it fully non-blocking: (1) send translog operations,
(2) primary relocation, (3) send commit files.

Relates #36195

Today a peer-recovery may run into a deadlock if the value of
`node_concurrent_recoveries` is too high. This happens because the
peer-recovery is executed in a blocking fashion. This commit attempts to
make the recovery source partially non-blocking. I will make three
follow-ups to make it fully non-blocking: (1) send translog operations,
(2) primary relocation, (3) send commit files.
@dnhatn dnhatn added >enhancement :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.7.0 labels Jan 10, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@dnhatn dnhatn changed the title Partially make recovery source non-blocking Jan 10, 2019
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a first pass. I would love to minimize the steps we make async in this PR even further.

@dnhatn
Copy link
Member Author

dnhatn commented Jan 10, 2019

@s1monw Thanks for looking. I've minimized changes in this PR - just try to provide the infra for the next steps. Would you please take another look?

@dnhatn dnhatn requested a review from s1monw January 10, 2019 23:26
@dnhatn
Copy link
Member Author

dnhatn commented Jan 11, 2019

@elasticmachine run gradle build tests 1

dnhatn added a commit that referenced this pull request Jan 11, 2019
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took another round

@dnhatn
Copy link
Member Author

dnhatn commented Jan 11, 2019

@s1monw I pushed changes. Can you have another look?

@dnhatn dnhatn requested a review from s1monw January 11, 2019 17:59
dnhatn added a commit that referenced this pull request Jan 11, 2019
This commit introduces StepListener which provides a simple way to write
a flow consisting of multiple asynchronous steps without having nested
callbacks.

Relates #37291
dnhatn added a commit that referenced this pull request Jan 12, 2019
dnhatn added a commit that referenced this pull request Jan 12, 2019
This commit introduces StepListener which provides a simple way to write
a flow consisting of multiple asynchronous steps without having nested
callbacks.

Relates #37291
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM left 2 comments

@dnhatn
Copy link
Member Author

dnhatn commented Jan 12, 2019

Thanks @s1monw.

@dnhatn dnhatn merged commit 44a1071 into elastic:master Jan 12, 2019
@dnhatn dnhatn deleted the non-blocking-recovery branch January 12, 2019 17:49
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jan 12, 2019
dnhatn added a commit that referenced this pull request Jan 13, 2019
Today a peer-recovery may run into a deadlock if the value of
node_concurrent_recoveries is too high. This happens because the
peer-recovery is executed in a blocking fashion. This commit attempts
to make the recovery source partially non-blocking. I will make three
follow-ups to make it fully non-blocking: (1) send translog operations,
(2) primary relocation, (3) send commit files.

Relates #36195
dnhatn added a commit that referenced this pull request Jan 14, 2019
dnhatn added a commit that referenced this pull request Jan 15, 2019
dnhatn added a commit that referenced this pull request Jan 15, 2019
This commit prepares the required infra to make send a translog snapshot
of the recovery source non-blocking. I'll make a follow-up to make the send
snapshot method non-blocking.

Relates #37291
dnhatn added a commit that referenced this pull request Jan 23, 2019
This commit prepares the required infra to make send a translog snapshot
of the recovery source non-blocking. I'll make a follow-up to make the send
snapshot method non-blocking.

Relates #37291
kovrus added a commit to crate/crate that referenced this pull request Sep 11, 2019
kovrus added a commit to crate/crate that referenced this pull request Sep 11, 2019
kovrus added a commit to crate/crate that referenced this pull request Sep 12, 2019
kovrus added a commit to crate/crate that referenced this pull request Sep 12, 2019
kovrus added a commit to crate/crate that referenced this pull request Sep 12, 2019
mergify bot pushed a commit to crate/crate that referenced this pull request Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement v6.7.0 v7.0.0-beta1
4 participants