Allocate_stale_primary appears to succeed on wrong node

Today if one issues an allocate_stale_primary reroute command requesting the primary to be allocated on a node which does not hold a stale copy of the shard in question then the reroute command still returns 200 OK. The recovery subsequently fails, of course, because there is no copy of the shard from which to recover:

[2019-01-03T08:48:16,731][WARN ][o.e.c.r.a.AllocationService] [node-0] failing shard [failed shard, shard [i][1], node[0ZoVYAp6TzC6VjGW7qo2-w], [P], recovery_source[existing store recovery; bootstrap_history_uuid=true], s[INITIALIZING], a[id=H9r7KK24TkuzUz6OdNzwlg], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-01-03T08:47:40.487Z], failed_attempts[1], delayed=false, details[failed shard on node [0ZoVYAp6TzC6VjGW7qo2-w]: failed recovery, failure RecoveryFailedException[[i][1]: Recovery failed on {node-1}{0ZoVYAp6TzC6VjGW7qo2-w}{psSlOmPKQZeVqw63jcncyA}{127.0.0.1}{127.0.0.1:9301}{ml.machine_memory=17179869184, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: IndexShardRecoveryException[failed to fetch index version after copying it over]; nested: IndexShardRecoveryException[shard allocated for local recovery (post api), should exist, but doesn't, current files: []]; nested: FileNotFoundException[no segments* file found in store(ByteSizeCachingDirectory(MMapDirectory@/Users/davidturner/discuss/162719/elasticsearch-6.5.1/data-1/nodes/0/indices/YPnPuK8hRnyDHjzM2EhLtQ/1/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@46a205e7)): files: []]; ], allocation_status[no_valid_shard_copy]], message [failed recovery], failure [RecoveryFailedException[[i][1]: Recovery failed on {node-1}{0ZoVYAp6TzC6VjGW7qo2-w}{psSlOmPKQZeVqw63jcncyA}{127.0.0.1}{127.0.0.1:9301}{ml.machine_memory=17179869184, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: IndexShardRecoveryException[failed to fetch index version after copying it over]; nested: IndexShardRecoveryException[shard allocated for local recovery (post api), should exist, but doesn't, current files: []]; nested: FileNotFoundException[no segments* file found in store(ByteSizeCachingDirectory(MMapDirectory@/Users/davidturner/discuss/162719/elasticsearch-6.5.1/data-1/nodes/0/indices/YPnPuK8hRnyDHjzM2EhLtQ/1/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@46a205e7)): files: []]; ], markAsStale [true]]
org.elasticsearch.indices.recovery.RecoveryFailedException: [i][1]: Recovery failed on {node-1}{0ZoVYAp6TzC6VjGW7qo2-w}{psSlOmPKQZeVqw63jcncyA}{127.0.0.1}{127.0.0.1:9301}{ml.machine_memory=17179869184, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2139) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) ~[elasticsearch-6.5.1.jar:6.5.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed to fetch index version after copying it over
	at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:389) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:302) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1645) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2135) ~[elasticsearch-6.5.1.jar:6.5.1]
	... 4 more
Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: shard allocated for local recovery (post api), should exist, but doesn't, current files: []
	at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:374) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:302) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1645) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2135) ~[elasticsearch-6.5.1.jar:6.5.1]
	... 4 more
Caused by: java.io.FileNotFoundException: no segments* file found in store(ByteSizeCachingDirectory(MMapDirectory@/Users/davidturner/discuss/162719/elasticsearch-6.5.1/data-1/nodes/0/indices/YPnPuK8hRnyDHjzM2EhLtQ/1/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@46a205e7)): files: []
	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:640) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
	at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:442) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
	at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:131) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:201) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.store.Store.readLastCommittedSegmentsInfo(Store.java:186) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:364) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:302) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1645) ~[elasticsearch-6.5.1.jar:6.5.1]
	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2135) ~[elasticsearch-6.5.1.jar:6.5.1]
	... 4 more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allocate_stale_primary appears to succeed on wrong node #37098

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allocate_stale_primary appears to succeed on wrong node #37098

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions