-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Closed
Labels
:Distributed Coordination/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)All issues relating to the decision making around placing a shard (both master logic & on the nodes)>bugv6.5.1
Description
Today if one issues an allocate_stale_primary
reroute command requesting the primary to be allocated on a node which does not hold a stale copy of the shard in question then the reroute command still returns 200 OK
. The recovery subsequently fails, of course, because there is no copy of the shard from which to recover:
[2019-01-03T08:48:16,731][WARN ][o.e.c.r.a.AllocationService] [node-0] failing shard [failed shard, shard [i][1], node[0ZoVYAp6TzC6VjGW7qo2-w], [P], recovery_source[existing store recovery; bootstrap_history_uuid=true], s[INITIALIZING], a[id=H9r7KK24TkuzUz6OdNzwlg], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-01-03T08:47:40.487Z], failed_attempts[1], delayed=false, details[failed shard on node [0ZoVYAp6TzC6VjGW7qo2-w]: failed recovery, failure RecoveryFailedException[[i][1]: Recovery failed on {node-1}{0ZoVYAp6TzC6VjGW7qo2-w}{psSlOmPKQZeVqw63jcncyA}{127.0.0.1}{127.0.0.1:9301}{ml.machine_memory=17179869184, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: IndexShardRecoveryException[failed to fetch index version after copying it over]; nested: IndexShardRecoveryException[shard allocated for local recovery (post api), should exist, but doesn't, current files: []]; nested: FileNotFoundException[no segments* file found in store(ByteSizeCachingDirectory(MMapDirectory@/Users/davidturner/discuss/162719/elasticsearch-6.5.1/data-1/nodes/0/indices/YPnPuK8hRnyDHjzM2EhLtQ/1/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@46a205e7)): files: []]; ], allocation_status[no_valid_shard_copy]], message [failed recovery], failure [RecoveryFailedException[[i][1]: Recovery failed on {node-1}{0ZoVYAp6TzC6VjGW7qo2-w}{psSlOmPKQZeVqw63jcncyA}{127.0.0.1}{127.0.0.1:9301}{ml.machine_memory=17179869184, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: IndexShardRecoveryException[failed to fetch index version after copying it over]; nested: IndexShardRecoveryException[shard allocated for local recovery (post api), should exist, but doesn't, current files: []]; nested: FileNotFoundException[no segments* file found in store(ByteSizeCachingDirectory(MMapDirectory@/Users/davidturner/discuss/162719/elasticsearch-6.5.1/data-1/nodes/0/indices/YPnPuK8hRnyDHjzM2EhLtQ/1/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@46a205e7)): files: []]; ], markAsStale [true]]
org.elasticsearch.indices.recovery.RecoveryFailedException: [i][1]: Recovery failed on {node-1}{0ZoVYAp6TzC6VjGW7qo2-w}{psSlOmPKQZeVqw63jcncyA}{127.0.0.1}{127.0.0.1:9301}{ml.machine_memory=17179869184, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2139) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) ~[elasticsearch-6.5.1.jar:6.5.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed to fetch index version after copying it over
at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:389) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:302) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1645) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2135) ~[elasticsearch-6.5.1.jar:6.5.1]
... 4 more
Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: shard allocated for local recovery (post api), should exist, but doesn't, current files: []
at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:374) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:302) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1645) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2135) ~[elasticsearch-6.5.1.jar:6.5.1]
... 4 more
Caused by: java.io.FileNotFoundException: no segments* file found in store(ByteSizeCachingDirectory(MMapDirectory@/Users/davidturner/discuss/162719/elasticsearch-6.5.1/data-1/nodes/0/indices/YPnPuK8hRnyDHjzM2EhLtQ/1/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@46a205e7)): files: []
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:640) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:442) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:131) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:201) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.store.Store.readLastCommittedSegmentsInfo(Store.java:186) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:364) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:302) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1645) ~[elasticsearch-6.5.1.jar:6.5.1]
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2135) ~[elasticsearch-6.5.1.jar:6.5.1]
... 4 more
Metadata
Metadata
Assignees
Labels
:Distributed Coordination/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)All issues relating to the decision making around placing a shard (both master logic & on the nodes)>bugv6.5.1