Skip to content

Conversation

original-brownbear
Copy link
Contributor

@original-brownbear original-brownbear commented Dec 28, 2018

* Randomly doing non-atomic writes causes rare 0 byte reads from `index-N` files in tests
* Removing this randomness fixes these random failures and is valid because it does not reproduce a real-world failure-mode:
  * Cloud based Blob stores (S3, GCS and Azure) do not have inconsistent partial reads of a blob, either you read a complete blob or nothing on them
  * For file system based blob stores the atomic move we do (to atomically write a file) by setting `java.nio.file.StandardCopyOption#ATOMIC_MOVE` would throw if the file system does not provide for atomic moves
* Closes elastic#37005
@original-brownbear original-brownbear added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v7.0.0 v6.7.0 labels Dec 28, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@ywelsch
Copy link
Contributor

ywelsch commented Dec 28, 2018

yes, it's not an issue for the repository plugins that we provide, but the BlobContainer interface allows for this (see Javadocs of BlobContainer#writeBlobAtomic). I suggest we change the spec in this regard for 7.0 (requiring atomic writes when called by this method), and only then proceed with this change here.

@original-brownbear
Copy link
Contributor Author

I suggest we change the spec in this regard for 7.0 (requiring atomic writes when called by this method), and only then proceed with this change here.

Should we fix the production code in 6.x then to fix these tests? We could simply break out of the read logic for repository data here https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java#L670 and return empty data if we read an empty blob (i.e. treat an empty blob like a missing file)?

@original-brownbear
Copy link
Contributor Author

@ywelsch as a matter of fact, I think org.elasticsearch.repositories.hdfs.HdfsBlobContainer actually is an implementation that isn't atomic (missed that one when I opened the PR) in its writes. We could make it atomic though by doing the same thing we do for the fs repository (write to tempfile -> then move)?

=> Fix hdfs repository, adjust docs and just target 7.0 here?

@ywelsch
Copy link
Contributor

ywelsch commented Dec 28, 2018

The problem with moving HDFS to the "write temp file -> atomic rename" pattern is that it will make writing to an HDFS repository slower (more metadata requests, which are executed sequentially on the namenode). My prior experience with HDFS showed that these operations can possibly take upward of a few 100 milliseconds to complete. This might be an ok tradeoff though if we can simplify the spec in ES 7. We still have to decide what to do for 6.x. Is there an alternative to removing the allow_atomic_operations on 6.x?

@original-brownbear
Copy link
Contributor Author

original-brownbear commented Dec 28, 2018

@ywelsch

Is there an alternative to removing the allow_atomic_operations on 6.x?

I suggested one above #37011 (comment) (simply break out of the read on empty blob and treat them like missing blobs). WDYT?

... that may also be an alternative for 7.0 if we don't want to make the trade-off, but it seems to me like there are possible production issues with this method not being consistent?

@ywelsch
Copy link
Contributor

ywelsch commented Dec 31, 2018

simply break out of the read on empty blob and treat them like missing blobs

what if the file is not empty, but partially written?

Fix hdfs repository, adjust docs

Let's do this for both 7.0 and 6.x

@original-brownbear
Copy link
Contributor Author

@ywelsch

what if the file is not empty, but partially written?

I don't think this will happen with the file system store because the content of that file is small enough to be less than a sector and just sync atomically right?
It seems to me handling the partially written situation is overthinking the problem potentially. In the end this isn't going to happen in the real world with any of the cloud blob stores, HDFS (your block size is always going to be more than the size of this file there) or the fs repository as pointed above.

=> Is there really a practical situation where we want to support blob repositories that need multi-part writes for this little data?

@ywelsch
Copy link
Contributor

ywelsch commented Dec 31, 2018

I wouldn't want to make any strong assumptions about the size of the files that are to be written with writeAtomic. Let's make HDFS atomic and proceed with this change here, both on 6.x and 7.0

@original-brownbear
Copy link
Contributor Author

@ywelsch

I suggest we change the spec in this regard for 7.0 (requiring atomic writes when called by this method), and only then proceed with this change here.

I guess it's not only changing docs here right? Basically we'd change the interface to require implementation of org.elasticsearch.common.blobstore.BlobContainer#writeBlobAtomic and remove the default fallback to writeBlob?

@ywelsch
Copy link
Contributor

ywelsch commented Dec 31, 2018

I guess it's not only changing docs here right? Basically we'd change the interface to require implementation of org.elasticsearch.common.blobstore.BlobContainer#writeBlobAtomic and remove the default fallback to writeBlob?

yes, we can restrict that change to 7.0 though.

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jan 2, 2019
* Implement atomic writes the same way we do for the FsBlobContainer via rename which is atomic
* Relates elastic#37011
original-brownbear added a commit that referenced this pull request Jan 3, 2019
* Implement atomic writes the same way we do for the FsBlobContainer via rename which is atomic
* Relates #37011
original-brownbear added a commit that referenced this pull request Jan 4, 2019
* Implement atomic writes the same way we do for the FsBlobContainer via rename which is atomic
* Relates #37011
@original-brownbear
Copy link
Contributor Author

@ywelsch with HDFS now fixed we could change the interface to not do the default fallback to non-atomic write for master/7.x.

But do we even want to do anything else in this PR (since it needs to fix master and 6.x)?

The docs there simply state:

        When the BlobContainer implementation
     * does not provide a specific implementation of writeBlobAtomic(String, InputStream, long), then
     * the {@link #writeBlob(String, InputStream, long, boolean)} method is used.

This is factually correct and we don't want to change this for 6.x, => what would we even change in the docs in this PR?

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jan 6, 2019
* With elastic#37066 introducing atomic writes to HDFS repository we can enforce atomic write capabilities on this interface
* The overrides on the other three cloud implementations are ok because:
   * https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html states that "Amazon S3 never adds partial objects; if you receive a success response, Amazon S3 added the entire object to the bucket."
   * https://cloud.google.com/storage/docs/consistency states that GCS has strong read-after-write consistency
   * https://docs.microsoft.com/en-us/rest/api/storageservices/put-block#remarks Azure has the concept of committing blobs, so there's no partial content here either
* Relates elastic#37011
@DaveCTurner DaveCTurner removed their request for review January 7, 2019 11:10
original-brownbear added a commit that referenced this pull request Jan 7, 2019
* With #37066 introducing atomic writes to HDFS repository we can enforce atomic write capabilities on this interface
* The overrides on the other three cloud implementations are ok because:
   * https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html states that "Amazon S3 never adds partial objects; if you receive a success response, Amazon S3 added the entire object to the bucket."
   * https://cloud.google.com/storage/docs/consistency states that GCS has strong read-after-write consistency
   * https://docs.microsoft.com/en-us/rest/api/storageservices/put-block#remarks Azure has the concept of committing blobs, so there's no partial content here either
* Relates #37011
@original-brownbear
Copy link
Contributor Author

@ywelsch thanks, will merge once green. The new commit I just pushed is just re-enabling the test fixed here.

@original-brownbear original-brownbear merged commit 82b1f10 into elastic:master Jan 7, 2019
@original-brownbear original-brownbear deleted the 37005 branch January 7, 2019 14:24
original-brownbear added a commit that referenced this pull request Jan 7, 2019
* Randomly doing non-atomic writes causes rare 0 byte reads from `index-N` files in tests
* Removing this randomness fixes these random failures and is valid because it does not reproduce a real-world failure-mode:
  * Cloud-based Blob stores (S3, GCS, and Azure) do not have inconsistent partial reads of a blob, either you read a complete blob or nothing on them
  * For file system based blob stores the atomic move we do (to atomically write a file) by setting `java.nio.file.StandardCopyOption#ATOMIC_MOVE` would throw if the file system does not provide for atomic moves
* Closes #37005
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test Issues or PRs that are addressing/adding tests v6.7.0 v7.0.0-beta1
4 participants