Skip to content

Conversation

zhjunqin
Copy link
Contributor

@zhjunqin zhjunqin commented May 22, 2020

In direct_session.cc https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/direct_session.cc#L1514, it always emplaces key to executors_, then a lot of keys are added to map, which leads to a lot of memory usage.

If 10 input tensors, then there are 10! = 3,628,800‬ kinds of keys, memory usage is very big.

  // See if we already have the executors for this run.
  {
    mutex_lock l(executor_lock_);
    auto it = executors_.find(sorted_key);
    if (it != executors_.end()) {
      *executors_and_keys = it->second.get();

      // Insert this under the original key.  
      executors_.emplace(key, it->second); 
      return Status::OK();
    }
  }

please check attached file.
serving_nommap.0042.heap.base0007.pdf

Also check issue #1215

I'm not sure fix TF code or TF serving code would be better, so I submitted another PR tensorflow/tensorflow#39743, please help check.

@algorithmdog
Copy link

Thank you. We used the code from this pr, and solved the out-memory problem in our production system.

@netfs
Copy link
Collaborator

netfs commented May 29, 2020

Thanks for the bug report and the PR.

I think this is best fixed in (TF) direct_session (tensorflow/tensorflow#39743) than in TF serving.

Though I am wondering, why do you have so many keys in your setup. If the input/output ordering is kept consistent across requests, we should not have these many keys. no?

@zhjunqin
Copy link
Contributor Author

zhjunqin commented May 30, 2020

Thanks for the bug report and the PR.

I think this is best fixed in (TF) direct_session (tensorflow/tensorflow#39743) than in TF serving.

Though I am wondering, why do you have so many keys in your setup. If the input/output ordering is kept consistent across requests, we should not have these many keys. no?

I think I didn't make it clear, the root cause is the inputs map in PredictRequest is not an order map.

message PredictRequest {
  // Model Specification. If version is not specified, will use the latest
  // (numerical) version.
  ModelSpec model_spec = 1;

  // Input tensors.
  // Names of input tensor are alias names. The mapping from aliases to real
  // input tensor names is stored in the SavedModel export as a prediction
  // SignatureDef under the 'inputs' field.
  map<string, TensorProto> inputs = 2;

  // Output filter.
  // Names specified are alias names. The mapping from aliases to real output
  // tensor names is stored in the SavedModel export as a prediction
  // SignatureDef under the 'outputs' field.
  // Only tensors specified here will be run/fetched and returned, with the
  // exception that when none is specified, all tensors specified in the
  // named signature will be run/fetched and returned.
  repeated string output_filter = 3;
}

Then the inputs in PredictRequest could be any order in function RunPredict, even the reqeust sent in GRPC message is same.

Status RunPredict(const RunOptions& run_options,
                  const MetaGraphDef& meta_graph_def,
                  const optional<int64>& servable_version, Session* session,
                  const PredictRequest& request, PredictResponse* response) {
  // Validate signatures.
  const string signature_name = request.model_spec().signature_name().empty()
                                    ? kDefaultServingSignatureDefKey
                                    : request.model_spec().signature_name();
  auto iter = meta_graph_def.signature_def().find(signature_name);
  if (iter == meta_graph_def.signature_def().end()) {
    return errors::FailedPrecondition(strings::StrCat(
        "Serving signature key \"", signature_name, "\" not found."));
  }
  SignatureDef signature = iter->second;

  MakeModelSpec(request.model_spec().name(), signature_name, servable_version,
                response->mutable_model_spec());

  std::vector<std::pair<string, Tensor>> input_tensors;
  std::vector<string> output_tensor_names;
  std::vector<string> output_tensor_aliases;
  TF_RETURN_IF_ERROR(PreProcessPrediction(signature, request, &input_tensors,
                                          &output_tensor_names,
                                          &output_tensor_aliases));
@zhjunqin
Copy link
Contributor Author

zhjunqin commented May 30, 2020

For example:

map<string, TensorProto> inputs = 2;

There are 3 inputs, "feature1", "feature2" and "feature3" in request.inputs , but the iteration order could be different even send same GRPC message.

  for (auto& input : request.inputs()) {
    const string& alias = input.first;
    std::cout << alias << std::endl;
  }
@zhjunqin zhjunqin force-pushed the add-sort-to-signature branch from a0c8875 to 7916b6d Compare November 4, 2020 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 participants