Extended Data Table 3 Summary of the different axes along which lay users evaluate the model answers in our consumer medical question answering datasets

From: Large language models encode clinical knowledge

  1. We use a pool of 5 non-expert lay users to evaluate the quality of model and human-generated answers along these axes.