-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Closed
Labels
Description
At the moment, there is no difference in handling between COUNT(*)
and COUNT(expression)
when it comes to non-null values (ie all values are considered). Whereas the COUNT(ALL|DISTINCT expression)
should consider only non null values.
The current implementation generates and uses a query like the following to perform this select select author,count(ip_addr) from ip_index group by author having count(ip_addr)>1
:
{
"size": 0,
"_source": false,
"stored_fields": "_none_",
"aggregations": {
"groupby": {
"composite": {
"size": 1000,
"sources": [
{
"1562": {
"terms": {
"field": "author.keyword",
"order": "asc"
}
}
}
]
},
"aggregations": {
"1572": {
"bucket_selector": {
"buckets_path": {
"a0": "_count"
},
"script": {
"source": "params.a0 > params.v0",
"lang": "painless",
"params": {
"v0": 1
}
},
"gap_policy": "skip"
}
}
}
}
}
}
where the relevant part is "buckets_path": { "a0": "_count" }
which simply counts the documents. To also consider non-null values something like this should work:
{
"size": 0,
"_source": false,
"stored_fields": "_none_",
"aggregations": {
"groupby": {
"composite": {
"size": 1000,
"sources": [
{
"1395": {
"terms": {
"field": "author.keyword",
"order": "asc"
}
}
}
]
},
"aggregations": {
"non_null_values": {
"filter": {
"exists": {
"field": "ip_addr"
}
}
},
"1406": {
"bucket_selector": {
"buckets_path": {
"a0": "non_null_values._count"
},
"script": {
"source": "params.a0 == params.v0",
"lang": "painless",
"params": {
"v0": 1
}
},
"gap_policy": "skip"
}
}
}
}
}
}