This article lists all built-in aggregate functions (UDAF) supported by Hive 0.13.0.

# Aggregate Functions (UDAF)

Return Type | Name(Signature) | Description |
---|---|---|

bigint | count(*), count(expr), count(DISTINCT expr[, expr_.]) | count(*) - Returns the total number of retrieved rows, including rows containing NULL values; count(expr) - Returns the number of rows for which the supplied expression is non-NULL; count(DISTINCT expr[, expr]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL. |

double | sum(col), sum(DISTINCT col) | Returns the sum of the elements in the group or the sum of the distinct values of the column in the group |

double | avg(col), avg(DISTINCT col) | Returns the average of the elements in the group or the average of the distinct values of the column in the group |

double | min(col) | Returns the minimum of the column in the group |

double | max(col) | Returns the maximum value of the column in the group |

double | variance(col), var_pop(col) | Returns the variance of a numeric column in the group |

double | var_samp(col) | Returns the unbiased sample variance of a numeric column in the group |

double | stddev_pop(col) | Returns the standard deviation of a numeric column in the group |

double | stddev_samp(col) | Returns the unbiased sample standard deviation of a numeric column in the group |

double | covar_pop(col1, col2) | Returns the population covariance of a pair of numeric columns in the group |

double | covar_samp(col1, col2) | Returns the sample covariance of a pair of a numeric columns in the group |

double | corr(col1, col2) | Returns the Pearson coefficient of correlation of a pair of a numeric columns in the group |

double | percentile(BIGINT col, p) | Returns the exact p^{th} percentile of a column in the group (does not work with floating point types). p must be between 0 and 1. NOTE: A true percentile can only be computed for integer values. Use PERCENTILE_APPROX if your input is non-integral. |

array<double> | percentile(BIGINT col, array(p_{1} [, p_{2}]...)) |
Returns the exact percentiles p_{1}, p_{2}, ... of a column in the group (does not work with floating point types). p_{i} must be between 0 and 1. NOTE: A true percentile can only be computed for integer values. Use PERCENTILE_APPROX if your input is non-integral. |

double | percentile_approx(DOUBLE col, p [, B]) | Returns an approximate p^{th} percentile of a numeric column (including floating point types) in the group. The B parameter controls approximation accuracy at the cost of memory. Higher values yield better approximations, and the default is 10,000. When the number of distinct values in col is smaller than B, this gives an exact percentile value. |

array<double> | percentile_approx(DOUBLE col, array(p_{1} [, p_{2}]...) [, B]) |
Same as above, but accepts and returns an array of percentile values instead of a single one. |

array<struct {'x','y'}> |
histogram_numeric(col, b) | Computes a histogram of a numeric column in the group using b non-uniformly spaced bins. The output is an array of size b of double-valued (x,y) coordinates that represent the bin centers and heights |

array | collect_set(col) | Returns a set of objects with duplicate elements eliminated |

## Comments

0 comments

Please sign in to leave a comment.