Arm Treasure Data provides Hive v0.13. There are many useful functions added since Apache Hive v0.13.0. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF Of these newer functions we support the following:
Since Hive v1.1.0
greatest
- Name:
greatest(T v1, T v2, ...)
- Return Type:
T
- Description: Returns the greatest value of the list of values. If null is contained, the result is always null.
Example:
SELECT greatest(1, 2, 3, 3) ==> 3
least
- Name:
least(T v1, T v2, ...)
- Return Type:
T
- Description: Returns the least value of the list of values. If null is contained, the result is always null.
Example:
SELECT least(1, 2, 3, 3) ==> 1
add_months
- Name:
add_months(string start_date, int num_months)
- Return Type:
string
- Description: Returns the date that is num_months after start_date.
Example:
SELECT add_months('2017-06-03',6) as add1, add_months('2017-06-03 00:00:00',7), add_months('2017-06-03 00:00:00',-1) ==> 2017-12-03 2018-01-03 2017-05-03
last_day
- Name:
last_day(string date)
- Return Type:
string
- Description: Returns the last day of the month which the date belongs to. date is a string in the format ‘yyyy-MM-dd HH:mm:ss’ or ‘yyyy-MM-dd’.
Example:
SELECT last_day('2017-06-03'), last_day('2017-06-03 00:00:00') ==> 2017-06-30 2017-06-30
initcap
- Name:
initcap(string A)
- Return Type:
string
- Description: Returns string, with the first letter of each word in uppercase, all other letters in lowercase. Words are delimited by whitespace
Example:
SELECT initcap('abc'), initcap('ABC'), initcap('abc'), initcap('ABC') ==> Abc Abc Abc Abc
Since Hive v1.2.0
factorial
- Name:
factorial(INT a)
- Return Type:
BIGINT
- Description: Returns the factorial of a. Valid a is [0..20].
cbrt
- Name:
cbrt(DOUBLE a)
- Return Type:
DOUBLE
- Description: Returns the cube root of a double value.
shiftleft
- Name:
shiftleft(TINYINT|SMALLINT|INT a, INT b)
/shiftleft(BIGINT a, INT b)
- Return Type:
INT
/BIGINT
- Description: Bitwise left shift. Shifts a b positions to the left. / Returns int for tinyint, smallint and int a. Returns bigint for bigint a.
shiftright
- Name:
shiftright(TINYINT|SMALLINT|INT a, INT b)
/shiftright(BIGINT a, INT b)
- Return Type:
INT
/BIGINT
- Description: Bitwise right shift. Shifts a b positions to the right. / Returns int for tinyint, smallint and int a. Returns bigint for bigint a.
shiftrightunsigned
- Name:
shiftrightunsigned(TINYINT|SMALLINT|INT a, INT b)
/shiftrightunsigned(BIGINT a, INT b)
- Return Type:
INT
/BIGINT
- Description: Bitwise unsigned right shift. Shifts a b positions to the right. / Returns int for tinyint, smallint and int a. Returns bigint for bigint a.
current_date
- Name:
current_date()
- Return Type:
date
- Description: Returns the current date at the start of query evaluation. All calls of current_date within the same query return the same value. (TD_SCHEDULED_TIME() is recommended on Treasure Data.)
On Treasure Data, date type in Hive is not supported. Convert the value to String or Timestamp type within SQL. |
current_timestamp
- Name:
current_timestamp()
- Return Type:
timestamp
- Description: Returns the current timestamp at the start of query evaluation. All calls of current_timestamp within the same query return the same value. (TD_SCHEDULED_TIME is recommended in Treasure Data.)
Example:
SELECT current_timestamp() ==> 2017-06-04 01:52:17.292
next_day
- Name:
next_day(string start_date, string day_of_week)
- Return Type:
string
- Description: Returns the first date which is later than start_date and named as day_of_week. start_date is a string/date/timestamp. day_of_week is 2 letters, 3 letters or full name of the day of the week (e.g. Mo, tue, FRIDAY). The time part of start_date is ignored.
Example:
SELECT next_day('2017-06-10', 'MON'), next_day('2017-06-10', 'TUE'), next_day('2017-06-10', 'WED'), next_day('2017-06-10', 'THU'), next_day('2017-06-10', 'FRI'), next_day('2017-06-10', 'SAT'), next_day('2017-06-10', 'SUN') ==> 2017-06-12 2017-06-13 2017-06-14 2017-06-15 2017-06-16 2017-06-17 2017-06-11
trunc
- Name:
trunc(string date, string format)
- Return Type:
string
- Description: Returns date truncated to the unit specified by the format. Supported formats: MONTH/MON/MM, YEAR/YYYY/YY. If NOT supported format is used, the result is NULL.
Example:
SELECT trunc('2017-08-05', 'MONTH'), trunc('2017-08-05 03:00:00', 'YEAR'), trunc('2016-03-01 03:00:00', 'DAY') ==> 2017-08-01 2017-01-01 NULL
months_between
- Name:
months_between(date1, date2)
- Return Type:
double
- Description: Returns number of months between dates date1 and date2. If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise the UDF calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. date1 and date2 type can be date, timestamp or string in the format ‘yyyy-MM-dd’ or ‘yyyy-MM-dd HH:mm:ss’. The result is rounded to 8 decimal places.
Example:
SELECT months_between('2017-08-01', '2017-01-01'), months_between('2017-01-01', '2017-08-05'), months_between('2017-08-05', '2017-01-01'), months_between('2017-01-01 00:00:00', '2016-01-01 03:00:00') ==> 7.0 -7.12903226 7.12903226 12.0
date_format
- Name:
date_format(date/timestamp/string ts, string fmt)
- Return Type:
string
- Description: Converts a date/timestamp/string to a value of string in the format specified by the date format fmt. Supported formats are Java SimpleDateFormat formats – https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html. The second argument fmt should be constant.
Example:
SELECT date_format('2015-04-08', 'y'), date_format('2015-04-08', 'd'), date_format('2015-04-08', 's') ==> 2015 8 0
levenshtein
- Name:
levenshtein(string A, string B)
- Return Type:
int
- Description: Returns the Levenshtein distance between two strings.
Example:
SELECT levenshtein('xxx', 'xyx') ==> 1
soundex
- Name:
soundex(string A)
- Return Type:
string
- Description: Returns soundex code of the string. (Ref: https://en.wikipedia.org/wiki/Soundex)
Example:
SELECT soundex('TreasureData') ==> T626
quarter
- Name:
quarter(date/timestamp/string)
- Return Type:
int
- Description: Returns the quarter of the year for a date, timestamp, or string in the range 1 to 4.
Example:
SELECT quarter('2017-06-01') ==> 2
chr
- Name:
chr(bigint|double A)
- Return Type:
string
- Description: Returns the ASCII character having the binary equivalent to A. If A is larger than 256 the result is equivalent to chr(A % 256). https://msdn.microsoft.com/en-us/library/4z4t9ed1(v=vs.80).aspx
Example:
SELECT chr(48) ==> 0
replace
- Name:
replace(string A, string OLD, string NEW)
- Return Type:
string
- Description: Returns the string A with all non-overlapping occurrences of OLD replaced with NEW.
Example:
SELECT replace("ababab", "abab", "Z") ==> Zab
substring_index
- Name:
substring_index(string A, string delim, int count)
- Return Type: string
- Description: Returns the substring from string A before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. Substring_index performs a case-sensitive match when searching for delim.
Example:
SELECT substring_index('www.apache.org', '.', 2) ==> www.apache
md5
- Name:
md5(string/binary)
- Return Type:
string
- Description: Calculates an MD5 128-bit checksum for the string or binary (as of Hive 1.3.0). The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. This is same function as TD_MD5.
Example:
SELECT MD5('abc'), TD_MD5('abc') ==> 900150983cd24fb0d6963f7d28e17f72 900150983cd24fb0d6963f7d28e17f72
sha1
- Name:
sha1(string/binary)
- Return Type:
string
- Description: Calculates the SHA-1 digest for string or binary and returns the value as a hex string.
crc32
- Name:
crc32(string/binary)
- Return Type:
bigint
- Description: Computes a cyclic redundancy check value for string or binary argument and returns bigint value.
sha2
- Name:
sha2(string/binary, int)
- Return Type:
string
- Description: Calculates the SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512). The first argument is the string or binary to be hashed. The second argument indicates the necessary bit length of the result, which must have a value of 224, 256, 384, 512, or 0 (which is equivalent to 256). SHA-224 is supported starting from Java 8. If either argument is NULL or the hash length is not one of the permitted values, the return value is NULL.
- Example:
sha2('ABC', 256) = 'b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78'
aes_encrypt
- Name:
aes_encrypt(input string/binary, key string/binary)
- Return Type:
binary
- Description: Encrypt input using AES. Key lengths of 128 bits can be used. If either argument is NULL or the key length is not the permitted value, the return value is NULL.
Example:
SELECT base64(aes_encrypt('ABC','1234567890123456')) ==> y6Ss+zCYObpCbgfWfyNWTw==
aes_decrypt
- Name:
aes_decrypt(input binary, key string/binary)
- Return Type:
binary
- Description: Decrypt input using AES. Key lengths of 128 bits can be used. If either argument is NULL or the key length is not the permitted value, the return value is NULL.
Example:
SELECT decode(aes_decrypt(unbase64('y6Ss+zCYObpCbgfWfyNWTw=='), '1234567890123456'), 'UTF-8') ==> ABC
Since Hive v2.1.0
mask
- Name:
mask(string str[, string upper[, string lower[, string number]]])
- Return Type:
string
- Description: Returns a masked version of str. By default, upper case letters are converted to “X”, lower case letters are converted to “x” and numbers are converted to “n”. For example mask(“abcd-EFGH-8765-4321”) results in xxxx-XXXX-nnnn-nnnn. You can override the characters used in the mask by supplying additional arguments: the second argument controls the mask character for upper case letters, the third argument for lower case letters and the fourth argument for numbers. Limitation: Only alphabet and number are masked.
Example:
SELECT mask('Testing000@gmail.com'), mask('Testing000@gmail.com','x'), mask('Testing000@gmail.com','x','y'), mask('Testing000@gmail.com','x','y','z') ==> Xxxxxxxnnn@xxxxx.xxx xxxxxxxnnn@xxxxx.xxx xyyyyyynnn@yyyyy.yyy xyyyyyyzzz@yyyyy.yyy
mask_first_n
- Name:
mask_first_n(string str[, int n])
- Return Type:
string
- Description: Returns a masked version of str with the first n values masked. Upper case letters are converted to “X”, lower case letters are converted to “x” and numbers are converted to “n”. Default “n” is 4.
Example:
SELECT mask_first_n('Testing000@gmail.com'), mask_first_n('Testing000@gmail.com',7) ==> Xxxxing000@gmail.com Xxxxxxx000@gmail.com
mask_last_n
- Name:
mask_last_n(string str[, int n])
- Return Type:
string
- Description: Returns a masked version of str with the last n values masked. Upper case letters are converted to “X”, lower case letters are converted to “x” and numbers are converted to “n”. Default “n” is 4.
Example:
SELECT mask_last_n('Testing000@gmail.com'), mask_last_n('Testing000@gmail.com',7) ==> Testing000@gmail.xxx Testing000@gmxxx.xxx
mask_show_first_n
- Name:
mask_show_first_n(string str[, int n])
- Return Type:
string
- Description: Returns a masked version of str, showing the first n characters unmasked). Upper case letters are converted to “X”, lower case letters are converted to “x” and numbers are converted to “n”. Default “n” is 4.
Example:
SELECT mask_show_first_n('Testing000@gmail.com'), mask_show_first_n('Testing000@gmail.com',7) ==> Testxxxnnn@xxxxx.xxx Testingnnn@xxxxx.xxx
mask_show_last_n
- Name:
mask_show_last_n(string str[, int n])
- Return Type:
string
- Description: Returns a masked version of str, showing the last n characters unmasked. Upper case letters are converted to “X”, lower case letters are converted to “x” and numbers are converted to “n”. Default “n” is 4.
Example:
SELECT mask_show_last_n('Testing000@gmail.com'), mask_show_last_n('Testing000@gmail.com',7) ==> Xxxxxxxnnn@xxxxx.com Xxxxxxxnnn@xxail.com
mask_hash
- Name:
mask_hash(string|char|varchar str)
- Return Type:
string
- Description: Returns a hashed value based on str. The hash is consistent and can be used to join masked values together across tables. This function returns null for non-string types.
Example:
SELECT mask_hash('Testing000@gmail.com') ==> e979e1ca866a125460db6fb26995df2b
Since Hive v2.2.0
nullif
- Name:
nullif( a, b )
- Return Type:
T
- Description: Returns NULL if a=b; otherwise returns a. Shorthand for: CASE WHEN a = b then NULL else a.
Example:
SELECT nullif( 0, 0), nullif( 1, 0) ==> NULL 1
character_length
- Name:
character_length(string str)
- Return Type:
int
- Description: Returns the number of UTF-8 characters contained in str. The function char_length is shorthand for this function.
Example:
SELECT character_length('abc'), character_length('あいう') ==> 3 3
regr_avgx
- Name:
regr_avgx(independent, dependent)
- Return Type:
double
- Description: Equivalent to avg(dependent).
regr_avgy
- Name:
regr_avgy(independent, dependent)
- Return Type:
double
- Description: Equivalent to avg(independent).
regr_count
- Name:
regr_count(independent, dependent)
- Return Type:
double
- Description: Returns the number of non-null pairs used to fit the linear regression line.
regr_intercept
- Name:
regr_intercept(independent, dependent)
- Return Type:
double
- Description: Returns the y-intercept of the linear regression line, i.e. the value of b in the equation dependent = a * independent + b.
regr_r2
- Name:
regr_r2(independent, dependent)
- Return Type:
double
- Description: Returns the coefficient of determination for the regression.
regr_slope
- Name:
regr_slope(independent, dependent)
- Return Type:
double
- Description: Returns the slope of the linear regression line, i.e. the value of a in the equation dependent = a * independent + b.
regr_sxx
- Name:
regr_sxx(independent, dependent)
- Return Type:
double
- Description: Equivalent to regr_count(independent, dependent) * var_pop(dependent).
regr_sxy
- Name:
regr_sxy(independent, dependent)
- Return Type:
double
- Description: Equivalent to regr_count(independent, dependent) * covar_pop(independent, dependent).
regr_syy
- Name:
regr_syy(independent, dependent)
- Return Type:
double
- Description: Equivalent to regr_count(independent, dependent) * var_pop(independent).
Comments
0 comments
Please sign in to leave a comment.