The Presto query engine has some limitations compared to Hive at this stage. Arm Treasure Data welcomes your feedback.
Column Name Escaping
When a column name matches the name of a reserved keyword, the name of the column needs to be quoted. While in Hive, as most of the SQL-based query languages, the quotation character is the back tick character `. In Presto, quoting of a column name is accomplished with the double quote character instead “. See the following comparison:
SELECT `join` FROM mytbl # Hive SELECT "join" FROM mytbl # Presto
Quoting a column name in single quotes ‘ will make the query parser interpret the content as a simple string that might produce an unexpected result. For example:
SELECT "join" FROM mytbl LIMIT 10 result: value1 value2 value3 value4 ... value10 SELECT 'join' FROM mytbl result: join join join join ... join
48 Hour Limit for Presto Queries
Presto queries that run more than 48 hours are automatically cancelled. Typically, Presto queries that run more than a day or so are failing queries.
For long running queries, you can rewrite the Presto query as a Hive query.
Table Schemas are Required
When querying using Presto, you must set a Schema for the tables; the
v syntax doesn’t work.
Presto does not support cost-based JOIN optimizations, meaning JOINs are not automatically reordered based on table size. Make sure that smaller tables are on the right hand size of JOIN, and they must fit in memory. Otherwise out of memory exceptions will cause the query to fail.
SELECT ... FROM large_table JOIN small_table
INSERT OVERWRITE Statements are NOT Supported
Presto does not support INSERT OVERWRITE Statements. Make sure that you delete the table before using INSERT INTO. See the details here.
Error Message – NOT_SUPPORTED: NULL values are not allowed on the probe side of SemiJoin operator
This error is caused by using NULL IN empty subquery has an issue on result correctness.
The IN predicate determines if any values produced by the subquery are equal to the provided expression. The result of IN follows the standard rules for nulls. The subquery must produce exactly one column.
Example: “user” column contains NULL value, then the following query can be failed.
select * from www_access where user in (select 0 as user_id) => Query 20170526_042346_18708_6huz3 failed: NULL values are not allowed on the probe side of SemiJoin operator. See the query plan for details.
As a workaround, adding IS NOT NULL clause works.
select * from www_access where user in (select 0 as user_id) and user is not null