The Presto query engine has some limitations compared to Hive at this stage. Arm Treasure Data welcomes your feedback and will make improvements based on these.
Column Name Escaping
When a column name matches the name of a reserved keyword, the name of the columns needs to be quoted. While in Hive, as most the SQL-based query languages, the quotation character is the back tick character `, in Presto quoting of a column name is accomplished with the double quote character instead “. See the comparison below:
SELECT `join` FROM mytbl # Hive SELECT "join" FROM mytbl # Presto
Please be aware that quoting a column name in single quotes ‘ will make the query parser interpret the content as a simple string producing a perhaps unexpected result:
SELECT "join" FROM mytbl LIMIT 10 result: value1 value2 value3 value4 ... value10 SELECT 'join' FROM mytbl result: join join join join ... join
Table Schemas are Required
When querying via Presto, you must set a Schema for the table(s); the
v syntax doesn’t work.
Presto does not currently support cost-based JOIN optimizations, meaning JOINs are not automatically reordered based on table size. Please make sure that smaller tables are on the right hand size of JOIN, and they must fit in memory. Otherwise out of memory exceptions will cause the query to fail.
SELECT ... FROM large_table JOIN small_table
INSERT OVERWRITE Statements are NOT Supported
Presto does not currently support INSERT OVERWRITE Statements. Please delete table before INSERT INTO. See the detail here.
Error Message – NOT_SUPPORTED: NULL values are not allowed on the probe side of SemiJoin operator
This error is caused by using NULL IN empty subquery has an issue on result correctness.
The IN predicate determines if any values produced by the subquery are equal to the provided expression. The result of IN follows the standard rules for nulls. The subquery must produce exactly one column.
Example: “user” column contains NULL value, then the following query can be failed.
select * from www_access where user in (select 0 as user_id) => Query 20170526_042346_18708_6huz3 failed: NULL values are not allowed on the probe side of SemiJoin operator. See the query plan for details.
As a workaround, adding IS NOT NULL clause works.
select * from www_access where user in (select 0 as user_id) and user is not null