The Presto query engine has some known limitations.
Column Name Escaping
When a column name matches the name of a reserved keyword, the name of the column needs to be quoted.
- In Hive, the quotation character is the back tick character `.
SELECT `join` FROM mytbl
- In Presto, quoting of a column name is accomplished with the double quote character instead “.
SELECT "join" FROM mytbl
Quoting a column name in single quotes ‘ will make the query parser interpret the content as a simple string that might produce an unexpected result. For example:
SELECT "join" FROM mytbl LIMIT 10 result: value1 value2 value3 value4 ... value10 SELECT 'join' FROM mytbl result: join join join join ... join
48 Hour Limit for Presto Queries
Presto queries that run more than 48 hours are automatically cancelled. Typically, Presto queries that run more than a day or so are failing queries.
For long running queries, you can rewrite the Presto query as a Hive query.
Table Schemas are Required
When querying using Presto, you must set a schema for the tables. The
v syntax doesn’t work.
Presto does not support cost-based JOIN optimizations. This means that JOINs are not automatically reordered based on table size. Make sure that smaller tables are on the right hand size of JOIN, and they must fit in memory. Otherwise out of memory exceptions cause the queries to fail.
SELECT ... FROM large_table JOIN small_table
INSERT OVERWRITE Statements are NOT Supported
Presto does not support INSERT OVERWRITE statements. Make sure that you delete the table before using INSERT INTO. See Presto Query FAQs.
Error Message – NOT_SUPPORTED: NULL values are not allowed on the probe side of SemiJoin operator
This error is caused by using NULL IN an empty subquery that has an issue with result correctness.
The IN predicate determines if any values produced by the subquery are equal to the provided expression. The result of IN follows the standard rules for nulls. The subquery must produce exactly one column.
For example, the user column contains a NULL value, which might cause the following query to fail.
select * from www_access where user in (select 0 as user_id) => Query 20170526_042346_18708_6huz3 failed: NULL values are not allowed on the probe side of SemiJoin operator. See the query plan for details.
As a workaround, use the IS NOT NULL clause instead.
select * from www_access where user in (select 0 as user_id) and user is not null
Error Message - Killed by the system because this query generates output for more than 200GB.
This error is caused by trying to generate more than 200GB output by a single query.
As a workaround, use result_output_redirect option or CREATE TABLE AS.
- result_output_redirect option
Solution: [Experimental] Use result_output_redirect
- CREATE TABLE AS
Solution: Use CREATE TABLE (table) AS SELECT …
Error Message - Accessing too many(N) tables. Maximum 300 tables allowed
This error is caused by trying to scan more than 300 tables by a single query.
As a workaround, tune the SQL as reducing scan tables, or divide single query to multiple jobs.
Error Message - Creating partitions(N) exceeds the limit(500000)
This error is caused by trying to commit more than 500,000 (500K) partitions by a single query.
As a workaround, use User Defined Partitioning (UDP) or tune parameters for UDP.
- Use User Defined Partitioning
User Defined Partitioning for Presto
- Tune parameters for UDP. For example:
- Reduce bucket_count to 256. (512 in default)
- Update max_time_range to adopt more broad time range.
- Increase max_file_size to 512MB. (256MB in default)