This article provides a work-around to perform row level deletions from an Arm Treasure Data table.
If you would like to delete rows imported to table within a time range (based on time column), consider using the partial delete feature.
- Basic knowledge of Treasure Data, including the TD toolbelt
- Understanding of Presto or Hive
As you use the procedures in this article, refer to the following articles:
- Presto DELETE Statement Syntax and also Presto DELETE limitations.
- Partial Delete: Efficient for small tables because it reloads the entire table
- Partial Delete based on TIME Column: Efficient for huge tables. You take a superset of rows, which includes rows to be deleted and filtered based on the
Using Presto DELETE Statements
Presto Delete enables you to issues DELETE statement queries against any table in Treasure Data.
- Use the following syntax to isolate the row you want to delete and delete it:
DELETE FROM table_name [ WHERE condition ]
DELETE FROM emp WHERE name = 'beekeeper1'
This method requires the creation of a new table and the movement of data.
- Create a temporary table with the required data from your original table, using the following syntax:
Query Engine Syntax Presto CREATE TABLE tmp AS SELECT * from original_table where col1 is not null Hive
-td table:create <db> tmp
-INSERT INTO TABLE tmp select * from <original_table> where <col1> is not null
- Swap table names of <tmp> and <original_table>. For example:
td table:swap tmp original_table
- Delete the <
Partial Delete Filtered on the TIME Column
Efficient for huge tables. You create a superset of rows, which includes rows to be deleted and filtered based on the
- Load data from your original table to a temporary table within the time range that you want to delete. For example:
Query engine Example Syntax Presto CREATE TABLE tmp AS SELECT from original_table where td_time_range(time, '2018-06-10 12:00:00 ' , ' 2018-06-10 14:00:00 ' ) Hive
-td table:create <db> tmp
-INSERT INTO TABLE tmp select * from <original_table> where td_time_range(time, '2018-06-10 12:00:00', '2018-06-10 14:00:00')
- Using the following syntax, delete data that is in the same time range from your original table.
td table:partial_delete example_db original_table --from 1465560000 --to 1465567200
- Reload the required data into your original table:
Query Engine Example Syntax Presto INSERT INTO original_table select * from tmp where col1 is not null Hive INSERT INTO TABLE original_table select * from tmp where col1 is not null
- Ensure that the original_table has the required data.
- Delete the tmp table.