When your data is stored in Arm Treasure Data, it is partitioned according to its timestamps. Data is partitioned on the time column, generally into one hour partitions.
By constraining the ‘time’ column, you can avoid processing an entire data set and can instead have more targeted data processing. The partitioning enables good performance, efficient data management, and increased availability.
In the following examples, only records that fit the specified time range are selected.
--example 1: SELECT ... WHERE TD_TIME_RANGE(time, NULL, '2013-01-01', 'PDT') --example 2: SELECT COUNT(1) FROM table WHERE TD_TIME_RANGE(time, ‘2017-07-01’, ‘2017-07-02’, ‘UTC’)
TD provides real-time storage and archive storage for customer data. The partitioned data is moved into real-time storage and then moved into archive storage. Data imported into Treasure Data through a streaming import API, such as td-agent and JS/Mobile SDK, is initially stored in the real-time storage. Data connectors or the bulk import API load data directly into archive storage.
User-defined partitioning is an alternative to timestamp-based partitioning. User-defined partitioning allows other data partitioning strategies that can improve performance when working with non-time-series data. For more information, see User Defined Partitioning.
Movement of Partitioned Data in Treasure Data
Accessing Partitioned Data in Treasure Data
When data is imported into Treasure Data through a streaming import, users can immediately query data. Real-time storage is meant to hold many small files with few records that users can query using the timestamp. However, the data does not remain in real-time storage. Treasure Data moves the data from real-time storage to archive storage where large volumes of data can be more efficiently managed.
The process of merging the data into the archive storage is performed on a regular basis. Similar to real-time storage, the data moved to the archive storage is partitioned into hourly buckets, according to the value in the ‘time’ column. Partitioning by Time enables efficient data scanning; you avoid reading unnecessary data. You can use the time column in a query, using TD_TIME_RANGE, or to delete data with a specified time range with the partial_delete command and Presto DELETE command.
For examples of how to use time-based partitioning in Treasure Data, refer to: