This article explains Treasure Data’s bulk-export feature, which lets you dump data into your Amazon S3 bucket.
At Treasure Data, we believe that your data belongs to you, even after importing it to our platform. We believe that vendor-lockin MUST be stopped.
|We're limiting export capability to only us-east region S3 bucket.|
- Basic knowledge of Treasure Data, including the Treasure Data Toolbelt.
- Amazon AWS account and Amazon S3 bucket.
td table:export command will dump all the data uploaded to TD into your Amazon S3 bucket. Please specify the database and table from which to dump your data.
$ td table:export database_name table_name \ --s3-bucket <S3_BUCKET_NAME> \ --prefix <S3_FILE_PREFIX> \ --aws-key-id <AWS_KEY> \ --aws-secret-key <AWS_SECRET_KEY> \ --file-format jsonl.gz
|We highly recommend to use jsonl.gz or tsv.gz format, because we have specific performance optimizations. Other formats are way slower.|
The dump is performed via MapReduce jobs. Where the location of the bucket is expressed as an S3 path with the AWS public and private access keys embedded in it.
usage: $ td table:export <db> <table> example: $ td table:export example_db table1 --s3-bucket mybucket -k KEY_ID -s SECRET_KEY description: Dump logs in a table to the specified storage options: -w, --wait wait until the job is completed -f, --from TIME export data which is newer than or same with the TIME (unixtime e.g. 1446617523) -t, --to TIME export data which is older than the TIME (unixtime e.g. 1480383205) -b, --s3-bucket NAME name of the destination S3 bucket (required) -p, --prefix PATH path prefix of the file on S3 -k, --aws-key-id KEY_ID AWS access key id to export data (required) -s, --aws-secret-key SECRET_KEY AWS secret access key to export data (required) -F, --file-format FILE_FORMAT file format for exported data. Available formats are tsv.gz (tab-separated values per line) and jsonl.gz (JSON record per line). The json.gz and line-json.gz formats are default and still available but only for backward compatibility purpose; use is discouraged because they have far lower performance. -O, --pool-name NAME specify resource pool by name -e, --encryption ENCRYPT_METHOD export with server side encryption with the ENCRYPT_METHOD -a ASSUME_ROLE_ARN, export with assume role with ASSUME_ROLE_ARN as role arn --assume-role
Support Server-side Encryption
Server-side encryption is about protecting data at rest. Our Bulk Export supports some of Server-side encryption.
td table:export command with
--encryption ENCRYPT_METHOD is able to dump all the data uploaded to TD into your encrypted storage. This option is available in td command since version 0.14.0.
The following command is a example for x-amz-server-side-encryption: AES256 on S3:
$ td table:export example_db table1 -F jsonl.gz --s3-bucket mybucket -k KEY_ID -s SECRET_KEY --encryption s3
Best Practices for Achieving Time Partitioning in S3 of Data Exported using Bulk Export
A common question we’ve received is “Can Treasure Data make sure the data exported through this process is partitioned according to hourly bucket, similar as the partitioning strategy that which is maintained within the core Treasure Data system?”
Unfortunately, as noted above, the Bulk Export command no longer supports partitioning of exported data. This is to optimize the speed of export, which the majority of users found too slow to meet their requirements.
If you do require partitioning, we recommend using this command to export 1 hour segments at a time – automating the process with a script. While we know this isn’t the most convenient approach, please use this approach to achieve any time-based partitioning in your bulk exported data. We will continue to consider improvements in this process for the future.
- Protecting Data Using Server-Side Encryption
- Protecting Data Using Server-Side Encryption with Amazon S3-Managed Encryption Keys (SSE-S3)