Arm Treasure Data recommends using Embulk instead of td import command. This tool is under maintenance mode.
This article explains how to bulk-import data using the
td import command for version 0.10.84 and above.
- Basic knowledge of Treasure Data, including the toolbelt.
- Java runtime (6 and above).
Why Bulk Import?
Because Treasure Data is a cloud service, the data needs to be transferred via an Internet network connection. This can get tricky once the data size gets big (> 100MB). Consider a couple of cases:
- If the network becomes unstable, the import could fail halfway through the data transfer. There’s no easy way to pick up from where you left off, and you will need to restart the upload from scratch.
- Your company sometimes has bandwidth limits for transferring huge data with a single stream. Also, the limitations of the TCP/IP protocol make it difficult for applications to saturate a network connection.
We designed our bulk import feature to overcome these problems. You can now break your larger data sets into smaller chunks and upload them in parallel. If the upload of a particular chunk fails, you can restart the upload for that chunk only. This parallelism will improve your overall upload speed.
Because bulk import is a complex way to achieve performance reliability, you can use these short cuts to achieve your goal.
- Legacy Bulk Import from JSON file
- Legacy Bulk Import from Amazon S3
- Legacy Bulk Import from MySQL
- Legacy Bulk Import from MongoDB
To understand the bulk import internals or tips and tricks, please refer the documents below.