The Arm Treasure Data CLI (‘Command Line Interface’ or ‘Toolbelt’) allows you to create databases and tables, import or export data into and from the tables, set and modify the table schema, issue queries, monitor job status, view and download job results, create schedule queries, and much more.
Step 1: Installation & Update
Install the Treasure Data Toolbelt to set up your local workstation with td, the Treasure Data command-line client. Refer to Installing the Treasure Data CLI for information on how to install the CLI in various environments.
The page also contains important information about updating the td CLI.
Step 2: Authorize
After you’ve installed the toolbelt, you’ll have access to the td command from your command line. The next you need to setup the credentials for your account.
There are two ways to sign up: password based authentication and Google Single Sign On. Regardless on how your signed up, the toolbelt requires your TD API key to authorize its requests – the information will be saved in the ~/.td/td.conf file for future use onwards. Here is how you set it up:
You can setup the account credentials with the td account command. Use the user name and password that you used when you signed up:
$ td -e https://api.treasuredata.com account -f Email: email@example.com Password (typing will be hidden): Authenticated successfully.
This command will create or update the ~/.td/td.conf file with your master API key.
Google SSO Users
$ td apikey:set <your_apikey>
This command will create or update the ~/.td/td.conf file with the provided API key.
The default endpoint is also setup in the configuration file – if you need to setup a different endpoint, use the td server:endpoint command:
$ td server:endpoint https://api.treasuredata.com
Step 3: Query the Sample Dataset
Let’s issue an SQL query. Out of the box, we have a table called www_access in the database called sample_datasets. The following query calculates the distribution of HTTP status codes.
$ td query -w -d sample_datasets \ "SELECT code, COUNT(1) AS cnt FROM www_access GROUP BY code" queued... started at 2012-04-10T23:44:41Z 2012-04-10 23:43:12,692 Stage-1 map = 0%, reduce = 0% 2012-04-10 23:43:18,766 Stage-1 map = 100%, reduce = 0% 2012-04-10 23:43:29,925 Stage-1 map = 100%, reduce = 33% 2012-04-10 23:43:32,973 Stage-1 map = 100%, reduce = 100% Status : success Result : +------+------+ | code | cnt | +------+------+ | 404 | 17 | | 500 | 2 | | 200 | 4981 | +------+------+
The command above will take about 15-45 seconds, owing mainly to the overhead in setting up jobs within the cloud-based MapReduce engine.
Issue Query Idempotent by Domain Key for your batch
Beginning with td command v0.14, the td query command supports a domain key. By means of a domain key, clients can ensure the submission of queries becomes idempotent. For more information about Idempotency and the mechanism, please refer to the corresponding REST API article.
$ td query -d sample_datasets --domain-key domainkey-test -T presto "select * from www_access" Job 92034375 is queued. Use 'td job:show 92034375' to show the status. # In case the command has not responded with a job ID (because of any API issue), # you can issue the same query with the same domain key again safely $ td query -d sample_datasets --domain-key domainkey-test -T presto "select * from www_access" Error: Query failed: ["Domain key has already been taken"]: conflicts_with job:92034375
Step 4: Import Data Into A Table
You’re now ready to import your real data to the cloud! The following tutorials will explain how to import your data (e.g. Application Logs, Middleware Logs) from various sources. For a deeper understanding of the platform, refer to the architecture overview article.
This example shows how to use the CLI to generate a sample apache log in json format and import it into a brand new table in the ‘own_database’ database.
$ td sample:apache sample_apache.json # If you don't have an own database, you need to create it to import data at first. $ td database:create own_database $ td table:import own_database sample_tbl \ --auto-create-table -f json sample_apache.json
Languages and Frameworks
|Ruby or Rails||Java||Perl|
Running td help:all shows the commands available in Treasure Data:
$ td help:all database:list # Show list of tables in a database database:show <db> # Describe a information of a database database:create <db> # Create a database database:delete <db> # Delete a database ....
If you want more information about individual commands, you can run td help <command>:<subcommand>, e.g.,
$ td help table:list usage: $ td table:list [db] example: $ td table:list $ td table:list example_db $ td tables description: Show list of tables options: -n, --num_threads VAL number of threads to get list in parallel --show-bytes show estimated table size in bytes
See the td command line tool reference page for a complete list of commands and their helpers.