In this tutorial we will run a workflow that includes importing data to Treasure Data, running a presto query that results in the output being sent out of Treasure Data.
For importing data into Treasure Data, will will be utilizing Data Connectors. If you don’t know much about this functionality, you can read our overview. If you’d like to know more about what sources we can currently pull data from, you can see the list from “Data Collection (Bulk)” section.
For this tutorial, we will be using our SFTP Data Connector.
For exporting data out of Treasure Data, we will be using our Result Output (Please see “Data Delivery and Activation” section) functionality. This allows you to run a query and submit it’s results to another service.
We will using Result Output to SFTP.
TD Workflows Introductory Tutorial
If you haven’t already, be sure to go through the Introductory Tutorial to install TD Workflows and submit a sample workflow project to Treasure Data.
Download Project & Review the Workflow
# Download the sample project $ cd ~/Downloads $ curl -o ftp_pull_push.tar.gz -L https://gist.github.com/rbparrish/8547d6cf5f835a505f8eb4f68ec1bc23/raw/67681fa0275b1f53e29117fe6d6e0993a25bb83e/ftp_pull_push.tar.gz # downloads the sample project # Extract the downloaded project $ tar xzvf ftp_pull_push.tar.gz # Enter the workflow project directory $ cd ftp_pull_push
The workflow you download is as follows:
_export: td: database: workflow_temp +import_from_sftp: td_load>: imports/sftp_load.yml table: pull_push_tutorial +export_to_sftp: td>: queries/count_rows.sql result_url: ftp://wf_test:***@treasure.brickftp.com/count.csv
|This Data Connection yml configuration is not operational. You will need to replace it with your own configuration to run this project yourself.|
Data Connector Operator
The first difference you may notice is the use of the
td_load> operator. This runs a Data Connector job with a precreated yml config.
td_load> operator needs both the
table: defined somewhere in the workflow file.
+import_from_sftp: td_load>: imports/sftp_load.yml table: pull_push_tutorial
Create required tables
In order to run any Data Connector job in Treasure Data, you must first have the database & table you’re loading data into created. Use this command to do so.
$ td table:create workflow_temp pull_push_tutorial
TD Query Operator with Result Output
For exporting data out of Treasure Data, we are running a normal presto query using the
td> operator, and then also including the parameter
result_url: for defining what 3rd party source to send the result of the query to.
+export_to_sftp: td>: queries/count_rows.sql result_url: ftp://wf_test:***@treasure.brickftp.com/count.csv
|You can also use a saved result output (a "favorite") that you created in the console! This instead of the URL notation included in this example. See how below.|
If you use the save query, the configuration would instead look like this:
+export_to_sftp: td>: queries/count_rows.sql result_url: my_saved_output_name
You can create a saved output In order to create a saved output you: 1. Go to create a new query 2. Select “add” in the result export section 3. Choose the service that you want to send your data to under “Export to:” 4. Then put in your credentials and add the name under 5. Click “Add to Favorites”
Now you can use the text string of your new favorite output within your workflow!
Submit Workflow to Treasure Data
$ td wf push ftp_pull_push
Execute the Workflow on Treasure Data
|When learning to use TD Workflows, it can be helpful to also follow along on the jobs page in Treasure Data while running your workflows. https://console.treasuredata.com/jobs|
Run the following command to execute the workflow already submitted to Treasure Data
$ td wf start ftp_pull_push pull_push --session now
And that’s it! Now try creating your own.
If you have any feedback we welcome hearing your thoughts on our TD Workflows ideas forum.
Also, if you have any ideas or feedback on the tutorial itself, we’d welcome them here!