This example introduces you to a workflow that includes steps for:
- importing data to Treasure Data
- run a Presto query
- sends query results out of Treasure Data
To make this example functional for you:
- Define a data import using the SFTP Data Connector. You can define the connector through the TD Console and it requires several connection details that you will need to provide and use to modify the example workflow.
- Define a data export using Result Output to SFTP. You can define the connector through the TD Console. As with the data import connector you will need to provide connection details and will the connector to modify the example workflow.
- Download the sample project, review the workflow, and modify the workflow definition to match your connection details.
Downloading the Sample Project and Review the Workflow
- Download the sample project from the following URL
- Extract the downloaded project. For example from the command line you could run:
$ tar xzvf ftp_pull_push.tar.gz
- Navigate to the newly created workflow project directory. For example:
$ cd ftp_pull_push
- Verify that the pull_push.dig file contains the following syntax:
_export: td: database: workflow_temp +import_from_sftp: td_load>: imports/sftp_load.yml table: pull_push_tutorial +export_to_sftp: td>: queries/count_rows.sql result_url: ftp://wf_test:***@treasure.brickftp.com/count.csv
td_load>: imports/sftp_load.yml runs a Data Connector job with a precreated yml config.
td_load> operator needs both the
database: and the
table: defined somewhere in the workflow file.
Creating the required tables
To run any data connector job in Treasure Data, you must first have the database and table you’re loading data into created.
- Open the TD Console.
- Navigate to Data Workbench > Databases.
- If workflow_temp doesn't exist, create a New Database and name it workflow_temp.
- Create a new table and name it push_pull_tutorial.
Alternatively, you can use the following command from where you have the TD toolbelt installed:
$ td table:create workflow_temp pull_push_tutorial
Adding a TD Query with Result Export to your Workflow
The sample workflow that you downloaded already contains the syntax to include a query and have the results of the query exported to a specified URL. For example:
+export_to_sftp: td>: queries/count_rows.sql result_url: ftp://wf_test:***@treasure.brickftp.com/count.csv
The workflow definition could be modified to work with your existing queries and output connectors. For example:
+export_to_sftp: td>: queries/count_rows.sql result_url: my_saved_output_name
To create a saved output:
- Navigate to Data Workbench > Queries.
- Select a query that returns results that you would like to export from Treasure Data.
- Open the query and select Export Results.
- Select an integration, for example, select the SFTP connection that you defined as your result output.
- Type in any required values or credentials.
- Save the query.
- Edit the workflow definition.
- Replace 'my_saved_output_name' with the name of the integration you specified as the Export Results To for your query. For example, change 'my_saved_output_name' to 'sftp_result_output'.
Adding the Workflow file to Treasure Data
- From the TD Console, navigate to Data Workbench > Workflows.
- Add a new workflow and name it ftp_pull_push.
- Select a blank template.
- In the workflow definition, copy and paste the following:
- Replace 'wf_test:***@treasure.brickftp.com' with your data export information.
- Save and commit the workflow.
Alternatively, to upload the queries and workflow files into Treasure Data:
- From the machine where TD toolbelt is installed.
- Open a command window.
- Navigate to the ftp_pull_push directory.
- Type the following:
$ td wf push ftp_pull_push
The necessary files for the workflow are now registered and viewable from within the TD Console.
Executing the Workflow on Treasure Data
- From Data Workbench > Workflows > pull_push.
- Select New Run.
- Review the Run History to watch the execution steps of the workflow.
Alternatively, run the following command to execute the workflow already submitted to Treasure Data:
$ td wf start ftp_pull_push pull_push --session now