In this tutorial, using the command line interface, you will run your first workflow of two Treasure Data Presto jobs, one that runs right after the other.
Let’s get started!
Install TD Toolbelt
Use the TD Toolbelt to interact with Treasure Data’s many services. If not already installed and configured, complete the following commands in your terminal.
First, visit Installing and Updating the Treasure Data CLI.
# Set up the toolbelt to access your account $ td account # Follow prompts for inputting your Treasure Data username & password
If you already have TD Toolbelt installed, update it to the latest version.
$ td update
Now install the TD Toolbelt Workflow module by running the workflow command. Answer
Y when prompted.
$ td workflow
td workflow command can also be abbreviated to
td wf. We will be using this shorter form throughout the rest of this tutorial.
Prepare the database on Treasure Data to run this tutorial within
To run this tutorial you’ll need to create the following database in your Treasure Data account. Run this command using TD Toolbelt.
$ td db:create workflow_temp
Create your first workflow project
Download sample workflow project
With this first example, we’ll help you a bit by having you download your first Workflow project directory. This will include a sample workflow & Presto SQL commands that you’ll be running.
# Download the sample project $ cd ~/Downloads $ curl -o nasdaq_analysis.zip -L https://gist.github.com/danielnorberg/f839e5f2fd0d1a27d63001f4fd19b947/raw/d2d6dd0e3d419ea5d18b1c1e7ded9ec106c775d4/nasdaq_analysis.zip # Extract the downloaded project $ unzip nasdaq_analysis.zip # Enter the workflow project directory $ cd nasdaq_analysis
Check out the contents of your workflow file
# Print the contents of the workflow file. $ cat nasdaq_analysis.dig
The workflow that prints is made up of 3 sections, described as follows:
In section 1, you see the definition for on what interval the workflow will run:
timezone: UTC schedule: daily>: 07:00:00
In section 2, you will see how to choose what Treasure Data database the workflow will run against
_export: td: database: workflow_temp
In section 3, you will see the workflow definition comprised of 2 tasks.
+task1: td>: queries/daily_open.sql create_table: daily_open +task2: td>: queries/monthly_open.sql create_table: monthly_open
+ signifies a new task. The text that follows before the
: is the name you give the task.
td> signifies that the query that follows will run against Treasure Data. This is automatically set to run a Presto query. The
> signifies that this is where the “action” part of the task is defined – the specific processing to run.
create_table:___ parameter will do a “Drop table if exists + create table as” operation, creating the new table based on the output of the task’s query.
Run the workflow
Typically, when developing your workflow you will start by editing a workflow from your local machine. You can run & iterate on a workflow of steps that all occur within the TD cloud environment, while creating the workflow definition & execution pattern locally.
Before running your first workflow, we recommend opening up your jobs page so you can see the execution happen live.
This command lets you run the sample workflow once, from your local machine. It will not be scheduled until you push the workflow to Treasure Data.
$ td wf run nasdaq_analysis Running workflow “nasdag_analysis”...
You’ve run your first workflow!
Optional: See that your workflow executed.
This workflow created two tables, named
monthly_open in the database
You can use td toolbelt to see some basic information on the created tables as follows:
$ td table:show workflow_temp daily_open $ td table:show workflow_temp monthly_open
Submit the workflow to Treasure Data
Now that you’ve created a workflow, you will often want it to run on a scheduled basis. Remember, that we defined the schedule as follows:
timezone: UTC schedule: daily>: 07:00:00
Run this command to submit the workflow to Treasure Data:
$ td wf push nasdaq_analysis Submitting workflow "nasdaq_analysis"...
That’s it! Now your workflow of steps will run every day at 7am UTC!
List the workflows registered on Treasure Data
$ td wf workflows
You can also see the definition of your submitted workflow, as pulled from Treasure Data
# This command takes the form of: # `td wf workflows <project_name> <workflow_name>` $ td wf workflows nasdaq_analysis nasdaq_analysis
Find out what workflows are scheduled to run next on Treasure Data
$ td wf schedules
Learn more about using Treasure Workflows with the following tutorials:
If you have any ideas or feedback on this tutorial, we’d welcome them here!