In this tutorial, using the command line interface, you run your first workflow of two Treasure Data Presto jobs, one that runs right after the other.
Let’s get started!
Install TD Toolbelt
Use the TD Toolbelt to interact with Treasure Data’s many services. If not already installed and configured, complete the following commands in your terminal.
Set up the toolbelt to access your account
$ td account
Follow the prompts for inputting your Treasure Data username and password.
If you already have TD Toolbelt installed, update it to the latest version.
$ td update
Now install the TD Toolbelt Workflow module by running the workflow command.
$ td workflow
Y when prompted.
td workflow command can also be abbreviated to
td wf. We will be using this shorter form throughout the rest of this tutorial.
Create the workflow_temp database
To run this tutorial you’ll need to create the following database in your Treasure Data account. Run this command using TD Toolbelt.
$ td db:create workflow_temp
Create your first workflow project
Download sample workflow project
With this first example, we’ll help you a bit by having you download your first Workflow project directory. This will include a sample workflow and Presto SQL commands that you’ll be running.
Navigate to the download directory and download the sample project:
$ cd ~/Downloads $ curl -o nasdaq_analysis.zip -L https://gist.github.com/danielnorberg/f839e5f2fd0d1a27d63001f4fd19b947/raw/d2d6dd0e3d419ea5d18b1c1e7ded9ec106c775d4/nasdaq_analysis.zip
Extract the downloaded project:
$ unzip nasdaq_analysis.zip
Navigate to the workflow project directory:
$ cd nasdaq_analysis
Check out the contents of your workflow file
Print the contents of the workflow file.
$ cat nasdaq_analysis.dig
The workflow that prints is made up of 3 sections, timezone, export and tasks:
In section 1, you see the definition for on what interval the workflow will run:
timezone: UTC schedule: daily>: 07:00:00
In section 2, you see how to specify theTreasure Data database for which the workflow will run.
_export: td: database: workflow_temp
In section 3, you see that the workflow definition has two tasks.
+task1: td>: queries/daily_open.sql create_table: daily_open +task2: td>: queries/monthly_open.sql create_table: monthly_open
+ signifies a new task. The text that follows before the
: is the name you give the task.
td> signifies that the query that follows will run against Treasure Data. This is automatically set to run a Presto query. The
> signifies that this is where the “action” part of the task is defined – the specific processing to run.
create_table:___ parameter will do a “Drop table if exists + create table as” operation, creating the new table based on the output of the task’s query.
Run the workflow
Typically, when developing your workflow you will start by editing a workflow from your local machine. You can run and iterate on a workflow of steps that all occur within the TD environment, while creating the workflow definition and execution pattern locally.
Before running your first workflow, we recommend opening up your jobs page so you can see the execution happen live.
This command lets you run the sample workflow once, from your local machine.
$ td wf run nasdaq_analysis
Running workflow “nasdag_analysis”... You’ve run your first workflow!
Optional: See that your workflow executed
This workflow created two tables, named
monthly_open in the database
You can use TD Toolbelt to see some basic information on the created tables as follows:
$ td table:show workflow_temp daily_open $ td table:show workflow_temp monthly_open
Submit the workflow to Treasure Data
To run the workflow on a scheduled basis, add the following:
timezone: UTC schedule: daily>: 07:00:00
Run this command to submit the workflow to Treasure Data:
$ td wf push nasdaq_analysis Submitting workflow "nasdaq_analysis"...
That’s it! Now your workflow of steps will run every day at 7am UTC!
List the workflows registered on Treasure Data
To retrieve a list of projects and workflows defined in your Treasure Data environment, issue the following:
$ td wf workflows
A list is returned along with an ending statement of:
Use `td workflow workflows <project-name> <name>` to show details.
To see the definition of your submitted workflow, use td workflow workflows <project-name> <name>. For example:
$ td wf workflows nasdaq_analysis nasdaq_analysis
Find out what workflows are scheduled to run next on Treasure Data
$ td wf schedules