Jobs are executed in parallel on the cloud. The degree of parallelism is chosen automatically based on your plan, your remaining capacity, and the size of our cluster at that time.
You can view job information and set job priority using TD Console or using TD toolbelt.
Optionally and upon request, resource pools can be used to help with priority and job processing. For more information, see Hive and Other Hadoop Based Resource Pools.
- Open the TD Console.
- Navigate to Jobs.
You can review the number of jobs which is listed in the upper right of the page.
- Optionally, use filters to reduce the listing of jobs to locate what you are interested in. Including filtering by job owner, date, and database name.
- Select a job to open it and view results, query definition, logs, and other details.
- Each tab has different actions that can be taken. For example:
query results can be
-copied to the clipboard
-downloaded to a file for use later.
syntax of the job can be viewed
a query editor can be launched
log log information can be copied to the clipboard
Setting Job Priority
Scheduled jobs must sometimes be set as high or low priority items.
The very high and high options allow you to submit jobs to the Hadoop cluster at a higher priority than jobs specified with a normal priority. Resources are assigned to higher priority jobs on the Hadoop cluster. Jobs with the same highest level of priority will be run in parallel until all available resources are used. If resources become available, jobs with the next highest level of priority will be run until all available resources are used. For example, if you had a job queue that was similar to the following and you ran a new job that had a 2 priority level, query1 would drop off the queue and the new job would be run in parallel with H-query1 and H-query2.
|Job Priority||Job Name|
|2 (Very High)||H-query1|
|2 (Very High)||H-query2|
These settings can help you prevent normal priority jobs from running in parallel with the high priority jobs.
- Navigate to Data Workbench > Queries.
- Select the query for which you want to set priority.
- Select the ellipsis.
- Select Edit query settings.
- Set the Priority.
- Select OK to save changes and exit.
Optionally, from the command line you can use:
td query -P <priority> <query_id>
|very low||-2||$ td query -P -2 <query_id>|
|high||1||$ td query -P 1 <query_id>|
Typically, when using the TD toolbelt to run a query you want to wait for the job to complete. For example, use the -w option:
$ td query -w -d testdb "SELECT COUNT(1) FROM www_access"
Job 702 is started.
If you issue a query without using the -w option, the command ends immediately after submitting the job. For example:
$ td query -d testdb "SELECT COUNT(1) FROM www_access"
Job 704 is started.
Use 'td job 704' to show the status.
You can output the job results into local disk as CSV format, instead of STDOUT.
$ td query -o test.csv --format csv -w -d testdb 'SELECT COUNT(1) FROM www_access'
Status : success
Result : written to test.csv in csv format
The td jobs command lists your submitted jobs. The most recent 20 jobs are shown by default.
$ td jobs
To see older jobs, specify the page number as an argument. For example:
|command example||to view|
|$ td jobs -p 0||the most recent 20 jobs|
|$ td jobs -p 1||jobs 21-40|
|$ td jobs -p 2||jobs 41-60|
TD Job <job_ID>
The command shows a specific job’s detailed information.
$ td job 10349872
TD Job Types
Within Arm Treasure Data, there are several different job types:
|Presto||A query issued using the Presto engine|
|Hive||A query issued using the Hive query language.|
|Data Import||A job created to import data into Treasure Data.|
|Result Export||A job issued for a result export query.|
|PartialDeleteJob||A job created from the Partial Delete.|
|Legacy Bulk Import||A job created from the Bulk Import command.|