This article explains the different options for processing data within Arm Treasure Data. Data processing within the platform includes import and exporting data, creating queries and running jobs, as well as managing workflows.
|You can perform most data processing tasks directly on our console at https://console.treasuredata.com.|
Data Processing Options
Treasure Data provides two major ways of processing data for data collected from both batch and streaming sources.
Data Processing with Multiple Engines
Treasure Data allows users to issues queries from API, JDBC/ODBC, the web console, via scheduled queries, or our hosted workflow execution framework.
All of these issued queries are managed as separate jobs (see Job Management). For every query you issue, you can specify which data processing engine to use. Currently, we’re supporting two different data processing engines:
Heavy Lifting SQL (Hive)
Hive is a MapReduce-based SQL engine. This engine is really powerful when you do large data processing and heavy JOINs. Often used for ETL or sessionization.
Interactive SQL (Presto)
Presto provides low-latency SQL access to the dataset.
Treasure Data has a scheduler feature called Scheduled Jobs that supports periodic query execution. This allows you to launch hourly / daily / weekly / monthly jobs, without having a cron daemon.
We take great care in distributing and operating our scheduler in order to achieve high availability. You can use any of the engines listed in the preceding section for scheduled jobs.
After establishing your data transfer, you can manage data input as a job.
Result Output (or Output Results) is a feature to push Treasure Data’s query result into other systems, such as RDBMS (MySQL, PostgreSQL, RedShift), Google Spread Sheet, FTP, etc. By using this feature, you can integrate Treasure Data with your existing system instantly.
Queries and jobs are used for data output. You can use Scheduled Jobs with Result Output, so that you can periodically launch Treasure Data jobs and write the result somewhere else.
You can find the Output Results option as a checkbox in the Query Editor. Use the option to set up an export of query results.
You can use TD Workflow to manage and automate jobs. TD Workflow is often used for incremental processing and is also used in data segmentation.