This article explains the different options for processing data within Arm Treasure Data. Data processing within the platform includes import and exporting data, creating queries and running jobs, as well as managing workflows.
|You can perform most data processing tasks directly on TD Console at https://console.treasuredata.com.|
Data Processing Engine Option
Treasure Data allows users to issues queries from API, JDBC/ODBC, the TD Console, via scheduled queries, or our hosted workflow execution framework.
All of these issued queries are managed as separate jobs. Treasure Data provides two major ways of processing data for data collected from both batch and streaming sources. For every query you issue, you can specify one of the following data processing engines:
- Presto for ad-hoc and shorter batch workloads. Presto provides low-latency SQL access to the data set.
- Hive for large or complex batch workloads. Hive is a MapReduce-based SQL engine. This engine is really powerful when you do large data processing and heavy JOINs. Often used for ETL or sessionization.
Treasure Data has a scheduler feature called Scheduled Jobs that supports periodic query execution. This allows you to launch hourly, daily, weekly, or monthly jobs, without having to use a cron daemon.
We take great care in distributing and operating our scheduler to achieve high availability. You can use any of the engines listed in the preceding section for scheduled jobs.
Input Transfers (Data Import)
After establishing your data transfer, you can manage data input as a job.
Export Results is used to push Treasure Data’s query results into other systems. For example, other systems MySQL, PostgreSQL, RedShift, Google Spread Sheet, and FTP. By using this feature, you can integrate Treasure Data with your existing system instantly.
Queries are used to select your data. Establish your query, make sure the Export Results checkbox is selected, save, and run your query. For ongoing management of the export results, use the jobs area of the TD Console.
You can use workflows to manage and automate jobs, perform incremental processing, and for data segmentation.