This article explains how to import data from MySQL database to Treasure Data using embulk-input-mysql input plugin.
- Basic knowledge of Treasure Data.
- Basic knowledge of Embulk
- Embulk and embulk-output-td plugin installed on your machine.
Step 0: Install embulk-input-mysql plugin
To install embulk-input-mysql plugin, run the following command:
$ embulk gem install embulk-input-mysql
Step 1: Create seed configuration file
Using your favorite text editor, create embulk config file (for eg:seed.yml) defining input(MySQL) and output(TD) parameters. Example:
in: type: mysql host: localhost port: 3306 user: username password: password database: mysql_db select: "col1, col2, datecolumn" where: "col4 != 'a'" out: type: td apikey: xxxxxxxxxxxx endpoint: api.treasuredata.com database: dbname table: tblname time_column: datecolumn mode: replace #by default mode: append is used, if not defined. Imported records are appended to the target table with this mode. #mode: replace, replaces existing target table default_timestamp_format: '%d/%m/%Y'
For further details about additional parameters and other examples are available, refer Embulk Input MySQL
Step 2: Guess Fields (Generate load.yml)
Embulk guess option uses
seed.yml to read the target file and automatically guesses the column types/settings and creates a new file
load.yml with this information.
$ embulk guess seed.yml -o load.yml
Now, you may preview the data using
embulk preview load.yml command. If any of the column types or data seems incorrect you may edit
load.yml file directly and preview again to verify. If
guess option doesn’t yield satisfactory results, you may change parameters in
load.yml according to your requirement manually.
Step 3: Execute Load Job
Finally, issue the import job by running the following command:
$ embulk run load.yml
It may take few mins to hours for the job to complete, depending on the size of the data.