The Data Connector for Google DDP via SFTP enables you to import status from GoogleDoubleClick Data Platform's SFTP server into Arm Treasure Data.
Prerequisites
- Basic knowledge of Treasure Data
Configure connection
Go to Integrations Hub > Catalog and search and select. Search for and select SFTP Doubleclick Data Platform (hint: search for 'sftp double'):
The following dialog opens:
SFTP credentials are from Google DDP, hence you don't need to provide connection. Just name your new Google DDP via SFTP connection:
Usages
TD Console
Configure the connection by specifying the parameters
Connection parameter:
- File names: The names of files that you want to capture status from. The file names are captured from the Google Adwords on DDP job output logs.
Next, you see a Preview of your data similar to the following dialog. To make changes, click on Advanced Settings otherwise, click Next.
From here, if you want to change some options such as limiting the total file, click Advanced Settings:
The next step is to select the database and table where you want to transfer the data, as shown in the following dialog:
Finally, specify the schedule of the data transfer using the following dialog and click Start Transfer
Command-line interface
Install ‘td’ command v0.11.9 or later
Install the most current Treasure Data Toolbelt.
$ td --version 0.11.10
Create Seed Config File (seed.yml)
in: type: sftp_ddp
file_names:
- dmp_20180926_123456789.dat out: mode: append exec: {}
Guess Fields (Generate load.yml)
Use connector:guess. This command automatically reads the source file, and assesses (uses logic to guess) the file format.
$ td connector:guess seed.yml -o load.yml
If you open load.yml, you’ll see the guessed file format definitions. This example is trying to load csv files.
in: type: sftp_ddp
file_names:
- dmp_20180926_10130066a050f4e68b46d5b052abaedb05dd2f.dat out: mode: append exec: {}
Then, you can preview how the system will parse the file by using the preview command.
$ td connector:preview load.yml +------------+---------------------------------+ | recordInfo | rawData | +------------+---------------------------------+ | | 1 lines were processed correctly| | 1111111111 | Line 1111111111 could not find |
Execute Load Job
Finally, submit the load job
$ td connector:issue load.yml --database td_sample_db --table td_sample_table
The connector:issue command assumes that you have already created a database(td_sample_db) and a table(td_sample_table). If the database or the table do not exist in TD, the connector:issue command fails, so create the database and table manually or use --auto-create-table option with td connector:issue command to auto create the database and table:
$ td connector:issue load.yml --database td_sample_db --table td_sample_table --time-column created_at --auto-create-table
Scheduled execution
You can schedule periodic Data Connector execution for incremental SFTP DDP file import. We manage our scheduler to ensure high availability. By using this feature, you no longer need a cron daemon on your local data center.
Create the schedule
A new schedule can be created using the td connector:create command. The following are required: the name of the schedule, the cron-style schedule, the database and table where the data will be stored, and the Data Connector configuration file.
$ td connector:create \ daily_import \ "10 0 * * *" \ td_sample_db \ td_sample_table \ load.yml
The `cron` parameter also accepts three special options: `@hourly`, `@daily` and `@monthly`.
By default, schedule is setup in UTC timezone. You can set the schedule in a timezone using -t or --timezone option. Note that `--timezone` option supports only extended timezone formats like 'Asia/Tokyo', 'America/Los_Angeles' etc. Timezone abbreviations like PST, CST are *not* supported and may lead to unexpected schedules.
List the Schedules
You can see the list of currently scheduled entries by running the command td connector:list.
$ td connector:list +--------------+--------------+----------+-------+--------------+-----------------+------------------------------------------------+ | Name | Cron | Timezone | Delay | Database | Table | Config | +--------------+--------------+----------+-------+--------------+-----------------+------------------------------------------------+ | daily_import | 10 0 * * * | UTC | 0 | td_sample_db | td_sample_table | {"in"=>{"type"=>"sftp_ddp", "access_key_id"....| +--------------+--------------+----------+-------+--------------+-----------------+------------------------------------------------+
Show the Setting
td connector:show shows the execution setting of a schedule entry.
% td connector:show daily_import Name : daily_import Cron : 10 0 * * * Timezone : UTC Delay : 0 Database : td_sample_db Table : td_sample_table Config ---
in:
type: sftp_ddp
file_names:
- dmp_20180926_10130066a050f4e68b46d5b052abaedb05dd2f.dat
out:
mode: append
exec: {}
Delete the Schedule
td connector:delete removes the schedule.
Troubleshooting
Review the job log. Warning and errors provide information about the success of your import.
You can read more about locating the source file names associated with import errors.
Comments
0 comments
Article is closed for comments.