This article explains how to write job results to your Salesforce.com organization.
- Basic knowledge of Arm Treasure Data, including the toolbelt
- Salesforce.com organization and username, password, and security token for API integration
- User has “API Enabled” permission
- Target Salesforce.com Object should exist with read/write permissions for the User
A front-end application streams data to be collected in Treasure Data via Log/data collector daemon or Mobile SDKs. You can also bulk import your data using Bulk Import from the CLI. A scheduled query is setup in Treasure Data to run periodically on the data and write the result of each query execution into your Salesforce.com Object.
The above is a fairly common architecture; for example:
Example 1: Ranking: What are the “Top N of X?”
Every social/mobile application calculates the “top N of X” (ex: top 5 movies watched today). Treasure Data already handles the raw data warehousing; the “write-to-Salesforce.com” feature enables Treasure Data to find the “top N” data as well.
Example 2: Dashboard Application
If you’re a data scientist, you need to keep track of a range of metrics every hour/day/month and make them accessible via visualizations. Using this “write-to-Salesforce.com” feature, you can streamline the process and focus on building visualizations of your query results via Reports and Dashboards on the Salesforce.com organization.
Result Output URL Format
The result output target is represented by a URL with the following format:
- sfdc is an identifier for result output to Salesforce.com;
- username and password are the credential to your Salesforce.com organization;
- security_token is the additional credential for API access;
- hostname is the host name of the Salesforce.com organization. Usually this is ‘login.salesforce.com’ for production environments and ‘test.salesforce.com’ for sandbox environments. In case where you configure custom domain for your organization specify the hostname you’re using for login;
- object_name is the target Salesforce.com Object API name (e.g. ResultOutput__c). The Object and columns for data integration must be defined beforehand;
For example with:
- username: email@example.com
- password: PASSWORD
- security_token: 7SMvicR9ojdPz0XLtlWi3Rtw
The URL will look like:
|Make sure that you escape the '@' in the username with '%40'.|
Result output to Salesforce.com supports various options that can be specified as optional URL parameters. The options are compatible with each other and can be combined.
Where applicable, the default behavior is indicated.
Update mode option
Controls the various ways of modifying the database data.
The append mode is the default that is used when no mode option is provided in the URL. In this mode, the query results are appended to the object.
Because mode=append is the default behavior, these two URLs are equivalent:
With the truncate mode the system first truncates the existing records in the Salesforce.com Object and moves them into the Trashbin, then inserts the query results.
|You can specify the hard_delete=true option for mode=truncate to delete records instead of moving it to the Trashbin. To use this option, the user must have the 'Bulk API Hard Delete' permission.|
With the update mode, a row is inserted unless it would cause a duplicate value in the external key columns specified in the “unique” parameter. In such a case, an update is performed instead. The “unique” parameter is required with this mode and must be defined as an external key when used with the update mode.
The default behavior for the ‘update’ mode is actually ‘upsert’. If you do not want to “upsert” but only “update”, you can add the upsert=false option. Then it updates existing records based on “unique” parameter match and does not insert new records.
Upload concurrency_mode option
The concurrency_mode option controls how the data is uploaded to the Salesforce.com organization. The default mode is parallel; it is the recommended method for most situations.
With the parallel method, data is uploaded in parallel. This is the most reliable and effective method and it is recommended for most situations.
Because concurrency_mode=parallel is the default behavior, these two URLs are equivalent:
Uploading records in parallel is recommended. However, if you see “UNABLE_TO_LOCK_ROW” in an error message, try concurrency_mode=serial instead.
Updating A Salesforce.com Object acquires a lock for the Object and parent Object referenced by columns. If you upload objects in parallel and multiple objects have reference to the same parent object, Salesforce.com is not able to acquire the lock for insert/update and returns an ‘UNABLE_TO_LOCK_ROW’ error. In such cases, specify the concurrency_mode=serial option.
Authentication session_id option
If you have a Salesforce.com Session ID, you can authenticate with the session_id option instead of username, password, and security token (that is, username, password, and security token can be omitted from the URL).
This options sets the number of attempts the Treasure Data export worker makes to write the result to the configured Salesforce.com destination, if errors occur. If the export fails more than the set number of retries, the query fails.
The default number of retries is retry=2 but one can virtually set it to any number. The number of retries affect the overall duration of a query.
Split Records options
The Treasure Data result export splits the records in the result of a query in chunks of 10000 records by default and bulk upload one chunk at a time. The split_records option configures the size of this chunk, if required.
To output the result of a single query to Salesforce.com organization add the -r / —result option to the td query command. After the job is finished, the results are written into your Salesforce.com organization Object:
$ td query -w -d testdb \ --result 'sfdc://login.salesforce.com/CustomObject__c?session_id=.....' \ "SELECT code as Code__c, COUNT(1) as Count__c FROM www_access GROUP BY code"
To create a scheduled query whose output is systematically written to Salesforce.com organization, add the -r / —result option when creating the schedule through the td sched:create command:
$ td sched:create hourly_count_example "0 * * * *" -d testdb \ --result 'sfdc://user%40treasure-data.com:PASSWORDsecuritytoken@login.salesforce.com/CustomObject__c' \ "SELECT COUNT(*) as Count__c FROM www_access"
Using the TD Console
Queries tab, click New Query.
Select a database and table from the popup menus. Enter your query and select Output results.
|To avoid any issues with result export, define column aliases in your query such that resulting column names from the query match the Salesforce field names for default fields and API names (usually ending with __c) for custom fields.|
To export the result of a query or schedule to Salesforce.com, specify Result Export information for a new query or an existing job/query:
If you get the following type of error due to the Result Output to SFDC job, you can see the error of the job on SFDC. As shown in the following example, check “XXXXXXXXXXX” on SFDC. You can figure out a detail of the error.
17/05/01 03:35:05 INFO sfdc.BulkAPIJob: Job XXXXXXXXXXX finished: Total 1, Completed 0, Failed 1 17/05/01 03:35:05 INFO sfdc.BulkAPIClient: Batch jobs failed (1/1)