This article explains how to write job results to your existing MongoDB instance.
- Basic knowledge of Arm Treasure Data, including the toolbelt.
- A MongoDB instance.
- Treasure Data must have proper privileges.
A front-end application streamingly collects data to Treasure Data via Treasure Agent. Treasure Data periodically runs jobs on the data, then writes the job results to your MongoDB collections.
Example 1: Ranking: What are the “Top N of X?”
Every social/mobile application calculates the “top N of X” (ex: top 5 movies watched today). Treasure Data already handles the raw data warehousing; the “write-to-mongodb” feature enables Treasure Data to find the “top N” data as well.
Example 2: Dashboard Application
If you’re a data scientist, you need to keep track of a range of metrics every hour/day/month and make them accessible via visualizations. Using this “write-to-mongodb” feature, you can streamline the process and focus on your queries and your visualizations of the query results.
|You can limit the access to your MongoDB instance by using a list of static IPs. Contact firstname.lastname@example.org if you need it.|
Check the 'Output Results' checkbox.
You have 2 options when creating a the result output connector.
- use an existing connection.
1. type the name of the connection in the prompt and select it.
- Create a new connection
Nodes: comma separated list of nodes
Use SSL?whether to use SSL or not
Auth Method: select either
Username: Username for basic authentication
Password: Password for above user
Mode: select either
Index: the name of index
Type: the name of type
ID: (optional) the name of ID column
When you execute your query, Treasure Data query result will be automatically imported into Elastic Cloud. This supports “basic authentication” including “Security”(formally “Shield”) of Elastic Cloud. But the query result doesn’t support LDAP and Active Directory that are provided by “Security”.
For On-demand Jobs
For on-demand jobs, just add the
--result / -r option to the
td query command. After the job is finished, the results are written into your collection.
$ td query --result 'mongodb://user:password@host:1234/database/collection' \ -w -d testdb "SELECT code, COUNT(1) FROM www_access GROUP BY code"
For Scheduled Jobs
For scheduled jobs, add the
--result / -r option when scheduling a job. Every time the job runs, the results are written into
$ td result:create mydb 'mongodb://user:password@host:1234/database' $ td sched:create hourly_count_example "0 * * * *" \ -d testdb "select count(*) from www_access" --result mydb:mycollection
The result output target is represented by URL with the following format:
- mongodb is identified for result output to MongoDB;
- username and password are the credential to the MongoDB instance;
- hostname is the host name of the MongoDB instance;
- port is the port number through which the MongoDB instance is accessible. This is optional;
- database is the name of the destination database;
- collection is the name of the destination collection.
You can add or delete data using the following modes:
mongodb://user:password@host:1234/database/collection # append mongodb://user:password@host:1234/database/collection?mode=append # append mongodb://user:password@host:1234/database/collection?mode=replace # replace mongodb://user:password@host:1234/database/collection?mode=truncate # truncate mongodb://user:password@host:1234/database/collection?mode=update&unique=key1 # update
This is the default mode. The query results are appended to a collection. If the collection does not exist yet, a new collection is created.
This method is atomic.
If the collection already exists, the rows of the existing collection are replaced with the query results. If the collection does not exist yet, a new collection is created.
We achieve atomicity (so that a consumer of the collection always has consistent data) by performing the following three steps in a single transaction.
- Create a temporary collection.
- Write to the temporary collection.
- Replace the existing collection with the temporary collection using RENAME command.
This method is atomic.
The system first truncates the existing collection, then inserts the query results. If the collection does not exist yet, a new collection is created.
|Unlike REPLACE, TRUNCATE retains the indexes of your collection.|
This method is atomic.
This mode uses MongoDB’s find and “upsert” method (see MongoDB’s documentation). In short, a row is inserted unless it would cause a duplicate value in the unique index or primary key, in which case an update is performed. Make sure you’ve already created unique index on the fields you specified at the arguments. When this mode is used, the
unique option is required.
Because MongoDB doesn’t support transactions, this mode cannot guarantee transaction atomicity.
This option is only relevant and required with the
update mode. It takes the name of the unique key or keys (command separated) column name to use for updating the MongoDB collection.