You can write job results to your existing MongoDB instance.
- Basic knowledge of Arm Treasure Data, including the toolbelt.
- A MongoDB instance.
- Treasure Data must have proper privileges.
A front-end application collects data to Treasure Data via Treasure Agent. Treasure Data periodically runs jobs on the data, then writes the job results to your MongoDB collections.
Example 1: Ranking: What are the “Top N of X?”
Every social/mobile application calculates the “top N of X” (ex: top 5 movies watched today). Treasure Data already handles the raw data warehousing; the “write-to-mongodb” feature enables Treasure Data to find the “top N” data as well.
Example 2: Dashboard Application
If you’re a data scientist, you need to keep track of a range of metrics every hour/day/month and make them accessible via visualizations. Using this “write-to-mongodb” feature, you can streamline the process and focus on your queries and your visualizations of the query results.
|You can limit the access to your MongoDB instance by using a list of static IPs. Contact firstname.lastname@example.org if you need it.|
Check the 'Output Results' checkbox.
You have 2 options when creating a the result output connector.
- Use an existing connection.
1. type the name of the connection in the prompt and select it.
- Create a new Authentication
- Host: The hostname or IP address of the remote Server. (You can add more than one IP address, depending your MongoDB setup.)
- Username: Username to connect to the remote database.
- Password: Password to connect to the remote database.
- Database name: The name of the database to which you are transferring data. (Ex.
- Table Name: The name of the collection to which you are transferring data.
- Append - Add the to the existing records in the database.
- Replace - Replace the existing records with the query results.
When you execute your query, Treasure Data query's results will be automatically imported into your Mongodb instance.
Using the CLI to output query results.
For On-demand Jobs
For on-demand jobs, just add the
--result / -r option to the
td query command. After the job is finished, the results are written into your collection.
$ td query --result 'mongodb://user:password@host:1234/database/collection' \ -w -d testdb "SELECT code, COUNT(1) FROM www_access GROUP BY code"
For Scheduled Jobs
For scheduled jobs, add the
--result / -r option when scheduling a job. Every time the job runs, the results are written into
$ td result:create mydb 'mongodb://user:password@host:1234/database' $ td sched:create hourly_count_example "0 * * * *" \ -d testdb "select count(*) from www_access" --result mydb:mycollection
The result output target is represented by URL with the following format:
- mongodb is identified for result output to MongoDB;
- username and password are the credential to the MongoDB instance;
- hostname is the host name of the MongoDB instance;
- port is the port number through which the MongoDB instance is accessible. This is optional;
- database is the name of the destination database;
- collection is the name of the destination collection.
You can add or delete data using the following modes:
mongodb://user:password@host:1234/database/collection # append mongodb://user:password@host:1234/database/collection?mode=append # append mongodb://user:password@host:1234/database/collection?mode=replace # replace mongodb://user:password@host:1234/database/collection?mode=truncate # truncate mongodb://user:password@host:1234/database/collection?mode=update&unique=key1 # update
This is the default mode. The query results are appended to a collection. If the collection does not exist yet, a new collection is created.
This method is atomic.
If the collection already exists, the rows of the existing collection are replaced with the query results. If the collection does not exist yet, a new collection is created.
We achieve atomicity (so that a consumer of the collection always has consistent data) by performing the following three steps in a single transaction.
- Create a temporary collection.
- Write to the temporary collection.
- Replace the existing collection with the temporary collection using RENAME command.
This method is atomic.
The system first truncates the existing collection, then inserts the query results. If the collection does not exist yet, a new collection is created.
|Unlike REPLACE, TRUNCATE retains the indexes of your collection.|
This method is atomic.
This mode uses MongoDB’s find and “upsert” method (see MongoDB’s documentation). In short, a row is inserted unless it would cause a duplicate value in the unique index or primary key, in which case an update is performed. Make sure you’ve already created unique index on the fields you specified at the arguments. When this mode is used, the
unique option is required.
Because MongoDB doesn’t support transactions, this mode cannot guarantee transaction atomicity.
This option is only relevant and required with the
update mode. It takes the name of the unique key or keys (command separated) column name to use for updating the MongoDB collection.