The query string
parser plugin for the Arm Treasure Data Data Connector transform query string data as key=value&key2=value2
line to {key: "value", key2: "value2"}
. It takes 6 options as explained below. Detail is here.
Option | Description |
---|---|
strip_quote | If you have quoted lines file such as “foo=FOO&bar=BAR”, should be true for strip their quotes. (bool, default: true) |
strip_whitespace | Strip whitespace before parsing lines for any indented line parse correctly such as ‘ foo=FOO’. (bool, default: true) |
capture | Capture valuable text from each line using Regexp. Matched first pattern (a.k.a $1) will be used. See also partial-config.yml (string, default: nil) |
charset | Character encoding (eg. ISO-8859-1, UTF-8) |
newline | Newline character (CRLF, LF or CR) |
columns | Columns (see below) |
The columns
option declares the list of columns.
Column | Description |
---|---|
name | Name of the column |
type | Type of the column (see below). long: 64-bit signed integers timestamp: Date and time with nano-seconds precision double: 64-bit floating point numbers string: Strings |
Configuration
Set the parser
section like an example:
in: ... parser: strip_quote: true strip_whitespace: true charset: UTF-8 newline: CRLF type: query_string columns: - {name: id, type: long} - {name: name, type: string} - {name: price, type: double} - {name: datetime, type: timestamp} out: ...
Use Case Example: Simple case
Example of Source data
"user_id=42&some_param=ABC&price=100.95&datetime=2014-05-14_10:11:25" "user_id=43&some_param=EFG&price=200.04&datetime=2015-05-15_11:11:25" "user_id=44&some_param=XYZ&price=300.34&datetime=2016-05-16_12:11:25" "user_id=44&some_param=XYZ&price=531.24&datetime=2017-05-17_23:11:25"
Example of load.yml
in: ... parser: strip_quote: true strip_whitespace: true charset: UTF-8 newline: CRLF type: query_string columns: - {name: user_id, type: long} - {name: some_param, type: string} - {name: price, type: double} - {name: datetime, type: timestamp} filters: [] out: {mode: append} exec: {}
You can show parsed query string data by preview
command.
$ td connector:preview load.yml +--------------+-------------------+--------------+---------------------------+ | user_id:long | some_param:string | price:double | datetime:timestamp | +--------------+-------------------+--------------+---------------------------+ | 42 | "ABC" | 100.95 | "2014-05-14 10:11:25 UTC" | | 43 | "EFG" | 200.04 | "2015-05-15 11:11:25 UTC" | | 44 | "XYZ" | 300.34 | "2016-05-16 12:11:25 UTC" | | 44 | "XYZ" | 531.24 | "2017-05-17 23:11:25 UTC" | +--------------+-------------------+--------------+---------------------------+
If your data doesn’t have a time column you may add it using add_time
filter option. More details at add_time filter plugin
$ td connector:issue load.yml --database <database name> --table <table name> --time-column timestamp --auto-create-table
Use Case Example: Using “capture” option
Source data is.
site A: "user_id=42&some_param=ABC&price=100.95&datetime=2014-05-14_10:11:25" site B: "user_id=43&some_param=EFG&price=200.04&datetime=2015-05-15_11:11:25" site C: "user_id=44&some_param=XYZ&price=300.34&datetime=2016-05-16_12:11:25" site A: "customer_id=44&some_param=XYZ&price=531.24&datetime=2017-05-17_23:11:25"
If you use following load.yml
for it.
in: ... parser: strip_quote: true strip_whitespace: true capture: 'site A: ("u.*")' charset: UTF-8 newline: CRLF type: query_string columns: - {name: user_id, type: long} - {name: some_param, type: string} - {name: price, type: double} - {name: datetime, type: timestamp} filters: [] out: {mode: append} exec: {}
The capture
option matches first row only, so result is below.
$ td connector:preview load.yml +--------------+-------------------+--------------+---------------------------+ | user_id:long | some_param:string | price:double | datetime:timestamp | +--------------+-------------------+--------------+---------------------------+ | 42 | "ABC" | 100.95 | "2014-05-14 10:11:25 UTC" | +--------------+-------------------+--------------+---------------------------+
- “site B” and “site C” don’t match “site A: ” part.
- Second “site A” row doesn’t match “("u.*”)“ part because query string started from c (customer_id).
Comments
0 comments
Please sign in to leave a comment.