This article explains how to tail Apache logs from td-agent, to continuously import the access logs into the cloud.
‘td-agent’ needs to be installed on your application servers. td-agent is a daemon program dedicated to the streaming upload of any kind of the time-series data. td-agent is developed and maintained by Arm Treasure Data.
To set up td-agent, please refer to the following articles; we provide deb/rpm packages for Linux systems.
|If you have...||Please refer to...|
|MacOS X||Installing td-agent on MacOS X|
|Ubuntu System||Installing td-agent for Debian and Ubuntu|
|RHEL / CentOS System||Installing td-agent for Redhat and CentOS|
|AWS Elastic Beanstalk||Installing td-agent on AWS Elastic Beanstalk|
|td-agent is fully open-sourced under the fluentd project. td-agent extends fluentd with custom plugins for Treasure Data.|
Next, please specify your authentication key by setting the
apikey option. You can view your api key from the console. Next, please set the
apikey option in your td-agent.conf file.
Note: YOUR_API_KEY should be your actual apikey string.
# Tailing the Apache Log <source> type tail path /var/log/httpd-access.log pos_file /var/log/td-agent/httpd-access.pos tag td.production.access format apache2 </source> # Treasure Data Input and Output <match td.*.*> type tdlog endpoint api.treasuredata.com apikey YOUR_API_KEY auto_create_table buffer_type file buffer_path /var/log/td-agent/buffer/td use_ssl true </match>
Please restart your agent once these lines are in place.
$ sudo /etc/init.d/td-agent restart
td-agent will now keep tailing the log, buffer it (var/log/td-agent/buffer/td), and automatically upload it into the cloud.
Confirming Data Import
Sending a SIGUSR1 signal will flush td-agent’s buffer; upload will start immediately.
# generate access logs $ curl http://host:port/ # flush the buffer $ kill -USR1 `cat /var/run/td-agent/td-agent.pid`
To confirm that your data has been uploaded successfully, issue the
td tables command as shown below.
$ td tables +------------+------------+------+-----------+ | Database | Table | Type | Count | +------------+------------+------+-----------+ | production | access | log | 1 | +------------+------------+------+-----------+
/var/log/td-agent.log if it’s not working correctly.
td-agent:td-agent needs to have a permission to read the logs.
|td-agent handles log-rotation. td-agent keeps a record of the last position of the log, ensuring that each line is read exactly once even if the td-agent process goes down. However, since the information is kept in a file, the "exactly once" guarantee breaks down if the file becomes corrupted.|
We offer a schema mechanism that is more flexible than that of traditional RDBMSs. For queries, we leverage the Hive Query Language.