Treasure Data features include:
Built-in Data Collector
Treasure Data provides a data collection daemon, Treasure Agent, which is installed in your existing infrastructure. Treasure Agent collects records from various data sources and continuously uploads the data to Treasure Data’s cloud storage.
Ready-to-Use Data Collection Capabilities
Treasure Data provides flexible options to simplify collecting data in a wide variety of scenarios:
For many common use cases, you can install Treasure Agent, our data collection daemon, in your existing infrastructure. It can collect records from a wide range of data sources and continuously upload them to TD’s cloud storage.
For collecting data from mobile devices, we provide easy-to-use API libraries that you can embed in iOS or Android apps.
Data Deduplication for Integrations
Treasure Data supports data deduplication using TD Agent, which is our collection mechanism for streaming data:
Treasure Agent assigns a universally unique identifier (UUID) to each chunk of data.
Treasure Agent retries whenever it detects network failure. However, the retry can sometimes result in the same chunk of data being sent more that once.
When a chunk arrives, to avoid duplication, Treasure Data’s API endpoint inspects the chunk’s ID and discards data chunk if it has been processed in the last 10 minutes.
Storage for High Performance
All your data is stored in the cloud as columnar data. This format achieves far better performance and compression compared to existing RDBMSs.
The data is stored in Amazon Web Services S3 storage, for maximum scalability, availability and reading and writing performance.
We also believe that your data is yours: you can bulk-export your data at any time.
Schema-Free Data Representation
Our internal data formats are schema-free, to keep up with the variety of data types and fast-evolving schemas that are increasingly common in big, heterogeneous data use cases.
SQL-Style Query Languages: Presto and Hive
Treasure Data lets you analyze your data using industry-standard Big Data query engines: Presto and Hive. Queries are executed in parallel in our elastic clusters, that can scale to keep up with your demanding requirements for interactive query performance and high-volume batch processing.
Export to a RDBMS or a Traditional Data Warehouse
A built-in export capability is provided for writing summarized data from Treasure Data to a traditional RDBMs or Data Warehouse. Enabling efficient processing of large data volumes with Treasure Data both as a primary analytics engine as well as a preprocessing platform.
Hadoop clusters across multiple data centers
Our computing resources are always shared across our users. If a Hadoop cluster dies, jobs are automatically reassigned to another live cluster.
3rd Party Tools Connectivity
For data transfer to BI tools, Treasure Data provides:
- JDBC interface for batch
Eliminating the need for custom coding and maintenance to link these environments with the primary data store.