Data sometimes generates poor or undesired prediction results, and syndication based on such predictive score does not contribute to your business success. Treasure Data provides auxiliary information about the latest result of predictive scoring to help you to refine your predictive scoring:
How to interpret the accuracy of prediction
When you click Run on the Predictive Scoring view, an evaluation runs in parallel to estimate the accuracy of prediction. The evaluation completes the following steps:
- Splits customers in population into 80% training and 20% testing samples.
- Builds a predictive model by using only the 80% train set.
- Computes a predictive score for customers in the 20% test set.
- Puts an estimated “converted or not” label to each of the 20% customers based on the predictive score; that is, if predictive score is greater than 50, a customer is identified as “convert in near future” (becomes a positive sample).
As a consequence, each test customer has a both predictive score and estimated label. And, because positive samples is known, the truth label for the test customer is obvious. We can compute the accuracy of prediction by comparing the truth label to the predictive score/estimated label:
In the following image, the metrics Accuracy and AUC, as shown in the dashboard view, are respectively derived from estimated label and predictive score for the 20% test customers:
Accuracy is the percentage of the “correct” estimated label computed over pairs of truth label and estimated label:
(in positive samples)
(predictive score > 50)
Because a rounded or truncated estimated label is less informative than a raw predictive score, Accuracy is not so reliable as a metric. Therefore, we recommend that you also consider the AUC (shorthand for Area Under the ROC Curve) which is a metric computed from raw predictive score.
The metric returns a float value (larger is better) in [0.0, 1.0] range, and higher (lower) predictive score for truth label = 1 (0) customer increases AUC value, and vice versa. Basically, you can interpret the value of AUC as follows:
- If AUC is less than 0.7 (in other words, poor), you should refine predictive model to improve the accuracy of prediction.
- If AUC is greater than 0.9 (in other words, exceptionally good), consider the possibility of data leakage.
- Otherwise, your predictive scoring reasonably works.
The possibility of leakage occurs when you choose attributes that are directly related to the solution to your problem. When you use a solution for prediction, it is kind of like cheating. For example, in the churn prediction tutorial, the attribute
churn must not be incorporated into columns configured in the Predictive Scoring view.
Tune predictive scoring and make AUC more reasonable
Case 1: AUC is poor (< 0.7)
When AUC is low, it is usually because the quality of input to the ML algorithm is poor. You can exercise one of the following options to improve AUC:
- Manually add more features (attributes)
- Rethink your problem (such as, definition of population, positive samples and scoring target segment)
- Revisit your audience definition (such as, customer data) and utilize more data and attributes
The first approach, Manually add more features, requires that you edit the configuration in the Predictive Scoring setting view:
As described in How Feature Guess Works, the Guess Columns automatically drops some “likely to be meaningless” attributes, but misclassification can still occur. Therefore, try to add additional columns, which are likely to be informative, to the input boxes based on your understanding of data.
If the AUC is still poor, regardless of a choice of columns, your audience might not have reasonably informative attributes, or the number of customers might be too small. Thus, think about importing more data to TD and integrating it with the audience as additional behaviors or attributes.
Meanwhile, you might have to reconsider the definition of population, positive samples and scoring target segments. For instance, if all customers are used for building predictive model, consider narrowing down your focus to a specific population segment.
Case 2: AUC is exceptionally good (> 0.9)
Exceptionally good accuracy might occur as a result of inappropriate choice of features. Check the feature importance on the dashboard, and see if the absolute values of importance for top features are unusually large:
What you must do is for exceptionally good accuracy:
- If AUC is greater than 0.9, drop an attribute corresponding to the most important feature on the Predictive Scoring setting view.
- Click Save and Run on Predictive Scoring to rebuild the predictive model, and then click Run Audience to refresh the dashboard.
- Repeat these steps until the AUC is decreased to a reasonable value.
Possibility of over-fitting
The reason for very low or exceptionally high accuracy is sometimes over-fitting, sometimes referred to as overtraining. Over-fitting occurs when the prediction model memorizes the data points in its entirety rather by learning patterns. As an example, note the many store#xxx in the list of following Top 20 Features:
The preceding image interpreted means that the predictive score for a customer who has specific value in an attribute
store is likely to become large, regardless of values in the other attributes. Consequently, predictive scores become undesired values across an audience, and accuracy might be strangely biased.
If you observe over-fitting when you view the dashboard, we recommend that you drop the over-fitted column on the Predictive Scoring setting view and rebuild a predictive model.