Wednesday, September 08, 2010
   
Text Size

Logistic regression modeling

The Analysis studio logistic regression modeling methodology handles all aspects of model generation from data extraction to model deployment. As data mining requirements grow organizations aim towards fast model production and verification.
Reduce model generation time: During the procedure of model generation, You may go back and forth and change selected settings from each step

Step 1 Pick variables – Pick the explained variable and select the column(s) that explain it. In this step you can select the variable insertion method (all selected variables, stepwise selection by p-value or stepwise selection).

Logistic regression model step 1 - select variables

Step 2 Preview – Preview the generated model and refine it before you publish it. Analysis studio includes a set of preview screens that allow you to analyze and refine a model before you publish it. For further logistic regression analysis, see the logistic regression analysis article

ROC Curve – Gives a fast first impression of the generated model

Logistic regression model step 2 - preview model ROC

Cases screen – Check how many data rows were included in the generated model

Logistic regression model step 2 - preview selected cases

Variables screen – Preview the variables that comprise the logistic regression model including: Coefficient, Standard Error, Wald, p-value, importance and lower and upper limits.

Logistic regression model step 2 - preview variable parameters

Parameters screen – Preview the logistic regression model parameters: Variable selection method, Null deviance, Model deviance, Improvement, Wald, Pearson X Square, Score Test, p-value, Cox and Snell, R Square, McFadden R Square, Nagelkerke R Square, Area Under ROC Curve, Gini Coefficient, Homsmer-Lemeshow Cp, Homsmer-Lemeshow Probability.

Logistic regression model step 2 - preview logistic regression model parameters

HL Table screen

Logistic regression model step 2 - preview HL Table

Classifications screen – This screen shows both the model performance with a 50% cut value and model performance with an optimized cut value. This feature helps you can decide which cut value to use in the deployment process .

Logistic regression model step 2 - preview model hits and misses on model data

Step 3 Publish – Publish the model as a part of your Analysis studio data mining project. Once a model is published you may scrutinize and verify the model, comprising independent variables and model behavior under different values.

The published model includes different views and different measurements that allow in depth analysis of the generated logistic regression model. In addition to the measurements and parameters described in the previous step.

Logistic model charts include: Model lift, gain and x y diagnostic graphs can be seen in the Chart section of the published model

Logistic regression model gain curveLogistic regression model lift curve

Logistic regression model x diagnosis curveLogistic regression model y diagnosis curve

Additional charts are included to allow viewing the classified data. Charts include: model classifications by deciles, model hits, model misses and the relation between them.

Logistic regression model hits per decilesLogistic regression model hits vs. misses per decile chart

Logistic regression model hits per decile chartLogistic regression model misses per decile chart

What if ? - The "What If?" scenario screen allows you to test the probability assigned by the model with different values of different variables. This screen lets you intuitively understand the impact of variable changes on model outcome.

Logistic regression model what if scenario screen

Sensitivity Table – Compute a range of values for a selected pivot variable and view the model outcome for the entire range. Sensitivity tables give an intuitive model and variable understanding.

Logistic regression model sensitivity table screen

Step 4 Deploy – Deploy the logistic regression model on a future data set or an additional test data set. The deployment of the model results are displayed as calculated variables in the data grid. Newly imported data is displayed as a separate data source in your project.

The calculated model variables are standard transformed variables so they can be analyzed and tested in a crosstab or any other statistical method.

Logistic regression deployment

Deployment variables

  • Probability – The actual model outcome – a number between 0 and 1 or NaN
  • Decile – A number between 1 and 10 that represents the deciles which the model classified the data to (e.g. a variable with a probability of 0.87 will be classified at the 8th decile.
  • Did Hit – A variable indicating whether the model classification is considered as a hit.

The expression that produces each variable can be edited by clicking the expression link

 

Next steps to start using Analysis Studio.