Interpreting Logistic Risk Analysis: A Discrete Choice Optimization Approach

What is Logistic Risk Analysis?

In many applications we want to predict the odds of an outcome, e.g., default, product acceptance, machine failure, fraud, etc. In risk management we can use the prediction of odds to quantify the likelihood of a risk event occurring.

This simulation illustrates the idea:

Create two indicators of the occurrence of a risk event, for example vendor inability to deliver, borrower inability to pay, customer refusal to buy, etc.
Combine these indicators into a score

n <- 1000
set.seed(555)
x1 <-  rnorm(1000)           # some continuous variables we can call risk factors 
x2 <- rnorm(1000)
z <- 1 + 2*x1 + 3*x2        # the "risk score" is a linear combination with a bias
xz_df <- data_frame(x1 = x1, x2 = x2, z = z)
p <- ggplot(xz_df, aes(x = 1:length(z), y = z)) +
  geom_line(color = "blue", size = 1.00) +
  geom_point(color = "red", size = 1.25)
ggplotly(p)

The odds that an event might occur is defined as the ratio of the probability that the event occurs, \(\pi\) to the probability that the event does not occur, \(1 - \pi\), We observe several vendor, credit, or product acceptance accounts. When the account is not accepted as a vendor, or if a customer defaults, or when a product is refused, we assign a 1, otherwise we will assign a zero. The odds ratio ends up being the sum of 1’s divided by the sum of 0’s we observe accross accounts.

We hypothesize that several factors might help systematically explain the choices behind, for example, default / acceptance / refusal. Suppose all other reasons why a default / acceptance / refusal might occur are contained in a factor we will call \(z\). To examine the hypothesis that \(z\) explains behavior we set the behaviorially observed odds ratio equal to a predictor model:

\[ \frac{\pi}{1-\pi} = exp(z), \]

and solve for \(\pi\) to get

\[ \pi = \frac{1}{1+exp(-z)}. \]

This is the traditional statistical derivation behind the use of a logistic representation of a decision maker’s choice. This paper further asks how can a discrete choice derivation of the logistic representation improve our interpretation of the statistics of a decision makers binary choice.

An expository simulation

Let’s translate the mathemetical expression for the logistic probability \(\pi\) directly into R. Then let’s calculate “Bernoulli” coin toss trials. A Bernouli coin flip is an example of a 1 (e.g., “heads = default”) occurring (indicated by a \(1\)) and not occurring (indicated by a \(0\)). We then plot the results.

prob <- 1/(1+exp(-z))         # pass through an inverse-logit function
                           # odds ratio: probability that 1 in 1+exp(-z) event occurs
y <-  rbinom(1000,1,prob)      # bernoulli response variable
y_df <- data_frame(y = y, z = z, prob = prob)
p <- ggplot(y_df, aes(x = z, y = prob )) +
  geom_line(color = "blue", size = 1.00) +
  geom_point(color = "red", size = 1.25)
ggplotly(p)                # plot logistic curve

The scatter shows a gathering of indicators at \(1\) and \(0\) with a cloud of potential probable occurrences in between. The plot of probability versus the \(z\) scores marks out the logistic \(S\) curve.

p <- ggplot(xz_df, aes(x = z, y = y )) +
  geom_point(color = "blue", size = 1.25)
ggplotly(p)

More insight can be gotten by looking at the cross scatter plot and histograms of the event indicator \(y\) and predictor variables \(x_1\) and \(x_2\).

# Load the GGally library to view relationships and histograms
# install.packages("psych") #if necessary
yx1x2_df <- tibble(y = y, x1 = x1, x2 = x2) #make a column-wise matrix
ggpairs(yx1x2_df)

Let’s build the predictor model using ordinary least squares (OLS) and the general linear model (GLM). GLM assumes that the indicator variable is binomially distributed like a (loaded) coin toss. OLS assumes that the outcomes are normally distributed.

fit_logistic <- glm( y~x1+x2,data = yx1x2_df, family = "binomial")
summary(fit_logistic)

## 
## Call:
## glm(formula = y ~ x1 + x2, family = "binomial", data = yx1x2_df)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.81788  -0.31398   0.07909   0.43924   2.25308  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.1599     0.1198   9.684   <2e-16 ***
## x1            2.0469     0.1627  12.583   <2e-16 ***
## x2            3.1007     0.2118  14.638   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1341.0  on 999  degrees of freedom
## Residual deviance:  594.7  on 997  degrees of freedom
## AIC: 600.7
## 
## Number of Fisher Scoring iterations: 6

#fit_ols <- lm( y~x1+x2,data = yx1x2_df)
#summary(fit_ols)

Let’s attempt an interpretation of the coefficients of the logistic regression. For that to happen we must discuss what the odds are.

Enter the discrete decision maker

The generation of logistic-binomial factor models derives from an agent’s discrete (that is, binary) choices under constraint. When modeling risk, we consider unsystematic deviations from systematic expectations. Decision makers have preferences for risk relative to performance. These preferences can be expressed in terms of von Neuman-Morgenstern utility functions.

Let \(i\) index decision makers and \(j\) alternatives so that\(x_{ij}\) are a vector of features of alternative \(j\) for individual decision maker \(i\) and \(s_{i}\) a vector of characteristics of each decision maker \(i\). Together \(z_{ij}=z_{ij}(x_{ij},s_{i})\) describe features and characteristics for each decision maker \(i\).

Each decision maker has an indirect utility \(u(z_{ij})\) based on quantities and qualities of chosen goods and services. In turn each decision maker forms expectations \(V\) about utility. In this way indirect utility decomposes into

\[ u(z_{ij}) = V(z_{ij}) + \varepsilon_{ij} \] where \(\varepsilon_{ij}\) is a mean zero, finite variance random term.

Decision maker \(i\) will choose alternative \(j\) over \(k\), for \(j \neq k\) when \(u(z_{ij}) > u(z_{ik})\) so that

\[ V(z_{ij}) + \varepsilon_{ij} > V(z_{ik}) + \varepsilon_{ik} \] With a collection of like terms there is

\[ \varepsilon_{ik} - \varepsilon_{ij} < V(z_{ij}) - V(z_{ik}) \] and a transposition of terms in \(j\) and \(k\). The probability that a decision maker \(i\) will choose alternative \(j\) over \(k\) is then the cumulative probability \(Q_{ij}\)

\[ Q_{ij} = Pr[\varepsilon_{ik} - \varepsilon_{ij} < V(z_{ij}) - V(z_{ik})] \]

In the binary case with two alternatives \(j=(1,2)\) and assuming one decision maker, there is

\[ Q_{1} = Pr[\varepsilon_{2} - \varepsilon_{1} < V(z_{1}) - V(z_{2})] \]

When indirect utility is linear in \(z_{ij}\) there is

\[ V(z_{ij}) = \beta^Tz_{ij} \]

Further assuming that \(\varepsilon_{ij}=\varepsilon_{i}\) is independent of \(z_{ij}\) and statistically independent, there is the question of what distributions of \(\varepsilon_{i}\) give rise to a logistic representation of discrete choice.

Suppose we assume a simple one-parameter version of the Weibull function. This function in multiple parameters is widely used in reliability analysis owing to the richness of potential shapes of the distribution function. For a very naive one-parameter version of Weibull realizations \(w_j\) of alternative \(j\) we suppose that \(w_j = \alpha + \varepsilon_j\). Then the cumulative probabiity function (CDF) for the Weibull function is

\[ Pr[\varepsilon_j < \varepsilon] = -exp(e^{-(\alpha + \varepsilon)}) \]

Differentiating with respect to \(\varepsilon\) we get the probability density function (PDF)

\[ Pr[\varepsilon] = -e^{-(\alpha + \varepsilon)}exp(e^{-(\alpha + \varepsilon)}) \]

Interpreting deviations \(\varepsilon_j\) from anticipated indirect utility \(V_j\) as Weibull distributed random variables yields …. SHOW

Logistic derivation for unanticipated movements in indirect utility
Sampling properties
Interpretations of odds as marginal utilities of risk due to decision factors

McFadden et al. and Hogg and Craig references.

The odds are …

Everything with odds starts with the concept of probability. Probability in a decision context means preferences for risk and return. At one stage in a decision these preferences are prioritization weights. Such weights can further be modelled in a linear goal gramming model.

Let’s say that the probability of success of some event is .8. Then the probability of failure is 1 – .8 = .2. The odds of success are defined as the ratio of the probability of success over the probability of failure.

In our example, the odds of success are .8/.2 = 4. That is to say that the odds of success are 4 to 1. If the probability of success is .5, i.e., 50-50 percent chance, then the odds of success is 1 to 1.

The transformation from probability to odds is monotonic, that is, the odds increase as the probability increases and vice-versa. Probability ranges from 0 and 1. Odds range from 0 and positive infinity. Below is a table of the transformation from probability to odds, where the odds in favor of an event are read x times favorable to 1 time unfavorable.

probability <- seq(0.01, 0.99, length.out = 10)
odds <- round(probability / (1 - probability), 2)
odds_tbl <- tibble(probability = probability, odds = paste0(odds, " : 1"))
kable(odds_tbl)

probability	odds
0.0100000	0.01 : 1
0.1188889	0.13 : 1
0.2277778	0.29 : 1
0.3366667	0.51 : 1
0.4455556	0.8 : 1
0.5544444	1.24 : 1
0.6633333	1.97 : 1
0.7722222	3.39 : 1
0.8811111	7.41 : 1
0.9900000	99 : 1

A plot shows better the transformaation.

probability <- seq(0.01, 0.99, length.out = 100)
odds <- round(probability / (1 - probability), 2)
odds_2_tbl <- tibble(probability = probability, odds)
p <- ggplot(odds_2_tbl, aes(x = probability, y = odds)) +
  geom_line(color = "blue", size = 1.25)
ggplotly(p)

The odds ratio has a very sharply changing bend with probabilities greater than 0.75.

Now for the coefficients of the logistic regression. We go back to the definition of the logistic curve that transfeorms a binomial sequence of 0’s andd 1’s into an odds ratio. Here again \(\pi\) is the probability of a favorable outcome. The odds ratio relates to the scoring model \(z\) through \(exp(z)\).

\[ \frac{\pi}{1-\pi} = exp(z), \]

If wew take the natural logarithm of both sides we can isolate the coefficients in \(z=\beta_0+\beta_1x_1+\beta_2x_2\).

\[ log\left(\frac{\pi}{1-\pi}\right) = z = \beta_0+\beta_1x_1+\beta_2x_2 , \]

Let’s hold constant \(x_2\) while \(\beta_0\) certainly doesn’t change and so that \(\beta_2x_2\) is also a constant. We are left with the term in \(x_1\) by itself, ceteris paribus, all else held equal. Then the change in the log odds for a unit change in \(x_1\) is thus just \(\beta_1\).

This means that the coefficients of the logistic regression are really expressed in terms of logarithms. To recover the odds inside the logarithms all we have to calculate the inverse of the log so that \(exp(\beta_1)=exp(2.0469)=7.6959\). We are not yet done: we must subtract a 1 from this expression since we were calculating the impact of a one unit change in the value of the \(x_1\) variate. Thus a one unit change in \(x_1\) changes the odds by 669.59%. The \(x_1\) variate is quite an influential factor!

Predict now …

We begin a so-called prediction with some new data we think might occur for the values of \(x_1\) and \(x_2\). suppose we think a reasonable scenario is the 75th quantile of \(x_1\) and the 25th quantile of \(x_2\).

(data_scenario <- yx1x2_df %>% summarize(x1 = quantile(x1, 0.75), x2 = quantile(x2, 0.25)))

## # A tibble: 1 x 2
##      x1     x2
##   <dbl>  <dbl>
## 1 0.639 -0.709

In specifying the data for the scenario we must use the same variable names that we used to estimate the logistic regression.

(data_scenario$x1_prob <- predict(fit_logistic, newdata = data_scenario, type = "response"))

##         1 
## 0.5665145

This gives us a single forecast of 57% probability that favorable event would occur.

Let’s visualize this by generating a grid of values for \(x_1\) and plotting probabilities versus those values

data_scenario_plot <- yx1x2_df %>% mutate(x1 = seq(from = quantile(x1, 0.50), to = max(x1), length.out = 1000), x2 = mean(x2))
data_scenario_pred <- predict(fit_logistic, newdata = data_scenario_plot , type = "link", se = TRUE)
data_scenario_pred <- cbind(data_scenario_plot, data_scenario_pred)
data_scenario_CI <- within(data_scenario_pred, {
    prob_pred <- plogis(fit)
    LL <- plogis(fit - (1.96 * se.fit))
    UL <- plogis(fit + (1.96 * se.fit))
})
p <- ggplot(data_scenario_CI, aes(x = x1, y = prob_pred)) +
  geom_ribbon(aes(ymin = LL, ymax = UL), alpha = 0.6) +
  geom_line(color = "blue", size = 1.25)
ggplotly(p)

This procedurer produces our forecast: a confidence interval of forecasted probabilities of success conditional on levels of \(x_1\).

A deposit

The coup de grace: We next deposit the results of this analysis into a goal programming model.

References

McFadden

Hogg and Craig

Current goal programming review