Processing math: 100%
  • What is Logistic Risk Analysis?
  • An expository simulation
  • Enter the discrete decision maker
  • The odds are …
  • Predict now …
  • A deposit
  • References

What is Logistic Risk Analysis?

In many applications we want to predict the odds of an outcome, e.g., default, product acceptance, machine failure, fraud, etc. In risk management we can use the prediction of odds to quantify the likelihood of a risk event occurring.

This simulation illustrates the idea:

  • Create two indicators of the occurrence of a risk event, for example vendor inability to deliver, borrower inability to pay, customer refusal to buy, etc.

  • Combine these indicators into a score

n <- 1000
set.seed(555)
x1 <-  rnorm(1000)           # some continuous variables we can call risk factors 
x2 <- rnorm(1000)
z <- 1 + 2*x1 + 3*x2        # the "risk score" is a linear combination with a bias
xz_df <- data_frame(x1 = x1, x2 = x2, z = z)
p <- ggplot(xz_df, aes(x = 1:length(z), y = z)) +
  geom_line(color = "blue", size = 1.00) +
  geom_point(color = "red", size = 1.25)
ggplotly(p)

The odds that an event might occur is defined as the ratio of the probability that the event occurs, π to the probability that the event does not occur, 1π, We observe several vendor, credit, or product acceptance accounts. When the account is not accepted as a vendor, or if a customer defaults, or when a product is refused, we assign a 1, otherwise we will assign a zero. The odds ratio ends up being the sum of 1’s divided by the sum of 0’s we observe accross accounts.

We hypothesize that several factors might help systematically explain the choices behind, for example, default / acceptance / refusal. Suppose all other reasons why a default / acceptance / refusal might occur are contained in a factor we will call z. To examine the hypothesis that z explains behavior we set the behaviorially observed odds ratio equal to a predictor model:

π1π=exp(z),

and solve for π to get

π=11+exp(z).

This is the traditional statistical derivation behind the use of a logistic representation of a decision maker’s choice. This paper further asks how can a discrete choice derivation of the logistic representation improve our interpretation of the statistics of a decision makers binary choice.

An expository simulation

Let’s translate the mathemetical expression for the logistic probability π directly into R. Then let’s calculate “Bernoulli” coin toss trials. A Bernouli coin flip is an example of a 1 (e.g., “heads = default”) occurring (indicated by a 1) and not occurring (indicated by a 0). We then plot the results.

prob <- 1/(1+exp(-z))         # pass through an inverse-logit function
                           # odds ratio: probability that 1 in 1+exp(-z) event occurs
y <-  rbinom(1000,1,prob)      # bernoulli response variable
y_df <- data_frame(y = y, z = z, prob = prob)
p <- ggplot(y_df, aes(x = z, y = prob )) +
  geom_line(color = "blue", size = 1.00) +
  geom_point(color = "red", size = 1.25)
ggplotly(p)                # plot logistic curve

The scatter shows a gathering of indicators at 1 and 0 with a cloud of potential probable occurrences in between. The plot of probability versus the z scores marks out the logistic S curve.

p <- ggplot(xz_df, aes(x = z, y = y )) +
  geom_point(color = "blue", size = 1.25)
ggplotly(p)

More insight can be gotten by looking at the cross scatter plot and histograms of the event indicator y and predictor variables x1 and x2.

# Load the GGally library to view relationships and histograms
# install.packages("psych") #if necessary
yx1x2_df <- tibble(y = y, x1 = x1, x2 = x2) #make a column-wise matrix
ggpairs(yx1x2_df)

Let’s build the predictor model using ordinary least squares (OLS) and the general linear model (GLM). GLM assumes that the indicator variable is binomially distributed like a (loaded) coin toss. OLS assumes that the outcomes are normally distributed.

fit_logistic <- glm( y~x1+x2,data = yx1x2_df, family = "binomial")
summary(fit_logistic)
## 
## Call:
## glm(formula = y ~ x1 + x2, family = "binomial", data = yx1x2_df)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.81788  -0.31398   0.07909   0.43924   2.25308  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.1599     0.1198   9.684   <2e-16 ***
## x1            2.0469     0.1627  12.583   <2e-16 ***
## x2            3.1007     0.2118  14.638   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1341.0  on 999  degrees of freedom
## Residual deviance:  594.7  on 997  degrees of freedom
## AIC: 600.7
## 
## Number of Fisher Scoring iterations: 6
#fit_ols <- lm( y~x1+x2,data = yx1x2_df)
#summary(fit_ols)

Let’s attempt an interpretation of the coefficients of the logistic regression. For that to happen we must discuss what the odds are.

Enter the discrete decision maker

The generation of logistic-binomial factor models derives from an agent’s discrete (that is, binary) choices under constraint. When modeling risk, we consider unsystematic deviations from systematic expectations. Decision makers have preferences for risk relative to performance. These preferences can be expressed in terms of von Neuman-Morgenstern utility functions.

Let i index decision makers and j alternatives so thatxij are a vector of features of alternative j for individual decision maker i and si a vector of characteristics of each decision maker i. Together zij=zij(xij,si) describe features and characteristics for each decision maker i.

Each decision maker has an indirect utility u(zij) based on quantities and qualities of chosen goods and services. In turn each decision maker forms expectations V about utility. In this way indirect utility decomposes into

u(zij)=V(zij)+εij where εij is a mean zero, finite variance random term.

Decision maker i will choose alternative j over k, for jk when u(zij)>u(zik) so that

V(zij)+εij>V(zik)+εik With a collection of like terms there is

εikεij<V(zij)V(zik) and a transposition of terms in j and k. The probability that a decision maker i will choose alternative j over k is then the cumulative probability Qij

Qij=Pr[εikεij<V(zij)V(zik)]

In the binary case with two alternatives j=(1,2) and assuming one decision maker, there is

Q1=Pr[ε2ε1<V(z1)V(z2)]

When indirect utility is linear in zij there is

V(zij)=βTzij

Further assuming that εij=εi is independent of zij and statistically independent, there is the question of what distributions of εi give rise to a logistic representation of discrete choice.

Suppose we assume a simple one-parameter version of the Weibull function. This function in multiple parameters is widely used in reliability analysis owing to the richness of potential shapes of the distribution function. For a very naive one-parameter version of Weibull realizations wj of alternative j we suppose that wj=α+εj. Then the cumulative probabiity function (CDF) for the Weibull function is

Pr[εj<ε]=exp(e(α+ε))

Differentiating with respect to ε we get the probability density function (PDF)

Pr[ε]=e(α+ε)exp(e(α+ε))

Interpreting deviations εj from anticipated indirect utility Vj as Weibull distributed random variables yields …. SHOW

  1. Logistic derivation for unanticipated movements in indirect utility

  2. Sampling properties

  3. Interpretations of odds as marginal utilities of risk due to decision factors

McFadden et al. and Hogg and Craig references.

The odds are …

Everything with odds starts with the concept of probability. Probability in a decision context means preferences for risk and return. At one stage in a decision these preferences are prioritization weights. Such weights can further be modelled in a linear goal gramming model.

Let’s say that the probability of success of some event is .8. Then the probability of failure is 1 – .8 = .2. The odds of success are defined as the ratio of the probability of success over the probability of failure.

In our example, the odds of success are .8/.2 = 4. That is to say that the odds of success are 4 to 1. If the probability of success is .5, i.e., 50-50 percent chance, then the odds of success is 1 to 1.

The transformation from probability to odds is monotonic, that is, the odds increase as the probability increases and vice-versa. Probability ranges from 0 and 1. Odds range from 0 and positive infinity. Below is a table of the transformation from probability to odds, where the odds in favor of an event are read x times favorable to 1 time unfavorable.

probability <- seq(0.01, 0.99, length.out = 10)
odds <- round(probability / (1 - probability), 2)
odds_tbl <- tibble(probability = probability, odds = paste0(odds, " : 1"))
kable(odds_tbl)
probability odds
0.0100000 0.01 : 1
0.1188889 0.13 : 1
0.2277778 0.29 : 1
0.3366667 0.51 : 1
0.4455556 0.8 : 1
0.5544444 1.24 : 1
0.6633333 1.97 : 1
0.7722222 3.39 : 1
0.8811111 7.41 : 1
0.9900000 99 : 1

A plot shows better the transformaation.

probability <- seq(0.01, 0.99, length.out = 100)
odds <- round(probability / (1 - probability), 2)
odds_2_tbl <- tibble(probability = probability, odds)
p <- ggplot(odds_2_tbl, aes(x = probability, y = odds)) +
  geom_line(color = "blue", size = 1.25)
ggplotly(p)

The odds ratio has a very sharply changing bend with probabilities greater than 0.75.

Now for the coefficients of the logistic regression. We go back to the definition of the logistic curve that transfeorms a binomial sequence of 0’s andd 1’s into an odds ratio. Here again π is the probability of a favorable outcome. The odds ratio relates to the scoring model z through exp(z).

π1π=exp(z),

If wew take the natural logarithm of both sides we can isolate the coefficients in z=β0+β1x1+β2x2.

log(π1π)=z=β0+β1x1+β2x2,

Let’s hold constant x2 while β0 certainly doesn’t change and so that β2x2 is also a constant. We are left with the term in x1 by itself, ceteris paribus, all else held equal. Then the change in the log odds for a unit change in x1 is thus just β1.

This means that the coefficients of the logistic regression are really expressed in terms of logarithms. To recover the odds inside the logarithms all we have to calculate the inverse of the log so that exp(β1)=exp(2.0469)=7.6959. We are not yet done: we must subtract a 1 from this expression since we were calculating the impact of a one unit change in the value of the x1 variate. Thus a one unit change in x1 changes the odds by 669.59%. The x1 variate is quite an influential factor!

Predict now …

We begin a so-called prediction with some new data we think might occur for the values of x1 and x2. suppose we think a reasonable scenario is the 75th quantile of x1 and the 25th quantile of x2.

(data_scenario <- yx1x2_df %>% summarize(x1 = quantile(x1, 0.75), x2 = quantile(x2, 0.25)))
## # A tibble: 1 x 2
##      x1     x2
##   <dbl>  <dbl>
## 1 0.639 -0.709

In specifying the data for the scenario we must use the same variable names that we used to estimate the logistic regression.

(data_scenario$x1_prob <- predict(fit_logistic, newdata = data_scenario, type = "response"))
##         1 
## 0.5665145

This gives us a single forecast of 57% probability that favorable event would occur.

Let’s visualize this by generating a grid of values for x1 and plotting probabilities versus those values

data_scenario_plot <- yx1x2_df %>% mutate(x1 = seq(from = quantile(x1, 0.50), to = max(x1), length.out = 1000), x2 = mean(x2))
data_scenario_pred <- predict(fit_logistic, newdata = data_scenario_plot , type = "link", se = TRUE)
data_scenario_pred <- cbind(data_scenario_plot, data_scenario_pred)
data_scenario_CI <- within(data_scenario_pred, {
    prob_pred <- plogis(fit)
    LL <- plogis(fit - (1.96 * se.fit))
    UL <- plogis(fit + (1.96 * se.fit))
})
p <- ggplot(data_scenario_CI, aes(x = x1, y = prob_pred)) +
  geom_ribbon(aes(ymin = LL, ymax = UL), alpha = 0.6) +
  geom_line(color = "blue", size = 1.25)
ggplotly(p)

This procedurer produces our forecast: a confidence interval of forecasted probabilities of success conditional on levels of x1.

A deposit

The coup de grace: We next deposit the results of this analysis into a goal programming model.

References

McFadden

Hogg and Craig

Current goal programming review