Logistic Regression

Photo by Aditya Wahyu R. on Unsplash

Logistic regression is used to model the probability of an event occurring by estimating its log odds. If we assume a linear relationship between the log odds and the \(j\) independent variables, then we can model the probability \(p\) of the event occurring as:

$$\log (\frac{p}{1-p}) = \beta_o + \beta_1 x_1 + ... + \beta_j x_j$$

You might notice that the logarithm base is not specified (in this case, we can assume a base of 10). The base of the logarithm actually doesn't matter -- recall that we can change the base \(b\) to any new base \(k\) if we multiply by the known value \(\log_k b\). This gives us the flexibility to assume the base for the left hand side. Of course, the base will affect the interpretation of the outcomes as well as the coefficient values.

Isolating Probability

If we have estimates for the coefficients, it is easy to isolate for \(p\). Note that \(\frac{p}{1-p}\) represents the odds of the event occurring.

$$\begin{aligned} \ln (\frac{p}{1-p}) &= \beta_o + \beta_1 x_1 + ... + \beta_j x_j \\ \frac{p}{1-p} &= e^{\beta_o + \beta_1 x_1 + ... + \beta_j x_j} \\ p &= (1-p) e^{\beta_o + \beta_1 x_1 + ... + \beta_j x_j} \\ p + p e^{\beta_o + \beta_1 x_1 + ... + \beta_j x_j} &= e^{\beta_o + \beta_1 x_1 + ... + \beta_j x_j} \\ p (1 + e^{\beta_o + \beta_1 x_1 + ... + \beta_j x_j}) &= e^{\beta_o + \beta_1 x_1 + ... + \beta_j x_j} \\ p &= \frac{e^{\beta_o + \beta_1 x_1 + ... + \beta_j x_j}}{1 + e^{\beta_o + \beta_1 x_1 + ... + \beta_j x_j}} \end{aligned}$$


Model Interpretation

We will illustrate interpretation with another example. Jim, the real estate agent, trains a logistic regression model to predict someone's likelihood of making an offer on a house. He keeps his model simple by using two explanatory variables:

  • \(x_1\): the number of times the prospective clients visited the house
  • \(x_2\): the asking price of the house in thousands of dollars
  • After using a program to crunch the numbers, Jim derives these coefficients for his model:

    $$\ln (\frac{p}{1-p}) = -5 + 2 x_1 - 0.002 x_2$$

    Jim's model tells us that:

  • for every additional time prospective buyers visit the house, on average the natural-logarithm of the odds increases by 2
  • for every additional $1000 increase of the home, on average the natural-logarithm of the odds decrease by 0.002
  • That ... sounds like a mouthful and it's incredibly hard to follow. We can improve interpretation with one simple trick.

    $$\frac{p}{1-p} = e^{-5 + 2 x_1 - 0.002 x_2} = e^{-5} e^{2 x_1} e^{-0.002x_2}$$

    \(e^2\) is about 7.39 and \(e^{-0.002}\) is about 0.998.

  • for every additional time prospective buyers visit the house, on average the odds of making an offer is affected by a multiplier of about 7.39
  • for every additional $1000 increase of the home, on average the odds of making an offer is affected by a multiplier of 0.998
  • If Jim's client, Sue, visits a house priced at $1,000,000 one time, then we can estimate the probability of her purchasing the house using the formula derived above.

    $$ p = \frac{e^{-5 + 2 - 0.2}}{1 + e^{-5 + 2 - 0.2}} = 0.039$$

    This tells us that Sue has about a 4% chance of making an offer on the home.


    As a Classifier

    Despite being a regression model, logistic regression is frequently used for classification. The isolated probability is always between 0 and 1. We can set an arbitrary threshold to predict the class the observation falls under.

    If Jim set his threshold at 0.5, then he can use his model to predict whether someone will fall under 1) class A: make an offer or 2) class B: not make an offer. In this case, Sue will be classified as class A.

    We can also extend logistic regression as a classifier for more than 2 classes, making it a multiclass classifier. We can do this by taking a one versus all approach where we train as many logistic regression models as there are classes (each model predicting the log odds of one class occurring) and adopting the class yielding the highest probability for inference.




    « Previous: Linear Regression TutorialsNext: Tree-Based Models »