Churn Prediction (Synthetic Dataset)

Customer churn, also known as attrition, occurs when a customer stops doing business with a company. Understanding and detecting churn is the first step to retaining these customers and improving the company's offerings. IBM created a dataset for a fictional telecommunication company which can be downloaded here. Conveniently, the dataset contains a column churn to determine whether the customer has churned.

For this problem, you should put yourself in the position of a Data Scientist working at Telco. Your goal is to build a model to predict a customer's likelihood of churn and provide recommendations on how Telco can reduce churn. The solution should include a detailed guide (including code) that makes it possible to replicate the model and recommendations.

In addition to the code and replication guidelines, please also detail:

  • performance metrics and data used to obtain these evaluations
  • recommendations for how Telco can use this model to retain customers
  • findings on the drivers behind churn and how Telco can use these findings to improve their business offerings
  • reasons behind the algorithm you chose and why you are confident that this is ideal algorithm for this problem
  • ideas for future improvement


  • Sample Solution

    Note: sample solutions are intended for demonstration purposes (they are not intended to be very detailed). We encourage anyone attempting these problems to dive deeper to answer the questions outlined above.

  • part 1
  • part 2