Building your Skills and Portfolio in Data Science

Photo by Danielle MacInnes on Unsplash

Knowing which type of Data Scientist you want to be

There are many different types of Data Science roles. The first step is to determine which role(s) are well suited to your interests and skill set. While all of these roles can be grouped under Data Scientist, some may appear under other job titles.

The Analytics Scientist

For a lot of companies, Data Scientists are synonymous for Data Analysts. They perform ad-hoc analysis through SQL or Excel and use business intelligence tools to produce dashboards and visualizations for company reporting. These reporting duties can range from aggregating metrics on company performance to running A/B tests to determine product direction.

While not the flashiest job, these Data Scientists have a strong footing in the business-product world and the opportunity to inform important decisions for the company. These roles are particularly well suited for candidates transitioning from a less technical background.

Tip: Get comfortable with communicating to non-technical stakeholders and drawing insight from data. A deeper understanding of mathematics/Machine Learning/programming is nice to have, but not imperative to this role.

The Developer Scientist

On the other hand, there are Data Scientists whose duties are more aligned with Data Engineers and Software Engineers. These Data Scientists are responsible for building and maintaining the data infrastructure to support the rest of the company. Their responsibilities include monitoring the data pipelines, improving the data warehouse and maintaining API endpoints for serving model predictions.

Typically, these Data Scientists are the first hires for budding data teams. Strong programming skills are more important for this role than being an expert in mathematics and Machine Learning.

Tip: Get familiar with dev-ops and data-ops practices. The hiring process for these roles are very similar to Software Engineers: practice leetcode and algorithms.

The Machine Learning Scientist

Companies with more established data teams or companies that incorporate Machine Learning as a part of their core product will hire scientists for the sole purpose of maintaining, improving and building new models and AI systems. These positions typically require a graduate degree and or prior research experience in Machine Learning.

For smaller companies whose core product is based in Machine Learning, these scientists are also expected to write code in production.

Tip: It is very hard to get these roles without the required education and or research background. To get your foot through the door, look for Machine Learning roles in smaller companies where they are more lenient about these criteria.

Breaking down Skills by Role

In addition to these three types of Data Scientists, some companies exclusively look for Data Generalists, or someone who is familiar with all of the above. Data generalists are typically hired to provide general support to other company functions by building dashboards, maintaining the data warehouse and occasionally building models to improve operational processes.

The required skills are very different based on which of these three roles you pursue.

Analytics Data Scientist

  • Data Analysis
  • Data Visualization
  • Recommended Project: analyze a dataset by relating it to a tangible business problem, providing visualizations and thorough explanations on casual effects and how these findings can be leveraged.

    Recommended Course(s): Applied Data Science with Python Specialization

    This 5-course specialization covers inferential statistical analysis, practical data visualizations and how to use graphs and networks to visualize and analyze data.

    The Developer Scientist

  • Software Engineering skills
  • Working with data warehouses and building data pipelines
  • Recommended Project: make data more palatable by writing API wrappers or set up your own workflow management platform to orchestrate simple jobs and processes.

    Recommended Course(s): Specialization: Python for Everybody

    This 5-course specialization introduces fundamental dev-ops concepts entirely in Python. The capstone project involves retrieving, processing and visualizing data using Python.

    Machine Learning Scientist

  • Statistics, Calculus and Linear Algebra
  • Deep understanding of state of the art algorithms and Machine Learning methodologies
  • Recommended Project: dissect and reproduce results from research papers in your field of interest.

    Recommended Course(s): Machine Learning Specialization

    Checklist for General Data Science Skills

    In addition to these specialized skills, here's a common check list for all aspiring Data Scientists irregardless of which role they want to pursue.

    Familiarity with at least one programming language

    Python is the de facto language for Data Science at the moment, although R remains a popular choice for more analytics and statistics heavy roles.

    Experience working with Data

    Regardless of the role, expect to get close and personal with data. Data in the real world is messy and prone to errors, which makes data wrangling an important skill to have.

    Business Acumen and Communication Skills

    These skills will come in handy irregardless of how theoretical or research heavy your job is. Understanding the revenue drivers of the company and the Key Performance Indicator of each stakeholder helps Data Scientists stand out in the company and deliver value.

    Mathematics and Statistics

    You don't need to be a linear algebra or multivariate calculus expert for most Data Science roles. You should, however, have at least a basic grasp in these areas to understand commonly used algorithms and how to interpret their outputs.

    A Scientist's Curiosity for Uncovering the Truth

    Never settle for assumptions. Don't let your own biases and the biases of others influence the results of your work. Build systems on ground truths and always set proper expectations on what these systems can accomplish.

    Portfolio Building

    If you never worked in a data related role before, prioritize side projects and contributions to highlight your skills, technical aptitude and passion for this field. Don't forget to emphasize any relevant skills that you demonstrated in previous roles, even if these roles are unrelated.

    For each project description, don't forget to include:

  • a brief, human readable description
  • the tech stack used to complete this project (programming languages, libraries and frameworks) and or the algorithms being applied
  • summarize findings with the most important metrics
  • (if applicable) a link to explore the project in more detail

  • For one of our practice guides, we used text reviews to predict wine sentiment. Here's a sample description for what this project might look like on our resume.

    Predicting Wine Sentiment using Text Reviews [link to Jupyter Notebook]

    Tech stack: Python, Keras, Pandas

  • trained a Recurrent Neural Network with GRU units over GloVe Embeddings to predict wine sentiment using text reviews
  • achieved XYZ% accuracy and XYZ% f1-score over the testing set across 5 classes

  • The easiest way to host projects is on GitHub or GitLab. Make sure that there is a detailed README to inform readers about the contents of the project with instructions on how they can reproduce the results.

    For Python projects, this should include:

  • requirements.txt file with all of the necessary dependencies
  • specify which Python version and OS was used to develop the project (better yet: dockerize the project with build instructions)
  • any necessary setup commands / auxilliary dependencies
  • expected results and high level findings

  • GitHub also offers free web hosting through GitHub Pages.

    CareersNext: Getting Noticed in the Job Market »