Cohort analysis

A cohort is a group of people who share a common characteristic over a certain period of time.

For example, let's look at a group of students. All of these students graduated in 2010. This group of students is a cohort. All of the students graduated in the same year, and this is their commonality.

Class of 2010

Cohort analysis is a study that focuses on the activities of a particular cohort. If we were to calculate the average income of these students over the course of a five-year period following their graduation, we would be conducting a cohort analysis.

Class of 2010 average income

Cohort analysis gets more interesting when we compare cohorts over a period of time. Imagine another cohort of students who graduated in 2011.

Two cohorts

Cohort analysis allows us to identify relationships between the characteristics of a population and that population's behavior. Looking at the average income over the five years after graduation in comparison to the income of the 2011 students over the same interval allows for a unique apples-to-apples comparison of these groups. In this case, there appears to be a relationship between a student's year of graduation and their income.

Average income of cohorts compared

Here, we can see that both graduating classes increase in their average income per year. However, by the third year out, the 2011 grads make more on average than their 2010 counterparts (by an increasing margin).

Cohort analysis for business

Imagine that instead of graduating students, we were studying your customers. We could group them by how they were originally referred to your business and track how much money they spent over time.

Customer spending by referral source

Here we see that customers referred by the blog deliver strong, consistent long-term spending. Search engines and other channels, however, refer customers who spend a decreasing amount over time.

Want to learn about setting the data strategy for your organization?

Sign up for a free 30-day course to learn how to succeed with data. We've helped more than 3,000 companies of all sizes build their data infrastructure, run analytics, and make data-driven decisions. Learn how the data landscape has changed and what that means for your company.

Get the Course →

Perhaps the most popular cohort analysis is one that groups customers based on their "join date," or the date when they made their first purchase. Studying the spending trends of cohorts from different periods in time can indicate whether the quality of the average customer being acquired is increasing or decreasing in over time.

Average cumulative spending

In the chart above, the average customer in newer cohorts is spending less as time goes on. This would be a red flag for many investors or acquirers because it implies that the value of recently acquired customers is less than that of those acquired in the past.

Perform your own cohort analysis

Tip: Most professionals use tools like Stitch to consolidate their data for cohort analysis.

Step 1: Pull the raw data

Typically, the data required to conduct cohort analysis lives inside a database of some kind and needs to be exported into spreadsheet software. In this example, we use MySQL and Microsoft Excel.

If you're studying customer purchase behavior, you want to end up with a table of data that includes one record per customer purchase. Each record contains the customer's ID (typically either a unique number or an email address), the date and time of the purchase, the amount of the purchase, and the customer's "cohort date" (typically the date of the customer's first purchase). In a typical "orders" database table, the MySQL query to pull such information might look like this:

Ideally, however, you would want to include additional attributes such as the customer's referral source, the first product they purchased, geographic and demographic information, and more. The more information about the customer you have, the more ways you'll be able to segment your cohorts. Each of these additional attributes may require additional database joins. Tools like Stitch make all attributes accessible in the same database for you automatically.

Step 2: Create cohort identifiers

Open the data you've pulled into Excel. Since we pulled the "cohort date" attribute in the example above, we'll conduct the popular cohort analysis in which we compare groups of customers based on when they made their first purchase. Assuming we want to group our cohorts based on the month in which they made their first purchase, we'll need to translate each "cohort date" value into a "bucket" that represents the year and month of their first purchase. Assuming cohort date is in column D, the following Excel formula does the trick:

=YEAR(D2) & "-" & MONTH(D2)

Step 3: Calculate lifecycle stages

Once we know the cohort that each customer belongs to, we also need to determine the "lifecycle stage" at which each event happened for that cohort member. For example, if a customer made their first purchase on January 10, 2012, and their second purchase on March 15, 2012, they would be in the "January 2012" cohort, their first purchase would be in the "Month 1" lifecycle stage, and their second purchase would be in their "Month 3" lifecycle stage, because it happened in their third month after becoming a customer. To calculate lifecycle stage, we need to determine the amount of time between the customer's first purchase and the purchase in question. Assuming transaction date is in column C and cohort date is in column D, a function like the one below will do the trick:

=ROUND((C2-D2)/30)+1

When you're done, you should have a table in Excel that looks like this.

Excel table

Step 4: Create a pivot table and graph

Pivot tables allow you to calculate an aggregation such as a sum or average across multiple dimensions of your data. The pivot table we'd like to create is one that conducts a sum of transaction amount, and shows one row per cohort and one column per relative time period. Its data can be visualized on a basic Excel line graph.

Excel graph

There you have it: an extremely basic cohort analysis built from the ground up. There are hundreds of variations on cohort analysis that you can run based on your needs.

Bonus step: data perspectives

The chart we created is a cohort analysis, but it isn't easy to interpret in this format. Another way to look at this chart would be to view each cohort's spending as a cumulative value over time. This effectively builds a curve that allows you to watch total customer lifetime spending grow over time per cohort.

Even more helpful is to normalize this data by the size of the cohort. To do this, you must divide each data point for a cohort by the number of members in that cohort. That way, you can view the average value per cohort member side by side without a bias from the size of the cohort. To do this, you'll have to create a second pivot table to calculate cohort size and then divide one by the other.

Want to learn about setting the data strategy for your organization?

Sign up for a free 30-day course to learn how to succeed with data. We've helped more than 3,000 companies of all sizes build their data infrastructure, run analytics, and make data-driven decisions. Learn how the data landscape has changed and what that means for your company.

Get the Course →

Try it out using Stitch

Stitch offers a free 14-day trial, during which you can import your historical data to a data warehouse and build and explore your cohorts in SQL or using a business intelligence tool. Give it a try today!