Congratulations! You've enrolled in our free data strategy course. Check your email for more info
A cohort is a group of people who share a common characteristic over a certain period of time.
For example, let's look at a group of students. All of these students graduated in 2010. This group of students is a cohort. All of the students graduated in the same year, and this is their commonality.
Cohort analysis is a study that focuses on the activities of a particular cohort. If we were to calculate the average income of these students over the course of a five year period following their graduation, we would be conducting a cohort analysis.
Cohort analysis gets more interesting when we compare cohorts over a period of time. Imagine that there is another cohort of students that graduated in 2011.
Cohort analysis allows us to identify relationships between the characteristics of a population and that population's behavior. Looking at the average income over the five years after graduation in comparison to the income of the 2011 students over the same relative interval allows for a unique apples-to-apples comparison of these groups. In this case, there appears to be a relationship between a student's year of graduation and their income.
Here, we can see that both graduating classes increase in their average income per year. However, by the third year out, the 2011 grads make more on average than their 2010 counterparts (by an increasing margin).
Cohort Analysis for Business
Imagine that instead of graduating students, we were studying your customers. We could group them by how they were originally referred to your business and track how much money they spent over time.
Above, we see that customers referred by the blog deliver strong, consistent long-term spending. Search engines and other channels, however, refer customers who spend a decreasing amount over time.
Perhaps the most popular cohort analysis is one that groups customers based on their "join date," or the date when they made their first purchase. Studying the spending trends of cohorts from different periods in time can indicate if the quality of the average customer being acquired is increasing or decreasing in over time.
In the chart above, the average customer in newer cohorts is spending less as time goes on. This would be a red flag for many investors or acquirers because it implies that the value of recently-acquired customers is less than those acquired in the past.
Perform Your Own Cohort Analysis
Tip: Most professionals use tools like Stitch to consolidate their data for cohort analysis.
Step 1: Pull the Raw Data
Typically, the data required to conduct cohort analysis lives inside of a database of some kind and needs to be exported into spreadsheet software. In this example, we use MySQL and Microsoft Excel.
If you're studying customer purchase behavior, you want to end up with a table of data that includes one record per customer purchase. Each record contains the customer's ID (typically either a unique number or an e-mail address), the date and time of the purchase, the amount of the purchase, and the customer's "cohort date" (this is typically the date of the customer's first purchase). In a typical "orders" database table, the MySQL query to pull such information might look something like this:
Ideally, however, you would want to include additional attributes such as the customer's referral source, the first product they purchased, geographic and demographic information, and more. The more information about the customer you have, the more ways you'll be able to segment your cohorts. However, each of these additional attributes may require additional database joins. Tools like Stitch make these attributes all accessible in the same database for you automatically.
Step 2: Create Cohort Identifiers
Open the data you've pulled into Excel. Since we pulled the "cohort date" attribute in the example above, we'll conduct the popular cohort analysis in which we compare groups of customers based on when they made their first purchase. Assuming we want to group our cohorts based on the month in which they made their first purchase, we'll need to translate each "cohort date" value into a "bucket" that represents the year and month of their first purchase. Assuming cohort date is in "Column D," the following Excel formula does the trick:
=YEAR(D2) & "-" & MONTH(D2)
Step 3: Calculate Lifecycle Stages
Once we know the cohort that each customer belongs to, we also need to determine the "lifecycle stage" at which each event happened for that cohort member. For example, if a customer made their first purchase on January 10th, 2012, and their second purchase on March 15th, 2012, they would be in the "January 2012" cohort, their first purchase would be in the "Month 1" lifecycle stage, and their second purchase would be in their "Month 3" lifecycle stage, because it happened in their third month after becoming a customer. To calculate lifecycle stage, we'll need to determine the amount of time between the customer's first purchase and the purchase in question. Assuming transaction date is in "Column C" and cohort date is in "Column D," a function like the one below will do the trick:
When you're done, you should have a table in Excel that looks like the one below.
Step 4: Create a Pivot Table and Graph
Pivot tables allow you to calculate an aggregation such as a sum or average across multiple dimensions of your data. The pivot table we'd like to create here is one that conducts a SUM of transaction amount, shows one row per cohort and one column per relative time period. Its data can be visualized on a basic Excel line graph.
There you have it: an extremely basic cohort analysis built from the ground up. There are hundreds of variations on cohort analysis that you can run based on your needs, a few of which are described below.
Bonus Step: Data Perspectives
The chart we've created is a cohort analysis, but it isn't very easy to interpret in this format. Another way to look at this chart would be to view each cohort's spending as a cumulative value over time. This will effectively build a curve that allows you to watch total customer lifetime spending grow over time per cohort.
Even more helpful is to normalize this data by the size of the cohort. In order to do this, each data point for a cohort must be divided by the number of members in that cohort. That way, you can view the average value per cohort member side-by-side without a bias from the size of the cohort. To do this, you'll have to create a second pivot table to calculate cohort size and then divide one by the other.
These data perspectives are another area where an automated tool like RJMetrics can be extremely valuable. Data perspectives such as "cumulative average per cohort member" can be applied with just a few clicks.
Try it out using Stitch
Stitch is currently offering a free 14 day trial for visitors of CohortAnalysis.com. In this 14 day period, your historical data will be imported, and you'll be able to build and explore your cohorts in SQL or using the tools of one of our business intelligence partners. Give it a try today!
Run cohort analysis across all your data sources.
14-day free trial | setup in minutes | no ETL scripts necessary