Productionizing A Marketing Mix Model

4 min readApr 22, 2021

Photo by Stephen Phillips - Hostreviews.co.uk on Unsplash

Introduction

In this post, I discuss marketing mix models, a PyStan implementation of said models, and how I am helping to productionize one by automating the collection of economic data from Statistics Canada. Take a look at what I’ve done here :)

Marketing mix models (MMM) are used by advertisers to understand how their advertising spending affects a certain KPI, for example, sales or revenue. This allows them to optimize their future media budget more effectively. To this end, Return on Ad Spend (ROAS) and marginal Return on Ad Spend (mROAS) are the most important metrics. If a certain media channel, say TV advertising, has a high ROAS, then spending more on this channel results in higher sales. In contrast, mROAS measures incremental sales as a result of an incremental ad spend in this media channel.

The effect of spending on advertising is not immediately apparent - there is a lag between allocating the funds and seeing an increase in sales. Another effect is that spending has diminishing returns - there is a point where increasing spending on advertising results in little to no effect on sales. Linear regression will not capture these effects well. This paper makes use of Bayesian methods with flexible functional forms (the Hill function from pharmacology) to account for the lag and shape effects of ad spend on sales.

An implementation of this model using PyStan can be found here, together with an application to a more complicated dataset incorporating 13 media channels and 46 control variables. The goal of my project is to help productionize this model by automating the collection of additional data this model uses.

Productionizing a Model

Model development is an important part of the data science life cycle. However, models need to be put into production to be useful. To quote this article, “… a machine learning model can only do so when it is in production and actively in use by consumers. As such, model deployment is as important as model building.”

To assist with putting the model into production, a function called make_dataframe was added to the MMMModule (the class that houses the PyStan marketing mix model) to accomplish the following with a single function call:

Creation of a dataframe with user-supplied weekly ad spend and sales data
If desired, appending Canadian economic data, in particular, monthly GDP, consumer price index (CPI), and unemployment rates to the created dataframe

My function is defined here:

def make_dataframe(self, start_date, end_date, user_data_filepath, user_data_date_column, include_econ_indicators=True):
        
        user_df = pd.read_csv(user_data_filepath)
        user_df[user_data_date_column] = pd.to_datetime(user_df[user_data_date_column])
        user_df['ref_yr_mth'] = [item for item in zip(user_df[user_data_date_column].dt.year, user_df[user_data_date_column].dt.month)]start_date_ts = pd.Timestamp(start_date)
        end_date_ts = pd.Timestamp(end_date)df = pd.DataFrame()
        df['date'] = pd.date_range(start_date_ts, end_date_ts, freq = 'W-SUN')
        df['ref_date'] = [item for item in zip(df['date'].dt.year, df['date'].dt.month)]if include_econ_indicators:
            sc = StatsCan()
            unem_df = sc.vectors_to_df_remote('v2062815', periods = 360)
            unem_df.columns = ['unemployment_rate']
            unem_df = unem_df.reset_index()gdp_df = sc.vectors_to_df_remote('v65201210', periods = 360)
            gdp_df.columns = ['monthly_gdp']
            gdp_df = gdp_df.reset_index()cpi_df = sc.vectors_to_df_remote('v41690973', periods = 360)
            cpi_df.columns = ['monthly_cpi']
            cpi_df = cpi_df.reset_index()econ_df = pd.merge(unem_df, gdp_df, on='refPer', how='inner')
            econ_df = econ_df.merge(cpi_df, on='refPer')econ_df['refPer_yr_mth'] = [item for item in zip(econ_df['refPer'].dt.year, econ_df['refPer'].dt.month)]df_merge_orig_gdp = pd.merge(user_df, 
                                    econ_df, 
                                    how = 'left',
                                    left_on='ref_yr_mth', 
                                    right_on='refPer_yr_mth')return df_merge_orig_gdp.drop(['refPer_yr_mth','ref_yr_mth','refPer',], axis = 1)
        else:
            return user_df

Creating the dataframe is pretty self-explanatory — the user simply calls the function with the file path to the csv file containing their weekly sales and ad spend data, and make_dataframe returns a dataframe with the supplied dataset. On the other hand, appending the economic data was a bit more complicated. Incorporating economic data into the Bayesian model is helpful because metrics like GDP, CPI, and unemployment rate could influence sales.

The stat_can Python library was used to read economic data from Statistics Canada (StatsCan) tables. The function in this library that accomplishes this specific task is vectors_to_df_remote which takes a dataset’s vector ID and returns a dataframe with the data having the corresponding vector ID. For the purposes of this project, monthly unemployment, GDP, and CPI have vector ID’s v2062815, v65201210, and v41690973 respectively.

Next Steps

Economic indicators aren’t the only things that could influence sales. Without a doubt, holiday seasons affect sales. Currently, the Marketing Mix Model takes holidays into consideration, but this has to be done manually by the user. Just as there is an API to pull various indicators from StatsCan, there is also a Python library called holidays that determines whether there is a holiday on a specific date. I plan to incorporate this into determining whether there is a holiday in the upcoming week.

General interest in the product, or things relevant to the product, in question could also be a factor driving sales. One way of gauging interest is to look at the number of searches on Google for a certain keyword. The pytrends library can help with that.

Training the MMM automatically returns plots. One additional goal is to make these plots interactive. The plotly library will be used for this purpose.

References

Bayesian Methods for Media Mix Modeling with Carryover and
Shape Effects by Yuxue Jin, Yueqing Wang, Yunting Sun, David Chan, Jim Koehler

Python/STAN Implementation of Multiplicative Marketing Mix Model. https://github.com/sibylhe/mmm_stan

An Introduction to Bayesian Inference in PyStan. https://towardsdatascience.com/an-introduction-to-bayesian-inference-in-pystan-c27078e58d53

Productionizing A Marketing Mix Model

Introduction

Productionizing a Model

Next Steps

References

Written by Gabriel John Dusing