Look for this trio of email related organizations MessageGears, Validity, Vidi-Emi, to become trailblazers in the RealTimeML Email landscape.
This article will examine the nascent concept of utilizing RealTimeML to optimize for open rates and ultimately engagement rates by using a SendTime Email Optimization model. The model will infer RealTimeML predictions for optimal engagement rates, which provides enormous value throughout your current supply chain. Additionally, we will share ongoing time trials to serve the model efficiently.
Feedback from the email industry allowed us to build, develop, test and validate our eighth email recommendation model in our portfolio. Serving these models and predictions inside the workflow of an email campaign confirms that RealTimeML produces higher engagement rates while uplifting the revenue per email and revenue per campaign metrics.
What’s tantamount in this particular model request by Email Service Provider MessageGears, and their long time CTO, Mr. Craig Pohan, is that they asked us to logistically build an efficient model that would examine the sending pattern and open rate of an email campaign for optimal engagement rate within a 15-minute window. Essentially, Mr. Pohan wanted predictions within a 15 minutes window of when the end-user or subscriber would open the email.
This 15-minute window, per his request, informs marketers of the exact window of time that a subscriber would open the email although the specific time element can be shorter or longer based on the specifics of the campaign and what the end client desires. While subtly shuffling a deck of cards during our Zoom conference, Mr Pohan deliberated on why this is beneficial, and further what the potential output of what a send-time optimization model would look like, how it will be seamlessly inserted into the UI by merely clicking a checkbox and the genuine benefits for campaign builders who use such a model. Upon running the model, the campaign builder is uniquely informed about predicted open rates, click-through rates, and other outputs implemented with other datasets, in real-time, before sending, inferring that when a campaign builder runs this model, she will deploy at the exact times the end-user would most likely open her email.
This model serves in milliseconds for the highest engagement rates based on her selected target variable. For example, if your target variable is open-rate, the model will optimize for optimal open rates in a specific demographic group, assuming we introduce demographic data in the data prep and model development phase. Let’s examine the few reasons this is critical during a particular holiday season.
Supply Chain Bottlenecks
As holiday shopping ramps up—and supply-chain issues persist—forward-looking campaigners are already doing things differently. Maintaining customer loyalty has been challenging during the pandemic as out-of-stock items generate anxiety. Further, with container ships clogging up ports worldwide, people are eager to snag toys, electronics, and other gifts off the shelves. Given that ports are clogged, and the human workforce to unload containers from those ports is waning, an email campaign builder needs every imaginable edge to maintain brand loyalty during these uncertain times. This is the first wave of value creation with the SendTime Optimization Model.
From a very recent McKinsey Survey, among the 60% of consumers who have faced out-of-stock products in the last three months, only 13% say they waited for an item to come back in stock. About 70% switched retailers or brands instead. The send time optimization model can alleviate this initial departure maintaining brand loyalty.
The time optimization model provides RealTimeML predictions on when site visitors will open the email and click-through to the website achieving more efficiencies in supply chain management and hyper-optimizing the supply chain further by using demographic data. The model’s inputs may or may not include age, gender, region and time sent.
Time Window
The time window is essential on several levels. Suitable time window options will alleviate supply-chain bottlenecks, recalibrate inventory levels and prepare in-store pick up staff for items that need to be available. Determining highest engagement rates with proper send time for retailers is also very valuable getting closer to the end of the holiday season as there can be a scramble for goods and services by online shoppers.
For example, let’s say Home Depot has 40M email subscribers, and they want to know the optimal time to send for an email campaign for a specific region for their 40M Subscribers. Home Depot can rely on predictive analytics to service this region and stock accordingly to drive a campaign in the deep south. Besides inventory optimization, the model can help achieve efficiencies based on the log activity and prepare for spikes in site visitorship for these specific time frames, allowing for mission-critical supplies being available during these times for in-store pick-up. The burden on brand loyalty and 70% of subscribers who switch brands for other retailers when an item is out of stock suggest brands need to have things in stock to be shipped on time. A well-prepared campaign builder like Home Depot will have these predictive inferences well before launching the campaign, alerting the supply-chain well ahead of the campaign is deployed. The model can be monikered as a supply chain optimization model.
Datasets Used for this Model
The dataset used in this analysis is from the Github Repository-Optimizing Email Marketing. This dataset contains 100K rows of historical email campaign data and over two dozen variables, including the day of the week, the hour of the day, country and open email rate, among others, such as past purchases and campaign type. In the data processing notebook for demo purposes, we created additional synthetic variables for many other features that would be typical for an email campaign. As seen in the figure above, we also created an extra input layer to drill down on exact demographics for each prediction. Introducing other datasets for richer outputs, including but not limited to eCommerce data, are also relatively standard for this model. For the send time optimization model, you may also consider historical data sets by region if Home Depot needs supplies in the recent tragic episode of Kentucky.
The Model
This algorithm used in this model is a Random Forest Classifier, but other recommended models include Gradient Boosted Classifiers and Neural Networks. In this model, the random forest classifier creates a set of decision trees from a randomly selected subset of the training set and then it collects the votes from different decision trees to decide the final prediction. These exact times are served in milliseconds before the campaign is sent. Input tweaks are necessary to tune the model for optimized performance.
The bootstrap and sub-sample size are ensembled as two critical parameters to improve the predictive accuracy and control of over-fitting. First, Bootstrap sampling is used when building models to repeatedly sample data with replacement from the original training set to reduce the variance of the predictions, thus significantly improving the predictive performance. The model randomly selects a fixed percentage of the whole training set with replacement as a bootstrap sample and grows a decision tree from the bootstrap sample. Second, feature sub-sampling randomly selects subsets of features considered when splitting nodes in each decision tree. Dimensionality reduction can sometimes take place at this time. The model randomly selects features without replacement at each node and then splits the node using the component that provides the best split according to the objective function, for instance, by maximizing the information gain. In the end, the model aggregates the prediction by each tree to assign the class label by majority vote. The implementation of feature subsampling prevents model overfitting effectively.
Validation
The SendTime Optimization Model provides predictions and recommendations in 2 ways:
- Based on an input “selected user group,” the model provides recommendations of best email send time, including best sending weekday and best sending time. The model can also predict an email’s region, gender, age, and LTV with supported data.
- Based on the “target variable” the user selects to improve, the model provides recommended user groups and corresponding best weekdays and times to send the email.
For increased accuracy, we experimented with several ensembling techniques. Once again, this model is only built for send-time optimization. In addition, blending two or more email recommendation models for hyper-personalization and customization may potentially have a higher engagement rate. In this example, you might consider incorporating a sentiment analysis model with the SendTime optimization model and to maximize the optimization and engagement rate under this scenario. For example, our sentiment analysis model can derive the optimal sentiment for a particular email campaign and perhaps striking a more urgent tone during the holidays could uplift engagement rates. You might also consider incentive-based models to blend.
Essentially the ST Model uses RealTimeML to predict the optimal send time for the highest engagement rates in an email. The target variable can be set for Open Rate, CTR or other target variables you seek to optimize for. After these inputs have been entered within the workflow of a campaign build, we serve the model predictions in milliseconds via an API as an output to include predicted engagement rates in real-time before the campaign is sent.
Time Trials
Our CTO is conducting time trials to determine the length of time of when these predictions can be served. These ESP models are designed to interact with email campaign builders. The email campaign builder used by an ESP will prepare the request.
Immediate time implications
Death by papercut. If an operation takes one millisecond, then 40 million events take roughly 12 hours, but we are working on ways so the ESP can heavily cache the predictions. Therefore the minimum time it would take to process a list of 40M email addresses would be the time it took to complete the process once multiplied by the length of the list. In other words, O(n) time complexity. The complex nature of serving up RealTimeML predictions promptly is the challenge we are solving. Milliseconds are the metric we are working with. Producing a send time optimization model to the ESP has some complexities.
We are attempting to solve this issue by [running the models weekly and caching the results. Doing this is essentially the equivalent of doing all the heavy lifting and keeping the final product ready on the shelf for the customer when they request it via API. In this way we can provide send times for a list of emails 40M+ long not in 12 hours but rather in milliseconds]
Data growth mindset
A quote from our CTO: Mr. Ara Baghdassarian: “The email addresses that this model houses will continuously increase in size, detail, and accuracy. It is the best weapon we have in the war on time when serving predictions, and we must hone it well. One could say, and the irony is striking, that only time will solve the problem of time.” Given that this model is iterative, once you have deployed your first SendTime optimization model and know the available time of that email, the model will become increasingly more accurate, and we can in-turn run models on optimal send times and provide predictions based on the number of batches.
Increasingly, we will be ingesting streaming data to retrain the models to serve up inferences continuously. RealtimeML is still a very nascent industry, and while historical data is required to deliver RealTimeML accurately, predictive models undoubtedly is where the industry is headed.
Special Thanks to Chen Song, Mateo Martinez and Craig Pohan, and Ara Baghdassarian.