Machine Learning & AI

Retail & Consumer Goods

Retailer Predicts Inventory Demand with Azure Databricks

Retail victory requires expertly juggling supply and demand. Mastering that balance keeps this retailer’s thousands of nationwide storefronts and dozens of distribution centers running smoothly. When it began grappling with too many products at some locations and not enough at others, the corporation set out to gain a deeper understanding of its customers’ evolving needs and reduce disruption in sales. To better predict adequate future supply, the company enlisted BlueGranite to help it uncover underlying factors influencing sales. 

The retailer had been using a three-week rolling average to predict the next four weeks of sales, which was ultimately contributing to the organization’s supply issues. While this projection gave moderately accurate results, it did not factor in external influences like seasonality, weather fluctuations, or economic influences. The company also struggled to quickly integrate data from the disparate systems across its many storefronts. And even once it had the data in hand, processing it was difficult due to its incredibly large size and complexity.

Partnering with the retailer, BlueGranite used Azure Databricks and Azure Data Lake Store to engineer a cloud-based analytics platform with limitless storage capacity and swift data processing capabilities.

Azure Databricks-2

Our initial use case collected and utilized more than three years of a single city’s sales data in order to forecast weekly sales predictions up to four weeks out. Azure Data Lake Store ingests the retailer’s massive amounts of sales and inventory data. Apache Spark in Azure Databricks Notebooks removes inaccurate records from the data and transforms it for use. Through Notebooks, users can explore and visualize the data, identify trends, and uncover candidate products to add or remove from stores.


Machine learning models were created with the Spark MLlib library, predicting four weeks of sales data for each product using a Decision Forest Regression module in Azure Databricks. The model predicts at better than 80% accuracy for the majority of products with continuous, consistent sales data.

Finally, Power BI is used to help visualize predictions and the model’s accuracy.

model results

In summary, we designed a solution that captures and ingests the retailer’s multi-sourced data into a common repository. It scales storage and compute resources easily to accommodate the retailer’s substantial nationwide data and processing needs. The solution also provides advanced analytics and machine learning capabilities, allowing for the creation of more complex models.

This new way of looking at supply and demand makes use of existing skill sets, scalability, and greater accuracy than previous methods. This solution reduces inventory costs and increases product distribution efficiencies, keeping this national retailer at the forefront of trade.

For more information on Azure Databricks, check out BlueGranite’s free resources here.

How We Did It

  • Implemented common data lake storage that overcomes previous storage capacity issues by scaling across all regions and locations.
  • Provided Platform as a Service (PaaS), cluster-based, computing to process significantly more data in drastically less time.
  • Instituted new advanced analytics and machine learning capabilities in a single unified platform that allow for more complex modeling than previously possible.
  • Created a platform that makes use of retailer’s existing skill sets.
  • Initial use case facilitates faster time to market, reduces inventory costs, and provides more efficient product distribution.


Machine Learning Models For Better Data and Analytics Reporting