Comparative Forecasting of Pharmaceutical Point-of-Sale Trends for Inventory Optimization Using ARIMA and Prophet¶
Sasha Nabila Fortuna
Background Problem¶
The pharmaceutical industry faces significant challenges in achieving accurate demand forecasting, which is crucial for effective inventory management, cost control, and ensuring timely product availability. While pharmacies and distributors heavily utilize point-of-sale (POS) data to monitor pharmaceutical product movement, issues such as stockouts, overstocking, and the accumulation of expired inventory remain prevalent. These problems often stem from the complex and volatile nature of pharmaceutical demand, which is influenced by a multitude of factors including seasonal disease patterns, public health campaigns, regulatory changes, and sudden market disruptions like epidemics.
Traditional forecasting methods, such as ARIMA, have been employed to address these challenges, but their effectiveness can be limited by the non-stationarity of sales data and the need for careful parameter tuning. Moreover, these methods may struggle to adequately capture the impact of promotional activities or other external factors that significantly influence demand. As Abolghasemi et al. (2021) highlight, the choice between using POS data and order data for forecasting is not straightforward and depends on various factors, including the characteristics of the time series itself. Factors such as the mean, variance, non-linearity, and entropy of POS data, as well as the presence of promotions, can significantly affect forecasting accuracy.
Therefore, there is a need to explore and evaluate alternative forecasting approaches, such as Facebook Prophet, which is designed to handle data with strong seasonality and irregular events, and to systematically investigate the impact of time series characteristics on the performance of different forecasting models in the pharmaceutical context. A more nuanced understanding of these factors can lead to improved forecasting accuracy and better-informed inventory and supply chain decisions.
Objectives¶
This study aims to:
- Analyze POS data of pharmaceutical product sales to identify temporal patterns and seasonal trends.
- Implement and evaluate ARIMA and Prophet models for forecasting pharmaceutical sales.
- Compare the forecasting performance of both models using appropriate evaluation metrics (e.g., RMSE, MAE, MAPE).
- Identify practical advantages and limitations of ARIMA and Prophet in forecasting pharmaceutical sales data.
- Provide actionable insights for inventory and supply chain decision-making based on model forecasts.
Dataset Description¶
The dataset used in this study originates from a comprehensive collection of transactional records, initially comprising approximately 600,000 entries spanning a six-year period from 2014 to 2019. These records were extracted from a Point-of-Sale (POS) system implemented in an individual pharmacy and include essential information such as the exact date and time of each transaction, the brand name of the pharmaceutical product sold, and the quantity dispensed.
From the broader dataset, a subset of 57 pharmaceutical drugs was selected for focused analysis. These drugs were grouped and categorized according to the Anatomical Therapeutic Chemical (ATC) Classification System, covering the following therapeutic classes:
- M01AB – Non-steroidal anti-inflammatory and antirheumatic products (acetic acid derivatives and related compounds)
- M01AE – Non-steroidal anti-inflammatory and antirheumatic products (propionic acid derivatives)
- N02BA – Other analgesics and antipyretics (salicylic acid and derivatives)
- N02BE/B – Other analgesics and antipyretics (pyrazolones and anilides)
- N05B – Psycholeptics (anxiolytic drugs)
- N05C – Psycholeptics (hypnotics and sedatives)
- R03 – Medications for obstructive airway diseases
- R06 – Systemic antihistamines
To facilitate time series analysis, the sales data were aggregated into multiple temporal granularities, including hourly, daily, weekly, and monthly levels. Prior to analysis, the dataset underwent a preprocessing phase that addressed data quality issues. This included the detection and treatment of outliers as well as the imputation of missing values, ensuring the dataset's suitability for robust modeling and forecasting.
Methodology¶
Import Libraries¶
import datetime
import os
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
import plotly.graph_objects as go
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
from sklearn.metrics import mean_absolute_error
from prophet import Prophet
import statsmodels.api as sm
from statsmodels.stats.diagnostic import acorr_ljungbox
import warnings
warnings.filterwarnings("ignore")
# Format the date as a string
formatted_date = datetime.date.today().strftime("%B %d, %Y")
text_block = f"""
This notebook was executed on: {formatted_date}.
Any results or analysis reflect the data and conditions as of this date.
"""
print(text_block)
Load Dataset¶
We try to load the dataset by hourly, daily, weekly, and monthly approaches.
hourly_df = pd.read_csv("pharma-sales-data/saleshourly.csv")
hourly_df.head()
hourly_df.info()
daily_df = pd.read_csv("pharma-sales-data/salesdaily.csv")
daily_df.head()
weekly_df = pd.read_csv("pharma-sales-data/salesweekly.csv")
weekly_df.head()
Exploratory Data Analysis¶
Comparison Drug Products Over Time¶
In this section, we will look at the comparison of total sales over time, based on the time frame to be observed, be it per hour, per day, per week, to per month.
# Convert datatype of datum
hourly_df['datum'] = pd.to_datetime(hourly_df['datum'])
daily_df['datum'] = pd.to_datetime(daily_df['datum'])
weekly_df['datum'] = pd.to_datetime(weekly_df['datum'])
hourly_df.set_index('datum', inplace=True)
# Create figure
fig_hourly = go.Figure()
# Add a line trace for each drug category
categories = ['M01AB', 'M01AE', 'N02BA', 'N02BE', 'N05B', 'N05C', 'R03', 'R06']
for col in categories:
fig_hourly.add_trace(go.Scatter(
x=hourly_df.index,
y=hourly_df[col],
mode='lines',
name=col
))
# Update layout
fig_hourly.update_layout(
title='Interactive Hourly Drug Sales by Category',
xaxis_title='Datetime (Hour)',
yaxis_title='Sales',
hovermode='x unified',
template='plotly_white',
width=1200,
height=600
)
# Show plot
fig_hourly.show()