Fig. 7.
Results for the joint replenishment problem (i.e., $K=75$). The top panel displays the cost performance of the trained DRL models using a continuous action representation and the $P(s,S)$-policy for the different problem settings (number of products). Cost performance is expressed as the relative gap in cost per period with the lower bound from Viswanathan (2007). Confidence intervals are omitted as they are negligibly small due to the long simulation runs. Although the $P(s,S)$-policy outperforms our DRL approach in all settings, we note that the performance of our DRL approach is often comparable to the well-performing heuristic. The bottom panel displays the learning curves, expressed in a relative gap with the lower bound. We observe that the models can learn, even for larger problem instances.

Results for the joint replenishment problem (i.e., |$K=75$|⁠). The top panel displays the cost performance of the trained DRL models using a continuous action representation and the |$P(s,S)$|-policy for the different problem settings (number of products). Cost performance is expressed as the relative gap in cost per period with the lower bound from Viswanathan (2007). Confidence intervals are omitted as they are negligibly small due to the long simulation runs. Although the |$P(s,S)$|-policy outperforms our DRL approach in all settings, we note that the performance of our DRL approach is often comparable to the well-performing heuristic. The bottom panel displays the learning curves, expressed in a relative gap with the lower bound. We observe that the models can learn, even for larger problem instances.

Close
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close