Approximate Dynamic Programming for Stochastic Resource Allocation Problems

Ali Forootani; Raffaele Iervolino; Massimo Tipaldi; Joshua Neilson

doi:10.1109/JAS.2020.1003231

Volume 7 Issue 4

Jun. 2020

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2020 > 7(4): 975-990

Ali Forootani, Raffaele Iervolino, Massimo Tipaldi and Joshua Neilson, "Approximate Dynamic Programming for Stochastic Resource Allocation Problems," IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 975-990, July 2020. doi: 10.1109/JAS.2020.1003231

Citation:

Ali Forootani, Raffaele Iervolino, Massimo Tipaldi and Joshua Neilson, "Approximate Dynamic Programming for Stochastic Resource Allocation Problems," IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 975-990, July 2020. doi: 10.1109/JAS.2020.1003231

Citation:

PDF( 1568 KB)

Approximate Dynamic Programming for Stochastic Resource Allocation Problems

doi: 10.1109/JAS.2020.1003231

More Information

Author Bio:
Ali Forootani (M’19) received the M.Sc. degree in electrical engineering and automatic control system from Power and Water University of Technology, Iran, in 2011, and the Ph.D. degree in automatic control system from Department of Engineering, University of Sannio, Italy, in 2019. From 2011 to 2012, he worked on the monitoring of high voltage circuit breakers and wind turbine control system at Niroo Research Institute, Iran. He worked as a control and instrumentation Engineer at the Ministry of Power and Energy (Water and Sewage khuzestan Engineering Company) from 2012 to 2015. He worked as a Research Fellow at the Measurement and Instrumentation Laboratory University of Sannio, as well as Post Doc. Research Fellow at the University of Salerno from Dec. 2018 to Jan. 2020. Currently he is doing research at Hamilton Institute, Maynooth University, Ireland. His current research interests include Markov decision processes, approximate dynamic programming, reinforcement learning in optimal control, and learning in network control systems. He is a Member of IEEE Control System Society and Reviewer of several important journals

Raffaele Iervolino (M’19) received the laurea degree cum laude in aerospace engineering from the University of Naples, Italy, in 1996, where he also obtained the Ph.D. degree in electronic and computer science engineering in 2002. Since 2003 he is an Assistant Professor of automatic control at the University of Naples. From 2005 he is also a Adjoint Professor of automatic control with the Department of Electrical Engineering and Information Technology at the same University. He is Member of IEEE Control System Society. His research interests include piecewise affine systems, opinion dynamics and consensus in social networks, and human telemetry systems

Massimo Tipaldi received the master degree in computer science engineering and the Ph.D. degree in information technology from the University of Sannio, Italy, in 1998 and 2017, respectively. He possesses more than 20 years of industrial experience in the managerial and technical coordination of ESA/ASI/CNES space project (satellite systems, experimental equipment for the International Space Station, and ground segments). He holds 1 patent and has co-authored more than 40 papers published in proceedings of international conferences or international archival journals. His research interests include space systems engineering, critical SW systems, reinforcement learning, approximate dynamic programming, stochastic systems, and advanced system control techniques

Joshua Neilson is a Ph.D. candidate in information technology at the University of Sannio, Italy, where he studies optical coatings for gravitational wave interferometers. He received the master degree in physics from California State University, USA, in 2017. He has been a Member of the LIGO Scientific Collaboration since 2016, has co-authored several peer-reviewed papers and more than 40 LIGO collaboration papers
Corresponding author: A. Forootani is with the Hamilton Institute, Maynooth University, Co. Kildare W23 F2K8, Ireland (e-mail: Aliforootani@gmail.com)
Received Date: 2020-03-12
Accepted Date: 2020-04-13

Available Online: 2020-04-30

Abstract

Abstract

A stochastic resource allocation model, based on the principles of Markov decision processes (MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations (i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming (DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations, occurs. In particular, an approximate dynamic programming (ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach.
- Approximate dynamic programming (ADP),
- dynamic programming (DP),
- Markov decision processes (MDPs),
- resource allocation problem

FullText(HTML)

References(38)

References

[1]	D. Bertsimas, S. Gupta, and G. Lulli, “Dynamic resource allocation: A flexible and tractable modeling framework,” European J. Operational Research, vol. 236, pp. 14–26, 2014. doi: 10.1016/j.ejor.2013.10.063
[2]	S. M. Kandil, H. E. Z. Farag, M. F. Shaaban, and M. Z. El-Sharafy, “A combined resource allocation framework for PEVs charging stations, renewable energy resources and distributed energy storage systems,” Energy, vol. 143, pp. 961–972, 2018. doi: 10.1016/j.energy.2017.11.005
[3]	J. Acimovic and S. C. Graves, “Making better fulfillment decisions on the fly in an online retail environment,” Manufacturing&Service Operations Management, vol. 17, no. 1, pp. 1–18, 2014.
[4]	T. Liu, B. Tian, Y. Ai, Y. Zou, and F. Y. Wang, “Parallel reinforcement learning-based energy efficiency improvement for a cyber-physical system,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 2, pp. 617–626, 2020.
[5]	L. Jiang, H. Huang, and Z. Ding, “Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge,” IEEE/CAA J. Autom. Sinica, 2019. doi: 10.1109/JAS.2019.1911732
[6]	A. Pietrabissa, F. D. Priscoli, A. D. Giorgio, A. Giuseppi, M. Panfili, and V. Suraci, “An approximate dynamic programming approach to resource management in multi-cloud scenarios,” Int. J. Control, vol. 90, pp. 492–503, 2017. doi: 10.1080/00207179.2016.1185802
[7]	D. P. Bertsekas, “Dynamic Programming and Optimal Control, Vol. I” Athena Scientific, Belmont, Massachusetts, USA, 2017. [Online]. Available: http://www.athenasc.com/dpbook.html
[8]	D. P. Bertsekas, “Dynamic Programming and Optimal Control, Vol. II” Athena Scientific, Belmont, Massachusetts, USA, 2012. [Online]. Available: http://www.athenasc.com/dpbook.html
[9]	C. Quan and F. Ren, “Weighted high-order hidden Markov models for compound emotions recognition in text,” Information Sciences, vol. 329, pp. 581–596, 2016. doi: 10.1016/j.ins.2015.09.050
[10]	V. M. Baskaran, Y. C. Chang, J. Loo, K. Wong, and M. Gan, “Dominant speaker detection in multipoint video communication using Markov chain with non-linear weights and dynamic transition window,” Information Sciences, vol. 464, pp. 344–362, 2018.
[11]	A. Pietrabissa, M. Castrucci, and A. Palo, “An MDP approach to faulttolerant routing,” European J. Control, vol. 18, no. 4, pp. 334–347, 2012. doi: 10.3166/ejc.18.334-347
[12]	D. P. Bertsekas, “Feature-based aggregation and deep reinforcement learning: A survey and some new implementations,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 1, pp. 1–31, 2019. doi: 10.1109/JAS.2018.7511249
[13]	D. P. Bertsekas, “Dynamic programming and suboptimal control: A survey from ADP to MPC,” European J. Control, vol. 11, no. 4, pp. 310–334, 2005.
[14]	Z. Zhang and D. Zhao, “Clique-based cooperative multiagent reinforcement learning using factor graphs,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 3, pp. 248–256, 2014. doi: 10.1109/JAS.2014.7004682
[15]	M. Kumar, K. Rajagopal, S. N. Balakrishnan, and N. T. Nguyen, “Reinforcement learning based controller synthesis for flexible aircraft wings,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 4, pp. 435–448, 2014. doi: 10.1109/JAS.2014.7004670
[16]	A. Forootani, M. Tipaldi, M. G. Zarch, D. Liuzza, and L. Glielmo, “Modelling and solving resource allocation problems via a dynamic programming approach,” Int. J. Control, 2019. doi: 10.1080/00207179.2019.1661521
[17]	A. Forootani, D. Liuzza, M. Tipaldi, L. Glielmo, “Allocating resources via price management systems: a dynamic programming-based approach,” Int. Journal of Control, 2019. doi: 10.1080/00207179.2019.1694178
[18]	D. P. Bertsekas, “Temporal difference methods for general projected equations,” IEEE Trans. Automatic Control, vol. 56, no. 9, pp. 2128–2139, 2011. doi: 10.1109/TAC.2011.2115290
[19]	A. Forootani, R. Iervolino, and M. Tipaldi, “Applying unweighted least squares based techniques to stochastic dynamic programming: Theory and application,” IET Control Theory&Applications, vol. 13, no. 15, pp. 2387–2398, 2019.
[20]	D. Bertsekas, “Approximate policy iteration: A survey and some new methods,” J. Control Theory and Applications, vol. 9, no. 3, pp. 310–335, 2011. doi: 10.1007/s11768-011-1005-3
[21]	D. P. Bertsekas and J. N. Tsitsiklis, Introduction to Probability, Athena Scientific Optimization and Computation Series, 2nd. ed., Belmont, Massachusetts, USA, 2008.
[22]	D. P. de Farias and B. Van Roy, “The linear programming approach to approximate dynamic programming,” Operations Research, vol. 51, no. 6, pp. 850–865, 2003. doi: 10.1287/opre.51.6.850.24925
[23]	H. Yu and D. P. Bertsekas, “Convergence results for some temporal difference methods based on least squares,” IEEE Trans. Automatic Control, vol. 54, no. 7, pp. 1515–1531, 2009. doi: 10.1109/TAC.2009.2022097
[24]	L. Busoniu, D. Ernst, B. De Schutter, and R. Babuska, “Cross-entropy optimization of control policies with adaptive basis functions,” IEEE Trans. Systems,Man,and Cybernetics - Part B:Cybernetics, vol. 41, no. 1, pp. 196–209, 2011. doi: 10.1109/TSMCB.2010.2050586
[25]	D. P. Bertsekas and H. Yu, “Projected equation methods for approximate solution of large linear systems,” J. Computational and Applied Mathematics, vol. 227, pp. 27–50, 2009. doi: 10.1016/j.cam.2008.07.037
[26]	A. R. Andersen, B. F. Nielson, and L. B. Reinhardt, “Optimization of hospital ward resources with patient relocation using Markov chain modeling,” European J. Operational Research, vol. 260, no. 3, pp. 1152–1163, 2017. doi: 10.1016/j.ejor.2017.01.026
[27]	Z. Huang, W. M. Van Der Aalst, Xu. Lu, and H. Duan, “Reinforcement learning based resource allocation in business process management,” Data&Knowledge Engineering, vol. 70, no. 1, pp. 127–145, 2011.
[28]	K. Zheng, H. Meng, P. Chatzimisios, L. Lei, and X. Shen, “An SMDP based resource allocation in vehicular cloud computing systems,” IEEE Trans. Industrial Electronics, vol. 62, no. 12, pp. 7920–7928, 2015. doi: 10.1109/TIE.2015.2482119
[29]	D. Astaraky and J. Patrick, “A simulation based approximate dynamic programming approach to multi-class multi resource surgical scheduling,” European J. Operational Research, vol. 245, no. 1, pp. 309–319, 2015. doi: 10.1016/j.ejor.2015.02.032
[30]	R. Pourmoayed and L. R. Nielsen, “An approximate dynamic programming approach for sequential pig marketing decisions at herd level,” European J. Operational Research, vol. 276, no. 3, pp. 1056–1070, 2019. doi: 10.1016/j.ejor.2019.01.050
[31]	S. Fianu and L. B. Davis, “A Markov decision process model for equitable distribution of supplies under uncertainty,” European J. Operational Research, vol. 264, pp. 1101–1115, 2018. doi: 10.1016/j.ejor.2017.07.017
[32]	R. W. Lien, S. M. R. Iravani, and K. R. smilowitz, “Sequential resource allocation for nonprofit operations,” Operations Research, vol. 62, no. 2, pp. 301–317, 2014. doi: 10.1287/opre.2013.1244
[33]	Z. Deng, “Distributed algorithm design for resource allocation problems of second-order multi-agent systems over weight-balanced digraphs,” IEEE Trans. Systems,Man,and Cybernetics, 2019. doi: 10.1109/TSMC.2019.2930672
[34]	R. Shone, K. Glazebrook, and K. G. Zografos, “Resource allocation in congested queueing systems with time-varying demand: An application to airport operations,” European J. Operational Research, vol. 276, no. 2, pp. 566–581, 2019. doi: 10.1016/j.ejor.2019.01.024
[35]	D. Bertsimas, J. D. Grifth, V. Gupta, M. J. Kochenderfer, and V. V. Misic, “Comparison of Monte Carlo tree search and rolling horizon optimization for large-scale dynamic resource allocation problems,” European J. Operational Research, vol. 263, no. 2, pp. 664–678, 2017. doi: 10.1016/j.ejor.2017.05.032
[36]	A. Bayram, S. Solak, and M. Johnson, “Stochastic models for strategic resource allocation in non profit foreclosed housing acquisitions,” European J. Operational Research, vol. 233, no. 1, pp. 246–262, 2014. doi: 10.1016/j.ejor.2013.08.040
[37]	D. Castanon, J. M. Wohletz, “Model predictive control for stochastic resource allocation,” IEEE Trans. Automatic Control, vol. 54, no. 8, pp. 1739–1750, 2009. doi: 10.1109/TAC.2009.2024562
[38]	M. P. Dal Pont, R. S. Ferreira, W. W. Teixeira, D. M. Lima, and J. E. Normey-Rico, “MPC with machine learning applied to resource allocation problem using Lambda architecture,” IFAC-Papers OnLine, vol. 52, no. 1, pp. 550–555, 2019. doi: 10.1016/j.ifacol.2019.06.120

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(8) / Tables(5)

Get Citation

PDF

XML

Article Metrics

Article views (1476) PDF downloads(104)

Highlights

MDP based resource allocation problem is proposed.
MPC is considered in the framework of the MDP.
Algorithms suitable for computer implementation are proposed.
Compressive sampling is considered for ADP.
Linear architecture is considered for ADP.

Approximate Dynamic Programming for Stochastic Resource Allocation Problems

doi: 10.1109/JAS.2020.1003231

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content