A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 8 Issue 8
Aug.  2021

IEEE/CAA Journal of Automatica Sinica

• JCR Impact Factor: 7.847, Top 10% (SCI Q1)
CiteScore: 13.0, Top 5% (Q1)
Google Scholar h5-index: 51， TOP 8
Turn off MathJax
Article Contents
Akshay Agrawal, Shane Barratt and Stephen Boyd, "Learning Convex Optimization Models," IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1355-1364, Aug. 2021. doi: 10.1109/JAS.2021.1004075
 Citation: Akshay Agrawal, Shane Barratt and Stephen Boyd, "Learning Convex Optimization Models," IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1355-1364, Aug. 2021.

# Learning Convex Optimization Models

##### doi: 10.1109/JAS.2021.1004075
• A convex optimization model predicts an output from an input by solving a convex optimization problem. The class of convex optimization models is large, and includes as special cases many well-known models like linear and logistic regression. We propose a heuristic for learning the parameters in a convex optimization model given a dataset of input-output pairs, using recently developed methods for differentiating the solution of a convex optimization problem with respect to its parameters. We describe three general classes of convex optimization models, maximum a posteriori (MAP) models, utility maximization models, and agent models, and present a numerical experiment for each.

• 1 redacted
•  [1] A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Z. Kolter, “Differentiable convex optimization layers,” in Proc. 33rd Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 9558–9570. [2] A. Agrawal, S. Barratt, S. Boyd, E. Busseti, and W. Moursi, “Differentiating through a cone program,” J. Appl. Numer. Optim., vol. 1, no. 2, pp. 107–115, Aug. 2019. [3] B. Amos, “Differentiable optimization-based modeling for machine learning,” Ph.D. dissertation, Carnegie Mellon Univ., Pittsburgh, USA, 2019. [4] E. Busseti, W. M. Moursi, and S. Boyd, “Solution refinement at regular points of conic problems,” Comput. Optim. Appl., vol. 74, no. 3, pp. 627–643, Aug. 2019. [5] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, USA: Cambridge University Press, 2004. [6] G. BakIr, T. Hofmann, B. Schölkopf, A. J. Smola, B. Taskar, and S. V. N. V. Vishwanathan, Predicting Structured Data. Cambridge, USA: MIT Press, 2007. [7] Y. LeCun, S. Chopra, R. Hadsell, M. A. Ranzato, and F. J. Huang, “A tutorial on energy-based learning,” in Predicting Structured Data, G. Bakir, T. Hofman, B. Scholkopf, A. Smola, and B. Taskar, Eds. Cambridge, USA: MIT Press, 2006. [8] B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin, “Learning structured prediction models: A large margin approach,” in Proc. 22nd Int. Conf. Machine Learning, Bonn, Germany, 2005, pp. 896–903. [9] B. Taskar, C. Guestrin, and D. Koller, “Max-margin Markov networks,” in Proc. 16th Int. Conf. Neural Information Processing Systems, Vancouver and Whistler, British Columbia, Canada, 2004, pp. 25–32. [10] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, “Large margin methods for structured and interdependent output variables,” J. Mach. Learn. Res., vol. 6, pp. 1453–1484, Dec. 2005. [11] S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric discriminatively, with application to face verification,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, San Diego, USA, 2005, pp. 539–546. [12] D. Belanger and A. McCallum, “Structured prediction energy networks,” in Proc. 33rd Int. Conf. Machine Learning, New York, USA, 2016, pp. 983–992. [13] D. Belanger, B. S. Yang, and A. McCallum, “End-to-end learning for structured prediction energy networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 429–439. [14] B. Amos, L. Xu, and J. Z. Kolter, “Input convex neural networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 146–155. [15] J. Peng, L. F. Bo, and J. B. Xu, “Conditional neural fields,” in Proc. 22nd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2009, pp. 1419–1427. [16] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Z. Su, D. L. Du, C. Huang, and P. H. S. Torr, “Conditional random fields as recurrent neural networks,” in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 1529–1537. [17] L. C. Chen, A. G. Schwing, A. L. Yuille, and R. Urtasun, “Learning deep structured models,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 1785–1794. [18] Z. L. Geng, D. Johnson, and R. Fedkiw, “Coercing machine learning to output physically accurate results,” J. Comput. Phys., vol. 406, Article No. 109099, Apr. 2020. [19] R. K. Ahuja and J. B. Orlin, “Inverse optimization,” Oper. Res., vol. 49, no. 5, pp. 771–783, Oct. 2001. [20] C. Heuberger, “Inverse combinatorial optimization: A survey on problems, methods, and results,” J. Comb. Optim., vol. 8, no. 3, pp. 329–361, Sept. 2004. [21] V. Chatalbashev, “Inverse convex optimization,” M.S. thesis, Stanford Univ., Stanford, USA, 2005. [22] A. Keshavarz, Y. Wang, and S. Boyd, “Imputing a convex objective function,” in Proc. IEEE Int. Symp. Intelligent Control, Denver, USA, 2011, pp. 613–619. [23] B. Amos and J. Z. Kolter, “OptNet: Differentiable optimization as a layer in neural networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 136–145. [24] A. V. Fiacco and G. P. McCormick, Nonlinear Programming—Sequential Unconstrained Minimization Techniques. New York, USA: John Wiley & Sons Ltd, 1990. [25] A. V. Fiacco, “Sensitivity analysis for nonlinear programming using penalty methods,” Mathem. Program., vol. 10, no. 1, pp. 287–311, Dec. 1976. [26] B. Amos, I. D. Jimenez, J. Sacks, B. Boots, and J. Z. Kolter, “Differentiable MPC for end-to-end planning and control,” in Proc. 32nd Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2018, pp. 8299–8310. [27] F. de A. Belbute-Peres, K. Smith, K. R. Allen, J. B. Tenenbaum, and J. Z. Kolter, “End-to-end differentiable physics for learning and control,” in Proc. 32nd Conf. Neural Information Processing Systems, Montreal, Canada, 2018, pp. 7178–7189. [28] S. T. Barratt and S. P. Boyd, “Fitting a Kalman smoother to data,” in Proc. American Control Conf., Denver, USA, 2020. [29] A. Agrawal, S. Barratt, S. Boyd, and B. Stellato, “Learning convex optimization control policies,” in Proc. 2nd Annu. Conf. Learning for Dynamics and Control, 2020. [30] C. K. Ling, F. Fang, and J. Z. Kolter, “What game are we playing? end-to-end learning in normal and extensive form games,” in Proc. 27th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018. [31] C. K. Ling, F. Fang, and J. Z. Kolter, “Large scale learning of agent rationality in two-player zero-sum games,” in Proc. AAAI Conf. Artificial Intelligence, Palo Alto, USA, 2019, pp. 6104–6111. [32] Q. Berthet, M. Blondel, O. Teboul, M. Cuturi, J. P. Vert, and F. Bach, “Learning with differentiable perturbed optimizers,” arXiv preprint arXiv: 2002.08676, Jun. 2020. [33] S. Barratt, G. Angeris, and S. Boyd, “Automatic repair of convex optimization problems,” Optim. Eng., vol. 22, pp. 247–259, 2021. [34] B. Amos and D. Yarats, “The differentiable cross-entropy method,” arXiv preprint arXiv: 1909.12830, Aug. 2020. [35] S. Barratt and S. Boyd, “Least squares auto-tuning,” Eng. Optim., vol. 53, no. 5, pp. 789–810, May 2021. [36] S. Barratt and R. Sharma, “Optimizing for generalization in machine learning with cross-validation gradients,” arXiv preprint arXiv: 1805.07072, May 2018. [37] S. Diamond, V. Sitzmann, F. Heide, and G. Wetzstein, “Unrolled optimization with deep priors,” arXiv preprint arXiv: 1705.08041, Dec. 2018. [38] J. Domke, “Generic methods for optimization-based modeling,” in Proc. 15th Int. Conf. Artificial Intelligence and Statistics, La Palma, Canary Islands, 2012, pp. 318–326. [39] C. Finn, “Learning to learn with gradients,” Univ. California, Berkeley, USA, Tech. Rep. No. UCB/EECS-2018–105, 2018. [40] D. Maclaurin, D. Duvenaud, and R. P. Adams, “Gradient-based hyperparameter optimization through reversible learning,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 2113–2122. [41] A. Agrawal and S. Boyd, “Differentiating through log-log convex programs,” arXiv preprint arXiv: 2004.12553, May 2020. [42] J. Lorraine, P. Vicol, and D. Duvenaud, “Optimizing millions of hyperparameters by implicit differentiation,” arXiv preprint arXiv: 1911.02590, Nov. 2019. [43] N. Hansen and A. Ostermeier, “Completely derandomized self-adaptation in evolution strategies,” Evol. Comput., vol. 9, no. 2, pp. 159–195, Jun. 2001. [44] J. Močkus, “On Bayesian methods for seeking the extremum,” in Proc. IFIP Technical Conf. Optimization Techniques, Novosibirsk, Russia, 1974, pp. 400–404. [45] F. J. Solis and R. J. B. Wets, “Minimization by random search techniques,” Math. Oper. Res., vol. 6, no. 1, pp. 19–30, Feb. 1981. [46] A. Beck and M. Teboulle, “Gradient-based algorithms with applications to signal-recovery,” in Convex Optimization in Signal Processing and Communications, D. P. Palomar and Y. C. Eldar, Eds. Cambridge, USA: Cambridge University Press, 2009, pp. 42–88. [47] L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proc. COMPSTAT, Y. Lechevallier and G. Saporta, Eds. Heidelberg: Physica-Verlag, 2010, pp. 177–186. [48] J. Nocedal and S. J. Wright, Numerical Optimization. New York, USA: Springer, 2006. [49] C. M. Bishop, Pattern Recognition and Machine Learning. New York, USA: Springer, 2006. [50] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, USA: MIT Press, 2016. [51] R. E. Barlow and H. D. Brunk, “The isotonic regression problem and its dual,” J. Am. Stat. Assoc., vol. 41, no. 5, pp. 140–147, Mar. 1972. [52] M. J. Best and N. Chakravarti, “Active set algorithms for isotonic regression; A unifying framework,” Math. Program., vol. 47, no. 1–3, pp. 425–439, May 1990. [53] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Tokyo, Japan: Springer Science & Business Media, 2009. [54] S. J. Kim, K. Koh, S. Boyd, and D. Gorinevsky, “ $\ell_1$ trend filtering” SIAM Rev., vol. 51, no. 2, pp. 339–360, May 2009. [55] D. Bertsekas, Dynamic Programming and Optimal Control, Vol. I. 4th ed. Belmont, USA: Athena Scientific, 2017.

### Catalog

###### 通讯作者: 陈斌, bchen63@163.com
• 1.

沈阳化工大学材料科学与工程学院 沈阳 110142

Figures(4)

## Highlights

• Convex optimization models predict outputs by solving an optimization problem
• We show how many models are in fact convex optimization models
• We give a general technique for learning convex optimization models

/