Isaac Scientific Publishing

Journal of Advances in Applied Mathematics

On Sparsity of Soft Margin Support Vector Machines

Download PDF (457.4 KB) PP. 109 - 114 Pub. Date: July 31, 2017

DOI: 10.22606/jaam.2017.23001


  • Jochen Merker*
    Faculty of Computer Science, Mathematics and Natural Sciences, University of Applied Sciences Leipzig, Germany


In supervised machine learning, a support vector machine (SVM) constructs from binary classified training data a linear classifier by solving a linearly constrained convex optimization problem. Depending on the number N of training data and the dimension D of the feature space, it either is advantageous to solve the primal problem or the dual problem. In this article, the case D >> N is discussed where D is so large that even a calculation of the dot product of fully occupied vectors in dimension D is too slow for the desired (e.g. real-time) application. Then a way to speed up the classification is to use an SVM which constructs a sparse linear classifier by solving an optimization problem involving the 1-norm, i.e. many components of the classifying vector are zero so that much less than D multiplications are neccessary to calculate the dot product. For a soft-margin SVM, in this article a theorem on the number of non-zero components is shown.


Support vector machines, SVM, sparse classifier, L1 regularization, LASSO, big data, machine learning.


[1] V. N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.

[2] P. Foremski, “On different ways to classify internet traffic: a short review of selected publications,” Theor. Appl. Inf., vol. 25, pp. 119–136, 2013.

[3] H. Kim, K. Clay, M. Fomenkov, D. Barman, M. Faloutsos, and K. Lee, “Internet traffic classification demystified: Myths, caveats, and the best practices,” in Proceedings of the 2008 ACM CoNEXT conference. ACM, 2008, pp. 11–14.

[4] P. Velan, M. Cermák, P. Celeda, and M. Dra?ar, “A survey of methods for encrypted traffic classification and analysis,” Int. J. Network Mgmt., pp. 1–24, 2014.

[5] A. Khakpour and A. Liu, “An information-theoretical approach to high-speed flow nature identification,” IEEE/ACM Transactions on Networking, vol. 21, pp. 1076–1089, 2013.

[6] B. Sch?lkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2001.

[7] J. Fehr, K. Arreola, and H. Burkhardt, “Fast support vector machine classification of very large datasets,” in Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006.

[8] E. Carrizosa, “Support vector machines and distance minimization,” in Data mining and mathematical programming / Panos M. Pardalos, Pierre Hansen, editors. CRM proceedings & lecture notes, ISSN 1065-8580; Volume 45. AMS, 2008, pp. 1–13.

[9] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer, 2009, 2013.

[10] S. Rosset and J. Zhu, “Piecewise linear regularized solution paths,” The Annals of Statistics, vol. 35, no. 3, pp. 1012–1030, 2007.

[11] J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani, “1-norm support vector machines,” in Advances in Neural Information Processing Systems 16: Proceedings of the 2003 Conference, 2004, pp. 49–63.

[12] V. Gómez-Verdejo, M. M. Ramón, J. Arenas-García, and H. Molina-Bulla, “Support vector machines with constraints for sparsity in the primal parameters,” IEEE Transactions on neural networks, vol. 22, no. 8, pp. 1269–1283, 2011.