Support vector machine (SVM) is one of the widely used machine learning algorithms because of its salient features such as margin maximization and kernel substitution for classification and regression of data in a high dimensional feature space. But SVMs still face difficulties in handling large datasets. This difficulty is because of solving quadratic programming problems in SVMs which is costly, especially when dealing with large sets of training data. The proposed algorithm extracts data points lying close to the cluster boundaries of large data set, which form a much reduced but critical set for classification and regression. Inspired by the difficulties associated with SVM while handling large data sets with nonlinear kernels, the presented algorithm preselects a subset of data points and solves a smaller optimization problem to obtain the support vectors. The method presented reduces the data vectors by a recursive and segmented data structure analysis on the data vectors used to train the SVM. As this method is independent of SVM and precedes the training stage of SVM, it reduces the problem suffered by most data reduction methods that choose data based on repeated training of SVMs. Experiments using line spectral frequency (LSF) data vectors for voice conversion application show that the presented algorithm is capable of reducing the number of data vectors as well as the training time of SVMs, while maintaining good accuracy in terms of objective evaluation. The subjective evaluation result of the proposed voice conversion system is compared with the state of the art method like neural networks (NNs). The results show that the proposed method may be used as an alternative to the existing method for voice conversion.
Key words: Support vector machine, clustering based support vector machine, Mahalanobis distance, ward’s linkage
Copyright © 2022 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0