Advanced Search
Article Contents
Article Contents

Evaluation of parallel and sequential deep learning models for music subgenre classification

The second author is supported by a grant from the Natural Sciences and Engineering Research Council of Canada (NSERC)

Abstract Full Text(HTML) Figure(6) / Table(4) Related Papers Cited by
  • In this paper, we evaluate two deep learning models which integrate convolutional and recurrent neural networks. We implement both sequential and parallel architectures for fine-grain musical subgenre classification. Due to the exceptionally low signal to noise ratio (SNR) of our low level mel-spectrogram dataset, more sensitive yet robust learning models are required to generate meaningful results. We investigate the effects of three commonly applied optimizers, dropout, batch regularization, and sensitivity to varying initialization distributions. The results demonstrate that the sequential model specifically requires the RMSprop optimizer, while the parallel model implemented with the Adam optimizer yielded encouraging and stable results achieving an average F1 score of $ 0.63 $. When all factors are considered, the optimized hybrid parallel model outperformed the sequential in classification accuracy and system stability.

    Mathematics Subject Classification: Primary: 68T07; Secondary: 68T10.


    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Baseline CNN model

    Figure 2.  CRNN sequential architecture

    Figure 3.  Parallel CNN-RNN architecture

    Figure 4.  Visualization of one song from our dataset

    Figure 5.  RMSprop learning process on two axes

    Figure 6.  Classification accuracy across 50 epochs

    Table 1.  F1 scores for optimizer evaluation

    Optimizer CNN CRNN CNN-RNN
    Adam 0.45 0.32 0.63
    Adadelta 0.30 0.31 0.35
    RMSprop 0.41 0.54 0.60
     | Show Table
    DownLoad: CSV

    Table 2.  Optimal classification accuracy

    Optimizer Adam RMSprop Adam
    Accuracy 0.31 0.57 0.64
     | Show Table
    DownLoad: CSV

    Table 3.  Marco F1 scores for effect of regularization

    Model Data Dropout Batch Normalization Dropout + Batch Normalization
    CRNN Train 0.67 1.00 0.98
    Validation 0.65 0.58 0.60
    Test 0.62 0.57 0.41
    CNN-RNN Train 0.65 1.00 0.90
    Validation 0.65 0.58 0.60
    Test 0.63 0.61 0.63
     | Show Table
    DownLoad: CSV

    Table 4.  Average F1 accuracy scores for effects of initialization methods

    Initialization CNN CRNN CNN-RNN
    Glorot Normal 0.31 0.63 0.63
    Glorot Uniform 0.34 0.60 0.59
    Random Normal 0.33 0.45 0.53
    Random Uniform 0.33 0.37 0.57
     | Show Table
    DownLoad: CSV
  • [1] M. Browne and S. S. Ghidary, Convolutional neural networks for image processing: An application in robot vision, in AI 2003: Advances in Artificial Intelligence, Lecture Notes in Comput. Sci., 2903, Lecture Notes in Artificial Intelligence, Springer, Berlin, 2003,641–652. doi: 10.1007/978-3-540-24581-0_55.
    [2] K. Choi, G. Fazekas, M. Sandler and K. Cho, Convolutional recurrent neural networks for music classification, IEEE International Conference on Acoustics, Speech, and Signal Processing, New Orleans, LA, 2017. doi: 10.1109/ICASSP.2017.7952585.
    [3] J. Chung, C. Gulcehre, K. Cho and Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, preprint, arXiv: 1412.3555v1.
    [4] Y. M. G. CostaL. S. Oliveira and C. N. Silla Jr., An evaluation of Convolutional Neural Networks for music classification using spectrograms, Applied Soft Computing, 52 (2017), 28-38.  doi: 10.1016/j.asoc.2016.12.024.
    [5] G. Gessle and S. Åkesson, A Comparative Analysis of CNN and LSTM for Music Genre Classification, Degree Project in Technology, Stockholm, Sweden, 2019. Available from: https://www.diva-portal.org/smash/get/diva2:1354738/FULLTEXT01.pdf.
    [6] X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 9 (2010), 249–256. Available from: http://proceedings.mlr.press/v9/glorot10a.html.
    [7] M. Helén and T. Virtanen, Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine, $13^{th}$ European Signal Processing Conference, Antalya, Turkey, (2005), 1–4. Available from: https://ieeexplore.ieee.org/document/7078147.
    [8] S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation, 9 (1997), 1735-1780.  doi: 10.1162/neco.1997.9.8.1735.
    [9] D. P. Kingma and J. Ba, ADAM: A method for stochastic optimization, preprint, arXiv: 1412.6980.
    [10] A. Krizhevsky, I. Sutskever and G. E. Hinton, ImageNet classification with deep convolutional neural networks, $26^{th}$ Conference on Neural Information Processing Systems, NeurIPS, 2012.
    [11] Y. LeCunB. BoserJ. S. DenkerD. HendersonR. E. HowardW. Hubbard and L. D. Jackel, Backpropagation applied to handwritten zip code recognition, Neural Computation, 1 (1989), 541-551. 
    [12] H. Lee, Y. Largman, P. Pham and A. Y. Ng, Unsupervised feature learning for audio classification using convolutional deep belief networks, $23^{rd}$ Conference on Neural Information Processing Systems, NeurIPS, 2009. Available from: https://ai.stanford.edu/ ang/papers/nips09-AudioConvolutionalDBN.pdf.
    [13] B. Logan, Mel frequency cepstral coefficients for music modeling, International Symposium on Music Information Retrieval, 2000.
    [14] Z. Nasrullah and Y. Zhao, Music artist classification with convolutional recurrent neural networks, preprint, arXiv: 1901.04555v2.
    [15] Y. Panagakis, C. Kotropoulos and G. R. Arce, Music genre classification via sparse representations of auditory temporal modulations, $17^{th}$ European Signal Processing Conference, Glasgow, UK, 2009.
    [16] Python, Package Index: Spotify and Spot-dl., Available from: https://pypi.org/project/spotify/ and https://pypi.org/project/spotdl/.
    [17] A. J. R. Simpson, G. Roma and M. D. Plumbley, Deep karaoke: Extracting vocals from musical mixtures using a convolutional deep neural network, preprint, arXiv: 1504.04658.
    [18] G. Tzanetakis and P. Cook, Musical genre classification of audio signals, IEEE Transactions on Speech and Audio Processing, 10 (2002), 293-302.  doi: 10.1109/TSA.2002.800560.
    [19] J. Wülfing and M. A. Riedmiller, Unsupervised learning of local features for music classification, International Society for Music Information Retrieval Conference (ISMIR), 2012. Available from: http://ml.informatik.uni-freiburg.de/former/_media/publications/wuelf2012.pdf.
    [20] C. Xu, N. C. Maddage, X. Shao, F. Cao and Q. Tian, Musical genre classification using support vector machines, IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP '03), Hong Kong, 2003. doi: 10.1109/ICASSP.2003.1199998.
    [21] R. YangL. FengH. WangJ. Yao and S. Luo, Parallel recurrent convolutional neural networks based music genre classification method for mobile devices, IEEE Access, 8 (2020), 19629-19637.  doi: 10.1109/ACCESS.2020.2968170.
    [22] M. D. Zeiler, ADADELTA: An adaptive learning rate method, preprint, arXiv: 1212.5701.
  • 加载中




Article Metrics

HTML views(567) PDF downloads(223) Cited by(0)

Access History

Other Articles By Authors



    DownLoad:  Full-Size Img  PowerPoint