\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Two-level optimization approach with accelerated proximal gradient for objective measures in sparse speech reconstruction

Abstract Full Text(HTML) Figure(3) / Table(8) Related Papers Cited by
  • Compressive speech enhancement makes use of the sparseness of speech and the non-sparseness of noise in time-frequency representation to perform speech enhancement. However, reconstructing the sparsest output may not necessarily translate to a good enhanced speech signal as speech distortion may be at risk. This paper proposes a two level optimization approach to incorporate objective quality measures in compressive speech enhancement. The proposed method combines the accelerated proximal gradient approach and a global one dimensional optimization method to solve the sparse reconstruction. By incorporating objective quality measures in the optimization process, the reconstructed output is not only sparse but also maintains the highest objective quality score possible. In other words, the sparse speech reconstruction process is now quality sparse speech reconstruction. Experimental results in a compressive speech enhancement consistently show score improvement in objectives measures in different noisy environments compared to the non-optimized method. Additionally, the proposed optimization yields a higher convergence rate with a lower computational complexity compared to the existing methods.

    Mathematics Subject Classification: Primary: 65K10, 90C26, 92C55; Secondary: 94A12.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Convergence for accelerated proximal gradient, proximal gradient methods and interior point methods for babble noise with 0 dB and $ L = 256 $

    Figure 2.  Convergence for the proximal gradient, the accelerated proximal gradient, and the interior point methods for babble noise with 0 dB and $ L = 512 $

    Figure 3.  Convergence for the proximal gradient, the accelerated proximal gradient, and the interior point methods for destroyer noise with 0 dB and $ L = 512 $

    Table 1.  Complexity comparison between the proximal gradient, the accelerated proximal gradient, and the interior point methods for babble noise, destroyer noise and white noise with window length $ L = 256 $

    Noise type SNR Accelerated Proximal Gradient Proximal Gradient Interior Point Method
    Babble noise 0dB 3.1307s 3.3139s 7.2640s
    5dB 2.7209s 2.8722s 7.0281s
    10dB 2.3887s 2.4476s 6.8677s
    15dB 2.2449s 2.4057s 6.8233s
    20dB 2.0481s 2.1050s 6.6695s
    Destroyer noise 0dB 2.8322s 2.9187s 6.9363s
    5dB 2.4799s 2.5386s 6.8301s
    10dB 2.2675s 2.4413s 6.7417s
    15dB 2.1390s 2.2119s 6.7070s
    20dB 1.8859s 1.9688s 6.4216s
    White noise 0dB 3.4491s 3.5234s 6.6548s
    5dB 2.8229s 2.9723s 6.9340s
    10dB 2.5765s 2.6288s 7.2333s
    15dB 2.3393s 2.4726s 7.0217s
    20dB 1.9912s 2.0732s 6.5130s
     | Show Table
    DownLoad: CSV

    Table 2.  Complexity comparison between the accelerated proximal gradient, the proximal gradient and the interior point methods for babble noise, destroyer noise and white noise with window length $ L = 512 $

    Noise type SNR Accelerated Proximal Gradient Proximal Gradient Interior Point Method
    Babble noise 0dB 0.8681s 0.9342s 12.7778s
    5dB 0.7779s 0.8346s 12.5931s
    10dB 0.7119s 0.7730s 12.2826s
    15dB 0.6637s 0.7199s 12.0663s
    20dB 0.6138s 0.6703s 11.7910s
    Destroyer noise 0dB 0.8096s 0.8760s 12.4143s
    5dB 0.7330s 0.7863s 12.4028s
    10dB 0.6709s 0.7329s 12.0540s
    15dB 0.6263s 0.6908s 11.9282s
    20dB 0.5950s 0.6550 11.8206s
    White noise 0dB 0.9592s 1.0401s 11.9704s
    5dB 0.8137s 0.8761s 12.5119s
    10dB 0.7049s 0.7656s 12.8533s
    15dB 0.6503s 0.7136s 12.3004s
    20dB 0.6193s 0.6818s 11.9545s
     | Show Table
    DownLoad: CSV

    Table 3.  PESQ and STOI performance for different SNR with babble noise and $ L = 256 $

    SNR Methods PESQ STOI
    0 dB Optimized $ \lambda=0.9549 $ 2.0328 0.7147
    Fixed value $ \lambda=0.8 $ 2.0073 0.7032
    Fixed value $ \lambda=0.9 $ 2.0241 0.7103
    Unprocessed $ -- $ 1.8938 0.7145
    5 dB Optimized $ \lambda=0.9449 $ 2.4100 0.8200
    Fixed value $ \lambda=0.8 $ 2.3896 0.8107
    Fixed value $ \lambda=0.9 $ 2.3996 0.8170
    Unprocessed $ -- $ 2.2203 0.8130
    10 dB Optimized $ \lambda=0.9549 $ 2.7702 0.8999
    Fixed value $ \lambda=0.8 $ 2.7522 0.8918
    Fixed value $ \lambda=0.9 $ 2.7639 0.8974
    Unprocessed $ -- $ 2.5434 0.8899
    15dB Optimized $ \lambda=0.9525 $ 3.1247 0.9504
    Fixed value $ \lambda=0.8 $ 3.0937 0.9455
    Fixed value $ \lambda=0.9 $ 3.1144 0.9489
    Unprocessed $ -- $ 2.8556 0.9423
    20dB Optimized $ \lambda=0.9549 $ 3.4425 0.9767
    Fixed value $ \lambda=0.8 $ 3.3898 0.9731
    Fixed value $ \lambda=0.9 $ 3.4317 0.9757
    Unprocessed $ -- $ 3.1674 0.9734
     | Show Table
    DownLoad: CSV

    Table 4.  PESQ and STOI performance for different SNR with destroyer noise and $ L = 256 $

    SNR Method s PESQ STOI
    0 dB Optimized $ \lambda=0.8949 $ 2.1629 0.7532
    Fixed value $ \lambda=0.8 $ 2.1543 0.7448
    Fixed value $ \lambda=0.9 $ 2.1456 0.7497
    Unprocessed $ -- $ 1.9271 0.7524
    5 dB Optimized $ \lambda=0.8951 $ 2.5370 0.8337
    Fixed value $ \lambda=0.8 $ 2.5186 0.8267
    Fixed value $ \lambda=0.9 $ 2.5283 0.8325
    Unprocessed $ -- $ 2.2955 0.8281
    10 dB Optimized $ \lambda=0.8749 $ 2.8704 0.9001
    Fixed value $ \lambda=0.8 $ 2.8543 0.8933
    Fixed value $ \lambda=0.9 $ 2.8677 0.8985
    Unprocessed $ -- $ 2.6132 0.8902
    15dB Optimized $ \lambda=0.8949 $ 3.1914 0.9468
    Fixed value $ \lambda=0.8 $ 3.1611 0.9412
    Fixed value $ \lambda=0.9 $ 3.1876 0.9455
    Unprocessed $ -- $ 2.9256 0.9382
    20dB Optimized $ \lambda=0.9451 $ 3.4868 0.9737
    Fixed value $ \lambda=0.8 $ 3.4427 0.9696
    Fixed value $ \lambda=0.9 $ 3.4722 0.9726
    Unprocessed $ -- $ 3.2468 0.9697
     | Show Table
    DownLoad: CSV

    Table 5.  PESQ and STOI performance for different SNR with white noise and $ L = 256 $

    SNR Methods PESQ STOI
    0 dB Optimized $ \lambda=0.9331 $ 2.0119 0.7661
    Fixed value $ \lambda=0.8 $ 1.9895 0.7519
    Fixed value $ \lambda=0.9 $ 2.0042 0.7619
    Unprocessed $ -- $ 1.6665 0.7377
    5 dB Optimized $ \lambda=0.9451 $ 2.3972 0.8615
    Fixed value $ \lambda=0.8 $ 2.3716 0.8492
    Fixed value $ \lambda=0.9 $ 2.3913 0.8580
    Unprocessed $ -- $ 1.9615 0.8387
    10 dB Optimized $ \lambda=0.9451 $ 2.8102 0.9275
    Fixed value $ \lambda=0.8 $ 2.7735 0.9183
    Fixed value $ \lambda=0.9 $ 2.7976 0.9246
    Unprocessed $ -- $ 2.2989 0.9146
    15dB Optimized $ \lambda=0.9349 $ 3.1973 0.9652
    Fixed value $ \lambda=0.8 $ 3.1472 0.9594
    Fixed value $ \lambda=0.9 $ 3.1844 0.9636
    Unprocessed $ -- $ 2.6442 0.9613
    20dB Optimized $ \lambda=0.9501 $ 3.5007 0.9858
    Fixed value $ \lambda=0.8 $ 3.4286 0.9797
    Fixed value $ \lambda=0.9 $ 3.4796 0.9826
    Unprocessed $ -- $ 2.9839 0.9845
     | Show Table
    DownLoad: CSV

    Table 6.  PESQ and STOI performance for different SNR with babble noise and $ L = 512 $

    SNR Methods PESQ STOI
    0 dB Optimized $ \lambda=0.9601 $ 2.0699 0.7234
    Fixed value $ \lambda=0.8 $ 2.0525 0.7129
    Fixed value $ \lambda=0.9 $ 2.0634 0.7212
    Unprocessed $ -- $ 1.8938 0.7145
    5 dB Optimized $ \lambda= 0.9601 $ 2.4185 0.8282
    Fixed value $ \lambda=0.8 $ 2.4084 0.8195
    Fixed value $ \lambda=0.9 $ 2.4150 0.8258
    Unprocessed $ -- $ 2.2203 0.8130
    10 dB Optimized $ \lambda=0.9079 $ 2.7672 0.9064
    Fixed value $ \lambda=0.8 $ 2.7529 0.8996
    Fixed value $ \lambda=0.9 $ 2.7586 0.9045
    Unprocessed $ -- $ 2.5434 0.8899
    15dB Optimized $ \lambda=0.9077 $ 3.1187 0.9540
    Fixed value $ \lambda=0.8 $ 3.0736 0.9507
    Fixed value $ \lambda=0.9 $ 3.0790 0.9530
    Unprocessed $ -- $ 2.8556 0.9423
    20dB Optimized $ \lambda=0.9601 $ 3.3898 0.9785
    Fixed value $ \lambda=0.8 $ 3.3703 0.9760
    Fixed value $ \lambda=0.9 $ 3.3822 0.9775
    Unprocessed $ -- $ 3.1674 0.9734
     | Show Table
    DownLoad: CSV

    Table 7.  PESQ and STOI performance for different SNR with destroyer noise and 512 subbands

    SNR Method s PESQ STOI
    0 dB Optimized $ \lambda=0.7601 $ 2.2328 0.7629
    Fixed value $ \lambda=0.8 $ 2.2256 0.7602
    Fixed value $ \lambda=0.9 $ 2.2078 0.7622
    Unprocessed $ -- $ 1.9271 0.7524
    5 dB Optimized $ \lambda=0.7700 $ 2.5651 0.8441
    Fixed value $ \lambda=0.8 $ 2.5589 0.8414
    Fixed value $ \lambda=0.9 $ 2.5569 0.8436
    Unprocessed $ -- $ 2.2955 0.8281
    10 dB Optimized $ \lambda=0.7700 $ 2.8773 0.9084
    Fixed value $ \lambda=0.8 $ 2.8699 0.9056
    Fixed value $ \lambda=0.9 $ 2.8742 0.9081
    Unprocessed $ -- $ 2.6132 0.8902
    15dB Optimized $ \lambda=0.8301 $ 3.1775 0.9530
    Fixed value $ \lambda=0.8 $ 3.1710 0.9509
    Fixed value $ \lambda=0.9 $ 3.1732 0.9529
    Unprocessed $ -- $ 2.9256 0.9382
    20dB Optimized $ \lambda=0.9270 $ 3.4819 0.9768
    Fixed value $ \lambda=0.8 $ 3.4375 0.9750
    Fixed value $ \lambda=0.9 $ 3.4469 0.9766
    Unprocessed $ -- $ 3.2468 0.9697
     | Show Table
    DownLoad: CSV

    Table 8.  PESQ and STOI performance for different SNR with white noise and 512 subbands

    SNR Methods PESQ STOI
    0 dB Optimized $ \lambda=0.8599 $ 2.0454 0.7829
    Fixed value $ \lambda=0.8 $ 2.0395 0.7721
    Fixed value $ \lambda=0.9 $ 2.0403 0.7775
    Unprocessed $ -- $ 1.6665 0.7377
    5 dB Optimized $ \lambda=0.8801 $ 2.4148 0.8735
    Fixed value $ \lambda=0.8 $ 2.4129 0.8638
    Fixed value $ \lambda=0.9 $ 2.4101 0.8695
    Unprocessed $ -- $ 1.9615 0.8387
    10 dB Optimized $ \lambda=0.9496 $ 2.8009 0.9338
    Fixed value $ \lambda=0.8 $ 2.7937 0.9261
    Fixed value $ \lambda=0.9 $ 2.7948 0.9308
    Unprocessed $ -- $ 2.2989 0.9146
    15dB Optimized $ \lambda=0.9550 $ 3.1967 0.9673
    Fixed value $ \lambda=0.8 $ 3.1421 0.9623
    Fixed value $ \lambda=0.9 $ 3.1514 0.9652
    Unprocessed $ -- $ 2.6442 0.9613
    20dB Optimized $ \lambda=0.9601 $ 3.4393 0.9864
    Fixed value $ \lambda=0.8 $ 3.3930 0.9807
    Fixed value $ \lambda=0.9 $ 3.4175 0.9823
    Unprocessed $ -- $ 2.9839 0.9845
     | Show Table
    DownLoad: CSV
  • [1] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problem, SIAM Journal on Imaging Sciences, 2 (2009), 183-202.  doi: 10.1137/080716542.
    [2] J. Benesty and Y. Huang, A Perspective on Single-Channel Frequency-Domain Speech Enhancement, San Rafael: Morgan and Claypool Publishers, 2010. doi: 10.2200/S00344ED1V01Y201104SAP008.
    [3] S. F. Boll, Supression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-27 (1979), 113-120. 
    [4] O. BurdakovY. Dai and N. Huang, Stabilized Barzilai-Borwein method, J. Comp. Math., 37 (2019), 916-936.  doi: 10.4208/jcm.1911-m2019-0171.
    [5] E. J. CandésJ. Romberg and T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Transactions on Information Theory, 52 (2006), 489-509.  doi: 10.1109/TIT.2005.862083.
    [6] E. J. Candes and T. Tao, Near-optimal signal recovery from random projections: universal encoding strategies, IEEE Transactions on Information Theory, 52 (2006), 5406-5425.  doi: 10.1109/TIT.2006.885507.
    [7] E. J. Candes and M. B. Wakin, An introduction to compressive sampling, IEEE Signal Processing Magazine, (2008), 21-30.
    [8] H. H. Dam and A. Cantoni, Interior point method for optimum zero-forcing beamforming with per-antenna power constraints and optimal step size, Signal Processing, 106 (2015), 10-14.  doi: 10.1016/j.sigpro.2014.06.028.
    [9] H. H. Dam and S. Nordholm, Accelerated gradient with optimal step size for second-order blind signal separation, Multidimens. Syst. Signal Process., 29 (2018), 903-919.  doi: 10.1007/s11045-017-0478-8.
    [10] T. Esch and P. Vary, Efficient musical noise suppression for speech enhancement system, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, (2009), 4409-4412. doi: 10.1109/ICASSP.2009.4960607.
    [11] P. K. GhoshA. Tsiartas and S. Narayanan, Robust voice activity detection using long-term signal variability, IEEE Transactions on Audio, Speech and Language Processing, 19 (2011), 600-613.  doi: 10.1109/TASL.2010.2052803.
    [12] S. J. KimK. KohM. LustigS. Boyd and D. Gorinevsky, An interior-point method for large-scale $l_1$-regularized least squares, IEEE Journal of Selected Topics in Signal Processing, 1 (2007), 606-617. 
    [13] H. Li, C. Fang and Z. Lin, Accelerated first-order optimization algorithms for machine learning, Proceedings of the IEEE, (2020), 1-16.
    [14] P. C. LoizouSpeech Enhancement: Theory and Practice, CRC press, Boca Raton, 2013.  doi: 10.1201/9781420015836.
    [15] S. Y. Low, Compressive speech enhancement in the modulation domain, Speech Communication, 102 (2018), 87-99.  doi: 10.1016/j.specom.2018.08.003.
    [16] S. Y. LowD. S. Pham and S. Venkatesh, Compressive speech enhancement, Speech Communication, 55 (2013), 757-768.  doi: 10.1016/j.specom.2013.03.003.
    [17] R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Transactions on Speech and Audio Processing, 9 (2001), 504-512.  doi: 10.1109/89.928915.
    [18] R. Miyazaki, H. Saruwatari, T. Inoue, K. Shikano and K. Kondo, Musical-noise-free speech enhancement: Theory and evaluation, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2012), 4565-4568. doi: 10.1109/ICASSP.2012.6288934.
    [19] M. Nazih, K. Minaoui and P. Comon, Using the proximal gradient and the accelerated proximal gradient as a canonical polyadic tensor decomposition algorithms in difficult situations, Signal Processing, 171 (2020), 107472. doi: 10.1016/j.sigpro.2020.107472.
    [20] N. Parikh and S. Boyd, Proximal Algorithms, Foundation and Trends in Optimization, 1 (2013), 123-231. 
    [21] A. W. RixJ. G. BeerendsM. P. Hollier and A. P. Hekstra, Perceptual evaluation of speech quality (PESQ) - a new method for speech quality assessment of telephone networks and codecs, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2 (2001), 749-752.  doi: 10.1109/ICASSP.2001.941023.
    [22] M. Schmidt, Least squares optimization with l1-norm regularization, Technical Report CSP542B, 2005.
    [23] Y. ShiS. Y. Low and K. F. C. Yiu, Hyper-parameterization of sparse reconstruction for speech enhancement, Applied Acoustics, 138 (2018), 72-79.  doi: 10.1016/j.apacoust.2018.03.020.
    [24] C. H. Taal, R. C. Hendriks, R. Heusdens and J. Jensen, A short-time objective intelligibility measure for time-frequency weighted noisy speech, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, (2010), 4214-4217. doi: 10.1109/ICASSP.2010.5495701.
    [25] R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B, 58 (1996), 267-288.  doi: 10.1111/j.2517-6161.1996.tb02080.x.
    [26] M. Torcoli, An improved measure of musical noise based on spectral kurtosis, 019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), (2019), 90-94. doi: 10.1109/WASPAA.2019.8937195.
    [27] D. Wu, W. Zhu and M. N. S. Swamy, A compressive sensing method for noise reduction of speech and audio signals, 2011 IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS), (2011), 1-4. doi: 10.1109/MWSCAS.2011.6026662.
    [28] Z. ZhangY. XuJ. YangX. Li and D. Zhang, A Survey of Sparse Representation: Algorithms and Applications, IEEE Access, 3 (2015), 490-530. 
  • 加载中

Figures(3)

Tables(8)

SHARE

Article Metrics

HTML views(718) PDF downloads(460) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return