A Framework for Music-Speech Segregation using Music Fingerprinting and Acoustic Echo Cancellation Principle

F. Hussain; H. A. Habib; M. J. Khan

doi:10.71330/thenucleus.2015.663

Authors

F. Hussain Department of Computer Engineering, University of Engineering and Technology Taxila, Taxila, Pakistan
H. A. Habib Department of Computer Engineering, University of Engineering and Technology Taxila, Taxila, Pakistan
M. J. Khan Department of Computer Engineering, University of Engineering and Technology Taxila, Taxila, Pakistan

DOI:

https://doi.org/10.71330/thenucleus.2015.663

Abstract

Background interference creates voice intelligibility issue for listener. This research work considers background music as interference for communication through smart phone in areas with loud background music. This paper proposes a novel framework for background music segregation from human speech using music fingerprinting and acoustic echo cancellation. Initially, background music is searched in the database by music fingerprinting. Identified background music is registered and segregated using acoustic echo cancellation. Proposed approach generates better quality music speech segregation than existing algorithms. The research work is novel and segregates background music completely in comparison to existing approaches where single instruments are segregated successfully.

References

Y. Fukayama, D. Tanaka and T. Kataoka, â€œSeparation of individual

instrument sounds in monaural music signals by applying statistical

least-squares criterionâ€, International journal of Innovative

Computing, Information and Control (IJICIC), vol. 8, March 2275-

, 2012.

G. Hu and D.L. Wang, â€œSegregation of unvoiced speech from nonspeech interference,â€ J. Acoust. Soc. Am., vol. 124, pp. 1306â€“1319,

J.R. Hershey, S.J. Rennie, P.A. Olsen and T.T. Kristjansson, â€œSuperhuman multi-talker speech recognition: a graphical model approachâ€,

Comput. Speech Lang., vol. 24, pp. 45â€“66, 2010.

A. Reddy and B. Raj, â€œSoft mask methods for single-channel speaker

separation,â€ IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no.

, pp. 1766â€“1776, 2007.

P. Smaragdis, â€œConvolutive speech bases and their application to

supervised speech separation,â€ IEEE Trans. Audio, Speech, Lang.

Process., vol. 15, no. 1, pp. 1â€“12, 2007.

Hu, Ke, and D. Wang. "SVM-based separation of unvoiced-voiced

speech in cochannel conditions." Acoustics, Speech and Signal

Processing (ICASSP), 2012 IEEE International Conference on.

IEEE, Kyoto, Japan, pp. 4545-4548, 2012.

K. Hu and D.L. Wang, Unvoiced speech segregation from nonspeech interference via CASA and spectral subtractionâ€, IEEE Trans.

Audio, Speech, Lang. Process., vol. 19, pp. 1600â€“ 1609, 2011.

G. Hu and D.L. Wang, â€œA tandem algorithm for pitch estimation and

voiced speech segregation,â€ IEEE Trans. Audio, Speech, Lang.

Process., vol. 18, pp. 2067â€“2079, 2010.

Z. Jin and D. L. Wang, â€œA supervised learning approach to monaural

segregation of reverberant speech,â€ IEEE Trans. Audio, Speech,

Lang. Process., vol. 17, pp. 625â€“638, 2009.

K. Han and D. L. Wang, â€œAn SVM based classification approach to

speech separationâ€, ICASSP, 2011, pp. 4632â€“4635.

Y. Li and D.L. Wang, â€œSeparation of singing voice from music

accompaniment for monaural recordingsâ€, IEEE Trans. Audio,

Speech, Lang. Process., vol. 15, no. 4, pp. 1475â€“1487, May 2007.

A. Ozerov, P. Philippe, F. Bimbot, and R. Gribonval, â€œAdaptation of

Bayesian models for single channel source separation and its

application to voice/music separation in popular songsâ€, IEEE Trans.

Audio, Speech, Lang. Process., vol. 15, no. 5, pp. 1564â€“1578, Jul.

B. Raj, P. Smaragdis, M. V. Shashanka, and R. Singh, â€œSeparating a

foreground singer from background musicâ€, Proc. Int. Symp.

Frontiers Res. Speech Music (FRSM), Mysore, India, 2007.

S. Vembu and S. Baumann, â€œSeparation of vocals from polyphonic

audio recordingsâ€, Proc. ISMIR, pp. 337â€“344, 2005.

T. Virtanen, A. Mesaros, and M. RyynÃ¤nen, â€œCombining pitch-based

inference and non-negative spectrogram factorization in separating

vocals from polyphonic musicâ€, Proc. SAPA, Brisbane, Australia,

, pp. 17-22. 2008.

A. Bregman, â€œAuditory Scene Analysisâ€, Cambridge, MA: MIT

Press, 1990.

G.J. Brown and M. Cooke, â€œComputational auditory scene analysisâ€,

Comput. Speech Lang., vol. 8, pp. 297â€“336, 1994.

G. Hu and D.L. Wang, â€œMonaural speech segregation based on pitch

tracking and amplitude modulation,â€ IEEE Trans. Neural Network,

vol. 15, no. 5, pp. 1135â€“1150, Sept., 2004.

Z. Jin and D.L. Wang, â€œA supervised learning approach to monaural

segregation of reverberant speech,â€ IEEE Trans. Audio, Speech,

Lang. Process., vol. 17, no. 4, pp. 625â€“638, May 2009.

P.Li, Y. Guan, B.Xu, and W. Liu, â€œMonaural speech separation

based on computational auditory scene analysis and objective quality

assessment of speech,â€ IEEE Trans. Audio, Speech, Lang. Process.,

vol. 14, no. 6, pp. 2014â€“2023, Nov. 2006.

Loizou, Philipos C. â€œSpeech enhancement: theory and practiceâ€.

CRC press, 2013. ISBN 9781466504219

M.H. Radfar and R.M. Dansereau, â€œSingle-channel speech separation

using soft masking filtering,â€ IEEE Trans. Audio, Speech, Lang.

Process., vol. 15, no. 8, pp. 2299â€“2310, Nov. 2007.

D. L.Wang and G. J. Brown, â€œComputational Auditory Scene

Analysis: Principles, Algorithms and Applicationsâ€, Eds. Hoboken,

NJ: Wiley-IEEE Press, 2006.

D.L. Wang, â€œOn ideal binary mask as the computational goal of

auditory scene analysis,â€ Speech Separation by Humans and

Machines, P. Divenyi, Ed. Norwell, MA: Kluwer, 2005, pp. 181â€“

D.S. Brungart, P.S. Chang, B.D. Simpson and D.L. Wang, â€œIsolating

the energetic component of speech-on-speech masking with ideal

timeâ€“frequency segregation,â€ J. Acoust. Soc. Amer., vol. 120, pp.

â€“4018, 2006.

N. Li and P. C. Loizou, â€œFactors influencing intelligibility of ideal

binary- masked speech: Implications for noise reduction,â€ J. Acoust.

Soc. Amer., vol. 123, pp. 1673â€“1682, 2008.

D.L. Wang, U. Kjems, M. S. Pedersen, J. B. Boldt, and T. Lunner,

â€œSpeech intelligibility in background noise with ideal binary timeâ€“

frequency masking,â€ J. Acoust. Soc. Amer., vol. 125, pp. 2336â€“2347,

K. Hu and D. L. Wang, â€œIncorporating spectral subtraction and noise

type for unvoiced speech segregationâ€, Proc. IEEE ICASSP, 2009,

pp. 4425â€“4428.

Hu, Ke, and D. Wang. "Incorporating spectral subtraction and noise

type for unvoiced speech segregation." Acoustics, Speech and Signal

Processing, (ICASSP), pp. 4425-4428, April 2009.

G. Hu, â€œMonaural speech organization and segregationâ€, Ph.D.

dissertation, Ph.D. dissertation, Biophysics Program, The Ohio State

University, 2006.

http://echoprint.me/

Daniel P.W. Ellis, B. Whitman, T. Jehan, and P. Lamere, â€œThe Echo

Nest musical fingerprintâ€, Proceedings of the International

Symposium on Music Information Retrieval, Aug, 2010.

J. Haitsma and T. Kalker. A highly robust audio fingerprinting

system with an efficient search strategy. Journal of New Music

Research, 32, no. 2, pp. 211â€“221, 2003.

T. Jehan, â€œCreating music by listeningâ€, PhD thesis, Massachusetts

Institute of Technology, 2005.

A. Wang, â€œAn industrial strength audio search algorithmâ€

International Conference on Music Information Retrieval (ISMIR)

Baltimore, Oct. 26â€“30, 2003.

Ellis, Daniel PW, B. Whitman, and A. Porter, "Echoprint: An open

music identification service.â€ ISMIR 2011 Miami: 12th International

Society for Music Information Retrieval Conference, October 24-28.

International Society for Music Information Retrieval, 2011.

A Framework for Music-Speech Segregation using Music Fingerprinting and Acoustic Echo Cancellation Principle

Authors

DOI:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Recognition and Indexation