A Framework for Music-Speech Segregation using Music Fingerprinting and Acoustic Echo Cancellation Principle

Authors

  • F. Hussain Department of Computer Engineering, University of Engineering and Technology Taxila, Taxila, Pakistan
  • H. A. Habib Department of Computer Engineering, University of Engineering and Technology Taxila, Taxila, Pakistan
  • M. J. Khan Department of Computer Engineering, University of Engineering and Technology Taxila, Taxila, Pakistan

Abstract

Background interference creates voice intelligibility issue for listener. This research work considers background music as interference for communication through smart phone in areas with loud background music. This paper proposes a novel framework for background music segregation from human speech using music fingerprinting and acoustic echo cancellation. Initially, background music is searched in the database by music fingerprinting. Identified background music is registered and segregated using acoustic echo cancellation. Proposed approach generates better quality music speech segregation than existing algorithms. The research work is novel and segregates background music completely in comparison to existing approaches where single instruments are segregated successfully.

References

Y. Fukayama, D. Tanaka and T. Kataoka, “Separation of individual

instrument sounds in monaural music signals by applying statistical

least-squares criterionâ€, International journal of Innovative

Computing, Information and Control (IJICIC), vol. 8, March 2275-

, 2012.

G. Hu and D.L. Wang, “Segregation of unvoiced speech from nonspeech interference,†J. Acoust. Soc. Am., vol. 124, pp. 1306–1319,

J.R. Hershey, S.J. Rennie, P.A. Olsen and T.T. Kristjansson, “Superhuman multi-talker speech recognition: a graphical model approachâ€,

Comput. Speech Lang., vol. 24, pp. 45–66, 2010.

A. Reddy and B. Raj, “Soft mask methods for single-channel speaker

separation,†IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no.

, pp. 1766–1776, 2007.

P. Smaragdis, “Convolutive speech bases and their application to

supervised speech separation,†IEEE Trans. Audio, Speech, Lang.

Process., vol. 15, no. 1, pp. 1–12, 2007.

Hu, Ke, and D. Wang. "SVM-based separation of unvoiced-voiced

speech in cochannel conditions." Acoustics, Speech and Signal

Processing (ICASSP), 2012 IEEE International Conference on.

IEEE, Kyoto, Japan, pp. 4545-4548, 2012.

K. Hu and D.L. Wang, Unvoiced speech segregation from nonspeech interference via CASA and spectral subtractionâ€, IEEE Trans.

Audio, Speech, Lang. Process., vol. 19, pp. 1600– 1609, 2011.

G. Hu and D.L. Wang, “A tandem algorithm for pitch estimation and

voiced speech segregation,†IEEE Trans. Audio, Speech, Lang.

Process., vol. 18, pp. 2067–2079, 2010.

Z. Jin and D. L. Wang, “A supervised learning approach to monaural

segregation of reverberant speech,†IEEE Trans. Audio, Speech,

Lang. Process., vol. 17, pp. 625–638, 2009.

K. Han and D. L. Wang, “An SVM based classification approach to

speech separationâ€, ICASSP, 2011, pp. 4632–4635.

Y. Li and D.L. Wang, “Separation of singing voice from music

accompaniment for monaural recordingsâ€, IEEE Trans. Audio,

Speech, Lang. Process., vol. 15, no. 4, pp. 1475–1487, May 2007.

A. Ozerov, P. Philippe, F. Bimbot, and R. Gribonval, “Adaptation of

Bayesian models for single channel source separation and its

application to voice/music separation in popular songsâ€, IEEE Trans.

Audio, Speech, Lang. Process., vol. 15, no. 5, pp. 1564–1578, Jul.

B. Raj, P. Smaragdis, M. V. Shashanka, and R. Singh, “Separating a

foreground singer from background musicâ€, Proc. Int. Symp.

Frontiers Res. Speech Music (FRSM), Mysore, India, 2007.

S. Vembu and S. Baumann, “Separation of vocals from polyphonic

audio recordingsâ€, Proc. ISMIR, pp. 337–344, 2005.

T. Virtanen, A. Mesaros, and M. Ryynänen, “Combining pitch-based

inference and non-negative spectrogram factorization in separating

vocals from polyphonic musicâ€, Proc. SAPA, Brisbane, Australia,

, pp. 17-22. 2008.

A. Bregman, “Auditory Scene Analysisâ€, Cambridge, MA: MIT

Press, 1990.

G.J. Brown and M. Cooke, “Computational auditory scene analysisâ€,

Comput. Speech Lang., vol. 8, pp. 297–336, 1994.

G. Hu and D.L. Wang, “Monaural speech segregation based on pitch

tracking and amplitude modulation,†IEEE Trans. Neural Network,

vol. 15, no. 5, pp. 1135–1150, Sept., 2004.

Z. Jin and D.L. Wang, “A supervised learning approach to monaural

segregation of reverberant speech,†IEEE Trans. Audio, Speech,

Lang. Process., vol. 17, no. 4, pp. 625–638, May 2009.

P.Li, Y. Guan, B.Xu, and W. Liu, “Monaural speech separation

based on computational auditory scene analysis and objective quality

assessment of speech,†IEEE Trans. Audio, Speech, Lang. Process.,

vol. 14, no. 6, pp. 2014–2023, Nov. 2006.

Loizou, Philipos C. “Speech enhancement: theory and practiceâ€.

CRC press, 2013. ISBN 9781466504219

M.H. Radfar and R.M. Dansereau, “Single-channel speech separation

using soft masking filtering,†IEEE Trans. Audio, Speech, Lang.

Process., vol. 15, no. 8, pp. 2299–2310, Nov. 2007.

D. L.Wang and G. J. Brown, “Computational Auditory Scene

Analysis: Principles, Algorithms and Applicationsâ€, Eds. Hoboken,

NJ: Wiley-IEEE Press, 2006.

D.L. Wang, “On ideal binary mask as the computational goal of

auditory scene analysis,†Speech Separation by Humans and

Machines, P. Divenyi, Ed. Norwell, MA: Kluwer, 2005, pp. 181–

D.S. Brungart, P.S. Chang, B.D. Simpson and D.L. Wang, “Isolating

the energetic component of speech-on-speech masking with ideal

time–frequency segregation,†J. Acoust. Soc. Amer., vol. 120, pp.

–4018, 2006.

N. Li and P. C. Loizou, “Factors influencing intelligibility of ideal

binary- masked speech: Implications for noise reduction,†J. Acoust.

Soc. Amer., vol. 123, pp. 1673–1682, 2008.

D.L. Wang, U. Kjems, M. S. Pedersen, J. B. Boldt, and T. Lunner,

“Speech intelligibility in background noise with ideal binary time–

frequency masking,†J. Acoust. Soc. Amer., vol. 125, pp. 2336–2347,

K. Hu and D. L. Wang, “Incorporating spectral subtraction and noise

type for unvoiced speech segregationâ€, Proc. IEEE ICASSP, 2009,

pp. 4425–4428.

Hu, Ke, and D. Wang. "Incorporating spectral subtraction and noise

type for unvoiced speech segregation." Acoustics, Speech and Signal

Processing, (ICASSP), pp. 4425-4428, April 2009.

G. Hu, “Monaural speech organization and segregationâ€, Ph.D.

dissertation, Ph.D. dissertation, Biophysics Program, The Ohio State

University, 2006.

http://echoprint.me/

Daniel P.W. Ellis, B. Whitman, T. Jehan, and P. Lamere, “The Echo

Nest musical fingerprintâ€, Proceedings of the International

Symposium on Music Information Retrieval, Aug, 2010.

J. Haitsma and T. Kalker. A highly robust audio fingerprinting

system with an efficient search strategy. Journal of New Music

Research, 32, no. 2, pp. 211–221, 2003.

T. Jehan, “Creating music by listeningâ€, PhD thesis, Massachusetts

Institute of Technology, 2005.

A. Wang, “An industrial strength audio search algorithmâ€

International Conference on Music Information Retrieval (ISMIR)

Baltimore, Oct. 26–30, 2003.

Ellis, Daniel PW, B. Whitman, and A. Porter, "Echoprint: An open

music identification service.†ISMIR 2011 Miami: 12th International

Society for Music Information Retrieval Conference, October 24-28.

International Society for Music Information Retrieval, 2011.

Downloads

Published

25-03-2015

How to Cite

[1]
F. Hussain, H. A. Habib, and M. J. Khan, “A Framework for Music-Speech Segregation using Music Fingerprinting and Acoustic Echo Cancellation Principle”, The Nucleus, vol. 52, no. 1, pp. 29–39, Mar. 2015.

Issue

Section

Articles