A Framework for Music-Speech Segregation using Music Fingerprinting and Acoustic Echo Cancellation Principle
Abstract
Background interference creates voice intelligibility issue for listener. This research work considers background music as interference for communication through smart phone in areas with loud background music. This paper proposes a novel framework for background music segregation from human speech using music fingerprinting and acoustic echo cancellation. Initially, background music is searched in the database by music fingerprinting. Identified background music is registered and segregated using acoustic echo cancellation. Proposed approach generates better quality music speech segregation than existing algorithms. The research work is novel and segregates background music completely in comparison to existing approaches where single instruments are segregated successfully.References
Y. Fukayama, D. Tanaka and T. Kataoka, “Separation of individual
instrument sounds in monaural music signals by applying statistical
least-squares criterionâ€, International journal of Innovative
Computing, Information and Control (IJICIC), vol. 8, March 2275-
, 2012.
G. Hu and D.L. Wang, “Segregation of unvoiced speech from nonspeech interference,†J. Acoust. Soc. Am., vol. 124, pp. 1306–1319,
J.R. Hershey, S.J. Rennie, P.A. Olsen and T.T. Kristjansson, “Superhuman multi-talker speech recognition: a graphical model approachâ€,
Comput. Speech Lang., vol. 24, pp. 45–66, 2010.
A. Reddy and B. Raj, “Soft mask methods for single-channel speaker
separation,†IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no.
, pp. 1766–1776, 2007.
P. Smaragdis, “Convolutive speech bases and their application to
supervised speech separation,†IEEE Trans. Audio, Speech, Lang.
Process., vol. 15, no. 1, pp. 1–12, 2007.
Hu, Ke, and D. Wang. "SVM-based separation of unvoiced-voiced
speech in cochannel conditions." Acoustics, Speech and Signal
Processing (ICASSP), 2012 IEEE International Conference on.
IEEE, Kyoto, Japan, pp. 4545-4548, 2012.
K. Hu and D.L. Wang, Unvoiced speech segregation from nonspeech interference via CASA and spectral subtractionâ€, IEEE Trans.
Audio, Speech, Lang. Process., vol. 19, pp. 1600– 1609, 2011.
G. Hu and D.L. Wang, “A tandem algorithm for pitch estimation and
voiced speech segregation,†IEEE Trans. Audio, Speech, Lang.
Process., vol. 18, pp. 2067–2079, 2010.
Z. Jin and D. L. Wang, “A supervised learning approach to monaural
segregation of reverberant speech,†IEEE Trans. Audio, Speech,
Lang. Process., vol. 17, pp. 625–638, 2009.
K. Han and D. L. Wang, “An SVM based classification approach to
speech separationâ€, ICASSP, 2011, pp. 4632–4635.
Y. Li and D.L. Wang, “Separation of singing voice from music
accompaniment for monaural recordingsâ€, IEEE Trans. Audio,
Speech, Lang. Process., vol. 15, no. 4, pp. 1475–1487, May 2007.
A. Ozerov, P. Philippe, F. Bimbot, and R. Gribonval, “Adaptation of
Bayesian models for single channel source separation and its
application to voice/music separation in popular songsâ€, IEEE Trans.
Audio, Speech, Lang. Process., vol. 15, no. 5, pp. 1564–1578, Jul.
B. Raj, P. Smaragdis, M. V. Shashanka, and R. Singh, “Separating a
foreground singer from background musicâ€, Proc. Int. Symp.
Frontiers Res. Speech Music (FRSM), Mysore, India, 2007.
S. Vembu and S. Baumann, “Separation of vocals from polyphonic
audio recordingsâ€, Proc. ISMIR, pp. 337–344, 2005.
T. Virtanen, A. Mesaros, and M. Ryynänen, “Combining pitch-based
inference and non-negative spectrogram factorization in separating
vocals from polyphonic musicâ€, Proc. SAPA, Brisbane, Australia,
, pp. 17-22. 2008.
A. Bregman, “Auditory Scene Analysisâ€, Cambridge, MA: MIT
Press, 1990.
G.J. Brown and M. Cooke, “Computational auditory scene analysisâ€,
Comput. Speech Lang., vol. 8, pp. 297–336, 1994.
G. Hu and D.L. Wang, “Monaural speech segregation based on pitch
tracking and amplitude modulation,†IEEE Trans. Neural Network,
vol. 15, no. 5, pp. 1135–1150, Sept., 2004.
Z. Jin and D.L. Wang, “A supervised learning approach to monaural
segregation of reverberant speech,†IEEE Trans. Audio, Speech,
Lang. Process., vol. 17, no. 4, pp. 625–638, May 2009.
P.Li, Y. Guan, B.Xu, and W. Liu, “Monaural speech separation
based on computational auditory scene analysis and objective quality
assessment of speech,†IEEE Trans. Audio, Speech, Lang. Process.,
vol. 14, no. 6, pp. 2014–2023, Nov. 2006.
Loizou, Philipos C. “Speech enhancement: theory and practiceâ€.
CRC press, 2013. ISBN 9781466504219
M.H. Radfar and R.M. Dansereau, “Single-channel speech separation
using soft masking filtering,†IEEE Trans. Audio, Speech, Lang.
Process., vol. 15, no. 8, pp. 2299–2310, Nov. 2007.
D. L.Wang and G. J. Brown, “Computational Auditory Scene
Analysis: Principles, Algorithms and Applicationsâ€, Eds. Hoboken,
NJ: Wiley-IEEE Press, 2006.
D.L. Wang, “On ideal binary mask as the computational goal of
auditory scene analysis,†Speech Separation by Humans and
Machines, P. Divenyi, Ed. Norwell, MA: Kluwer, 2005, pp. 181–
D.S. Brungart, P.S. Chang, B.D. Simpson and D.L. Wang, “Isolating
the energetic component of speech-on-speech masking with ideal
time–frequency segregation,†J. Acoust. Soc. Amer., vol. 120, pp.
–4018, 2006.
N. Li and P. C. Loizou, “Factors influencing intelligibility of ideal
binary- masked speech: Implications for noise reduction,†J. Acoust.
Soc. Amer., vol. 123, pp. 1673–1682, 2008.
D.L. Wang, U. Kjems, M. S. Pedersen, J. B. Boldt, and T. Lunner,
“Speech intelligibility in background noise with ideal binary time–
frequency masking,†J. Acoust. Soc. Amer., vol. 125, pp. 2336–2347,
K. Hu and D. L. Wang, “Incorporating spectral subtraction and noise
type for unvoiced speech segregationâ€, Proc. IEEE ICASSP, 2009,
pp. 4425–4428.
Hu, Ke, and D. Wang. "Incorporating spectral subtraction and noise
type for unvoiced speech segregation." Acoustics, Speech and Signal
Processing, (ICASSP), pp. 4425-4428, April 2009.
G. Hu, “Monaural speech organization and segregationâ€, Ph.D.
dissertation, Ph.D. dissertation, Biophysics Program, The Ohio State
University, 2006.
Daniel P.W. Ellis, B. Whitman, T. Jehan, and P. Lamere, “The Echo
Nest musical fingerprintâ€, Proceedings of the International
Symposium on Music Information Retrieval, Aug, 2010.
J. Haitsma and T. Kalker. A highly robust audio fingerprinting
system with an efficient search strategy. Journal of New Music
Research, 32, no. 2, pp. 211–221, 2003.
T. Jehan, “Creating music by listeningâ€, PhD thesis, Massachusetts
Institute of Technology, 2005.
A. Wang, “An industrial strength audio search algorithmâ€
International Conference on Music Information Retrieval (ISMIR)
Baltimore, Oct. 26–30, 2003.
Ellis, Daniel PW, B. Whitman, and A. Porter, "Echoprint: An open
music identification service.†ISMIR 2011 Miami: 12th International
Society for Music Information Retrieval Conference, October 24-28.
International Society for Music Information Retrieval, 2011.