- Sparse and Shift-Invariant Feature Extraction from Non-Negative Data.
P Smaragdis, B Raj, MVS Shashanka.
IEEE Intl Cong on Acoustics, Speech and Signal Processing, Las Vegas, Nevada, Apr 2008. [ DOI ] [ pdf ]
Abstract: Expand/Collapse
In this paper we describe a technique that allows the extraction of
multiple local shift-invariant features from analysis of non-negative
data of arbitrary dimensionality. Our approach employs a probabilistic
latent variable model with sparsity constraints. We demonstrate
its utility by performing feature extraction in a variety of domains
ranging from audio to images and video.
- Sparse Overcomplete Latent Variable Decomposition of Counts Data.
MVS Shashanka, B Raj, P Smaragdis.
Neural Information Processing Systems Conference (NIPS), Vancouver, Canada, Dec 2007. [ pdf ] [ supplement ] [ Fig.1 Data ]
Abstract: Expand/Collapse
An important problem in many fields is the analysis of counts data to extract meaningful latent components. Methods like Probabilistic Latent Semantic Analysis
(PLSA) and Latent Dirichlet Allocation (LDA) have been proposed for this purpose. However, they are limited in the number of components they can extract and
lack an explicit provision to control the expressiveness of the extracted components. In this paper, we present a learning formulation to address these limitations by employing the notion of sparsity. We start with the PLSA framework and
use an entropic prior in a maximum a posteriori formulation to enforce sparsity.We show that this allows the extraction of overcomplete sets of latent components which better characterize the data. We present experimental evidence of the utility of such representations.
- Privacy-Preserving Musical Database Matching.
MVS Shashanka, P Smaragdis.
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, Oct 2007. [ DOI ] [ pdf ]
Abstract: Expand/Collapse
In this paper we present an illustratory process which allows privacy-preserving transactions in the context of musical databases. In particular we address the problem of matching a piece of music audio to a service database in such a way such that the database provider will not directly observe the query, nor its result, thereby preserving the privacy of the inquirer. We formulate this process within the field of secure multiparty computation and show how such a transaction can be achieved once we derive secure versions of basic signal processing operations.
- Supervised and Semi-Supervised Separation of Sounds from Single-Channel Mixtures.
P Smaragdis, B Raj, MVS Shashanka.
Intl Conf on Independent Component Analysis, London, UK, Sep 2007. [ DOI ] [ pdf ]
Abstract: Expand/Collapse
In this paper we describe a methodology for model-based
single channel separation of sounds. We present a sparse latent variable
model that can learn sounds based on their distribution of time/frequency
energy. This model can then be used to extract known types of sounds
from mixtures in two scenarios. One being the case where all sound types
in the mixture are known, and the other being being the case where only
the target or the interference models are known. The model we propose
has close ties to non-negative decompositions and latent variable models
commonly used for semantic analysis.
- Sparse Overcomplete Decomposition for Single Channel Speaker Separation.
MVS Shashanka, B Raj, P Smaragdis.
IEEE Intl Conf on Acoustics, Speech and Signal Processing, Honolulu, Hawaii, April 2007. [ DOI ] [ pdf ] [ Examples ]
Abstract: Expand/Collapse
We present an algorithm for separating multiple speakers from a mixed single channel recording. The algorithm is based on a model proposed by Raj and Smaragdis (2005). The idea is to extract certain characteristic spectra-temporal basis functions from training data for individual speakers and decompose the mixed signals as linear combinations of these learned bases. In other words, their model extracts a compact code of basis functions that can explain the space spanned by spectral vectors of a speaker. In our model, we generate a sparse-distributed code where we have more basis functions than the dimensionality of the space. We propose a probabilistic framework to achieve sparsity. Experiments show that the resulting sparse code better captures the structure in data and hence leads to better separation.
- A Framework for Secure Speech Recognition.
P Smaragdis, MVS Shashanka.
IEEE Intl Conf on Acoustics, Speech and Signal Processing, Honolulu, Hawaii, April 2007. [ DOI ] [ pdf ]
Abstract: Expand/Collapse
We present an algorithm that enables privacy-preserving speech recognition transactions between multiple parties. We assume two commonplace scenarios. One being the case where one of two parties has private speech data to be transcribed and the other party has private models for speech recognition. And the other being that of one party having a speech model to be trained using private data of multiple other parties. In both of the above cases data privacy is desired from both the data and the model owners. In this paper we will show how such collaborations can be performed while ensuring no private data leaks using secure multiparty computations. In neither case will any party obtain information on other parties data. The protocols described herein can be used to construct rudimentary speech recognition systems and can be easily extended for arbitrary audio and speech processing.
- Bandwidth Expansion with a Polya Urn Model.
B Raj, R Singh, MVS Shashanka, P Smaragdis.
IEEE Intl Conf on Acoustics, Speech and Signal Processing, Honolulu, Hawaii, April 2007. [ DOI ] [ pdf ] [ Examples ]
Abstract: Expand/Collapse
We present a new statistical technique for the estimation of the high frequency components (4-8 kHz) of speech signals from narrow-band (0-4 kHz) signals. The magnitude spectra of broadband speech are modelled as the outcome of a Polya Urn process, that represents the spectra as the histogram of the outcome of several draws from a mixture multinomial distribution over frequency indices. The multinomial distributions that compose this process are learnt from a corpus of broadband (0-8 kHz) speech. To estimate high-frequency components of narrow-band speech, its spectra are also modelled as the outcome of draws from a mixture-multinomial process that is composed of the learnt multinomials, where the counts of the indices of higher frequencies have been obscured. The obscured high-frequency components are then estimated as the expected number of draws of their indices from the mixture-multinomial. Experiments conducted on bandlimited signals derived from the WSJ corpus show that the proposed procedure is able to accurately estimate the high frequency components of these signals.
- Separating a Foreground Singer from Background Music.
B Raj, P Smaragdis, MVS Shashanka, R Singh.
Intl Symposium on Frontiers of Research on Speech and Music (FRSM), Mysore, India, Jan
2007. [ pdf ][ Examples ]
Abstract: Expand/Collapse
In this paper we present a algorithm for separating singing voices
from background music in popular songs. The algorithm is derived
by modelling the magnitude spectrogram of audio signals as the outcome
of draws from a discrete bi-variate random process that generates
time-frequency pairs. The spectrogram of a song is assumed
to have been obtained through draws from the distributions underlying
the music and the vocals, respectively. The parameters of the
underlying distribuiton are learnt from the observed spectrogram of
the song. The spectrogram of the separated vocals is then derived by
estimating the fraction of draws that were obtained from its distribution.
In the paper we present the algorithm within a framework that
allows personalization of popular songs, by separating out the vocals,
processing them appropriately to one’s own tastes, and remixing
them. Our experiments reveal that we are effectively able to
separate out the vocals in a song and personalize them to our tastes.
- A Probabilistic Latent Variable Model for Acoustic Modeling.
P Smaragdis, B Raj, MVS Shashanka.
Workshop on Advances in Models for Acoustic Processing, NIPS
2006. [ pdf ]
Abstract: Expand/Collapse
In this paper we describe a model developed for the analysis of acoustic spectra. Unlike decompositions techniques that can result in difficult to interpret results this model explicitly models spectra as distributions and extracts sets of additive and semantically useful components that facilitate a variety of applications ranging from source separation, denoising, music transcription and sound recognition. This model is probabilistic in nature and is easily extended to produce sparse codes, and discover transform invariant components which can be optimized for particular applications.
- Secure Sound Classification: Gaussian Mixture Models.*
MVS Shashanka, P Smaragdis.
IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Toulouse, France, May
2006. [ DOI ] [ pdf ]
*
Finalist in the Student Paper Contest.
Abstract: Expand/Collapse
We propose secure protocols for Gaussian mixture-based sound recognition. The protocols we describe allow varying levels of security between two collaborating parties. The case we examine consists of one party (Alice) providing data and other party (Bob) providing a recognition algorithm. We show that it is possible to have Bob apply his algorithm on Alice's data in such a way that the data and the recognition results will not be revealed to Bob thereby guaranteeing Alice's data privacy. Likewise we show that it is possible to organize the collaboration so that a reverse engineering of Bob's recognition algorithm cannot be performed by Alice. We show how Gaussian mixtures can be implemented in a secure manner using secure computation primitives implementing simple numerical operations and we demonstrate the process by showing how it can yield identical results to a non-secure computation while maintaining privacy.
- Latent Dirichlet Decomposition for Single Channel Speaker Separation.
B Raj, MVS Shashanka, P Smaragdis.
IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Toulouse, France, May
2006. [ DOI ] [ pdf ] [ Examples ]
Abstract: Expand/Collapse
We present an algorithm for the separation of multiple speakers from mixed single-channel recordings by latent variable decomposition of the speech spectrogram. We model each magnitude spectral vector in the short-time Fourier transform of a speech signal as the outcome of a discrete random process that generates frequency bin indices. The distribution of the process is modeled as a mixture of multinomial distributions, such that the mixture weights of the component multinomials vary from analysis window to analysis window. The component multinomials are assumed to be speaker specific and are learned from training signals for each speaker. We model the prior distribution of the mixture weights for each speaker as a Dirichlet distribution. The distributions representing magnitude spectral vectors for the mixed signal are decomposed into mixtures of the multinomials for all component speakers. The frequency distribution, i.e the spectrum for each speaker, is reconstructed from this decomposition.
- Optimal Multi-Channel Data Allocation
with Flat Broadcast per Channel.*
AA Bertossi, MC Pinotti, S Ramaprasad,
R Rizzi, MVS Shashanka.
Intl. Parallel and Distributed Processing Symposium, Santa Fe, USA, Apr
2004. [ DOI ]
[ pdf ] [ BibTeX ]
*Authors listed in alphabetical order.
Abstract: Expand/Collapse
Broadcast is an efficient and scalable way of transmitting data to an unlimited number of clients that are listening to a channel. Cyclically broadcasting data over the channel is a basic scheduling technique, which is known as flat scheduling. When multiple channels are available, partitioning data among channels in an unbalanced way, depending on data popularities, is an allocation technique known as skewed allocation. In this paper, the problem of data broadcasting over multiple channels is considered assuming skewed data allocation to channels and fiat data scheduling per channel, with the objective of minimizing the average waiting time of the clients. Several algorithms, based on dynamic programming, are presented which provide optimal solutions for N data items and K channels. Specifically, for data items with uniform lengths, an O(NKlogN) time algorithm is proposed, which improves over the previously known O(N/sup 2/K) time algorithm. When K /spl les/ 4, faster O(N) time algorithms are exhibited. Moreover, for data items with nonuniform lengths, it is shown that the problem is NP-hard when K = 2, and strong NP-hard for arbitrary K. In the former case, a pseudo-polynomial algorithm is discussed, whose time is O(NZ) where Z is the sum of the data lengths.
- A Characterisation of Optimal Channel Assignments for Wireless
Networks Modelled as Cellular and Square Grids.
MVS Shashanka, A Pati, AM Shende.
Intl. Parallel and Distributed Processing Symposium, Nice, France,
Apr 2003. [ DOI ]
[ pdf ] [ BibTeX ]
Abstract: Expand/Collapse
In this paper we first present a uniformity property that characterises optimal channel assignments for networks arranged as cellular or square grids. Then, we present optimal channel assignments for cellular and square grids; these assignments exhibit a high value for /spl delta//sub 1/ - the separation between channels assigned to adjacent stations. Based on empirical evidence, we conjecture that the value our assignments exhibit is an upper bound on /spl delta//sub 1/.
- Channel Assignment for Wireless
Networks Modelled as d-Dimensional Square Grids.
A Dubhashi, MVS Shashanka, A Pati, S Ramaprasad, AM Shende.
Intl. Workshop on Distributed Computing, Kolkata, India, Dec
2002. [ pdf ] [ BibTeX ]
Abstract: Expand/Collapse
In this paper, we study the problem of channel assignment for wireless networks modelled as d-dimensional grids. In particular, for d-dimensional square grids, we present optimal assignments that achieve a channel separation of 2 for adjacent stations where the reuse distance is 3 or 4. We also introduce the notion of a colouring schema for d-dimensional square grids, and present an algorithm that assigns colours to the vertices of the grid satisfying the schema constraints.