Category : Audio Analysis

Audio Analysis Machine Learning Publications Signal Processing Video Analysis

Artificial Intelligence and Video Mining: Audio Event Detection Using SVM

In this paper we present a method aiming at analyzing the content an audio signal by using an artificial intelligence technique: Support Vector Machines (SVM). The objective is to detect the different events occurring in an unknown audio signal for information retrieval purposes. We present particularly the detection of violent events in a video. 

There are two types of data mining, depending on whether the aim is to describe or rather to predict. In the specific case of audio data mining, on the one hand there is a descriptive method consisting of classifying a set of audio signals into the most similar groups of signals from a perception viewpoint. This is unsupervised classification. On the other hand, there is the predictive method consisting in designing a model from a learning database. In this way, any new audio signal could be automatically classified on the basis of the built model. This method is the supervised classification. The present paper deals with the supervised classification.

There are various supervised classification algorithms, such as decision trees, neurone networks, etc. However, we chose Support Vector Machine (SVM) which, according to the literature gives good results for real-world applications.

Firstly, we will describe the database or corpus. In a second section, we will present features used to describe the stimuli of the corpus. The third part of the paper will be devoted to brief theory on SVM algorithm. Finally, we will present the results of our study before drawing conclusions from this work.

Read More
Audio Analysis Publications Signal Processing Speech Processing

Signal Processing Applied To Video Mining: Video Boundaries Detection

Scene change detection is a technique which aims to identify automatically the scene change in a video. Assuming that a scene is defined by its audio and video signals, we present here scene change techniques based on audio and video signals. In the case of audio signal, the different techniques are based on abrupt variations of their frequency- and time-based features. For techniques based on video signals, the usual algorithms are based on the Sum of Absolute Differences (SAD) variation.

Read More
Audio Analysis Machine Learning Publications Signal Processing Video Analysis

Random Forest Classifier and Bag of Audio Words concept applied to audio scene recognition

Bag of Audio Words (BoAW) is a concept inspired by the text mining research area. The idea is to represent any audio signal as a document of words. In this parallelism, each word corresponds to an acoustic feature.  The concept was successfully applied to image processing where the bag of visual words is generated using an unsupervised classifier like k-means. Here we will describe how to design a Bag of Words for the speech/audio signal case. Since the final goal is to build an audio/speech pattern recognition system, we will used as supervised classifier the Random Forest  (RF) classifier, which is well adapted to large data sets with a very high number of features. Moreover, it has some good robustness properties to guard against overfitting.

Read More
Audio Analysis Information Retrieval Machine Learning Publications Signal Processing Speech Processing Video Analysis

Automatic Emotion Recognition system using Deep Belief Network

Mood is a subjective term describing the emotional state of a human being. It can be expressed in textual form (e.g. twitter …). Let us remember that this topic is already addressed in our paper about sentiment analyses. On the other hand, mood can be recognized by analyzing facial expressions or/and the nature of voice. The speech-based Automatic Emotion Recognition (AER) systems which will be discussed here have several types of application, such as emotion detection in call centers, where being able to detect the emotion can be helpful in taking appropriate decisions. In the case of online video advertising, forecasting the emotion from speech signals in video can be useful to fine-tune the user targeting. Obviously, emotion detected from speech can be combined with facial expressions and textual information to improve accuracy. Here we will focus on Automatic Emotion Recognition based uniquely on an analysis of human speech. The system that will be presented is based on a recent machine learning technique: Deep Learning Network (DBN). It is an improvement on classical neural networks. We will describe the DBN and the database of emotional speech used to build such an AER system.

Read More
Audio Analysis Information Retrieval Machine Learning Publications Speech Processing

Gaussian Mixture Model Supervectors

Gaussian Mixture Model (GMM) supervectors (GSV) are generally used in speaker recognition tasks. However, they can be used for the classification of audio events, especially when the training dataset is very limited. This is the case for the recognition of some types of sound, such as “gunshots”, where the variation from one sample to another is small (so the number of various stimuli of these types can be limited). Thus, in a supervised classification, rather than directly using the features vectors as the classifier input, they are transformed into GSV beforehand. This transformation aims at compensating the limitation of the stimuli variability in the training database. In the following, we will present an introduction to Gaussian Mixture Models, the core idea of GSV, and then we will present the GSV concept.

Read More
Audio Analysis Audio Fingerprint Information Retrieval Publications

Audio Fingerprint

An audio fingerprint is a kind of compact representation of an audio signal. It allows indentifying an unknown audio signal by trying to match its signature against the signature of the entire audio signal stored in a database. This system can be useful in detecting pirated audio signals (watermarking) or recognizing or identifying unknown music broadcast by a radio station, for example. Another purpose of the audio fingerprint is integrity verification.

Read More