Category : Machine Learning

Audio Analysis Machine Learning Publications Signal Processing Video Analysis

Artificial Intelligence and Video Mining: Audio Event Detection Using SVM

In this paper we present a method aiming at analyzing the content an audio signal by using an artificial intelligence technique: Support Vector Machines (SVM). The objective is to detect the different events occurring in an unknown audio signal for information retrieval purposes. We present particularly the detection of violent events in a video. 

There are two types of data mining, depending on whether the aim is to describe or rather to predict. In the specific case of audio data mining, on the one hand there is a descriptive method consisting of classifying a set of audio signals into the most similar groups of signals from a perception viewpoint. This is unsupervised classification. On the other hand, there is the predictive method consisting in designing a model from a learning database. In this way, any new audio signal could be automatically classified on the basis of the built model. This method is the supervised classification. The present paper deals with the supervised classification.

There are various supervised classification algorithms, such as decision trees, neurone networks, etc. However, we chose Support Vector Machine (SVM) which, according to the literature gives good results for real-world applications.

Firstly, we will describe the database or corpus. In a second section, we will present features used to describe the stimuli of the corpus. The third part of the paper will be devoted to brief theory on SVM algorithm. Finally, we will present the results of our study before drawing conclusions from this work.

Read More
Information Retrieval Machine Learning Publications

Recommender Systems

Recommender Systems (RS) provide a user with recommendations about items as part of any given service. They are a subclass of Information Filtering. They have a variety of domains of application: movies, travel, financial services (insurance, loans), advertising… Using an RS presents several benefits: increased user loyalty and satisfaction, increased revenues and system efficiency…

Below, we present a general overview of RS, namely an introduction to the different types of RS and their core concept. In the last section, we will look at dimensionality reduction CF.

Read More
Audio Analysis Machine Learning Publications Signal Processing Video Analysis

Random Forest Classifier and Bag of Audio Words concept applied to audio scene recognition

Bag of Audio Words (BoAW) is a concept inspired by the text mining research area. The idea is to represent any audio signal as a document of words. In this parallelism, each word corresponds to an acoustic feature.  The concept was successfully applied to image processing where the bag of visual words is generated using an unsupervised classifier like k-means. Here we will describe how to design a Bag of Words for the speech/audio signal case. Since the final goal is to build an audio/speech pattern recognition system, we will used as supervised classifier the Random Forest  (RF) classifier, which is well adapted to large data sets with a very high number of features. Moreover, it has some good robustness properties to guard against overfitting.

Read More
Audio Analysis Information Retrieval Machine Learning Publications Signal Processing Speech Processing Video Analysis

Automatic Emotion Recognition system using Deep Belief Network

Mood is a subjective term describing the emotional state of a human being. It can be expressed in textual form (e.g. twitter …). Let us remember that this topic is already addressed in our paper about sentiment analyses. On the other hand, mood can be recognized by analyzing facial expressions or/and the nature of voice. The speech-based Automatic Emotion Recognition (AER) systems which will be discussed here have several types of application, such as emotion detection in call centers, where being able to detect the emotion can be helpful in taking appropriate decisions. In the case of online video advertising, forecasting the emotion from speech signals in video can be useful to fine-tune the user targeting. Obviously, emotion detected from speech can be combined with facial expressions and textual information to improve accuracy. Here we will focus on Automatic Emotion Recognition based uniquely on an analysis of human speech. The system that will be presented is based on a recent machine learning technique: Deep Learning Network (DBN). It is an improvement on classical neural networks. We will describe the DBN and the database of emotional speech used to build such an AER system.

Read More
Information Retrieval Machine Learning Publications Text Mining

Sentiment Analysis

The customers’ sentiments and opinions are key information in marketing. Feedback about the items customers have bought can be used to optimize production. On the other hand, in a political context, the population’s opinions about a given law are crucial for its establishment. In the field of advertising, knowledge about customers’ sentiments can be useful to refine the parameters of campaigns to achieve better targeting. We can imagine that being in possession of a tool which can measure this “sentiment” is an invaluable asset.

Sentiment analysis, as its name explicitly suggests, is a system that automatically identifies a sentiment in a multimedia (audio, video, text) document. The fields of application are very diverse: commercial products, political law, event, etc.

The large amount of rich textual information available on the internet thanks to the numerous websites (social networks, e-commerce site, news sites, etc.) is a tremendous aid in designing a sentiment analysis system.

We will focus on textual document sentiment analysis. Even if the document that contains the opinions is not textual, an ASR system allows the opinion to be obtained in textual format. Here we present a sentiment analysis based on a Naive Bayes classifier.

Read More
Machine Learning Publications

Behavioral Targeting

Online advertising is becoming more and more competitive. Hence, to maximize their revenue by optimizing their KPI (Key Performance Indicators) such as the CTR (Click-Through-Rate), advertisers use various targeting techniques. On the one hand, targeting can be based on the contextual information of the current webpage page visited by the users, and on the other hand it can be performed by profiling the users’ historical set of queries. The latter technique is known as Behavioral Targeting (BT). This approach is used to predict whether a user would be more or less sensitive to advertising. We present here a BT system based on logistic regression. However we shall begin this paper by presenting some technical concepts used in digital advertising.

Read More
Information Retrieval Machine Learning Publications

Relations Extraction in textual information

These days we find a large amount of textual data on the web, which is partly encouraged by the proliferation of social networks and other web pages. This textual information is a “goldmine”, because many applications can be drawn from the quantity of knowledge they provide. For example, it is possible to analyze the sentiment and/or opinion of customers. This kind of information can be used to improve the accuracy of customer targeting. One can also predict or prevent Customer Churn more accurately. Another kind of application is Semantic Web (SW, Web 3.0). The core idea behind the SW is to link information according to relations. Hence, computers will be able to “understand the information by themselves”. In the present paper, first we present an overview of the architecture of a generic Information Extraction (IE) system. This is followed by a short description of each component of this architecture, and we end by presenting the principle of a Relations Extraction system.

Read More
Audio Analysis Information Retrieval Machine Learning Publications Speech Processing

Gaussian Mixture Model Supervectors

Gaussian Mixture Model (GMM) supervectors (GSV) are generally used in speaker recognition tasks. However, they can be used for the classification of audio events, especially when the training dataset is very limited. This is the case for the recognition of some types of sound, such as “gunshots”, where the variation from one sample to another is small (so the number of various stimuli of these types can be limited). Thus, in a supervised classification, rather than directly using the features vectors as the classifier input, they are transformed into GSV beforehand. This transformation aims at compensating the limitation of the stimuli variability in the training database. In the following, we will present an introduction to Gaussian Mixture Models, the core idea of GSV, and then we will present the GSV concept.

Read More
Information Retrieval Machine Learning Publications Speech Processing Speech Recognition

HMM-based ASR

ASR is a system whose purpose is to convert speech into text. Several types of ASR have been designed by speech processing researchers, however those based on the HMM algorithm are the most accurate. Here, we will focus on the principle of HMM.

Read More
Information Retrieval Machine Learning Publications Speech Processing

Spoken Language Recognition

The objective of Spoken Language Identification (LID) is to recognize automatically the language spoken in an unknown speech signal. This system has several applications such as Speech-To-Speech Machine Translation Systems and telephone-based services. In the case of Automatic Speech Recognition (ASR), the LID allows the selection of appropriate parameters of the ASR system. There are two main types of LID system: there are LIDs based on spectral features and others based on tokens.

Read More
1 2