Category : Information Retrieval

Information Retrieval Machine Learning Publications

Recommender Systems

Recommender Systems (RS) provide a user with recommendations about items as part of any given service. They are a subclass of Information Filtering. They have a variety of domains of application: movies, travel, financial services (insurance, loans), advertising… Using an RS presents several benefits: increased user loyalty and satisfaction, increased revenues and system efficiency…

Below, we present a general overview of RS, namely an introduction to the different types of RS and their core concept. In the last section, we will look at dimensionality reduction CF.

Read More
Audio Analysis Information Retrieval Machine Learning Publications Signal Processing Speech Processing Video Analysis

Automatic Emotion Recognition system using Deep Belief Network

Mood is a subjective term describing the emotional state of a human being. It can be expressed in textual form (e.g. twitter …). Let us remember that this topic is already addressed in our paper about sentiment analyses. On the other hand, mood can be recognized by analyzing facial expressions or/and the nature of voice. The speech-based Automatic Emotion Recognition (AER) systems which will be discussed here have several types of application, such as emotion detection in call centers, where being able to detect the emotion can be helpful in taking appropriate decisions. In the case of online video advertising, forecasting the emotion from speech signals in video can be useful to fine-tune the user targeting. Obviously, emotion detected from speech can be combined with facial expressions and textual information to improve accuracy. Here we will focus on Automatic Emotion Recognition based uniquely on an analysis of human speech. The system that will be presented is based on a recent machine learning technique: Deep Learning Network (DBN). It is an improvement on classical neural networks. We will describe the DBN and the database of emotional speech used to build such an AER system.

Read More
Information Retrieval Machine Learning Publications Text Mining

Sentiment Analysis

The customers’ sentiments and opinions are key information in marketing. Feedback about the items customers have bought can be used to optimize production. On the other hand, in a political context, the population’s opinions about a given law are crucial for its establishment. In the field of advertising, knowledge about customers’ sentiments can be useful to refine the parameters of campaigns to achieve better targeting. We can imagine that being in possession of a tool which can measure this “sentiment” is an invaluable asset.

Sentiment analysis, as its name explicitly suggests, is a system that automatically identifies a sentiment in a multimedia (audio, video, text) document. The fields of application are very diverse: commercial products, political law, event, etc.

The large amount of rich textual information available on the internet thanks to the numerous websites (social networks, e-commerce site, news sites, etc.) is a tremendous aid in designing a sentiment analysis system.

We will focus on textual document sentiment analysis. Even if the document that contains the opinions is not textual, an ASR system allows the opinion to be obtained in textual format. Here we present a sentiment analysis based on a Naive Bayes classifier.

Read More
Information Retrieval Machine Learning Publications

Relations Extraction in textual information

These days we find a large amount of textual data on the web, which is partly encouraged by the proliferation of social networks and other web pages. This textual information is a “goldmine”, because many applications can be drawn from the quantity of knowledge they provide. For example, it is possible to analyze the sentiment and/or opinion of customers. This kind of information can be used to improve the accuracy of customer targeting. One can also predict or prevent Customer Churn more accurately. Another kind of application is Semantic Web (SW, Web 3.0). The core idea behind the SW is to link information according to relations. Hence, computers will be able to “understand the information by themselves”. In the present paper, first we present an overview of the architecture of a generic Information Extraction (IE) system. This is followed by a short description of each component of this architecture, and we end by presenting the principle of a Relations Extraction system.

Read More
Information Retrieval Publications Speech Processing Video Analysis

UBM-GMM based Text-Independent Speaker Recognition

Speaker recognition systems are often used in the field of security, and a common example of their use is client voice authentication for some secured applications. Another application of speaker recognition is segmentation into homogenous parts of speech where each segment corresponds to one speaker’s speech. This process can also be very useful for improving the accuracy of speech recognition systems. Speech can also be used in the field of audio indexing. Recognizing the identity of speakers in a multi-speaker audio stream can provide some usable knowledge about its content. Two types of speaker recognition system exist: the text-dependent and text-independent systems. The first are speaker recognition systems where the verification texts and those saved during the enrollment phase are the same. As in online video indexing, the sentences are a priori unknown; we will focus here on text-independent systems. This paper is organized as follows: first we present the GMM classifier, and then the principle of LLR (LikeLihood Ratio) detection used to decide on the score given by a tested utterance.

Read More
Audio Analysis Information Retrieval Machine Learning Publications Speech Processing

Gaussian Mixture Model Supervectors

Gaussian Mixture Model (GMM) supervectors (GSV) are generally used in speaker recognition tasks. However, they can be used for the classification of audio events, especially when the training dataset is very limited. This is the case for the recognition of some types of sound, such as “gunshots”, where the variation from one sample to another is small (so the number of various stimuli of these types can be limited). Thus, in a supervised classification, rather than directly using the features vectors as the classifier input, they are transformed into GSV beforehand. This transformation aims at compensating the limitation of the stimuli variability in the training database. In the following, we will present an introduction to Gaussian Mixture Models, the core idea of GSV, and then we will present the GSV concept.

Read More
Information Retrieval Publications Speech Processing Speech Recognition

PNCC features for ASR robustness enhancement

The acoustic features traditionally used in Speech and Audio Processing are MFCC and PLP. However, one important thing in designing an acoustic signal fingerprint is to use a robust feature. Consequently, several techniques aim to enhance MFCC and PLP by using for example, mean and variance normalization, variance normalization or RASTA filtering and variance normalization in the particular case of PLP.  Here, we present a new type of acoustic feature which directly implements a noise reduction algorithm: Power Normalised Cepstral Coefficients (PNCC) introduced by Chanwoo Kim [1]. This feature is more robust against background noise than the traditional features PLP and MFCC.

Read More
Computer Vision Image Processing Information Retrieval Publications Video Analysis

Image Segmentation

Image segmentation aims at splitting an image into partitions. These partitions should usually represent some real part of the global image. This technique is used in object identification (Face recognition, or relevant information retrieval) in digital images. There are many different ways to perform image segmentation, such as image thresholding, region-based segmentation and Hough’s Transform.

Read More
Information Retrieval Publications Video Analysis

Edge Detection

Edge detection is an essential step in any Computer Vision (CV) system. It is one of the principal steps of a human vision system. In fact, the Human Visual System (HVS) has cells for contour detection. This step reduces the amount of information to be retained, keeping only what is essential. Edge detection can be seen as an abrupt change in the intensity at any location of the image. In CV it is used for image segmentation, or identification of an object in an image.

Read More
Information Retrieval Machine Learning Publications Speech Processing Speech Recognition

HMM-based ASR

ASR is a system whose purpose is to convert speech into text. Several types of ASR have been designed by speech processing researchers, however those based on the HMM algorithm are the most accurate. Here, we will focus on the principle of HMM.

Read More
1 2