Audio Analysis Machine Learning Publications Signal Processing Video Analysis

Artificial Intelligence and Video Mining: Audio Event Detection Using SVM

In this paper we present a method aiming at analyzing the content an audio signal by using an artificial intelligence technique: Support Vector Machines (SVM). The objective is to detect the different events occurring in an unknown audio signal for information retrieval purposes. We present particularly the detection of violent events in a video. 

There are two types of data mining, depending on whether the aim is to describe or rather to predict. In the specific case of audio data mining, on the one hand there is a descriptive method consisting of classifying a set of audio signals into the most similar groups of signals from a perception viewpoint. This is unsupervised classification. On the other hand, there is the predictive method consisting in designing a model from a learning database. In this way, any new audio signal could be automatically classified on the basis of the built model. This method is the supervised classification. The present paper deals with the supervised classification.

There are various supervised classification algorithms, such as decision trees, neurone networks, etc. However, we chose Support Vector Machine (SVM) which, according to the literature gives good results for real-world applications.

Firstly, we will describe the database or corpus. In a second section, we will present features used to describe the stimuli of the corpus. The third part of the paper will be devoted to brief theory on SVM algorithm. Finally, we will present the results of our study before drawing conclusions from this work.

Read More
Audio Analysis Publications Signal Processing Speech Processing

Signal Processing Applied To Video Mining: Video Boundaries Detection

Scene change detection is a technique which aims to identify automatically the scene change in a video. Assuming that a scene is defined by its audio and video signals, we present here scene change techniques based on audio and video signals. In the case of audio signal, the different techniques are based on abrupt variations of their frequency- and time-based features. For techniques based on video signals, the usual algorithms are based on the Sum of Absolute Differences (SAD) variation.

Read More
Information Retrieval Machine Learning Publications

Recommender Systems

Recommender Systems (RS) provide a user with recommendations about items as part of any given service. They are a subclass of Information Filtering. They have a variety of domains of application: movies, travel, financial services (insurance, loans), advertising… Using an RS presents several benefits: increased user loyalty and satisfaction, increased revenues and system efficiency…

Below, we present a general overview of RS, namely an introduction to the different types of RS and their core concept. In the last section, we will look at dimensionality reduction CF.

Read More
Audio Analysis Machine Learning Publications Signal Processing Video Analysis

Random Forest Classifier and Bag of Audio Words concept applied to audio scene recognition

Bag of Audio Words (BoAW) is a concept inspired by the text mining research area. The idea is to represent any audio signal as a document of words. In this parallelism, each word corresponds to an acoustic feature.  The concept was successfully applied to image processing where the bag of visual words is generated using an unsupervised classifier like k-means. Here we will describe how to design a Bag of Words for the speech/audio signal case. Since the final goal is to build an audio/speech pattern recognition system, we will used as supervised classifier the Random Forest  (RF) classifier, which is well adapted to large data sets with a very high number of features. Moreover, it has some good robustness properties to guard against overfitting.

Read More
Audio Analysis Information Retrieval Machine Learning Publications Signal Processing Speech Processing Video Analysis

Automatic Emotion Recognition system using Deep Belief Network

Mood is a subjective term describing the emotional state of a human being. It can be expressed in textual form (e.g. twitter …). Let us remember that this topic is already addressed in our paper about sentiment analyses. On the other hand, mood can be recognized by analyzing facial expressions or/and the nature of voice. The speech-based Automatic Emotion Recognition (AER) systems which will be discussed here have several types of application, such as emotion detection in call centers, where being able to detect the emotion can be helpful in taking appropriate decisions. In the case of online video advertising, forecasting the emotion from speech signals in video can be useful to fine-tune the user targeting. Obviously, emotion detected from speech can be combined with facial expressions and textual information to improve accuracy. Here we will focus on Automatic Emotion Recognition based uniquely on an analysis of human speech. The system that will be presented is based on a recent machine learning technique: Deep Learning Network (DBN). It is an improvement on classical neural networks. We will describe the DBN and the database of emotional speech used to build such an AER system.

Read More

Auction Theory

The auction process is a set of trading rules in economic theory, relating to a very old procedure. As long ago as A.D. 193, the Praetorian Guard sold the Roman Empire by auction. Auctions can take place with the physical presence of buyers and sellers, or electronically. Nowadays, auctions are used in various fields: selling of companies, houses, cars, radio spectrum licenses, assets, etc. More recently, this process has been used in the field of online advertising through Real Time Bidding (RTB) systems. Here, we present some of the main principles of auctions while focusing on the particular case of Independent Private Value (IPV) auctions, which will be explained.

Read More
Information Retrieval Machine Learning Publications Text Mining

Sentiment Analysis

The customers’ sentiments and opinions are key information in marketing. Feedback about the items customers have bought can be used to optimize production. On the other hand, in a political context, the population’s opinions about a given law are crucial for its establishment. In the field of advertising, knowledge about customers’ sentiments can be useful to refine the parameters of campaigns to achieve better targeting. We can imagine that being in possession of a tool which can measure this “sentiment” is an invaluable asset.

Sentiment analysis, as its name explicitly suggests, is a system that automatically identifies a sentiment in a multimedia (audio, video, text) document. The fields of application are very diverse: commercial products, political law, event, etc.

The large amount of rich textual information available on the internet thanks to the numerous websites (social networks, e-commerce site, news sites, etc.) is a tremendous aid in designing a sentiment analysis system.

We will focus on textual document sentiment analysis. Even if the document that contains the opinions is not textual, an ASR system allows the opinion to be obtained in textual format. Here we present a sentiment analysis based on a Naive Bayes classifier.

Read More
Computer Vision Image Processing Publications Video Analysis

Video Quality Assessment (VQA)

The growing ubiquity of audiovisual content on the Internet has increased the importance of online advertising. With technological improvements, the quality of digital video rendering keeps improving as well, which in turn makes the users’ requirements stricter and stricter. So, the video quality is an important element to consider in online video advertising. Indeed, it goes without saying that a high quality video is more likely to interest users than a low quality video. Therefore it is crucial to be able to quantify the video’s quality. Nevertheless, the multiplicity of video formats and the various types of communication networks (wireless, fiber, xDSL networks…) make video quality assessment complex. Since the “end receiver” of video is human, the most accurate VQA is subjective (by humans). However, subjective assessment is time-consuming, and it depends on the person who evaluates it (mood, culture…). Thus, researchers have considered building objective assessment methods to model subjective methods. The advantage of objective VQA is that they can operate in real time. We will focus here on objective assessment processes for quality of video signals.

Read More
Machine Learning Publications

Behavioral Targeting

Online advertising is becoming more and more competitive. Hence, to maximize their revenue by optimizing their KPI (Key Performance Indicators) such as the CTR (Click-Through-Rate), advertisers use various targeting techniques. On the one hand, targeting can be based on the contextual information of the current webpage page visited by the users, and on the other hand it can be performed by profiling the users’ historical set of queries. The latter technique is known as Behavioral Targeting (BT). This approach is used to predict whether a user would be more or less sensitive to advertising. We present here a BT system based on logistic regression. However we shall begin this paper by presenting some technical concepts used in digital advertising.

Read More
Information Retrieval Machine Learning Publications

Relations Extraction in textual information

These days we find a large amount of textual data on the web, which is partly encouraged by the proliferation of social networks and other web pages. This textual information is a “goldmine”, because many applications can be drawn from the quantity of knowledge they provide. For example, it is possible to analyze the sentiment and/or opinion of customers. This kind of information can be used to improve the accuracy of customer targeting. One can also predict or prevent Customer Churn more accurately. Another kind of application is Semantic Web (SW, Web 3.0). The core idea behind the SW is to link information according to relations. Hence, computers will be able to “understand the information by themselves”. In the present paper, first we present an overview of the architecture of a generic Information Extraction (IE) system. This is followed by a short description of each component of this architecture, and we end by presenting the principle of a Relations Extraction system.

Read More
1 2 3