Information Retrieval Machine Learning Publications Text Mining

Sentiment Analysis

The customers’ sentiments and opinions are key information in marketing. Feedback about the items customers have bought can be used to optimize production. On the other hand, in a political context, the population’s opinions about a given law are crucial for its establishment. In the field of advertising, knowledge about customers’ sentiments can be useful to refine the parameters of campaigns to achieve better targeting. We can imagine that being in possession of a tool which can measure this “sentiment” is an invaluable asset.

Sentiment analysis, as its name explicitly suggests, is a system that automatically identifies a sentiment in a multimedia (audio, video, text) document. The fields of application are very diverse: commercial products, political law, event, etc.

The large amount of rich textual information available on the internet thanks to the numerous websites (social networks, e-commerce site, news sites, etc.) is a tremendous aid in designing a sentiment analysis system.

We will focus on textual document sentiment analysis. Even if the document that contains the opinions is not textual, an ASR system allows the opinion to be obtained in textual format. Here we present a sentiment analysis based on a Naive Bayes classifier.

Opinion Modelling

Sentiment analysis is also known as Opinion Mining. How can we define an Opinion?

An opinion depends on five parameters and can be modelled as a quintule  [1]

The first parameter  is a target entity (object): a product, a person, a topic, etc.

The second parameter  is an aspect/attribute (features) of the entity 

The parameter  is an opinion holder (opinion source).

The variable  denotes the time when  expressed the opinion. And finally, the value of the opinion is  whose value can be more or less granular (e.g.: positive, neutral, negative or a scale that varies from 0 to 5).

ExampleJean wrote, on 15th September 2014: “The Video Mining technology of DynAdmic is excellent”. The quintuple format is: (DynAdmic, Video Mining technology, excellent, Jean, 15/09/2015).

Sentiment analysis based on Naive Bayes (NB) Classifier

It is well known that any system of classification is either supervised or unsupervised. The classification of opinions using the textual opinion available on the internet is the supervised classifier case. Obviously there are several supervised techniques such as SVM, Maximum, etc. Here, we decide to use the Naive Bayes which is usually used in text classification purposes.

Several types of feature (different from “entity features”) can be used to feed the classifier: unigrams (or N-grams), POS (Part-Of-Speech) tags, etc.

We describe how to design a classifier to automatically classify the sentiment orientation of a tweet. We used as features a set of   features (e.g. unigrams) denoted  . And we assume that there are  classes denoted  (e.g.  : )

The training database is built using a set of  denoted  .

Given a new textual document, we must find its most likely class  . This problem can be solved using a Maximum A Posteriori estimation which leads to the following problem:


According to Bayes’ rule:


Consequently


Let us denote . We can see that  does not affect the search of  . Hence:

The distribution  is estimated by considering the Naive Bayes principle, which assumes that the features are conditionally independent given . This is why we speak about “Naive” Bayes. This condition is not very realistic, since the assumption of independence of the features is not always true. However, NB provides some good results compared to more complex classifiers (such as SVM) and its training phase has a low computational complexity. The independence assumption leads to rewriting the preceding equation as follows:



To avoid multiplication, the logarithm function (since it is monotonic and transforms multiplication to summation) can be applied to the previous equation to obtain the following equation:


There are different types of Naïve Bayes depending on the nature of the features’ likelihood . For example:

– Gaussian Naive Bayes (features are normally distributed)
– Multinomial Naive Bayes (features are multinomially distributed).
– Bernoulli Naïve Bayes (features have binary values: )

Let us focus on the second type: Multinomial Naive Bayes, which is widely used for text classification. In Multinomial NB, the likelihood  is estimated using a feature smoothing technique as follows:


The smoothing process aims to minimize the impact of statistical variability of the training data set [2]. In the case of Laplace smoothing, the parameter .

The variable  is the occurrence of the feature in class  in the training dataset. Otherwise, the prior distribution can be estimated simply by:  (where  is the number training data belonging to class  ). Finally:


References

[1] Bing Liu and J. Cheng in Handbook of Natural Language Processing, Second Edition. March, 2010.

[2] Vilar, D., Ney, H., Juan, A., & Vidal, E. (2004). Effect of feature smoothing methods in text classification tasks. Proc. of PRIS, 4, 108-117.

You may also like
Relations Extraction in textual information
Video Quality Assessment (VQA)