Recommender Systems (RS) provide a user with recommendations about items as part of any given service. They are a subclass of Information Filtering. They have a variety of domains of application: movies, travel, financial services (insurance, loans), advertising… Using an RS presents several benefits: increased user loyalty and satisfaction, increased revenues and system efficiency…
Below, we present a general overview of RS, namely an introduction to the different types of RS and their core concept. In the last section, we will look at dimensionality reduction CF.
In a Recommender System we distinguish users and items, described by their attributes. Let us consider the example of movie recommendations (e.g. Netflix). In this specific case, the users are customers and the items are movies. The users’ attributes are for example sociodemographic information. And the items’ attributes are the movies’ genre (action, comedy…), the producer, the actors…
We distinguish four types of systems for recommendations:
In the remainder we will focus on Content-based and CF RS.
This RS aims to model the preferences of users by modeling the items using their relevant features. Three main steps can describe this type of RS :
The principle behind CF RS is to predict the rating a particular user will give an item by using the ratings the users most similar to her have given that item (user-user approach).
The usual similarity measures are Cosine Similarity  or Pearson Correlation .
Scalability problems can affect the user-user CF when the number of users increases considerably. Hence one can alternatively use an item-item approach, where the rating a specific user would give a particular item depends on the rating he gave items that are the most similar to the item under consideration.
Another family of CF, named Model-based CF, designs a predictive model from the ratings matrix.
Although neighborhood-based (memory-based) CF provides good results, it faces problems like sparsity. Indeed sparsity occurs when users rate few items. And the scalability problem comes from the fact that, in the real world, the user-item matrix can grow larger and larger very quickly. Hence Matrix Factorization (MF) becomes useful. A straightforward approach for this type of CF uses a truncated Singular Vector Decomposition (SVD).
The objective of such a CF is to approximate the initial rating matrix as follows:
where and are orthogonal matrices ()
We can rewrite where and Q .
The objective is to find the optimal and by minimizing the loss function:
The second member of the above equation with variable (regularization parameter) is used to prevent overfitting. An Alternating Least Squares technique can find and . The MF using SVD is also known as Latent Semantic Analysis (LSA). Hofmann introduced a probabilistic approach of LSA  to handle polysemy.
Hence the data is stored in a term-document (co-occurrence) matrix where is the number of times term occurred in document .
Fig.1 Term-document matrix
The PLSA is a generative model based on the following algorithm:
Given a collection of documents, the objective of the training phase is to maximize the likelihood:
where is the term frequency of . Otherwise, considering a pair of word and document , the joint probability is:
Since and are independent conditioned on the state of the associated latent variable
Then the log of the likelihood can be rewritten as follows:
The estimate of the parameters that maximize the likelihood uses an Expectation Maximization (EM) algorithm. After initializing the parameters, the EM algorithm consists of two steps, which repeat until convergence. The first step, the Expectation (E) phase aims at computing the posterior probabilities given the parameters’ values resulting from the previous iteration:
The maximization (M) step updates the parameters based on the posterior probabilities computed in the previous step. The method of Lagrange multipliers is used:
Once the model is built, a query document is represented in the latent topic space using a folding-in process to calculate . This process uses the EM algorithm where the Expectation phase is identical to that in the training phase. The M step only recalculates while remains constant.
Let us note using a modified version of EM called Tempered EM (TEM) can prevent overfitting.
The main drawback of PLSA is that there is no direct way of modeling new documents. Moreover, the number of parameters grows linearly with the number of documents. So Lei introduced Latent Dirichlet Allocation (LDA)  to overcome these issues. It is indeed similar to PLSA where the topic mixtures have prior Dirichlet Distribution.
Cold Start issue
We must note the CF is one of the most common RS; however it faces the “cold start issue.” This problem occurs when there is no information about an item or user (e.g. new item/user). To resolve this issue several approaches are feasible:
 Kantor, Paul B; Ricci, Francesco; Rokach, Lior; Shapira, Bracha. “Recommender systems handbook.” Springer, 2011.
 Singhal, Amit. “Modern Information Retrieval: A Brief Overview.” Bulletin of the IEEE Computer Society Technical Committee on Data Engineering. 2001.
 T. Hofmann, “Probabilistic latent semantic indexing.” SIGIR 1999.
 Blei, David M.; Ng, Andrew Y.; Jordan, Michael I (January 2003). Lafferty, John, ed. “Latent Dirichlet allocation.” Journal of Machine Learning Research 3 (4–5): pp. 993–1022.