NLP Analysis and Recommendation System for Yelp
A thesis submitted in partial satisfaction of the requirements for the degree Master of Science in Applied Statistics
By Jiancong Sun, 2020
Abstract
Yelp is a platform which provides massive restaurant information. To be precise, this application shares various info distributed by categories such as home services, auto services and more. Research focuses on recommending appropriate restaurants to Yelp’s users. Based on several features and preferences in restaurant reviews, the paper suggests four practical recommendation models : Location based, content based, collaborative filtering and combined model.
Review
NLP Recommendation models are mostly derived from ranking models. Once you rank some entities by an algorithm, for example, you can recommend the highest entity(restaurant in this case) to the customer.
Tensorflow provides a package(a source code) called Tensorflow Ranking(TF ranking). Based on the algorithm “Learining-to-Rank”, tensorflow shares 3 different models for ranking : Pointwise, Pairwise and Listwise. Learning-to-Rank is a machine learning model that requires a certain form of data. Pointwise, Pairwise and Listwise models are distributed by this form of data. If you have higher quality of data, you can use Listwise model in order to get higher accuracy. However it is almost impossible to have a data form that matches the requirements for Listwise model. At my internship in 2021, I used pointwise model and still got satisfactory amount of accuracy.
The point I did not liked about this paper is that the paper does not provide the specific metric about the recommendation model. Consequently the paper lacks evaluation. Certain model requires a data, and extracts numerous features from the data. These features should have been kindly explained in this paper. Based on the features, the model processes the learning progress(suspervised in this case) within the ‘train data’(usually we divide data into train data, and validation data). Once the machine learning model is made, validation data escalates the accuracy of the hyper-parameters and returns the performance of this predictive model. In this case, the resulting rank by so called rank score would be the main deal of deciding the performance.
Further Work
The paper suggest three kinds of further works :
1) Extension of the recommendation system
2) Sentiment analysis on user reviews
3) Word vector recommendation model
There is one point I would like to talk about. Cronically, all NLP models suffers a lot when they have not much of a data. Such as sentiment analysis on user reviews, we cannot decide whether the place is good or not by few reviews in Yelp especially if the restaurant is placed in a region with not much of citizens. Even though the model could learn from reviews in las vegas, it is hard to apply the model and predict the rank score of restaurants in unpopular cities. However, fortunately there are some papers(recently written!) that could escalate the performance of a model despite the lack of data. This is it for today.
'Else' 카테고리의 다른 글
How Transformers Learn Long Sequences (0) | 2023.04.05 |
---|---|
Attention is All You Need (0) | 2023.01.05 |
Denying the Legacy System of Reviewing Scientific Papers (0) | 2022.08.07 |