Natural Language Processing for Sentiment Analysis

Marcelo Guarido

Continuing what was done during the Learning Lab 24, this lab will focus on different Natural Language Processing (NLP) strategies, such as selecting important words using the TF-IDF (Term Frequency - Inverse Document Frequency) algorithm, removing stop words, what n-grams are, and how to convert a sentence to numerical features.

For this experiment, we will use a data with YELP! restaurants reviews, which contains sentences (the reviews) and the sentiment of the review (if it is positive or not). As part of the pre-processing, too common words (the stop words) and punctuations are removed from the sentences, and the words at each sentence are converted to numerical features with the TF-IDF algorithm, which counts how many times a word appear in a sentence and weight it by how times it appeared in different documents.

In the end, we will try different classification algorithms, such as Logistic Regression, Random Forest, and Neural Networks, to classify the sentence as positive or negative.

We will show all that on a live coding demonstration in Python, step-by-step.