I don’t understand this Computer Science question and need help to study.
- Chapter 1 outlines the tidy text format and the
unnest_tokens()function. It also introduces the gutenbergr and janeaustenr packages, which provide useful literary text datasets that we’ll use throughout this book.
- Chapter 2 shows how to perform sentiment analysis on a tidy text dataset, using the
sentimentsdataset from tidytext and
- Chapter 3 describes the tf-idf statistic (term frequency times inverse document frequency), a quantity used for identifying terms that are especially important to a particular document.
- Chapter 4 introduces n-grams and how to analyze word networks in text using the widyr and ggraph packages.
- Chapter 5 introduces methods for tidying document-term matrices and corpus objects from the tm and quanteda packages, as well as for casting tidy text datasets into those formats.
- Chapter 6 explores the concept of topic modeling, and uses the
tidy()method to interpret and visualize the output of the topicmodels package.