Using topicmodels package for analysis of topics in texts

Hi fellow Students and Kirsty

My vignette is about text mining and analysis, utilising the tm and topicmodels packages in R and Latent Dirichlet Allocation, to work out what the documents are written about without having to read them all!

The vignette shows you how to create a Document-Term Matrix, then uses LDA to work out what key themes are present in a body of documents (called a corpus) and assigns each document to the topics, with varying probabilities for each topic.

This tool can help a user find a relevant document without having to search for it by name, or even knowing what it was written about!

Anyway, here is the link to my vignette:

http://rpubs.com/benjibex/266565

I hope you find it useful.

Tracy

3 thoughts on “Using topicmodels package for analysis of topics in texts”

  1. Hi Tracy,
    Interesting blog and covers an area I have looked at before. I wonder if it is possible to do this with PDF files, as I was faced with this issue at work and resorted to VBA to get through thousands of documents, but was pulling out numerical data rather than text. Using R may well have been another option and one long rainy day I might give it a try.

  2. Hi Tracy,
    I found your vignette a great summary to start working with topic modeling. I am still getting my head around this, but your post cleared some confusion for me as to “why” and then “how” we do certain things to the data to achieve the Topic models.
    I’m still not quite making the connection with the probability calculations and what they mean, but I know practice will solve this.
    Good Topic! – pun intended.

Leave a Reply