There are many packages in R for text processing.
It is thus possible to analyze a text and extract the most common words and visualize this set of words as a cloud .. from this derives the term for the type of visualization itself word cloud of words...
The R code for the word cloud visualization has to import many libraries. So install the following packages
library(tm)
libray(wordcloud)
What about the text we are going to visualize? It is a wikipedia page of a country. We will find out of what country it is by the wordcloud itself. I copied that page in a .txt file. So here is the code. It is well commented in order to explain what does each line.
read the text file, line by line
page = readLines("italy.txt")
produce a corpus of the text
corpus = Corpus(VectorSource(page))
convert all of the text to lower case (standard practice for text processing)
corpus = tm_map(corpus, tolower)
remove any kind of punctuation
corpus = tm_map(corpus, removePunctuation)
remove all the numbers
corpus = tm_map(corpus, removeNumbers)
remove English stop words
corpus = tm_map(corpus, removeWords, stopwords("english"))
create a document term matrix
dtm = TermDocumentMatrix(corpus)
there will be a kind of warning but I'm not sure about this warning
//Error: inherits(doc, "TextDocument") is not TRUE
it will then reconfigure the corpus as a text document
corpus = tm_map(corpus, PlainTextDocument)
dtm = TermDocumentMatrix(corpus)
convert the document matrix to a standard matrix for use in the
m = as.matrix(dtm)
sort the data so we end up with the highest as biggest
v = sort(rowSums(m), decreasing = TRUE)
finally produce the word cloud
wordcloud(names(v), v, min.freq = 10)
Go ahead and try the example. The code will complain at some point about some small error but at the end, it will run ... so no problem.
Here is the source of this article: http://www.datatreemap.com/vis4r/wordcloud_in_R.php
For more examples of data collecting analysis and visualizing http://www.datatreemap.com
P.S. Did you find out what country is the wikipedia page about? Italy of course ;-)