I equally struggled and enjoyed working with Voyant this week. As you may be able to tell from this late posting it was a bit of a week for me personally and I would have liked to spend even more time with Voyant than I did. Unfortunately, most of the time I did spend was spent being frustrated with it and troubleshooting. As few other people have mentioned, the corpus was rather limiting to me, it only allowed one instance of uploading files from the computer before it went it it’s analyzing mode and then would only allow URL or text additions after that. Voyant did mention that there was a limit on the amount of bytes the corpus could take but I it didn’t seem to give an exact number so I was left guessing on that one and even with my files zipped I was only able to analyze 7 texts at a time (granted, they are rather large texts but I wanted to do a few more than what I ended up with). To be fair, a few of my own files gave me a hard time as well. I think it was because they were online textbooks, but a few gave me strange commonly repeated words, like http and a www. website handle so I did have to replace those with ones that gave me more on topic analyzations.
I chose to analyze some of my old physics textbooks. I have to admit, I mostly decided on these because I already had electronic textbook versions on my computer but when I was first experimenting with Voyant I did find the analysis I got back interesting.
The textbooks are from a variety of disciplines so I thought it was interesting to see which words (and in a few cases symbols) the texts had in common, or at least which were most populous. For instance, when you look at the tool above, the word energy is clearly the most used word among all the texts. When you examine each text individually, the word energy is still in each top 25 word counts but it does vary from being the most populous to just barely making the cut. Perhaps we can draw a conclusion that one of the most important concepts in physics is energy.
This is probably a good time to clarify some of my texts names. I am not sure why but when uploaded to the corpus their names were composed entirely of numbers and I couldn’t figure out how to change that. So, the text starting with 019 is a thermal physics text, 978 is solid state physics and 032 is electrodynamics.
I thought it was interesting to take words that seem to have a strong correlation to each other (particle, wave, and light or electron, proton, neutron, and atom) and see how each text seemed to correlate and use these words. I would have liked to spend more time here, using these tools to examine my corpus, seeing that this is really not information I could glean myself by reading these texts.