Text Analysis
Text Analysis can be describes as a process that allows for information to be apprehended from data as a means to track, identify, and compare patterns and trends to work towards a conclusion. Text Analysis is not always technological or static, but establishes “webs of meaning” to quote Clifford Geertz.
Before conducting any type of text analysis, you need to start with a set of data. From there, you compare data with its relationship to other data. The great thing about technology is the possibilities it offers with your analysis because there is such a large quantity of it available.
Due to my background in English, I am drawn to think of text analysis in relation with novels and other types of literature. As such, the data that you would gather would be literature from a particular time period or group of people compared against each other. This may tell us something of the social aspects of that group. Text-mining would offer “provocations, surfacing evidence, suggesting patterns and structures, or adumbrating trends” (Clement et al.).
Currently, many companies are working to digitalize books, but I do have concerns about using their published data. I worry about the reliability of data because, specifically with Google Books, the content is not 100% available, which could potentially affect results. Further, someone somewhere selects which text would get published digitally, which means someone is also choosing what not to publish. Those unpublished text may be fundamental for your research and could potentially alter your actual findings. This issue seems to be the only down side of conducting text analysis.
While looking at Sara Steger’s research, as discussed in “How not to Read a Million Books”, one thing that stood out, for me, as a benefit of conducting text analysis, is the ability to perform distant readings instead of close readings because the tools used to do distant reading prevents researchers from projecting meaning onto something. Data does not lie, therefore, even if you go into a research project with an assumption of a particular outcome for your work, the data will be able to tell you whether or not your assumption is correct and you will not be drawn to search for particular pieces of evidence that fit your desired outcome. Text-mining tools seem to provide a barrier from projecting assumptions onto texts because when it comes down to it, the data doesn’t lie. However, there are disciplinary fears associated with Text Analysis and the fear of losing ones’ job because the computer can replace them (just look at what has happened at places like Scotiabank and McDonald’s where computers have replaced clerks and cashiers). Technological replacement of humans is real, but also offers other job opportunities as someone needs to code, build, and repair these programs.
With this in mind, there are advantages and disadvantages to both close and distant reading. With close reading, you can focuses on a small amount of text, you can do the work that a computer cannot, you can be more subjective, and have a deeper comprehension of language. With distant reading you can focuses on a large amount of text, do the work that a human cannot, be more objective, and dwork with a semantaic comprehension of language.
One thing that excites me about the tools available to conduct text-mining is that they all seem to be well thought out, for example, MorphAdorner, which “deal[s] with orthographic and morphological variance” (Clement et al). I had previously worried about the variation of spelling across time periods and how it could affect research results. It is encouraging to know that people are developing tools that respond to those types of problems and needs of scholars. These tools allow the benefit of transparency in your research though certain tools are best suited for certain types of research versus others. Tools also have the potential to “enhance” or “constrain” results.