Carleton PhD student ‘text-mining’ Jane Austen for new information

Jane Austen

In this recent interview ARTSFILE asked the Carlet0n PhD student Jenna Herdman about her project to ‘text-mine’ Jane Austen novels. The ability of computers to sort large amounts of information turns out to be a useful tool when it comes to seeing patterns in literature. This creates a way of examining a book through what is called distant reading. Let’s call it describing the forest using the trees. A new world of reading awaits.

Q. Tell me a little bit about yourself.

A. I am a PhD student in English at Carleton University. I grew up in Ottawa, and moved to Halifax for a BA in English and Political Science, with a minor in Journalism. I ended up back in Ottawa for my MA at Carleton, which narrowed my academic focus to English, with a specialization in Digital Humanities. My primary area of expertise is in 19th-century British literature and culture, specifically the London print culture and mid-19th century writing about urban poverty. The Jane Austen project falls a little outside of my regular research.

Q. Are you an Austen fan? 

A. Definitely. I’ve read several of her novels: Pride and Prejudice is inevitably a favourite and I quite enjoyed Northanger Abbey. I also have a soft spot for Persuasion, as it was the first Austen that I read in a scholarly context. Anne Elliot (the heroine) is such a wonderful character. Along with the original novels, I (am) a fan and consumer of the many texts that fall under ‘Austenmania.’ I’d highly recommend the novel Longbourn by Jo Baker (which retells Pride and Predjudice from the perspectives of the Bennet family servants); and Pride and Prejudice and Zombies by Seth Grahame-Smith. Other fun adaptations include the YouTube web-series The Lizzie Bennet Diaries, which modernizes the issues of courting and coming-of-age in Regency England into the concerns of 21st century college grads. For example, in the Diaries Lydia’s scandalous and dangerous elopement with George Wickham is modernized into a sex tape that Wickham threatens to circulate online. 

Jenna Herdman

Q. Why does her work appeal to you?

A. Austen created these fantastic, relatable characters who make mistakes, misinterpret social situations and have to face the consequences of their words and actions. They change their minds, grapple with difficult choices and develop strong friendships and relationships. Her characters have a lot to teach their readers about social relationships, personal development and how to negotiate our emotions and fears. Furthermore, I’ve always enjoyed Austen’s narrators, who comment in often sassy ways on the characters. From a more critical perspective, Austen’s contribution to the form of the novel in the early 19th century is important and interesting: she was a female writer, taking the problems and emotional development of female characters and female readers seriously. 

Q. Why do you think her works work today? 

A. I think that there is something enduringly appealing about several of Austen’s themes: the relationships between sisters and female friends, for example, which translates well into contemporary discourse and popular culture. Modern readers also enjoy some good, juicy romantic angst and love triangles, all of which abound in Austen. Furthermore, I think that there is also a huge market for period dramas and for the idealistic picture of provincial England in the Regency era that the novels offer. The small communities of characters and their complex social relationships offer a lot to play with for 21st century audiences.

Q. Can you explain your research idea. Where does it come from?

A. The Jane Austen project was an experiment in text-mining which involves the use of machines to look for patterns and trends. It’s been used in linguistics for a while, but can also be a useful method for literary studies. Another term which fits the project is ‘distant reading,’ which involves looking at large amounts of data. Distant reading has been positioned in opposition to the traditional close-reading of literary studies, though the general consensus is that distant reading should supplement, not replace, close reading. 

The project actually came about as part of a course that I took in the first year of my PhD. At the time, I was the TA for a class about Jane Austen and popular culture, which looked at Austen’s novels and Austen mania. I thought it might be interesting to see if any interesting patterns would emerge across the novels. For the project, I worked with a program called Voyant.  I also did a bit of playing around with the Google Ngram Viewer, which lets you look at all of the books published over the last two centuries and look for term frequencies across that corpus.

I had experimented with distant-reading before including looking at patterns in Dickens novels looking for recurrences of minor characters in some novels. Dickens’s novels … are populated with huge casts of characters, ranging from all social classes. Because the novels were published serially over long periods of time, the readers might forget about a minor character between their appearances. … For the project, I thought it would be interesting to trace the impact of these minor characters on the novel based on the frequencies of their names throughout the text. 

For the Jane Austen project, I focused on the frequencies of the love interests. Austen novels were published in the early part of the 19th century, before the prominence of the Victorian novel, and are quite different in form. Austen novels generally feature a female protagonist (and sometimes her sisters/female family members) who is moving through the domestic marriage plot. Crucially, she finds a marriage that offers romantic love and financial and social security. To fulfill the plot, the heroine has to reject the ‘wrong choice’ – a romantic rival to her future husband, who threatens to ruin her if he is chosen. For example, Elizabeth Bennet’s marriage to Darcy is defined by her rejection of Wickham. In Sense and Sensibility, Marianne’s eventual union with the steady Colonel Brandon is defined by her escape from Willoughby. 

My initial approach was to look at the name frequencies of the romantic rivals in each novel – for instance, how often Darcy is named compared to Wickham. I should stress that this project is quite simple and very prototypical, and that I find it more useful as a pedagogical tool than as authoritative scholarship.

Q. What did you hope to accomplish? 

A. The project relies on ‘peeling open’ the text of the novel and to encounter it in a new way. These days, we encounter novels in all sorts of different ways, from adaptations and non-textual media to Wikipedia or Sparknotes summaries. Text-mining allows the reader to open up the novel according to their interests and terms and offers new ways of entering it that might support a critical reading. A distant reading approach might let us visualize and explore the novel’s text, its language or its themes to supplement or precede our close reading of that text. Thus, a student working with these tools might use their results as a way to find and answer critical questions.

I am very interested in how we might teach and analyze digital literacy in English courses. These tools … can trace the evolution of language, such as the popularity of certain words across certain periods. It’s a pretty neat tool.

I am passionate about digital literacy, and on taking creative and critical approaches to texts and to digital tools. I envision using this methodology to introduce students to 19th-century novels and to interrogate the novels’ language and structure. These tools are very accessible for a general public audience, and distant reading projects could be tailored to fit different users’ individual interests and passions.

Q. What have you found? 

In my graph of Pride and Prejudice, I mapped the frequency of Mr. Darcy against mentions of Wickham. 

A second graph looks at the frequency trends of Willoughby and Colonel Brandon, the two contenders for Marianne Dashwood’s heart in Sense and Sensibility. … 

Both graphs show the triumph of the chosen husband at the end (unsurprisingly). Though Wickham’s importance is more consistent than Willoughby’s, the contrast between Willoughby’s and the Colonel’s trends in the Sense and Sensibility graph is far more dramatic than the contrast between Darcy and Wickham in the Pride and Prejudice graph. Indeed, though Brandon is the one who marries Marianne, Darcy and Willoughby’s trends are similar. Does this pattern suggest that Willoughby is the true romantic hero of Sense and Sensibility, despite his flaws and his betrayal of Marianne? Should my future analyses of Willoughby be to treat him more like Darcy, the romantic hero, than like Wickham, the immoral antagonist? In this sense, the graphs contradicted my initial assumption that the husband’s trend would prove more dominant.”

In comparison to the patterns in Pride and Prejudice and Sense and Sensibility, the graphs of romantic rivalries in Northanger Abbey and Mansfield Park show a more linear and less fluctuating trend for the romantic suitors in each of the novels. Instead of rapidly moving from high to low frequencies, they are more steady and consistent throughout the novels. 

Q. How can text mining advance scholarship?

Text mining projects might include linguistic analysis, or looking for patterns in individual texts. It could be used to settle an authorship debate: if there is a text whose author is unknown, we might use text mining tools to compare the text to texts with known-authors, and see if one style is similar enough to determine authorship. Some projects have used linguistic machine reading to look at authors like Agatha Christie and determine whether signs of Alzheimer’s disease emerge in the author’s writing.

Furthermore, text mining projects might be used to map or visualize texts, to create analytic tools or even pieces of art based on language patterns. Or, we might use these tools to compare all of the novels published in Britain in the 19th century and to find trends and make observations about popular reading practices.   

I am cautious about the claim that text mining, or other statistical or scientific ways of analyzing literary texts, will actually provoke a real or radical change in literary studies. Instead, we might use these tools to supplement research, and to find new ways of entering individual texts as well as larger archives.

Share Post
Written by

Peter Robb began his connection with the arts community in Ottawa in the mid-1980s when he was the administrator and public relations director of the Great Canadian Theatre Company. After a long career in journalism with the Ottawa Citizen where he served in a number of different posts he returned to the arts when he became the Citizen's arts editor.