Tuesday, 12 November 2013

Researchers' final reports (3)

This is the third in our series of final reports by the AADDA project researchers, posted with their permission. This one is by Dr Carole Taylor, a researcher at the House of Lords:

I. Research Background and Methodology 

My historical expertise lies in early Georgian music, art and politics and was not obviously suited to the Domain Dark Archive focus on UK websites extant between 1996 and 2010. However, my work as Research and Parliamentary Assistant to peers in the House of Lords seemed a more promising fit. I discussed this with colleagues in the Lords who immediately recognised the potential value of the web archive for MPs and Peers with “a range of policy interests which will map onto those of academic researchers”.  With the particular encouragement and advice of Dr Elizabeth Hallam Smith (Director of Information Services and Librarian, House of Lords Library) I identified political engagement as an area of obvious interest to parliamentarians, as well as a theme noted by Peter Webster during the 13 June 2012 seminar as a category, among others, that lent itself well to web archive research of this kind, and wrote up a proposal.

I undertook an intensive period of research to familiarise myself with the present state of serious research on political engagement in the UK in order to identify a manageable research exercise to take to the AADDA interface. I was advised by several academic colleagues, particularly two PhD students in the Department of Government at the University of Essex, one of the two main centres (together with the University of Lancaster) of political engagement studies in the UK. I was also assisted in this information gathering exercise by a Senior Researcher at the House of Lords Library where serious efforts are made to understand how parliamentarians are listening to and engaging with the public.

In advance of our access to the AADDA I presented a scaled-down version of my research proposal to the IHR/BL team in March. I suggested a focus on social media forums used by parliamentarians, particularly the House of Lords blog, launched in 2008. The House of Lords was the first parliamentary chamber in the world to set up a bipartisan blog which makes it a compelling example in the history of political engagement. Disappointingly, in the ensuing months leading up to our encounter with the AADDA dataset, I learned that social media sites with the exception of .co.uk would not be included in the dataset, which meant my topic was no longer viable. I re-thought the proposal and decided on a very narrow, entirely new subject that felt manageable to complete within the parameters of the consultation – Heathrow’s Third Runway. In February 2013 I had a meeting with Jane Winters and Jonathan Blaney at the IHR to confirm this third version of the research proposal was acceptable (it was).

Thus I was on track for the purposes of the consultation. However, with such limited access to social media sites the value of this exercise for serious researchers at Parliament was considerably eroded. Even the significance of the results below on the “Third Runway” was questioned, albeit sympathetically, by parliamentarians who cautioned that I appeared to be accessing information that is already well-known to parliamentary researchers. Their interest is obviously about what this resource can offer over and above what they know already.  It may be that areas that did not receive such widespread public airing (such as the Third Runway did) will deliver better results.

II. Research Results: 

1st session: March 2013

I questioned the interface three times.

  1. “third runway” – 171 items; 
  2. “third runway” AND “parliament” – 71 items; 
  3. “third runway” AND “heathrow” and “parliament” – 69 items


  • a lot of travel companies; 
  • .gov.uk (5 items) – entirely predictable; 
  • public suffixes important to investigate engagement, but I didn’t readily grasp how usefully to link left and right side of the search results page

Questions arising:
How many of the 100 that were dropped between first two searches might have included useful information? In this respect, I agree with GM at the 21 March 2013 meeting who said “there needs to be a ‘search within’ option, for when there are many thousands of results.” PW’s response that “in such cases adding more search terms should have the same effect” is helpful to reduce “many thousands” to a couple of hundred; however, at this point I might not want to lose potentially useful information in the course of adding a new search term.

What about people who are undecided or don’t express their views? an obvious but important qualitative question for historical researchers.

It would be a great help if a preview screen were available to the right of each item. Through all my searches (in March and September), I clicked on countless items that were duplicates of what I’d clicked on two or three items earlier. (Titles of items often differ, so titles alone are not a dependable indicator.)

At the March meeting I asked Peter and Andrew how to turn the search data into ngrams; the answer was that AADDA will have a “click to create ngram” function – not there yet: would be a great help

2nd session: September 2013

I questioned the interface 10 or 15 times.

  1. “third runway” AND “parliament” – yielded 990 items, but breakdown of these results (crawl year, content type etc) proved both manageable and useful; 
  2. “third runway” AND “soley” – 122 items: Lord Soley was Chairman of “Future Heathrow”, the pro-expansion group; among the 122 items was an interview (helpful, though repeated twice); the first 21 items were all the same and most were inaccessible or gobbledygook (cooking recipes); several references had no mention of Soley or third runway at all, eg, travel sites (nothing to do with Soley and no mention of his name)
  3. “third runway” AND “house of lords” – 206 items; and “third runway” AND “aviation” – 2000+. For both these, I checked out two extremes ends of sentiment analysis (“very positive” and “very negative”). Many of these items failed to link the two search filters in any way. Nearly all of the 206 items in the first set were bbc – this was not only a problem of repetition (though there was plenty of this), but these are also widely public documents of little use to parliamentarians (who are already well-equipped with knowledge at this level).
  4. I checked “third runway” AND “Howard Davies” on the off chance he was mentioned in this connection before he became Chairman of the Airports Commission in 2008 – eight items, all identical (a pdf report of the Association of British Insurers that had no mention of “third runway” or Davies) – disappointing! 
  5. Also checked “third runway” AND “future of aviation”; “third runway” AND “environment”; “third runway” AND “economy” – no new observations.

It would be a great help if we could print the page with search results (or somehow export this material).

Questions and Concerns:
Clearly in this September round of questioning the dataset I was encountering problems with the Boolean AND search that didn’t arise in March. At best I seemed in September to be accessing OR rather than AND; at worst there was no connection to either search filter. I corresponded with Richard Deswarte about this and he could not see where the problem lay and I have no idea what the problem was either.

Sentiment Analysis, where it hits items to do with the search term(s), was at least consistent and might therefore be of interest in early stages of research.

Repetition: This is my biggest concern about keyword searching: does the repetition of material occurring from one crawl to the next, render the number totals listed on the search results meaningless for the historian? And will this problem be multiplied by 200 when the entire dataset is available? Peter cautioned users to avoid taking numbers of results for 2009 and 2010 as evidence of patterns in relation to the previous years; does repetition compound this problem for all years?

III. Concluding Remarks

The Digital History and Archives seminar presented by Peter Webster and Richard Deswarte at the IHR on 23 September 2013 was an invaluable guide to my second round of searches on the interface: http://historyspot.org.uk/podcasts/digital-history/web-archives-new-class-pr - click on “Web Archives: A New Class of Primary Source for Historians?” I’d particularly like to highlight Peter’s observation that the traditional separation of historian and keeper of archives no longer holds in digitized systems of this kind. During the Q&A Tim Hitchcock expanded on this point, remarking that models of society now being digitized – newspapers, etc – were of course not digitized at the time. These changes demand a new skillset now being shaped by and for C21 historians. To this I would add that scholars will have questions about subjects they know well and subjects they are addressing for the first time, and this fact needs also to be built into the process of curating datasets of this kind – particularly in the present, pioneering stage of digital research.