Wednesday 19 December 2012

Exploring and Uncovering British Eurosceptism in the Dark Archive


Here is another in our series of guest posts by those researchers who plan to use the archive for topics of particular interest to them:

Richard Deswarte - 'Exploring and Uncovering British Eurosceptism in the Dark Archive'

Britain's relationship to and subsequent engagement in the process of European integration is one of the most important political, economic and social developments of the last 50 years. This relationship has always been controversial even before the UK in 1973 joined the EEC, as it then was, and has certainly remained controversial ever since. The views and arguments of those individuals and groups who have opposed British membership, commonly referred to over the last twenty years as 'Euroscepticism' has been one of the enduring elements of British political and media debate. In the previous 15 years - exactly the period of the Web Domain dataset - much of this debate has been undertaken on the Web with many pro and anti-European groups setting up webpages and engaging in debates over the Web via blogs and other postings. To date there has been no dedicated research based on these online sites and debates. In conjunction with more traditional archival research that I am undertaking on British Euroscepticism, my AADDA project will take the opportunity to uncover and analyse the phenomenon of Euroscepticism on the Web.

In doing this research the following tools and digital research methods will be utilised. In the first instance I will engage in some Google style Ngram searching based on such key terms as Euroscepticism, EU, UKIP, Euroreferendum, etc. This should produce some interesting aggregate and qualitiative results, and patterns relating to volume, timing and variety of Websites and references. Following this I will undertake some proximity searching of related terms to see if this brings up different results and patterns. In addition I am keen to see what searching under images, as one can do in the current UK Web Archive, brings in terms of results given what I suspect will be a large number of images on these webpages. In addition if time allows it will be interesting to see if sentiment analysis can be applied to gauge the degrees of negativity of Webpages/websites and how successfully it can do so. Finally I will finish by undertaking some filtering of the results based on such elements as domain type and medium type to see what and if any interesting patterns emerge. At the same time I will be open to consider trying out some of the other tools and methods that the other researchers are finding particularly successful in their case studies.

Tuesday 27 November 2012

Sentiment Analysis and the Reception of the Liverpool Poets

This is the latest in our series of guest posts from the AADDA researchers who are proposing ways in which the archive can inform their research. This post is from Helen Taylor of Royal Holloway:



I am currently writing a doctoral thesis entitled ‘Adrian Henri and Merseybeat poetry: performance, poetry, and public in the Liverpool Scene of the 1960s’ (at Royal Holloway, with Professor Robert Hampson). My work uses much archival research and oral memory, particularly in relation to the live event and oral poetry in Liverpool at the time.

Sentiment analysis of the Domain Dark Archive would be useful in relation to my work on the Liverpool Poets and their reception by not only the mainstream media but also by those who experienced their work at the time (in the form of memoir, via fan pages, forums, and the like), and as such could provide me with another area of information to consider alongside newspapers, interviews, and archival material.

My main proposal for the AADDA is for a small, self-contained, project involving proximity search. I have found in my research that a variety of labels have been attached to the poets, and I think it would be most interesting to see how Adrian Henri, Roger McGough, and Brian Patten are referred to in forums and similar (informal) internet sites. Henri is often referred to in academic material as a poet/painter, but I want to find out how ordinary people, for want of a better word, labelled him – and I will then combine and compare this data with searches for the same terms from newspaper and published works, as there is a marked difference in academic and popular attitudes to the poets.

Subsequent to this, I would like to run geo-indexing analysis, to see where (as well as who and when) these results are coming from. I would expect results within Liverpool, but it would be interesting to see where else is recorded. It would be particularly interesting to see if the Liverpool 8 postcode (which is where the poets were living and working) would be an area of memorialisation.

This project could be important for my research because I am approaching the literary movement from a multi-, inter-, and cross- media perspective, to present Merseybeat poetry as ‘total art’. In the archives in Liverpool there are flyers for events with a variety of labels for the poets (many of which were written by the poets themselves for events and tours), but I want to be able to provide evidence for how the people experiencing the work have categorised the poets and I think that proximity search will help me prove my thesis.


Helen Taylor

Monday 12 November 2012

London French Geo-Indexing and Image Tagging


This is the third in our series of guest posts from researchers with proposals for how the domain dark archive can be interrogated. Saskia Huc-Hepher of the University of Westminster writes:


Calculating the precise number of French people living in the capital and specifying where they live within the sprawling city has to this day never been achieved. The French Embassy itself admits to its ignorance in this respect, stating that there are approximately 120 000 individuals registered at the French Consulate in London, but that they estimate the true number of French Londoners to be somewhere between 300 000 and 400 000. I have devised several strategies to try to determine with more certainty an accurate figure, from scrutinising the number of French-native speakers in London's state schools (by borough) to examining the quantities of French citizens registering for UK National Insurance cards (by year), and my next tactic is to consult the electoral rolls of each London constituency, pending the publication of the 2011 census data (which now includes questions on identity and language). Whilst I am aware of the limitations of a geo-indexing study, that is, that it will not provide a 'hard' figure for the specified period, my hope is that targeted searches might serve to triangulate my current findings. My aim is therefore to use the geo-indexing tool to map out the areas of London with the greatest concentrations of French inhabitants on the basis of the post-codes associated with 'French' web sites / spaces. This data would have the potential to confirm either the unexpected findings of the Francophone-schoolchildren investigation mentioned above (unexpected in that the borough with the highest number of French speakers was Lambeth, not Kensington and Chelsea as the stereotype might suggest) or, on the contrary, reinforce the stereotype, as depicted in a map reproduced by the Think London (A. Wlores) report which identified Kensington & Chelsea, Westminster, Hammersmith & Fulham and Wandsworth as having the largest concentrations of French residents. It would also have historical value in that it would ascertain whether or not there was any relationship between the areas most associated with the London French today and the areas favoured in previous waves of migration to the capital. The findings could then be used in the multi-layered e-resources referred to in the context of the aforementioned AHRC bid.

A study of this kind, focused on the French community in London, would be unprecedented and therefore make an entirely novel and original contribution to both academic and political spheres.

In addition to the 'physical' demographic mapping process I describe above, my doctoral research will also involve a multi-modal analysis of the French community websites selected for the Special Collection. Given the inherent and increasing multi-modality of the Internet, an ethnosemiotic approach to the examination of the London French web content would seem to be the most appropriate. My intention is to depict the visual landscape constructed by the French community websites and, using semiotic theory, attempt to infer meaning from the images and draw ethnographic conclusions regarding the community's sense of belonging; how they perceive and conceive London and its inhabitants; how they (re)present and define their own identity through images; what elements of France and Frenchness they portray and promote, etc. In order to give this visual study greater temporal contextualisation and depth, I intend to conduct a parallel micro-study on the Domain Dark Archive visual data using some kind of image-tagging analytical tool which would allow a word, or combination of words, such as 'French' and 'London', to search for photographs or images only that have been uploaded onto the (London French) websites contained in the archive. This study could also serve to triangulate the findings of the geo-indexing investigation in that the images and spaces associated with key words such as 'London', or specific areas within London, may overlap with the places and spaces that were identified as being particularly French through the geo-indexing process and/or historically. This investigation would therefore be binary in its objectives: visual data for both ethnosemiotic analysis and triangulation of geo-indexing data.

Further investigative mechanisms, comparable to the image-tagging search and analysis tool described above, could also be envisaged with the focus being on, by way of example, video or soundtracks. They were deemed, however, within the framework of the Domain Dark Archive, to be of reduced pertinence given that the earlier websites would undoubtedly contain less meaningful and more restricted data as a result of the technical constraints of the era. It is worthwhile considering such studies, nevertheless, for future scholarly research or AADDA pilots.

Saskia Huc-Hepher

Tuesday 30 October 2012

PISA Rankings and public discourse

This is a guest post by another of the ADDAA researchers, Gemma Moss:


PISA Rankings and public discourse: Using the web domain dataset to explore how comparative statistical data have been used to set an agenda for educational change in the UK

The Programme for International Student Assessment (PISA), is a way of comparing educational performance in different countries, by testing students at age 15 when  they are preparing to leave schooling for work.  Conducted at three yearly intervals by the Organisation for Economic Co-operation and Development (OECD) since 2000, the latest round in 2012 involved 64 countries including all 34 OECD members.  Since their inception the rank orderings of countries’ performance has acted as a major spur to educational reform in many jurisdictions, particularly countries which collect little performance data of their own.  The findings are treated in national media as international league tables, with coverage in the UK focusing on our relative position (near to the mean) and whether we have risen or fallen in the rankings.  This information often enters political discourse.

This project will use the potential of the web domain dataset to explore how reports of the the first four cycles of assessment in the PISA series (2000; 2003; 2006; 2009) were covered on the net.  In particular the research aims are to identify:
the kinds of institutions that gave most prominence to the PISA findings,
how the findings were interpreted, and
the extent to which they led to calls for system reform.

In addition, this project will explore whether the analytic tools offered for analysing the web domain dataset enhance or hinder this form of enquiry.

The research questions are:
1.  Can the analytic tools suggested for use with the web domain archive help establish:
Which kinds of institutions were mostly likely to comment on PISA data? (Newspapers; government agencies; universities; think-tanks; individuals in the blogosphere)
How the data were represented and interpreted?
What the data led to in terms of ideas for system change in the UK?

2.  Do the analytic tools employed to answer 1.  offer efficiencies of research time and scale in understanding the uptake and recontextualisation of research knowledge about PISA via the web and the knowledge communities it represents?

Gemma Moss, Institute of Education

Thursday 18 October 2012

The Decline of Parliamentary Political Engagement, 2004-2010: implications for 2012 and beyond

This is a guest post by Carole Taylor, one of the researchers investigating the Domain Dark Archive as part of the AADDA project:


I  am investigating the decline of Parliamentary political engagement in the UK since 2004, a trend documented in the Hansard Society’s annual Audit[s] of Political Engagement. Public attitudes to the political process have “hardened” in recent years; for example the number of people certain that they will vote in a national election has dropped to an all-time low of 48%. My particular interest is in the impact of the work of MPs and peers in the Westminster Parliament, on public opinion; I want to be clearer about the links between political engagement and what Parliament does.

In my research proposal to this consultation, I suggested four questions that the Domain Dark Archive might address:

One: could we identify websites addressing some or all of the core indicators of political engagement (ie, knowledge and interest, action and participation, and efficacy and satisfaction)?
Two: could comparison searches be done to give parliamentarians an insight into changing public perceptions of the parliamentary process?
Three: can social media forums used by parliamentarians be identified in a time-sensitive way that highlights political themes commented on from one year to the next?
And four: could we examine the House of Lords blog, say, to analyse how politicians – peers in this case – engaged with the spontaneous, seldom thought-through but increasingly influential eruptions of public opinion expressed in tweets and blogs?

Given the limited amount of time we will have with the dataset this spring, I plan to focus on the last two questions, using the House of Lords as a case study not least because the Lords was the first parliamentary chamber in the world to set up a bipartisan blog (in 2008). Many peers comment on other blogs as well, and it will be interesting to chart how a discrete group of peers and public have interacted online during a period of decline in so-called political engagement. Between now and the spring I will interview peers with an interest in social media in order to identify why they got involved in blogging in the first place. This research will give me relevant key words and phrases to submit to the DDA consultation for search and analysis.


Dr Carole Taylor BSc, MA, PhD
taylorcm@parliament.uk

Friday 25 May 2012

Workshop 3: for humanities and social scientists

Bookings for this event are now closed.

Could you imagine what research questions you might be able to answer using a comprehensive archive of UK websites for the period 1996 to 2010 ? If so, this workshop may be for you, and bookings are now open. It offers an introduction to the Domain Dark Archive, a unique new research dataset, purchased by the JISC from the Internet Archive, in the keeping of the British Library, and not yet publicly available.

The workshop affords a unique opportunity to learn about the DDA, and to help shape the development of the new user interface for the data. The results of these workshops will directly influence the development of the search, analysis and display tools for the new service.

Where:  British Library (St Pancras, London)
When:  Wednesday 13 June,  10.50am - 3.15pm

Sessions include:
  • Introducing the UK Web Archive at the BL, and the Domain Dark Archive in particular;
  • What is analytical access anyway, and what could it do for me ? 
  • Case study: How one scholar might use the Domain Dark Archive  [see earlier post for a preview]
To book a place, contact the project manager, Dr Peter Webster, at Peter.Webster@sas.ac.uk, with a brief statement of your research interests.Booking is free, but places are very limited.

The event will be most suitable for scholars at doctoral level or higher, but should not be viewed as introductory research training.
A sandwich lunch will be served, and we will also be able to reimburse reasonable travel expenses within the UK.

Workshop 2: arts and humanities

Bookings for this event are now closed. 


Could you imagine what research questions you might be able to answer using a comprehensive archive of UK websites for the period 1996 to 2010 ? If so, this workshop may be for you, and bookings are now open. It offers an introduction to the Domain Dark Archive, a unique new research dataset, purchased by the JISC from the Internet Archive, in the keeping of the British Library, and not yet publicly available.

The workshop affords a unique opportunity to learn about the DDA, and to help shape the development of the new user interface for the data. The results of these workshops will directly influence the development of the search, analysis and display tools for the new service.

Where:  British Library (St Pancras, London)
When:  Tuesday 12 June,  10.50am - 3.00pm

Sessions include:
  • Introducing the UK Web Archive at the BL, and the Domain Dark Archive in particular;
  • What is analytical access anyway, and what could it do for me ? 
  • Case study: How one scholar might use the Domain Dark Archive  [see earlier post for a preview]
To book a place, contact the project manager, Dr Peter Webster, at Peter.Webster@sas.ac.uk, with a brief statement of your research interests.
Booking is free, but places are very limited.

 The event will be most suitable for scholars at doctoral level or higher, but should not be viewed as introductory research training.
A sandwich lunch will be served, and we will also be able to reimburse reasonable travel expenses within the UK.

Tuesday 17 April 2012

Workshop 1: for historians

Bookings for this event are now closed.


Could you imagine what historical questions you might be able to answer using a comprehensive archive of UK websites for the period 1996 to 2010 ? If so, this workshop may be for you, and bookings are now open.

The workshop affords a unique opportunity to learn about, and shape the development of, a unique new dataset, purchased by the JISC from the Internet Archive, and in the keeping of the British Library.

Where:  British Library (St Pancras, London)
When:  Thursday 24 May,  11am - 3.15pm

Sessions include:
  • Introducing the UK Web Archive 
  • How one historian might use the Domain Dark Archive  [see earlier post for a preview]
  • What is analytical access anyway, and what could it do for me ? 
To book a place, contact the project manager, Dr Peter Webster, at Peter.Webster@sas.ac.uk [or, from May 7, jane.winters@sas.ac.uk], with a brief statement of your research interests.
Booking is free, but places are very limited. Bookings will close at 12 noon on  May 10th, and applicants will hear whether they have secured a place soon afterwards. The event will be most suitable for scholars at doctoral level or higher.
Lunch will be served, and we will also be able to reimburse reasonable travel expenses within the UK.

Saturday 14 April 2012

What on earth would I do with this data ?

We’re busy arranging a series of workshops in May and June. Their purpose is to gather humanities and social science scholars together to think collectively about the kind of purposes to which they might put a near-comprehensive dataset of the UK web domain 1996-2010.

The exercise is going to involve the use of the imagination to an extent, since part of this project is to help the British Library to design a user interface for the new dataset; so there isn’t yet anything ‘to play with’, as it were. In order to help fund scholars’ imaginations, I’ve started to sketch how I myself, as an historian of contemporary British Christianity, might start to use the dataset; what questions I would like to ask of it.

I come to this with a research interest in the forms of words in which religion (broadly defined) is discussed, and how those modes of discourse change over time. This can usefully be thought of using the following scheme:
(i) there are some perennial issues, in relation to what we might call constitutional Christianity, taking such questions as the position of the bishops in the House of Lords, and the establishment of the Church of England
(ii) there are older issues that have been ‘reactivated’ in recent years. For instance, denominational church schools were an issue as far back as the 1906 general election. After a period of calm about the issue in public discussion, the last decade or so has seen the issue come back to prominence - except, of course, that they are now known as faith schools.
(iii) there are also new issues, the obvious one being the perception of a threat from radical Islamism; an issue that was simply absent until relatively recently.

I personally am particularly interested in the domain dark archive, since the period 1996-2010 frames many of these issues perfectly. So, what might I ask of the archive, and which tools might I use ?

Basic visualisation: the Ngram

At a most basic level, I might want to look at the incidence of particular terms, and look for periods in which a particular term is employed more often. For this, there is the Ngram; a visualisation tool that is already employed by Google, and on the existing UK Web Archive. Consider the following case:  in February 2008, the archbishop of Canterbury Rowan Williams gave a lecture to an audience of lawyers which reflected on the scope for the incorporation of sharia law into UK law. For some details of the media storm that followed, see here. An Ngram of the incidence of the word 'sharia' in the existing selective web archive looks like this:
As we might expect, there is a big spike in the incidence of the term at the time of the lecture, and then heightened activity for much of the following year. I had expected the former, certainly, but not the latter to the same extent; and so I now know to look more at the content indicated by those subsequent spikes in activity.
If one then looks for both of the terms 'sharia' and 'archbishop', it appears:

The spike in the terms happens at roughly the same point; but the incidence of 'archbishop' is higher, due perhaps to the wider speculation about Dr Williams' position as a result of the controversy. Also, the repeated peaks visible for 'sharia' aren't present for 'archbishop', suggesting that the debate about the former outlasted the particular instance of the lecture.

Proximity searching and sentiment analysis

One might, of course, want to go further than this, and by means that aren't yet possible within the UK Web Archive. One means might be using a proximity search - looking for terms occurring within a certain number of characters' distance of each other in the same source.  The graph above only shows the instances of the two terms, but (crucially) not necessarily occurring together in the same source. A proximity search would make the connection that is suggested by the graphs above much more secure.

Even more interesting would be sentiment analysis: gauging the attitude of the writer of a webpage towards the term employed, using various techniques including natural language processing to find terms denoting approval or disapproval occurring in connection with the search term. The present archbishop, when he retires at the end of the year, may look back on a very particular relationship with the media during his time in office. I would be interested to see whether 'archbishop' appeared more often in the data with negative connotations after the sharia controversy.

These, of course, are only some attempts to imagine what might be possible using the Domain Dark Archive. I shall be blogging more as the project progresses, and the possibilities become clearer.



Wednesday 14 March 2012

AADDA's 'elevator pitch'

I'm just on my way home from a very productive meeting with all the projects in this JISC sustainability strand. One of the activities during the day was building an one minute "elevator pitch" for the project, using the Pitch Builder from Harvard Business School. And so - here it is:

"AADDA is a joint venture between the IHR, the British Library and the University of Cambridge. It aims to transform the way in which humanities and social science researchers interact with the single most important archive of .uk web materials. It will develop innovative tools for analytical access to 40TB of primary data from UK webspace (1996-2010) and as a result will allow scholars to ask hitherto impossible questions of a singularly significant dataset. The archival record of contemporary Britain has increasingly migrated to a digital-only environment. The sheer volume of the record now requires new tools to render it accessible to scholars, and to unlock this unique and largely unexplored resource. In the next year, we will draw on a committed group of researchers to guide the British Library in the specification and development of tools for the analysis of the domain archive. Use cases arising from the project will be integral to the Library's sustainability strategy for the archive."

Thursday 2 February 2012

The AADDA project


AADDA is an 18-month project to enhance the sustainability of a substantial dark archive of UK domain websites collected between 1996 and 2010 by the Internet Archive, copies of which were recently acquired by the JISC and are stored at the British Library on their behalf. The project team will work with researchers in contemporary history in particular, and digital humanities in general, to obtain feedback on the feasibility of using web archives at an analytical level. The project will build on this feedback and on the existing UK Web Archive interface in order to develop new forms of analytical access to this collection, thereby enabling researchers to carry out unique and hitherto impossible research queries. This will make a significant contribution to the global understanding of the research value of web archives, particularly for collections that span over a decade and more. Proven clarification of the utility of web archives for scholarly research will significantly enhance the long-term sustainability of the collection and provide valuable data about use cases for justification of ongoing funding.
The project will assess and aim to increase the acknowledged value of domain web archives for scholarly research. Following a survey of current perceptions and consultation with researchers, it will develop prototype tools for the exploitation of domain web archives, raise awareness of the material and services available, promote discussion and debate among key stakeholders, and inform future scholarly access arrangements at a domain level.
The project is led by Dr Jane Winters (IHR), and managed by Dr Peter Webster (IHR), working closely with Maureen Pennock and colleagues at the British Library and Dr Anne Alexander (University of Cambridge). Evaluation will be undertaken by Simon Tanner (King's College London).