Thursday, 13 June 2013

A page, but not as we know it



James Baker, Digital Curator, British Library

It is commonplace to describe something new in relation to something that is known: think 'motion picture', 'spaceship', 'email' or 'smartphone'. The word 'webpage' is no different. And indeed in a sense many webpages are similar to the pages found in books or newspapers: they hold static media (text, image); core elements of them read from top to bottom; their headers, footers, cut-aways and advertisements orientate, guide and entice the reader; and in URLs they possess a (relatively) unique system of identifiers. It is hard to think of another name these digital objects could have been given.

It is also commonplace for the new thing to - linguistically speaking - replace the old thing: think 'motion picture' and 'the pictures', 'spaceship' and 'ship', 'email' and 'mail', or 'smartphone' and 'phone'. The same goes for 'webpage' and 'page'. Here by virtue of this act of redefinition, the 'page' absorbs features of the webpage not (or less) possible in book or newspaper pages: features such as dynamic content, user interaction, and direct links to other pages (or, more precisely, other pages that are not part of a sequence defined by the author whose work is the main content held by the page).

All of this makes the webpage-cum-page appear both familiar and unsettling, conservative and disruptive, old and new. These elements of lineage are crucial, for they have allowed us (among other things) to think of preserving the webpage as akin to preserving the page. Yes the challenges of novelty and disruption are discussed and debated (on which I'm not qualified to comment), but at the most basic level the webpage stuff that is being collected by Internet Archive or the UK Web Archive is page level stuff. (This is not to say I don't think page level stuff should be archived. Far from it, the fragility of webpages is well known (see Rosenzweig, 2003) and without these efforts valuable data on our society would be lost.)

But what are these pages and how can historians use them? A seminar jointly hosted by the Digital History seminar and the Archives and Society seminar at the Institute of Historical Research sought last night to tackle this very problem, asking quite simply 'Is this a new class of primary source for historians?'. After a presentation on the UK Web Archive and the Analytical Access to the Domain Dark Archive project both the speakers and the audience were largely in agreement that yes, the web archive is a new class of primary source, of historical stuff.

Does this make our nomenclature for what this stuff is problematic? For to call a webpage a page is to potentially place it into a category for which it is ill-suited and the techniques for investigating that category under huge-strain. Take a normal news article from the Guardian website as an example. The page contains a story, framing, context and advertisements: all very page like. But those adverts are dynamic as opposed to static, their content quite possibly targeted depending on the IP address accessing the URL and different each time the page is refreshed. The page also contains moderated comments, ranked as default by oldest first but malleable to user preferences. In short, when you visit the website it is unlikely to be the same as when I visit the website, so an archived version can only be one possible version of a webpage at a particular historical moment. Not very page like behaviour. Of course we might (quite rightly for the most part) say that the 'core' of the page, the textual content that historians are likely to be interested in will remain the same regardless of these peripheral changes. And yet as the growth of mainstream live blogs demonstrates (such as those covering the Taksim Square protests), the web is moving toward dynamic content over static content as default: embedded video, maps and text content streams are now commonplace, and are likely to become more so as the web develops.

The webpage then is a rapidly evolving beast whose capacity to change whilst still being called a 'page' complicates how we do research using webpages and how we preserve the internet. It is a page but not a page as we knew it, a semantic shift worth keeping in mind as we prepare for an era of born-digital historical scholarship.

This post was first published on the British Library's Digital Scholarship blog.


Friday, 15 February 2013

Public Health in Local Government, 2001-2012


This is another in our series where researchers planning to use the archive outline their projects. This guest post is by Martin Gorsky of the London School of Hygiene and Tropical Medicine.

Public Health in Local Government, 2001-2012: web representations and practices


Background The Health and Social Care Act of 2012 has introduced a major restructuring of the National Health Service (NHS).  As part of this process public health duties have been removed from NHS bodies to become part of local government. The Department of Health (DH) describes the initiative as reviving 'a long and proud history' by 'returning public health home'. That history includes not only the great achievements of Victorian sanitary reform, but also the early NHS, because in Bevan's original design public health departments were based in local authorities, where they remained until the NHS reorganisation of 1974. Policy-makers see exciting opportunities for public health to further its goals within the local government setting. They hope that by being more closely connected to local peoples' needs and living environments, health professionals will be able to develop better programmes and to tackle inequalities. This, they believe, should work better than 'one size fits all' policies.

Historiography  What might the recent history of public health in local government tell us about the opportunities and challenges ahead?  There is already a limited body of work examining the changes of 1974 from a national perspective, which argues that various problems had arisen.  Post-war public health 'lost its way', lacking a clear philosophy and rationale at a time of rapidly changing health needs.  It was also sidelined by other professional groups, delivering patchy services and failing to link effectively with the NHS.  There is also a more recent history which is as yet unexamined.  Since 2001, and gaining official force in 2006, the DH has championed the joint appointments of Directors of Public Health (DPH), to straddle both NHS primary care trusts and local authorities.  The activities of these new appointees may give some interesting clues as to whether the structural and philosophical challenges of the earlier period retain their force or can be overcome.

Aim  The project will therefore aim to identify the web presence of these joint appointment DPHs during the period from 2001 up until the passage of the recent Act.  By reading these texts it will ask:
a. whether a coherent rationale for public health in a local government setting is discernible
b. what practices of joint working between NHS and local government are reported, and how the benefits of integration are represented.

Martin Gorsky
Centre for History in Public Health, LSHTM

Wednesday, 19 December 2012

Exploring and Uncovering British Eurosceptism in the Dark Archive


Here is another in our series of guest posts by those researchers who plan to use the archive for topics of particular interest to them:

Richard Deswarte - 'Exploring and Uncovering British Eurosceptism in the Dark Archive'

Britain's relationship to and subsequent engagement in the process of European integration is one of the most important political, economic and social developments of the last 50 years. This relationship has always been controversial even before the UK in 1973 joined the EEC, as it then was, and has certainly remained controversial ever since. The views and arguments of those individuals and groups who have opposed British membership, commonly referred to over the last twenty years as 'Euroscepticism' has been one of the enduring elements of British political and media debate. In the previous 15 years - exactly the period of the Web Domain dataset - much of this debate has been undertaken on the Web with many pro and anti-European groups setting up webpages and engaging in debates over the Web via blogs and other postings. To date there has been no dedicated research based on these online sites and debates. In conjunction with more traditional archival research that I am undertaking on British Euroscepticism, my AADDA project will take the opportunity to uncover and analyse the phenomenon of Euroscepticism on the Web.

In doing this research the following tools and digital research methods will be utilised. In the first instance I will engage in some Google style Ngram searching based on such key terms as Euroscepticism, EU, UKIP, Euroreferendum, etc. This should produce some interesting aggregate and qualitiative results, and patterns relating to volume, timing and variety of Websites and references. Following this I will undertake some proximity searching of related terms to see if this brings up different results and patterns. In addition I am keen to see what searching under images, as one can do in the current UK Web Archive, brings in terms of results given what I suspect will be a large number of images on these webpages. In addition if time allows it will be interesting to see if sentiment analysis can be applied to gauge the degrees of negativity of Webpages/websites and how successfully it can do so. Finally I will finish by undertaking some filtering of the results based on such elements as domain type and medium type to see what and if any interesting patterns emerge. At the same time I will be open to consider trying out some of the other tools and methods that the other researchers are finding particularly successful in their case studies.

Tuesday, 27 November 2012

Sentiment Analysis and the Reception of the Liverpool Poets

This is the latest in our series of guest posts from the AADDA researchers who are proposing ways in which the archive can inform their research. This post is from Helen Taylor of Royal Holloway:



I am currently writing a doctoral thesis entitled ‘Adrian Henri and Merseybeat poetry: performance, poetry, and public in the Liverpool Scene of the 1960s’ (at Royal Holloway, with Professor Robert Hampson). My work uses much archival research and oral memory, particularly in relation to the live event and oral poetry in Liverpool at the time.

Sentiment analysis of the Domain Dark Archive would be useful in relation to my work on the Liverpool Poets and their reception by not only the mainstream media but also by those who experienced their work at the time (in the form of memoir, via fan pages, forums, and the like), and as such could provide me with another area of information to consider alongside newspapers, interviews, and archival material.

My main proposal for the AADDA is for a small, self-contained, project involving proximity search. I have found in my research that a variety of labels have been attached to the poets, and I think it would be most interesting to see how Adrian Henri, Roger McGough, and Brian Patten are referred to in forums and similar (informal) internet sites. Henri is often referred to in academic material as a poet/painter, but I want to find out how ordinary people, for want of a better word, labelled him – and I will then combine and compare this data with searches for the same terms from newspaper and published works, as there is a marked difference in academic and popular attitudes to the poets.

Subsequent to this, I would like to run geo-indexing analysis, to see where (as well as who and when) these results are coming from. I would expect results within Liverpool, but it would be interesting to see where else is recorded. It would be particularly interesting to see if the Liverpool 8 postcode (which is where the poets were living and working) would be an area of memorialisation.

This project could be important for my research because I am approaching the literary movement from a multi-, inter-, and cross- media perspective, to present Merseybeat poetry as ‘total art’. In the archives in Liverpool there are flyers for events with a variety of labels for the poets (many of which were written by the poets themselves for events and tours), but I want to be able to provide evidence for how the people experiencing the work have categorised the poets and I think that proximity search will help me prove my thesis.


Helen Taylor

Monday, 12 November 2012

London French Geo-Indexing and Image Tagging


This is the third in our series of guest posts from researchers with proposals for how the domain dark archive can be interrogated. Saskia Huc-Hepher of the University of Westminster writes:


Calculating the precise number of French people living in the capital and specifying where they live within the sprawling city has to this day never been achieved. The French Embassy itself admits to its ignorance in this respect, stating that there are approximately 120 000 individuals registered at the French Consulate in London, but that they estimate the true number of French Londoners to be somewhere between 300 000 and 400 000. I have devised several strategies to try to determine with more certainty an accurate figure, from scrutinising the number of French-native speakers in London's state schools (by borough) to examining the quantities of French citizens registering for UK National Insurance cards (by year), and my next tactic is to consult the electoral rolls of each London constituency, pending the publication of the 2011 census data (which now includes questions on identity and language). Whilst I am aware of the limitations of a geo-indexing study, that is, that it will not provide a 'hard' figure for the specified period, my hope is that targeted searches might serve to triangulate my current findings. My aim is therefore to use the geo-indexing tool to map out the areas of London with the greatest concentrations of French inhabitants on the basis of the post-codes associated with 'French' web sites / spaces. This data would have the potential to confirm either the unexpected findings of the Francophone-schoolchildren investigation mentioned above (unexpected in that the borough with the highest number of French speakers was Lambeth, not Kensington and Chelsea as the stereotype might suggest) or, on the contrary, reinforce the stereotype, as depicted in a map reproduced by the Think London (A. Wlores) report which identified Kensington & Chelsea, Westminster, Hammersmith & Fulham and Wandsworth as having the largest concentrations of French residents. It would also have historical value in that it would ascertain whether or not there was any relationship between the areas most associated with the London French today and the areas favoured in previous waves of migration to the capital. The findings could then be used in the multi-layered e-resources referred to in the context of the aforementioned AHRC bid.

A study of this kind, focused on the French community in London, would be unprecedented and therefore make an entirely novel and original contribution to both academic and political spheres.

In addition to the 'physical' demographic mapping process I describe above, my doctoral research will also involve a multi-modal analysis of the French community websites selected for the Special Collection. Given the inherent and increasing multi-modality of the Internet, an ethnosemiotic approach to the examination of the London French web content would seem to be the most appropriate. My intention is to depict the visual landscape constructed by the French community websites and, using semiotic theory, attempt to infer meaning from the images and draw ethnographic conclusions regarding the community's sense of belonging; how they perceive and conceive London and its inhabitants; how they (re)present and define their own identity through images; what elements of France and Frenchness they portray and promote, etc. In order to give this visual study greater temporal contextualisation and depth, I intend to conduct a parallel micro-study on the Domain Dark Archive visual data using some kind of image-tagging analytical tool which would allow a word, or combination of words, such as 'French' and 'London', to search for photographs or images only that have been uploaded onto the (London French) websites contained in the archive. This study could also serve to triangulate the findings of the geo-indexing investigation in that the images and spaces associated with key words such as 'London', or specific areas within London, may overlap with the places and spaces that were identified as being particularly French through the geo-indexing process and/or historically. This investigation would therefore be binary in its objectives: visual data for both ethnosemiotic analysis and triangulation of geo-indexing data.

Further investigative mechanisms, comparable to the image-tagging search and analysis tool described above, could also be envisaged with the focus being on, by way of example, video or soundtracks. They were deemed, however, within the framework of the Domain Dark Archive, to be of reduced pertinence given that the earlier websites would undoubtedly contain less meaningful and more restricted data as a result of the technical constraints of the era. It is worthwhile considering such studies, nevertheless, for future scholarly research or AADDA pilots.

Saskia Huc-Hepher

Tuesday, 30 October 2012

PISA Rankings and public discourse

This is a guest post by another of the ADDAA researchers, Gemma Moss:


PISA Rankings and public discourse: Using the web domain dataset to explore how comparative statistical data have been used to set an agenda for educational change in the UK

The Programme for International Student Assessment (PISA), is a way of comparing educational performance in different countries, by testing students at age 15 when  they are preparing to leave schooling for work.  Conducted at three yearly intervals by the Organisation for Economic Co-operation and Development (OECD) since 2000, the latest round in 2012 involved 64 countries including all 34 OECD members.  Since their inception the rank orderings of countries’ performance has acted as a major spur to educational reform in many jurisdictions, particularly countries which collect little performance data of their own.  The findings are treated in national media as international league tables, with coverage in the UK focusing on our relative position (near to the mean) and whether we have risen or fallen in the rankings.  This information often enters political discourse.

This project will use the potential of the web domain dataset to explore how reports of the the first four cycles of assessment in the PISA series (2000; 2003; 2006; 2009) were covered on the net.  In particular the research aims are to identify:
the kinds of institutions that gave most prominence to the PISA findings,
how the findings were interpreted, and
the extent to which they led to calls for system reform.

In addition, this project will explore whether the analytic tools offered for analysing the web domain dataset enhance or hinder this form of enquiry.

The research questions are:
1.  Can the analytic tools suggested for use with the web domain archive help establish:
Which kinds of institutions were mostly likely to comment on PISA data? (Newspapers; government agencies; universities; think-tanks; individuals in the blogosphere)
How the data were represented and interpreted?
What the data led to in terms of ideas for system change in the UK?

2.  Do the analytic tools employed to answer 1.  offer efficiencies of research time and scale in understanding the uptake and recontextualisation of research knowledge about PISA via the web and the knowledge communities it represents?

Gemma Moss, Institute of Education

Thursday, 18 October 2012

The Decline of Parliamentary Political Engagement, 2004-2010: implications for 2012 and beyond

This is a guest post by Carole Taylor, one of the researchers investigating the Domain Dark Archive as part of the AADDA project:


I  am investigating the decline of Parliamentary political engagement in the UK since 2004, a trend documented in the Hansard Society’s annual Audit[s] of Political Engagement. Public attitudes to the political process have “hardened” in recent years; for example the number of people certain that they will vote in a national election has dropped to an all-time low of 48%. My particular interest is in the impact of the work of MPs and peers in the Westminster Parliament, on public opinion; I want to be clearer about the links between political engagement and what Parliament does.

In my research proposal to this consultation, I suggested four questions that the Domain Dark Archive might address:

One: could we identify websites addressing some or all of the core indicators of political engagement (ie, knowledge and interest, action and participation, and efficacy and satisfaction)?
Two: could comparison searches be done to give parliamentarians an insight into changing public perceptions of the parliamentary process?
Three: can social media forums used by parliamentarians be identified in a time-sensitive way that highlights political themes commented on from one year to the next?
And four: could we examine the House of Lords blog, say, to analyse how politicians – peers in this case – engaged with the spontaneous, seldom thought-through but increasingly influential eruptions of public opinion expressed in tweets and blogs?

Given the limited amount of time we will have with the dataset this spring, I plan to focus on the last two questions, using the House of Lords as a case study not least because the Lords was the first parliamentary chamber in the world to set up a bipartisan blog (in 2008). Many peers comment on other blogs as well, and it will be interesting to chart how a discrete group of peers and public have interacted online during a period of decline in so-called political engagement. Between now and the spring I will interview peers with an interest in social media in order to identify why they got involved in blogging in the first place. This research will give me relevant key words and phrases to submit to the DDA consultation for search and analysis.


Dr Carole Taylor BSc, MA, PhD
taylorcm@parliament.uk