Teal overlay

The UK’s political archives move to the cloud

Moving 22 years and 120 terabytes worth of government data, documents, archives and web content is not an easy task. In just over two decades, the government’s GOV.UK website has collected a huge wealth of material. After a project to move this archive into the cloud, the data is now fully searchable, and a search for “Brexit”, for example, yields 19,043 results.

That’s a comparatively recent topic, of course, and some areas have even more results. “Climate change”, for example, provided 1,141,844 exact matches in documents dated up to 2016. The first one appeared when GOV.UK first made its appearance in 2016, with an Environment Agency press release noting that "the results of research into the impacts of climate change are being studied by the Agency to assess their impact on water resource yields”.

The sheer size of the volumes of data was one of the biggest factors behind the decision to move the archive to the cloud in the first place.

Sheer size of the archive made it difficult to manage
Philip Clegg, the Chief Technical Officer at Manchester-based archiving experts MirrorWeb, who were chosen to head up the project, wrote: “As an archive grows, it becomes less and less sustainable for an organisation like The National Archives – or its archiving partner – to keep investing in new infrastructure to accommodate this growth.

“However, when you source virtual infrastructure from the likes of Amazon Web Services (AWS) – the cloud platform used by massive web brands such as Netflix and Airbnb – it becomes trivial to add new storage whenever you need it. (That’s not to say cloud providers don’t rely on physical hardware – they simply have the economies of scale to offer almost unlimited capacity.)”

He added that cloud-based infrastructure also tended to be faster and more reliable, with more layers of redundancy built in. Essentially, he said, cloud infrastructure was less likely to suffer a complete failure or overload.

Moving the data was a huge challenge
Before the move, the archive was stored on 72 USB-3 hard drives physically located in a data centre in Paris. After considering the options, MirrorWeb decided to use devices called AWS (Amazon Web Services) Snowballs, which are able to connect to a local network, encrypt and copy data onto internal hard drives before being transferred to an AWS data centre for transfer to the cloud. The company also used two custom-built PCs, which allowed them to transfer data from 16 of the original USB-3 hard drives at a time.

They also had to design a searchable platform where researchers and members of the public could find and access the data in its original form.

All the documents and data in the new archive were previously available to the public, so you’re not going to turn up any juicy state secrets. It is an important historical, social and political record of life and the political process in the UK over the past 20-odd years, however, and it’s now safe – and searchable – in the cloud.

uk.tdsynnex.com

Back to Top