What is it?

Parli-N-Grams is an N-Gram viewer. The N-Grams it displays are extracted from Hansard, the archive of parliamentary debates, and most specifically from the House of Commons debates.

How does it work?

Parli-N-Grams is based on a collection of scripts.

There is a harvesting component that collects the debates files, a parsing component that extracts the actual N-Grams and builds a database model, and a data visualization component that creates the charts.

If you want to try, just type some words in the field and click on search. You can add multiple ngrams using the "+" button.

The ngrams are extracted from the original text using word2phrase and word2vec. The viewer supports 1- through 8-grams.

Where do the data come from?

All data comes from the XML archive at TheyWorkForYou, made by the awesome people at mySociety.

How often is it upgraded

The data is refreshed once a day, in the early morning, which is when data becomes available.