Beyond the Black Box

Beyond the Black Box is a programme of advanced digital humanities workshops at the University of Edinburgh, designed to foster statistical, algorithmic and quantitative literacy. It is directed by Anouk Lang, administered by Robyn Pritzker and funded by a grant from the British Academy.



All Edinburgh staff and postgraduates are eligible to apply for a place: booking links for each workshop are given below, and places are limited to 15 per workshop so early booking is advised. Unless otherwise noted, participants will need to bring their own laptops to each workshop.



Databases: MariaDB and Navicat – Wed 25 January 2017, 3-5pm

Learning to use relational databases is an important skill for managing, and structuring, data for digital projects so as to maximise analytical possibilities. This workshop will demonstrate how to use MariaDB, a free, open-source database server, and Navicat, a graphical user interface through which MariaDB can be administered. (Navicat is not open-source, but participants can make use of its free trial period in order to complete the workshop.)

Bridget Moynihan, University of Edinburgh | book at



Stylometric Analysis with Stylo – Fri 27 January 2017, 1-4pm

Stylo is a script package for R which allows users to perform statistical analysis on text files to explore stylistic similarities between texts in the service of, for instance, authorship attribution. Stylo comes with a graphical user interface and, for more experienced users, can also be used directly from an R prompt. This workshop will show users how to set up textual corpora for use in Stylo, how to perform basic operations and how to interpret the graphical results.

Dr Maciej Eder, Institute of Polish Language (Polish Academy of Sciences) | book at



Topic Modelling – Mon 30 January 2017, 2-5pm

Topic modelling is a method of textual analysis which has been in use within computer science for some time, but which has in recent years begun to generate interest among humanities scholars for its ability to identify “topics”, or thematic clusters, in textual corpora by analysing the way terms are distributed across documents. This workshop will explain the mathematical basis on which topic modelling algorithms operate, and will show participants how to begin creating topic models of their own.

Dr Christof Schöch, University of Würzburg | book at



Image Recognition with Pastec – Wed 8 February 2017, 2-5pm

Pastec is an open-source index and search engine for image recognition which can find duplicate and near duplicate images in a corpus of digital images. It can find not only exact replicas but also, for instance, reuses of the same woodblock on a different page, printed at different angles, with different inking, and subtle degradations in the printing block. This workshop will take participants through the process of installing Pastec on a virtual machine, setting it up, and running it over image data.

Dr James Baker, University of Sussex | book at



Transforming XML with XSLT – Fri 10 February 2017, 12.30-3.30pm

So you’ve poured blood, sweat and tears into marking up text in XML, and agonised over a perfectly customised TEI schema, but how to show the fruits of all that labour to the world? This workshop demystifies XSLT, the missing ingredient which allows scholars to take the information and editorial choices encoded in XML documents and choose the best ways of representing them on screen. 

Dr Melodee Beals, Loughborough University | book at



Data Visualisation – Wed 15 February 2017, 2-5pm

As the datasets used by humanists become ever larger and more readily accessible, the ability to render and interpret overwhelmingly large amounts of information in graphically literate ways has become an increasingly important part of the researcher’s skillset. In this workshop, participants will be introduced to the core principles of scholarly data visualisation and shown how to use a variety of visualisation tools.

Dr Mia Ridge, British Library | book at



Regular Expressions – Thurs 2 March 2017, 10am-1pm

Regular expressions offer a way of manipulating and editing large amounts of text very quickly, somewhat like a word processor’s ‘find and replace’ function but with greater sophistication and flexibility. They are particularly useful for those working with large texts or corpora who may need to make similar, though not identical, changes, across multiple documents (eg. correcting recurrent OCR errors). This workshop will teach participants the basics of regex scripting and demonstrate some of the ways it is useful in scholarly contexts.

Professor Martin Eve, Birkbeck, University of London | book at



Drupal – Fri 24 March 2017, 11.30am-2pm

Drupal is an open-source, highly customisable content management system, which makes it a compelling choice when building a scholarly website which may require features which other CMSs such as WordPress are unable to supply. This workshop will walk participants through the basics of setting up a Drupal site for themselves, and demonstrate how to add and modify modules. 

Jim Benstead, University of Edinburgh | book at



Word Vectors with Word2vec – Fri 31 March 2017, 2-5pm

Word vectors, which have been developed by computer scientists in the field of machine learning, offer ways of representing the relationships between terms in a corpus in ways that provide “a spatial analogy to relationships between words” (Schmidt) and that suggest new ways to understand the formation and operation of discourses. This workshop uses one model for generating word vectors, word2vec, to explain how distance between terms is calculated by these models, and will show participants how to create, and how to begin to interpret, their own word embedding models.

Ryan Heuser, Stanford University | book at



Working with Digital Facsimiles – Thurs 4 May 2017, 2-4pm

Researchers increasingly use digital facsimiles of primary sources to conduct their research. But how do these images shape, limit, and expand what we can do with the works we study? This workshop will focus on exploring the affordances of various digital facsimile resources (including subscription services and open access platforms), examining not only how digital representations of objects differ from their analog counterparts, but also how the structures of metadata and licensing affect their discoverability and usage. We will use EEBO (Early English Books Online) and ECCO (Eighteenth Century Collections Online) as exemplars for this sort of work, but this general approach to working with digital facsimiles will be relevant for other periods as well.

Dr Sarah Werner, Independent Scholar | book at