maandag 27 maart 2017

A model for the production and publication of scholarly source editions

The production and publishing of editions is an old branch of scholarly activity. In earlier days, almost all sources were published in the form of books, often a series of books, but for the past quarter of a century, the digital edition (on cd of dvd or online) was not only established, but it has become the normal way of publishing sources. In this blogpost I want to present a model for scholarly source editions, that was developed for paper as well as digital editions.

The model distinguishes between different layers (optionally) involved in publishing an edition of historical sources. There is a number of different actions involved in the elaboration of a source to a published edition, that are indicated as layers. Sometimes it will be difficult to separate layers clearly from each other, but for analytical purposes it is important to distinguish them, because they all involve different steps in the processing of a resource. A few things should be noted beforehand:
  • while there is a rough sequence in the steps, they do not depend on each other. For instance, a transcription may be made on basis of the original document or from a scan and the structuring of the data from the text does not depend on the availability of the text itself. Some of the layers could be subdivided
  • in an actual edition all or an arbitrary selection of the layers may be present
  • the choice which layers to use depends on the requirements and constraints of the context of methodological and economic considerations
  • layers usually are complementary rather than alternatives
  • some elaborations could fit in different layers. Usually one layer is more suited for an elaboration than another, though. For example visual features are more suited for visual treatment
  • all steps involve selection of material and of features
  • in each layer there are choices that may exclude each other. For instance, scanning or taking photos photo’s require decisions about resolutions, file formats and compression. For transcriptions, there are a number of different and mutually exclusive options.
  • all layers contain elaboration steps that inevitably interpretations even if they try to stay close to the original. This is true for manual and automatic elaborations. Most automatic and manual elaborations are interchangeable, but for practical reasons.

The layers:

  • Archive: availability of source material, the point of departure. Archive also stands for any other means of preservation of the sources, in libraries, in shoeboxes at an attic or in digital-born form. Which material is available, which sources will be elaborated (and in what way). This should also involve source criticism: an assessment of the provenance and the nature of the sources and their relation to previously existent sources (that may or may not have been lost), such as their place in the archive or a larger corpus they are contained in.
  • Selection: which parts of the available source material will be part of the edition (and in what form). Selection in any form is inevitable and should be accounted for, preferably in relation to the larger corpus.
  • Digitization: transfer of analog media to a digital form. This is usually done by scanning or photographing and also involves selection and choice of technical parameters.
  • Description: Identifying the sources and assigning essential metadata to them. This should at least comprise the provenance data and an identification, that may take any form from an id-number to a urn. Identification is important within the context of the edition; trying to make this transcend into other domains (for instance by encoding the original signature into the edition id) makes ids vulnerable to changes beyond the control of the editor. The relation between other existent ids of the same source should therefore preferably be explicit. If there is a digital version of the source this should also describe the relation of the digital form to the original source.
  • Transcription: transfer of the textual content of an object to a textual form following a set of rules. As for the other layers, choices in transcription methods are usually mutually exclusive.
  • Text structuring: any structure added to the ‘plain’ text, meant to enrich the text with structural elements for any purpose, including paging, chapter division and many forms of XML-structuring.
  • Annotation: that can usually be seen as a form of formal or content annotation mainly for scientific purposes. Is is often hard to draw a definite line between structuring and annotation. Annotation may be manual and by means of algorithms, including all sorts of tagging (such as Named Entity Recognition and Part of Speech Tagging).
  • Structured data: structuring elements in text for the purpose of information gathering. This involves both normalization, identification and contextualization. Data Structuring often targets Named Entities, but may extend to all types of information, such as emotion words or events.
  • Publication: publishing any mix of the layers mentioned above in print or digital form or on any other media or platform.
Written with StackEdit.

Geen opmerkingen: