Print this Page. Send this Page.

Early Chinese Periodicals Online (ECPO)

ECPO joins together several important digital collections of the early Chinese press. It builds up on and significantly enhances earlier initiatives, and puts them into a single overarching framework.

ECPO frontend
Early Chinese Periodicals Online (ECPO).

Phase 1

Chinese Entertainment Newspapers database frontend.
Chinese Entertainment Newspapers.

The first approach to the early periodicals was focusing on Chinese entertainment newspapers, or xiaobao 小報. Since 2006 a database was conceptualized and implemented. The database Chinese Entertainment Newspapers "is designed to present information about the particular profile of a given entertainment paper, and to highlight the kinds of information that might be found there. In other words, it is not a comprehensive title-author-subject database. It should be regarded as a guide for scholars to find the kinds of information that will allow them to select those xiaobao which might be most pertinent for their research."

Within ECPO this description of particular features of a publications is part of what the project calls an extensive approach. The web presentation offers free open access to 22 publications which were mostly analyzed by student editors. 

Phase 2

Chinese Women's Magazines database frontend.
Chinese Women's Magazines.

Another approach was chosen by an international research group studying women's magazines. The group focused on "four seminal women's or gendered journals—a key genre of the new media—published between 1904 and 1937," and conceptualized a data structure to record every individual page, every item (like article, image or advertisement), every agent (like author, photographer, or depicted person), and added an analytical layer by assigning subject headers and keywords. Within ECPO this detailed recording of every item on a single page is part of what the project calls an intensive approach.

During the second phase, a web-based database was created called Chinese Women's Magazines in the Late Qing and Early Republican Periods. The frontend provides bi-lingual access to the publications, which can be read online from front to back, or be searched or browsed. 

ECPO

Rapid editing interface.
Ingest data interface.
Person search.

Because of the success of the Entertainment Newspapers and the Women's Magazines databases, a follow-up project was established, again together with a large international research group. The Early Chinese Periodicals Online (ECPO) project is combination of extensive and intensive approaches. It was initiated through a research cooperation with the Academia Sinica, Taipei, with additional funding by the Chiang Ching-kuo Foundation.

To allow editing of larger material sets within short periods of time a rapid metadata editing workflow was implemented in ECPO. In addition, various other workflows to accommodate working with multiple publication types, e.g. renaming of scans, ingest of digital assets, generation of db-records from file-name structure, were developed.

A project group started to re-structure the subjects created for content analysis by mapping terms to the Chinese translation of Getty Art and Architecture Thesaurus (AAT), which is co-ordinateded by the Academia Sinica. This effort makes it possible to re-use annotations from the project, besides offering hierarchical terms to researchers.

In phase 2 an API was developed to be able to combine access to different SQL databases that are part of the ECPO project. This API makes server-access to the data possible by providing machine-readable metadata using the MODS XML format. All MODS records were made available in Tamboti, which also offers advanced search functionality and allows to further transform the data.

Phase 3

Manually analyzed and grouped page segments.

As it currently stands (04/2019), ECPO provides the research community with open access to more than 280 publications from the Early Republican period comprising over 280.000 pages of print. During the current phase ECPO focuses on four major aspects of development.

Broadening ECPO's scope

We are adding a selection of political, literary, art and women’s magazines, e.g. Tian yi 天義 (Tien yee), and Ban yue 半月 (The Half Moon Journal). With additional funding by the Heidelberg Centre for Transcultural Studies the ECPO materials base is expanded by a set of western-language press published in China, e.g. The Canton Press.

Opening data for re-use

To further increase the impact of ECPO and in order to sustain the information, ECPO has begun to enable the system to provide data for re-use as open data.

We are expanding the MODS XML API to provide bibliographic information for each publication using Digital Object Identifiers (DOI). This will make the discovery of ECPO publications possible within larger library catalogue infrastructures. It also makes it easier to cite a publication, or (in the future) any annotated item in the database. In addition, we installed a IIIF image service for all page scans. This allows us to easily zoom deeper into the images and potentially enables the project to share its visual resources.

We are working on the approximately 47.000 names recorded within the WoMag and ECPO databases. We set up a cross-database agent service that distinguishes between all kind of names that occur within the data from actual persons, groups, or corporations. While some agents may have multiple names, some names may refer to different agents. The agents service allows us to: a) manage names across databases (e.g. merge or split), b) identify agents and assigning names to them, and c) link agent records to authority data (GND, VIAF, Wikidata). Besides creating a curated list of agents occurring in the publications, we also aim to add missing persons to Authority files, using the German National Authority file (GND). In 2019 we started a cooperation with Erlangen University to further expand the functionality of the Agents Service.

Investigating in document layout recognition

ECPO aims at producing its material in full-text. However, the complex layout especially of the newspapers (xiaobao) still is a challenge not met by the OCR systems (e.g. Abbyy, ocropus, or tesseract). In preparation for further processing pages need to be split into segments. At the end of phase 2 in a pilot with a local commercial partner (Pallas Ludens) we ran first experiments on the use of crowds for page segmentation and grouping of segments. The outcome was very successful and forms the basis for further research. Together with our partner eXist-solutions, we are developing a workflow to manually segment pages, group semantic units into larger clusters, and store these annotations in a separate database using the web annotation standard.

Goals are the development of automatical (or semi-automatical) page segmentation workflows, relating database items with their respective area co-ordinates on the scan, implementation of OCR routines for segments, and the production of fulltext. We are preparing a larger project to semi/automatically process the image scans together with potential partners, e.g. from the OCR-D project, or computer vision labs.

Expanding data structure for text and mark-up

ECPO already contains some full-text passages. To make these discoverable and to allow for enriched content using textual mark-up (e.g. TEI XML) the data structure will be expanded.

Publications

Hockx, Michel, Joan Judge, and Barbara Mittler, ed. Women and the Periodical Press in China’s Long Twentieth Century: A Space of Their Own? Cambridge: Cambridge University Press, 2018. doi: 10.1017/9781108304085

Sung, Doris, Liying Sun and Matthias Arnold. “The Birth of a Database of Historical Periodicals: Chinese Women’s Magazines in the Late Qing and Early Republican Period.” In Tulsa Studies in Women's Literature 33, no. 2 (2014): pp. 227-37. doi:10.1353/tsw.2014.0004.

Sun, Liying and Matthias Arnold. “TS Tools: How to design a database of historical periodicals.” TS: Tijdschrift voor Tijdschriftstudies (Periodical for Periodical Studies), July 2013: pp. 73-78. Online Version.

Early Chinese Periodicals Online (ECPO). Video interview. Heidelberger Forum Edition, 2015-06-29, 23 Min.