Seite drucken. Seite weiterempfehlen.

Early Chinese Periodicals Online (ECPO)

ECPO joins together several important digital collections of the early Chinese press. It builds up on and significantly enhances earlier initiatives, and puts them into a single overarching framework.

ECPO frontend
Early Chinese Periodicals Online (ECPO).

Phase 1

Chinese Entertainment Newspapers database frontend.
Chinese Entertainment Newspapers.

The first approach to the early periodicals was focusing on Chinese entertainment newspapers, or xiaobao 小報. Since 2006 a database was conceptualized and implemented. The database Chinese Entertainment Newspapers "is designed to present information about the particular profile of a given entertainment paper, and to highlight the kinds of information that might be found there. In other words, it is not a comprehensive title-author-subject database. It should be regarded as a guide for scholars to find the kinds of information that will allow them to select those xiaobao which might be most pertinent for their research."

Within ECPO this description of particular features of a publications is part of what the project calls an extensive approach. The web presentation offers free open access to 22 publications which were mostly analyzed by student editors. 

Phase 2

Chinese Women's Magazines database frontend.
Chinese Women's Magazines.

Another approach was chosen by an international research group studying women's magazines. The group focused on "four seminal women's or gendered journals—a key genre of the new media—published between 1904 and 1937," and conceptualized a data structure to record every individual page, every item (like article, image or advertisement), every agent (like author, photographer, or depicted person), and added an analytical layer by assigning subject headers and keywords. Within ECPO this detailed recording of every item on a single page is part of what the project calls an intensive approach.

During the second phase, a web-based database was created called Chinese Women's Magazines in the Late Qing and Early Republican Periods. The frontend provides bi-lingual access to the publications, which can be read online from front to back, or be searched or browsed. 

ECPO

Rapid editing interface.
Ingest data interface.
Person search.

Because of the success of the Entertainment Newspapers and the Women's Magazines databases, a follow-up project was established, again together with a large international research group. The Early Chinese Periodicals Online (ECPO) project is combination of extensive and intensive approaches. It was initiated through a research cooperation with the Academia Sinica, Taipei, with additional funding by the Chiang Ching-kuo Foundation.

To allow editing of larger material sets within short periods of time a rapid metadata editing workflow was implemented in ECPO. In addition, various other workflows to accommodate working with multiple publication types, e.g. renaming of scans, ingest of digital assets, generation of db-records from file-name structure, were developed.

A project group started to re-structure the subjects created for content analysis by mapping terms to the Chinese translation of Getty Art and Architecture Thesaurus (AAT), which is co-ordinateded by the Academia Sinica. This effort makes it possible to re-use annotations from the project, besides offering hierarchical terms to researchers.

In phase 2 an API was developed to be able to combine access to different SQL databases that are part of the ECPO project. This API makes server-access to the data possible by providing machine-readable metadata using the MODS XML format. All MODS records were made available in Tamboti, which also offers advanced search functionality and allows to further transform the data.

Phase 3

Manually analyzed and grouped page segments.

As it currently stands (01/2020), ECPO provides the research community with open access to more than 300 publications from the Early Republican period comprising over 300.000 pages of print. During the current phase ECPO focuses on four major aspects of development.

Broadening ECPO's scope

We are adding a selection of political, literary, art and women’s magazines, e.g. Tian yi 天義 (Tien yee), and Ban yue 半月 (The Half Moon Journal). With additional funding by the Heidelberg Centre for Transcultural Studies the ECPO materials base is expanded by a set of western-language press published in China, e.g. The Canton Press (Rudolf G. Wagner Collection).

Opening data for re-use

To further increase the impact of ECPO and in order to sustain the information, ECPO has begun to enable the system to provide data for re-use as open data.

Together with the CATS Library, we are expanding the MODS XML API to provide bibliographic information for each publication using Digital Object Identifiers (DOI). This will make the discovery of ECPO publications possible within larger library catalogue infrastructures. It also makes it easier to cite a publication, or (in the future) any annotated item in the database. In addition, we installed a IIIF image service for all page scans. This allows us to zoom deeper into the images and opens the possibility to share its visual resources.

We are working on the approximately 47.000 names recorded within the WoMag and ECPO databases. We set up a cross-database agent service that distinguishes between all kind of names that occur within the data from actual persons, groups, or corporations. While some agents may have multiple names, some names may refer to different agents. The agents service allows us to: a) manage names across databases (e.g. merge or split), b) identify agents and assigning names to them, and c) link agent records to authority data (GND, VIAF, Wikidata, DBPedia, and Baidu). Besides creating a curated list of agents occurring in the publications, we also aim to add missing persons to Authority files, using the German National Authority file (GND). In 2019 we started a cooperation with Erlangen University to further expand the functionality of the Agents Service.

Expanding data structure for text and mark-up

ECPO already contains some full-text passages. To make these discoverable we have expanded the database structure and added full-text to the search functionality. This was made possible with additional funding by the Field of Focus 3. 

Investigating in document layout recognition

ECPO aims at producing its material in full-text. However, the complex layout especially of the newspapers (xiaobao) still is a challenge not met by the OCR systems (e.g. Abbyy, ocropus, or tesseract). In preparation for further processing pages need to be split into segments. At the end of phase 2 in a pilot with a local commercial partner (Pallas Ludens) we ran first experiments on the use of crowds for page segmentation and grouping of segments. The outcome was very successful and forms the basis for further research. Together with our partner eXist-Solutions, we are developing an eXist-db App to manually segment pages, group semantic units into larger clusters, and store these annotations using the web annotation standard.

In the second half of 2019, we received additional funding from the Field of Focus 3 (part of Heidelberg's Excellence Strategy) to create a ground truth for the early Chinese press. The results are published on GitHub. Further goals are the development of automatical (or semi-automatical) page segmentation workflows, relating database items with their respective area co-ordinates on the scan, implementation of OCR routines for segments, and the production of fulltext. We are preparing a larger project to process the image scans adopting the infrastructure dhSegment, which was developed as part of READ (H2020) in Lausanne. There, we aim at expand the models to procession non-latin scripts and left-to-right reading directions.

Publications

Open source code repository https://github.com/exc-asia-and-europe/ecpo

Arnold, Matthias und Lena Hessel. “Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO).” In: E-Science-Tage 2019: Data to Knowledge. Hg. von Fabian Gebhard und Vincent Heuveline. Heidelberg Univ. Press, 2020. (in press) Pre-print doi: 10.11588/heidok.00027325.

Hockx, Michel, Joan Judge, and Barbara Mittler, ed. Women and the Periodical Press in China’s Long Twentieth Century: A Space of Their Own? Cambridge: Cambridge University Press, 2018. doi: 10.1017/9781108304085

Sung, Doris, Liying Sun and Matthias Arnold. “The Birth of a Database of Historical Periodicals: Chinese Women’s Magazines in the Late Qing and Early Republican Period.” In Tulsa Studies in Women's Literature 33, no. 2 (2014): pp. 227-37. doi:10.1353/tsw.2014.0004.

Sun, Liying and Matthias Arnold. “TS Tools: How to design a database of historical periodicals.” TS: Tijdschrift voor Tijdschriftstudies (Periodical for Periodical Studies), July 2013: pp. 73-78. Online Version.

Early Chinese Periodicals Online (ECPO). Video interview. Heidelberger Forum Edition, 2015-06-29, 23 Min.