• Your current position:
  • News >
  • Detail

Print this Page. Send this Page.

New Version of SARIT Website

Jul 07, 2015

The SARIT project released a new version of the SARIT web application. SARIT – short for “Search and Retrieval of Indic Texts”, but also meaning “river” in Sanskrit – offers electronic texts in Sanskrit and other Indian languages. The project is headed by Profs. Sheldon Pollock, Columbia University, and Birgit Kellner, HCTS Professor of Buddhist Studies, and funded within the DFG/NEH Bilateral Digital Humanities Programme.

The new SARIT website offers searching and browsing features that were specifically designed for use with texts in Sanskrit and other Indian languages. The website offers full Unicode support, searching with Devanāgarī and transliterated search terms, and an NGram index to handle texts without clearly and consistently marked word-boundaries. Search results are returned in a Key Word in Context (KWIC) display. All texts are available for free download in PDF and TEI-XML formats. Programming and development have been driven by Wolfgang Meier (eXist Solutions), Jens Petersen and Claudius Teodorescu, under the umbrella of the Heidelberg Research Architecture. With its heterogeneous corpus of texts, SARIT offers an ideal test case for developers to work towards a more general – open-source – framework that can be reused for other TEI-based corpora.

The website has been designed for a growing corpus of texts encoded according to the standards of the Text Encoding Initiative (TEI). Aiming to foster the adoption of TEI among scholars and students working with Indic texts, we have also provided detailed guidelines for adding TEI encoding to texts in Indian languages, and we hope that the public will contribute texts to this initiative.

SARIT's corpus of texts is available at GitHub, where it is easy for anyone to make additions, changes, and suggestions. To date, this corpus consists of 28 partly voluminous texts in Sanskrit and Prakrit. We plan to add support for other South Asian languages and scripts, including Tamil, Kannada, and Sinhala, to the SARIT web application in the near future.

SARIT is and always will be free and open-source. All of the texts are made available under a Creative Commons license.

The DFG/NEH-project includes teams at Columbia University and the University of Heidelberg, directed respectively by Profs. Sheldon Pollock and Birgit Kellner.

Further Links:

The SARIT application

The SARIT text collection

The SARIT encoding guidelines

To suggest improvements to the search application

To add new texts

Information on the DFG/NEH project


RSS Feeds

RSS FeedNews Feed

  • Screenshot of SARIT