Description
|
The Texas German Sample Corpus (TGSC) is a collection of annotated transcripts of spoken Texas German (~13.5 hours, 75,000+ tokens). The TGSC was created to implement and test the language-tagging and normalization guidelines as proposed in Blevins (2022). Texas German is a set of mixed-language contact varieties of German "spoken in Texas which have descended from the dialects of German brought to Texas in the 19th century" by German-speaking immigrants (Boas 2009: 34)." The TGSC is a collection of audio recordings from the Texas German Dialect Archive (TGDA, tgdp.org/dialect-archive) with the following annotation layers: original TGDA literary transcription, tokenization, language tags, normalization, standard German utterance translation, and the original TGDA word-for-word English translation. By using the Texas German Sample Corpus (TGSC) database, you agree to the "User Rights and Responsibilities" in accordance with the specifications on https://tgdp.org/dialect-archive/ . Please cite the following works: - For the TGSC: Blevins (2022) The language-tagging & orthographic normalization of spoken mixed-language data, with a focus on Texas German (https://hdl.handle.net/2152/116703) - For the TGDA / TGDP (where the source material for the TGSC came from): Boas, Hans C., Marc Pierce, Karen Roesch, Guido Halder, and Hunter Weilbacher. (2010). The Texas German Dialect Archive: A Multimedia Resource for Research, Teaching, and Outreach. Journal of Germanic Linguistics, 22(3), 277-296. (2022-08-02)
|