Cross-linguistic corpora for the study of translations : by Erich Steiner, Silvia Hansen-Schirra, Stella Neumann

The booklet specifies a corpus structure, together with annotation and querying recommendations, and its implementation. The corpus structure is built for empirical reviews of translations, and past these for the examine of texts that are inter-lingually similar, relatively texts of comparable registers. The compiled corpus, CroCo, is a source for learn and is, with a few copyright regulations, available to different study initiatives. many of the examine was once undertaken as a part of a DFG-Project into linguistic houses of translations. essentially, this examine venture used to be a corpus-based research into the language pair English-German. The long term objective is a contribution to the examine of translation as a touch style, and past this to language comparability and language touch extra mostly with the language pair English - German as our item languages. This objective implies an intensive curiosity in attainable particular homes of translations, and past this in an empirical translation concept. The technique constructed isn't constrained to the conventional completely system-based comparability of past days, the place real-text excerpts or built examples are used as mere illustrations of assumptions and claims, yet in its place implements an empirical learn process concerning established info (the sub-corpora and their relationships to one another, annotated and aligned on a number of theoretically influenced degrees of representation), the formation of hypotheses and their operationalizations, records at the facts, serious examinations in their importance, and interpretation opposed to the history of system-based comparisons and different autonomous resources of reason for the phenomena saw. additional applic

Spielte. 11 As described above, the pos value for this token is retrieved by searching in the tag annotation for the file with the same xml:base value. The matching tag, in this case “vvfin”, is linked to the same XPointer “t1750”. 4 Chunk layer segmentation and annotation Moving up from the token unit to the chunk unit, chunks first have to be indexed again before they can be annotated. On the basis of the manual chunk segmentation, the chunk index file assigns an id attribute to each chunk within 11 The multiple annotations created by the morphology tool are disambiguated automatically by selecting the first annotation.

Figure 13: Query options for multi-dimensional annotation and alignment A query tool which can handle multi-dimensional annotation and alignment is the IMS Corpus Workbench (CWB, Christ 1994). As discussed above, several annotation layers can be imported to the Corpus Query Processor (CQP) of this Workbench, allowing combined queries for strings, tags, and combination of tags in aligned text segments. The output is then displayed as a concordance list. Figure 14 shows the CQP output of querying prepositional adverbs, which are typical of German, and their translations into English.

The XPointer links the annotation of each function to the chunk id in the chunk index file. From this file, in turn, the string can be retrieved in the token annotation. For example, the German chunk “ch556” (viele Möglichkeiten) carries the grammatical function of direct object (“dobj”). It is identified as “np” in the phrase structure annotation by comparing the xml: base attribute value of the two files and the XPointers. 5 Alignment In the examples shown so far, the different annotation layers linked to each other all belonged to the same language.

