CSL-WIENER: Corpora and Corpus-based
linguistic information




Corpora and Corpus-based Computational Linguistics


There are about 1600 exhaustively annotated files. Descriptions as a rule are taken directly from the sites they refer to and are only slightly adapted or translated into English. Usually a particular stress is put on the kind of availability (i. e. conditions or costs, for example) of the resources. Besides providing istitutional and general references, these pages aim principally to gather information on specific languages, especially the "exotic" and lesser known ones.

British National Corpus


The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English, both spoken and written. The Corpus is designed to represent as wide a range of modern British English as possible.

The written part (90%) includes, for example, extracts from regional and national newspapers, specialist periodicals and journals for all ages and interests, academic books and popular fiction, published and unpublished letters and memoranda, school and university essays, among many other kinds of text. The spoken part (10%) includes a large amount of unscripted informal conversation, recordeded by volunteers selected from different age, region and social classes in a demographically balanced way, together with spoken language collected in all kinds of different contexts, ranging from formal business or government meetings to radio shows and phone-ins.

Talkbank

Talkbank.org is a key reference in many studies of both human and animal communication. It contains written corpuses, video and sound records. The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It will construct sample databases within each of the subfields studying communication. It will use these databases to advance the development of standards and tools for creating, sharing, searching, and commenting upon primary materials via networked computers.

CHILDES Database

It is part of the TALKBANK web. The CHILDES system provides tools for studying conversational interactions. These tools include a database of transcripts, programs for computer analysis of transcripts, methods for linguistic coding,and systems for linking transcripts to digitized audio and video.

Corpus Scriptorum Latinorum


A digital library of latin texts. This project seeks to catalogue the entire body of Latin literature, spanning from the earliest epigraphic remains to the Neo-Latinists of the eighteenth century. That alone equates to tens of thousands of Latin texts, and when you throw in the translations and secondary materials that are also available through this site, there's quite a lot to sort through.


Biblioteca Augustana

Very complete library of electronic resources involving Latin texts. It includes several sections, namely Bibliotheca Latina, Bibliotheca Graeca, Bibliotheca Germanica, Bibliotheca Anglica, Bibliotheca Gallica, Bibliotheca Italica, Bibliotheca Hispanica, Bibliotheca Polonica, Bibliotheca Russica, Bibliotheca Iiddica. Texts of different types and sizes belonging to very different periods can be downloaded. Very well organized.


Labyrinth Lybrary

The Labyrinth provides free, organized access to electronic resources in medieval studies through a World Wide Web server at Georgetown University. The Labyrinth's easy-to-use menus and links provide connections to databases, services, texts, and images on other servers around the world. Each user will be able to find an Ariadne's thread through the maze of information on the Web. This project not only provides an organizational structure for electronic resources in medieval studies, but also serves as a model for similar, collaborative projects in other fields of study. The Labyrinth project is open-ended and is designed to grow and change with new developments in technology and in medieval studies.


Internet Medieval Sourcebook

The Internet Medieval Sourcebook is located at the Fordham University Center for Medieval Studies. It is organized as three main index pages, with a number of supplementary documents.

The


Back to WIENER page