|
Corpora and Corpus-based Computational Linguistics
There
are about 1600 exhaustively annotated files.
Descriptions as a rule are
taken directly from the sites they refer to and are only slightly
adapted or translated into English. Usually a particular stress is put
on the kind of availability (i. e. conditions or costs, for example)
of the resources. Besides providing istitutional and general references, these pages aim
principally to gather information on specific languages, especially the
"exotic" and lesser known ones.
British National Corpus
The British National Corpus (BNC) is a 100 million word collection of
samples of written and spoken language from a wide range of sources,
designed to represent a wide cross-section of current British English,
both spoken and written. The Corpus is designed to represent as wide a
range of modern British English as possible.
The written part (90%) includes, for example, extracts from regional
and national newspapers, specialist periodicals and journals for all
ages and interests, academic books and popular fiction, published and
unpublished letters and memoranda, school and university essays,
among many other kinds of text. The spoken part (10%) includes a
large amount of unscripted informal conversation, recordeded by volunteers
selected from different age, region and social classes in a
demographically balanced way, together with spoken language collected
in all kinds of different contexts, ranging from formal business or
government meetings to radio shows and phone-ins.
Talkbank
Talkbank.org is a key reference in many studies of both human and animal
communication. It contains written corpuses, video and sound records.
The goal of TalkBank is to foster fundamental research in the study of human
and animal communication. It will construct sample databases within each of
the subfields studying communication. It will use these databases to advance
the development of standards and tools for creating, sharing, searching,
and commenting upon primary materials via networked computers.
CHILDES Database
It is part of the TALKBANK web.
The CHILDES system provides tools for studying conversational
interactions. These tools include a database of transcripts,
programs for computer analysis of transcripts, methods for
linguistic coding,and systems for linking transcripts to
digitized audio and video.
Corpus Scriptorum Latinorum
A
digital library of latin texts. This project seeks to catalogue the
entire body of Latin literature, spanning from the earliest epigraphic
remains to the Neo-Latinists of the eighteenth century. That alone
equates to tens of thousands of Latin texts, and when you throw in the
translations and secondary materials that are also available through
this site, there's quite a lot to sort through.
Biblioteca Augustana
Very complete library of electronic resources involving Latin texts. It
includes several sections, namely Bibliotheca Latina, Bibliotheca
Graeca, Bibliotheca Germanica, Bibliotheca Anglica, Bibliotheca
Gallica, Bibliotheca Italica, Bibliotheca Hispanica, Bibliotheca
Polonica, Bibliotheca Russica, Bibliotheca Iiddica. Texts of
different types and sizes belonging to very different periods can be
downloaded. Very well organized.
Labyrinth Lybrary
The
Labyrinth provides free, organized access to electronic resources in
medieval studies through a World Wide Web server at Georgetown
University. The Labyrinth's easy-to-use menus and links provide
connections to databases, services, texts, and images on other servers
around the world. Each user will be able to find an Ariadne's thread
through the maze of information on the Web. This project not only
provides an organizational structure for electronic resources in
medieval studies, but also serves as a model for similar, collaborative
projects in other fields of study. The Labyrinth project is open-ended
and is designed to grow and change with new developments in technology
and in medieval studies.
Internet Medieval Sourcebook
The Internet Medieval Sourcebook is located at the
Fordham University Center for Medieval Studies. It is organized
as three main index pages, with a number of supplementary documents.
The
|