In addition to accumulating thousands of paper books and photographing thousands of pages from national libraries, I also make frequent use of a subset of the following websites and online tools which I'm sure will help many readers, especially those actively involved in the science of Linguistics.
African (Sub-Saharan) Language Families
- Bantu Basic Vocabulary Database.
- Pronouns in 500 African languages. Les marques personnelles dans les langues africaines.
- Reference Lexicon of the Languages of Africa Le projet RefLex a pour objectif de mettre à la disposition de la communauté scientifique un corpus lexical de référence pour les langues d'Afrique, ainsi que des outils de traitement et d'analyse adaptés à ce corpus.
- Glossika Courses: Swahili Fluency Training
- Virtual Zulu Website starts singing as soon as you enter.
Altaic Language Family
- Handbook of Japanese Verbs Get the 1000 most frequent Japanese Language Proficiency Test (JLPT) verbs in reverse lookup, grouped by transitive / intransitive, to assist the student in acquiring Kanji and vocabulary.
- Glossika Courses:
Japanese Fluency Training
Korean Fluency Training
Afroasiatic Language Family
- Arabic Learner Corpus.
- Arabic - Quranic Corpus.
- Glossika Courses: Modern Standard Arabic Fluency Training
American (North, Central, South) Language Families
- Corpus in Caddoan Languages Indiana University.
- South American Indigenous Language Structures (SAILS).
- línguas e culturas indígenas sul-americanas links to resources for over a hundred South American languages.
- South American Phonological Inventory Database at Berkeley University: to share data regarding the phonological inventories of South American indigenous languages for purposes of linguistic research and education. Phonological inventories of specific languages can be accessed using the map browse function, the sortable name table, or the phoneme search function. Currently includes inventories for 363 languages and varieties.
Australian Language Families
Austronesian Language Family
- Austronesian Basic Vocabulary Database includes Swadesh lists for about 1000 languages (some transcriptions can be misleading).
- Austronesian Comparative Dictionary.
Formosan Branch Languages
Philippine Branch Languages
Malayo-Polynesian Branch Languages
- Balinese Dictionary Resources.
- Indonesian Dictionary Resources.
- Glossika Courses: Indonesian Fluency Training
- Javanese Dictionary Resources.
- Malay Dictionary Resources.
- Vanuatu and Oceanic Language Resources: Alexandre François' excellent contributions cover many of his fieldwork stories and a large number of downloadable resources on the languages.
Oceanic & Polynesian Languages
Indo-European Language Family
- Early European Online University of Texas: covers all European language grammars, old and modern
- European Parliament Parallel Corpus of European languages.
- Indo-European Lexical Cognacy Database.
- PIE : The Primary Phoneme Inventory and Sound Law System for Proto-Indo-European.
- PIE Lexikon.
- Glossika Courses: Armenian Fluency Training
Balto-Slavic Branch Languages
- False Friends of the Slavist.
- Glossika Courses:
Belarusian Fluency Training
Czech Fluency Training
Czech Business Intro Module Fluency Training
Latvian Fluency Training
Lithuanian Fluency Training
Polish Fluency Training
Polish Business Intro Fluency Training.
Russian Fluency Training
Serbian Fluency Training
Slovak Fluency Training
Ukraianian Fluency TrainingIntroduction to Lithuanian Ebook.
- Polish Frequency Database based on 101 million words from subtitles.
- Po polsku: wolne książki "audiobooki" Over 4000 books available including a few in Lithuanian (search: lietuvių).
- Slovak Parallel Corpus (includes Czech).
- Словарь трудных слов из богослужения: Церковнославяно-русские паронимы.
- Толковый словарь русского языка.
Germanic Branch Languages
- Dutch Word Frequencies based on 44 million words from film and television subtitles.
- Glossika Courses:
German Fluency Training
German Business Intro Fluency Training
Icelandic Fluency Training
- Wordnet a lexical database for English Princeton University.
- Glossika Courses: Greek (Modern) Fluency Training
- Greek (Modern) Frequency Database based on 6000 subtitled films.
Indo-Iranian Branch Languages
- Glossika Courses: Hindi Fluency Training
- Romani Morpho-Syntax Database.
- Sanskrit Corpus.
Italic/Romance Branch Languages
- French: Casual Spoken contains 35 hours of high-quality recordings featuring 46 French speakers conversing among friends.
- Glossika Courses:
Castilian Spanish Fluency Training
Catalan Fluency Training
French Fluency Training
French Business Intro Module Fluency Training
Italian Fluency Training
Mexican Spanish Fluency Training
- Latin Library.
- About World Languages Love the layout and data here. Introductory material on individual languages and language families.
- Ethnologue contains information on 7,097 known living languages including: alternate names, population, location, language maps, language status, classification, dialect names, language use, language development, writing systems employed. It has links to research. Each language page also points to OLAC: Open Language Achives Community resource and research articles.
- Glottolog: a comprehensive catalogue of the world's language families, languages and dialects.
- Language Gulper A blog talking about language families.
- Linguasphere: Le Répertoire de la linguasphère: comporte la classification géolinguistique, le codage et l’index alphabétique de l’ensemble des langues, variétés et groupes linguistiques du monde.
- Multitree: MultiTree is a searchable database of hypotheses on language relationships.
- Apertium includes: Asturian, Aragonese, Breton, Northern Sami, Occitan, Tatar.
- Bing includes: Hmong Daw, Klingon, Otomi, Yucatec Maya.
- Google recently added: Amharic, Corsican, Frisian, Kyrgyz, Hawai`ian, Kurmanji, Luxembourgish, Samoan, Scots Gaelic, Shona, Sindhi, Pashto, Xhosa.
- PROMPT very good for Russian.
Maps, Media, Fun
- Books and Novels in many languages This is perfect for extensive and intensive language practice.
- Langscape is a gateway to language diversity. It is a resource that allow users with a wide array of interests, from recreational to academic, to discover the world’s languages via interactive tools and access to established research. Sample and listen to over 3000 languages here.
- Sound Comparisons Compare recordings of key vocabulary within a language branch (Romance, Slavic, etc) spread out on an interactive map.
- Language Landscape Various language samples and recordings from around the world on an interactive map. The site is to help raise the profile of minority and endangered languages.
- Subtitles in multiple languages: open.
- Subtitles in multiple languages: yify.
- Watch TV in any language.
- Generate a name in any language.
Mon-Khmer Language Family
- Hán Việt Từ Điển Trích Dẫn.
- Mon Dictionary Resources.
- Mon-Khmer Languages Project. See also Huffman Papers which includes Huffman's Outline of Cambodian Grammar.
- Vietnamese Dictionary Resources.
- Glossika Courses: Vietnamese (Northern) Fluency Training
- Derivational Phonology: MA Thesis
- Glossika Phonics Videos Features a video for each of the IPA symbols.
- Lyon-Albuquerque Phonological Systems Database.
- PHOIBLE: Repository of cross-linguistic phonological inventory data. The 2014 edition includes 2155 inventories that contain 2160 segment types found in 1672 distinct languages.
- Stress and Accent Patterns Typological database with stress and accent patterns 750 languages.
- Tonal Database.
- UCLA Phonological Segment Inventory Data (UPSID) Contains phonological inventories for 451 languages.
- UD Phonology Lab Stress Pattern Database Dominant stress patterns of the world's languages.
- World Phonotactics Database at Australian National University: a searchable database containing information about phonotactic restrictions of languages of the world. Using it, you can compare and contrast phonotactic patterns in different languages, group languages by features, investigate the frequencies of different settings for different features, and view the areal distribution of such patterns through the use of the interactive map. Phonotactic data on over 2000 languages.
- Метод чтения Ильи Франка Glossed readers in Russian for over 50 languages.
Pidgins, Creoles, Other Languages
- The Atlas of Pidgin and Creole Language Structures Online.
- Endangered Languages Archives.
- Teaching resources for less commonly taught languages.
Semantics, Corpus, Etymology
- Affix Borrowing Database A database of 101 languages where affixes have been borrowed.
- Automated Similarity Judgment Program (ASJP) 40-item word lists of all the world's languages. A lexical distance can be obtained by comparing the word lists, which is useful, for instance, for classifying a language group and for inferring its age of divergence.
- Concepticon Links 9611 concepts from 51 different concept lists to 2206 different concept sets, 243 relations between concepts are defined.
- Cross-Linguistic Colexification Database Gives polysemy information for 221 different languages covering 64 families (more than 300000 words and 10000 concepts).
- DFG Project Algorithmic corpus-based approaches to typological comparison Large-scale linguistic typological comparison: The Bible corpus contains 1169 unique translations, which have been assigned 906 different ISO-639-3 codes.
- Global Lexicostatistical Database George is carrying on the work of his late father Sergey Starostin, famous proponent of macrofamilies and deep etymological work. Contains a lot of data on Sub-Saharan languages (including Bantu and Khoisan), Nilo-Saharan, Caucasian, and Amerind languages. However, many of the deep etymologies are not widely accepted by the scientific community.
- Google N-Gram Viewer.
- Korean corpus.
- List of dictionaries available online to over 6300 dictionary resources.
- NLTK Corpora.
- Numerals in 4000 languages.
- Open Parallel Corpus.
- Personal Pronoun System database.
- Reduplication Database.
- Semantic shifts in the languages of the world (database): thousands of semantic connections in the world's languages (polysemy, semantic changes).
- Unicode's Universal Declaration of Human Rights in multiple languages.
- Wordbank Children's vocabulary development/acquisition.
- World Loanword Database WOLD.
- Wortschatz Universität Leipzig Suche in 246 korpusbasierten monolingualen Wörterbüchern in 222 Sprachen.
Sino-Tibetan Language Family
- Burmese Dictionary Resources.
- Chinese Word Frequency based on film subtitles (download frequency lists).
- Glossika Courses:
Beijing Mandarin Fluency Training
Beijing Mandarin Business Intro Module Fluency Training
Cantonese Fluency Training
Hakka Fluency Training
Taiwan Mandarin Fluency Training
Taiwan Mandarin Business Intro Module Fluency Training
Taiwan Mandarin Daily Life Module Fluency Training
Taiwan Mandarin Travel Module Fluency Training
Taiwanese Hokkien Fluency Training
Wenzhounese Dialect Fluency Training
- Phonemica Listen to hundreds of Chinese dialects through this interactive map individuals from around China upload their own recordings. Some phonetic transcriptions and glosses are available.
- Sgaw Karen Dictionary Resources.
- Sinica Treebank.
- Taiwanese and Hakka Dictionary.
- Thesaurus Linguae Sericae An Historical and Comparative Encyclopaedia of Chinese Conceptual Schemes.
- Tibeto-Burman languages of Assam.
- Anaphora Typology Database by Utrecht Institute of Linguistics.
- Irvine Phonotactic Online Dictionary (IPhOD).
- Get Started in Role and Reference Grammar
- Syntactic Structures of the World's Languages.
- ValPal: Valency patterns Leipzig Online Database: based on a database questionnaire for a selected sample of 80 verbs. These verbs are conceived of as representative of the verbal lexicon and have been reported in the literature to show distinctive syntactic behaviour both within and across languages.
Tai-Kadai Language Family
- Lao Dictionary Resources.
- Shan Dictionary Resources.
- Glossika Courses: Thai Fluency Training
- Thai Language Dictionary.
- Add stress marks to German
- Add stress marks to Russian
- Unicode Character Table
Trans-New Guinea Language Family
- Database of the languages of New Guinea: The Trans-New Guinea language family currently occupies most of the interior of New Guinea. This family is possibly the third largest in the world with 400 languages and is tentatively thought to have originated with root-crop agriculture around 10,000 years ago.
- Language Universals Archive at Universität Konstanz.
- Pangloss Database of audio materials from several of the world's languages.
- Rarities among languages: Das Grammatische Raritätenkabinett at Universität Konstanz.
- Reciprocal Markers Database.
- Typological Database.
- Typological Database of Intensifiers and Reflexives.
- World Atlas of Language Structures (WALS).
- WALS Sunburst Explorer shows the values for all WALS features by combining the geolocation of the respective languages with their genealogy in a sunburst visualization: to help users distinguish between cases of language contact and genealogical inheritance.
Uralic Language Family
Subscribe to The Glossika Blog
Get the latest posts delivered right to your inbox