Language/Multiple-languages/Culture/Licensed-Free-Databases
Hi polyglots! 😀
➡ On this page we have listed free databases around languages.
Those mentioned at https://polyglotclub.com/wiki/Language/Multiple-languages/Culture/Internet-Dictionaries will not be mentioned again here.
The listed items are data, so if you don't know programming, this page might not be of much help to you.
Main
Multiple languages
https://www.ethnologue.com/codes/download-code-tables
LanguageCodes.tab lists the 7,400+ distinct language identifiers used in the current Ethnologue database.
https://dumps.wikimedia.org/
License: https://dumps.wikimedia.org/legal.html
Some of its users: https://www.wikimedia.org/
Wikimedia.
https://tatoeba.org/eng/downloads/
License: https://tatoeba.org/eng/downloads/
Some of its users: https://tatoeba.org/, http://www.listeningpractice.org/, https://jisho.org/
Parallel corpora. In common words, collections about a sentence in different languages.
https://wiki.documentfoundation.org/Language_support_of_LibreOffice
License: https://wiki.documentfoundation.org/Language_support_of_LibreOffice
Some of its users: https://www.libreoffice.org/
You can find the “Spell check dictionaries” and other useful things.
http://www.gutenberg.org/wiki/Gutenberg:Information_About_Robot_Access_to_our_Pages
License: http://www.gutenberg.org/wiki/Gutenberg:Terms_of_Use
Some of its users: http://www.gutenberg.org/, https://librivox.org/ LibriVox
Ebooks.
https://librivox.org/pages/about-librivox/
License: https://librivox.org/pages/about-librivox/
Some of its users: https://librivox.org/, http://www.listeningpractice.org/
Audio books.
http://www.omegawiki.org/Help:Downloading_the_data
License: http://www.omegawiki.org/Meta:Main_Page
Some of its users: http://www.omegawiki.org/Meta:Main_Page, http://dictionarymid.sourceforge.net/
Dictionaries.
https://ltrc.iiit.ac.in/onlineServices/Dictionaries/Dict_Frame.html
License: https://ltrc.iiit.ac.in/onlineServices/Dictionaries/GPLHelp.html
Dictionaries for South Asian languages and English.
http://compling.hss.ntu.edu.sg/omw/
License: http://compling.hss.ntu.edu.sg/omw/
Some of its users: http://compling.hss.ntu.edu.sg/omw/cgi-bin/wn-gridx.cgi?gridmode=grid
Wordnets.
http://www.dicto.org.ru/xdxf.html
License: http://dicto.org.ru/license.html
Some of its users: http://dicto.org.ru/
Repository of dictionaries (from elsewhere).
http://shtooka.net/download.php
License: http://shtooka.net/
Collections of audio.
https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists
License: https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists
Frequency lists.
https://lego.linguistlist.org/about#contact
License: https://lego.linguistlist.org/about#copyright
Some of its users: https://lego.linguistlist.org/
Lexicon. No download link on the website.
https://panlex.org/source-list/
License: https://panlex.org/license/
Some of its users: https://glosbe.com
Lexical database links.
https://github.com/cburgmer/cjklib
License: https://github.com/cburgmer/cjklib/blob/master/COPYING
Some of its users: https://www.skishore.me/makemeahanzi/
Data about Han script.
https://www.radio-browser.info/gui/#!/
License: https://www.radio-browser.info/gui/#!/
Some of its users: https://github.com/segler-alex/RadioDroid
Database of radio stations.
https://help.archive.org/hc/en-us/articles/360017781111-How-to-download-files-
License: https://www.archive.org/about/terms.php
Some of its users: https://www.archive.org/
Archived Internet content.
https://www.fandom.com/
License: https://www.fandom.com/licensing
Fan-made wiki.
Chinese
http://lingua.mtsu.edu/chinese-computing/
License: http://lingua.mtsu.edu/chinese-computing/copyright.html
Frequency lists.
https://www.tanos.co.uk/hsk/
License: https://www.tanos.co.uk/jlpt/sharing/
HSK data.
http://www.hskhsk.com/resources.html
License: http://www.hskhsk.com/resources.html
HSK data.
https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8
License: https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8
Frequent characters.
https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8
Frequent characters.
http://input.foruto.com/ccc/gongbiu/index.htm
License:
Frequent characters.
English
http://gcide.gnu.org.ua/download
License: http://gcide.gnu.org.ua/license
Some of its users: http://gcide.gnu.org.ua/
Dictionary of definition.
https://foldoc.org/source.html
License: https://foldoc.org/Free+On-line+Dictionary
Some of its users: https://foldoc.org/
Dictionary about computing.
https://github.com/tony-mak/Eng-Chi-Dictionary/tree/master/app/src/main/assets/databases
License: https://github.com/tony-mak/Eng-Chi-Dictionary/blob/master/LICENSE
Dictionary.
https://github.com/linuxkathirvel/eng2tamildictionary/blob/master/dictionary.json
License: https://github.com/linuxkathirvel/eng2tamildictionary/blob/master/License.txt
Dictionary.
https://github.com/derekchuank/high-frequency-vocabulary
License: https://github.com/derekchuank/high-frequency-vocabulary/blob/master/LICENSE
Dictionary.
https://github.com/kujirahand/EJDict/tree/master/src
License: https://github.com/kujirahand/EJDict/blob/master/LICENSE
Dictionary.
Hindi
http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/downloaderInfo.php
License: http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/index.php
Some of its users: http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/dict_search_user.php
Dictionary. Application is required.
Icelandic
https://www.ling.upenn.edu/~kurisuto/germanic/oi_cleasbyvigfusson_about.html
License: http://lexicon.ff.cuni.cz/txt/oi_cleasbyvigfusson.txt
Dictionary.
Interlingue
https://github.com/Carmina16/hunspell-ie
License: https://github.com/Carmina16/hunspell-ie/blob/master/LICENSE
Spell checker with dictionary.
Japanese
https://github.com/KanjiVG/kanjivg/releases/
License: http://kanjivg.tagaini.net/
Some of its users: https://www.tagaini.net/, https://jisho.org/
Kanji strokes.
https://github.com/mifunetoshiro/kanjium
License: https://github.com/mifunetoshiro/kanjium/blob/master/LICENSE.txt
Kanji data.
https://www.tanos.co.uk/jlpt/
License: https://www.tanos.co.uk/jlpt/sharing/
JLPT data.
https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7
License: http://www.bunka.go.jp/bunkacho_homepage/index.html
Frequent characters.
https://ja.wiktionary.org/wiki/Wiktionary:%E4%BA%BA%E5%90%8D%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7
License: http://www.moj.go.jp/term.html
Frequent characters for names.
https://ja.wikipedia.org/wiki/%E5%AD%A6%E5%B9%B4%E5%88%A5%E6%BC%A2%E5%AD%97%E9%85%8D%E5%BD%93%E8%A1%A8
License: http://www.mext.go.jp/b_menu/about_link.htm
Frequent characters according to school grades.
Korean
https://github.com/libhangul/libhangul/blob/master/data/hanja/hanja.txt
License: https://github.com/libhangul/libhangul/blob/master/data/hanja/hanja.txt
Words in Hangul and Hanja.
There is a page of introduction: https://wiki.kldp.org/wiki.php/libhangul.
https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%ED%95%9C%EB%AC%B8_%EA%B5%90%EC%9C%A1%EC%9A%A9_%EA%B8%B0%EC%B4%88_%ED%95%9C%EC%9E%90_1800
License: http://www.suneung.re.kr/sub/info.do?m=0601&s=suneung
http://www.suneung.re.kr/boardCnts/fileDown.do?fileSeq=59692112e521efa80d2af27916704082 in a easy-to-copy form.
License: https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110702
Word list of TOPIK.
Lithuanian
https://github.com/ispell-lt/ispell-lt
License: https://github.com/ispell-lt/ispell-lt/blob/master/COPYING
Spell checker with dictionary.
Sanskrit
https://github.com/hemanth/sanskrit-dict/blob/master/dict.js
License: https://github.com/hemanth/sanskrit-dict/blob/master/license
Dictionary.
Slovak
http://sk-spell.sk.cx/hunspell-sk
License: http://sk-spell.sk.cx/hunspell-sk
Spell checker with dictionary.
Vietnamese
https://github.com/duyetdev/vietnamese-wordlist
License: https://github.com/duyetdev/vietnamese-wordlist/blob/master/LICENSE
Word list.
https://github.com/duyetdev/vietnamese-namedb
License: https://github.com/duyetdev/vietnamese-namedb/blob/master/LICENSE
Name list.
Non-language
https://unicode.org/ucd/
License: https://www.unicode.org/copyright.html
Some of its users: https://wiki.gnome.org/action/show/Apps/Gucharmap, http://www.decodeunicode.org/, https://unicode-table.com/en/, https://www.fontspace.com/
Unicode.
https://www.cia.gov/library/publications/download/
License: https://www.cia.gov/library/publications/the-world-factbook/docs/contributor_copyright.html
Some of its users: https://www.cia.gov/library/publications/resources/the-world-factbook/
General facts about countries and regions.
https://www.geonames.org/
License: https://www.geonames.org/
Gazetteer and postal code data for free.
https://iso639-3.sil.org/code_tables/download_tables/
License: https://iso639-3.sil.org/code_tables/download_tables/
Some of its users: https://iso639-3.sil.org/code_tables/639/data, https://polyglotclub.com/
ISO 639-3 tables. It assigns each language a code and is updated every year.
https://www.unicode.org/iso15924/codelists.html
License: https://www.unicode.org/copyright.html
Some of its users: http://www.unicode.org/iso15924/codelists.html
ISO 15924 lists. Codes for scripts.
https://www.unece.org/cefact/locode/welcome.html
License: https://www.unece.org/cefact/locode/locode_since1981.html
UN/LOCODE, an alternative to ISO 3166-2. It is updated twice a year.
http://www.nationalanthems.info/
License: http://www.nationalanthems.info/
National anthems.
Formats
Sheet
database name with link | file name | field separator | field 1 | field 2 | field 3 | field 4 | field 5 | field 6 | field 7 | field 8 | field 9 | field 10 | field 11 | field 12 | field 13 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
dictionary | ||||||||||||||||
An ordered and extended TOCFL word-list | tocfl.tsv | <tab> | Word | Pinyin | OtherPinyin | Level | First Translation | Other Translation | ||||||||
CC-Canto | cccanto-webdist.txt | <space> | Traditional | Simplified | [pin1 yin1] | {jyut6 ping3} | /English equivalent 1/equivalent 2/ | |||||||||
CC-CEDICT | cedict_ts.u8 | <space> | Traditional | Simplified | [pin1 yin1] | /English equivalent 1/equivalent 2/ | ||||||||||
CFDICT | CFDICT.u8 | <space> | Traditionnel | Simplifié | [pin1 yin1] | /traduction 1/traduction2/ | ||||||||||
CHDICT | CHDICT.u8 | <space> | Tradicionális | Egyszerűsített | [pin1 yin1] | /magyar egyenérték 1/ egyenérték 2 | ||||||||||
ECDICT | ecdict.csv | , | word | phonetic | definition | translation | pos | collins | oxford | tag | bnc | frq | exchange | detail | audio | |
English Persian Word Database | EnglishPersianWordDatabase.xlsx | EnglishWord | PersianWord | |||||||||||||
ESPDIC | espdict.txt | : | Esperanto | English | ||||||||||||
HanDeDict | handedict.u8 | <space> | Traditionel | Vereinfacht | [pin1 yin1] | /deutsche Entsprechung 1 /Entsprechung 2/ | ||||||||||
libhangul | hanja.txt | : | Hangul | Hanja | note | |||||||||||
IEDICT | iedict.txt | : | Interlingua | English | ||||||||||||
Inglise-eesti sõnaraamat | eestiinglise.txt | <tab> | eeste | inglise | ||||||||||||
JLPT Vocabulary | VocabList.N1.doc
VocabList.N2.doc VocabList.N3.doc VocabList.N4.doc VocabList.N5.doc |
Kanji | Hiragana | English | ||||||||||||
kengdic | kengdic_2011.tsv | <tab> | wordid | word | ? | def | ? | ? | submitter | doe | ? | hanja | ? | ? | ||
The Maryknoll Taiwanese-English Dictionary & English-Taiwanese Dictionary 2013 edition | Mkdictionary.xls | Sort | Taiwanese | Chinese | English | |||||||||||
VNEDICT | vnedict.txt | : | Vietnamese | English | ||||||||||||
word list | ||||||||||||||||
한국어능력시험 어휘목록 | 토픽 어휘 목록_공개 목록.xlsx | 수준 | 어휘 | 길잡이말 | 품사 | |||||||||||
古汉语单字字频: Character frequency list of Classical Chinese | CharFreq-Classical.xls | Serial number; 序号 | Character; 汉字 | |||||||||||||
现代汉语单字字频: Character frequency list of Modern Chinese | CharFreq.txt | <tab> | Serial number; 序号 | Character; 汉字 | Individual raw frequency; 频率 | Cumulative frequency in percentile; 累计频率 | Pinyin; 拼音 | English translation; 英文翻译 | ||||||||
通用规范汉字表 | 编号 | 字形 | ||||||||||||||
常用國字標準字體表 | 流水序 | 教育部字號 | Unicode | 常用字 | ||||||||||||
新汉语水平考试(HSK)词汇(2012年修订版) | HSK-2012.xls | 单词(等级) |
Manually convert to TSV
file name | process (on Linux) |
---|---|
cccanto-webdist.txt |
|
cedict_ts.u8 |
|
CharFreq.txt |
|
CharFreq-Classical.xls |
|
CHDICT.u8 |
|
ecdict.csv |
|
eestiinglise.txt |
|
EnglishPersianWordDatabase.xlsx |
|
espdict.txt |
|
handedict.u8 |
|
hanja.txt |
|
HSK-2012.xls |
|
iedict.txt |
|
kengdic_2011.tsv |
|
Mkdictionary.xls |
|
tocfl.tsv |
|
vnedict.txt |
|
토픽 어휘 목록_공개 목록.xlsx |
|
Others
database name with link | format |
---|---|
FreeDict | slob |
Free Vietnamese Dictionary Project | dict.dz |
XOBDO.ORG | db |