Difference between revisions of "Language/Multiple-languages/Culture/Licensed-Free-Databases"
(5 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
[[Category:Free-Resources]] | [[Category:Free-Resources]] | ||
[[Category:Computer-Knowledge]] | |||
On this page we have listed free language databases (organized collection of data related to languages). | On this page we have listed free language databases (organized collection of data related to languages). | ||
Line 7: | Line 8: | ||
=== Multiple languages === | === Multiple languages === | ||
====https://www.ethnologue.com/codes/download-code-tables<nowiki/>==== | |||
LanguageCodes.tab lists the 7,400+ distinct language identifiers used in the current Ethnologue database. | |||
==== https://dumps.wikimedia.org/ ==== | ==== https://dumps.wikimedia.org/ ==== |
Revision as of 16:48, 1 January 2021
On this page we have listed free language databases (organized collection of data related to languages).
The listed items are data sources, not sofwares able to use this data (like database-management systems). Therefore if you don't know programming, this page might not be of much help to you.
Main
Multiple languages
https://www.ethnologue.com/codes/download-code-tables
LanguageCodes.tab lists the 7,400+ distinct language identifiers used in the current Ethnologue database.
https://dumps.wikimedia.org/
License: https://dumps.wikimedia.org/legal.html
Some of its users: https://www.wikimedia.org/
Wikimedia.
https://iate.europa.eu/download-iate/
License: https://iate.europa.eu/download-iate/
Some of its users: https://iate.europa.eu/download-iate/
Terminology dictionary of the EU.
https://tatoeba.org/eng/downloads/
License: https://tatoeba.org/eng/downloads/
Some of its users: https://tatoeba.org/, http://www.listeningpractice.org/, https://jisho.org/
Parallel corpora. In common words, collections about a sentence in different languages.
https://wiki.documentfoundation.org/Language_support_of_LibreOffice
License: https://wiki.documentfoundation.org/Language_support_of_LibreOffice
Some of its users: https://www.libreoffice.org/
You can find the “Spell check dictionaries” and other useful things.
http://www.gutenberg.org/wiki/Gutenberg:Information_About_Robot_Access_to_our_Pages
License: http://www.gutenberg.org/wiki/Gutenberg:Terms_of_Use
Some of its users: http://www.gutenberg.org/, https://librivox.org/ LibriVox
Ebooks.
https://librivox.org/pages/about-librivox/
License: https://librivox.org/pages/about-librivox/
Some of its users: https://librivox.org/, http://www.listeningpractice.org/
Audio books.
https://freedict.org/downloads/
License: https://freedict.org/about/
Some of its users: http://aarddict.org/
Dictionaries.
http://www.omegawiki.org/Help:Downloading_the_data
License: http://www.omegawiki.org/Meta:Main_Page
Some of its users: http://www.omegawiki.org/Meta:Main_Page, http://dictionarymid.sourceforge.net/
Dictionaries.
http://www.xobdo.org/downloads/
License: http://www.xobdo.org/downloads/
Some of its users: http://www.xobdo.org/
Dictionaries for South Asian languages and English.
https://ltrc.iiit.ac.in/onlineServices/Dictionaries/Dict_Frame.html
License: https://ltrc.iiit.ac.in/onlineServices/Dictionaries/GPLHelp.html
Dictionaries for South Asian languages and English.
http://compling.hss.ntu.edu.sg/omw/
License: http://compling.hss.ntu.edu.sg/omw/
Some of its users: http://compling.hss.ntu.edu.sg/omw/cgi-bin/wn-gridx.cgi?gridmode=grid
Wordnets.
http://www.dicto.org.ru/xdxf.html
License: http://dicto.org.ru/license.html
Some of its users: http://dicto.org.ru/
Repository of dictionaries (from elsewhere).
http://shtooka.net/download.php
License: http://shtooka.net/
Collections of audio.
https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists
License: https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists
Frequency lists.
https://lego.linguistlist.org/about#contact
License: https://lego.linguistlist.org/about#copyright
Some of its users: https://lego.linguistlist.org/
Lexicon. No download link on the website.
https://panlex.org/source-list/
License: https://panlex.org/license/
Some of its users: https://glosbe.com
Lexical database links.
https://github.com/cburgmer/cjklib
License: https://github.com/cburgmer/cjklib/blob/master/COPYING
Some of its users: https://www.skishore.me/makemeahanzi/
Data about Han script.
https://www.radio-browser.info/gui/#!/
License: https://www.radio-browser.info/gui/#!/
Some of its users: https://github.com/segler-alex/RadioDroid
Database of radio stations.
https://help.archive.org/hc/en-us/articles/360017781111-How-to-download-files-
License: https://www.archive.org/about/terms.php
Some of its users: https://www.archive.org/
Archived Internet content.
https://www.fandom.com/
License: https://www.fandom.com/licensing
Fan-made wiki.
American Sign Language
http://www.asl-lex.org/
License: http://www.asl-lex.org/
Lexicon.
Burmese
https://github.com/saturngod/ornagai-V2
License: https://github.com/saturngod/ornagai-V2/blob/master/License
Some of its users: https://www.ornagai.com/#/
Dictionary.
Catalan
http://www.catalandictionary.org/en/search/
License: http://www.catalandictionary.org/en/search/
Dictionary. Font of license is too small.
Chinese
https://resources.publicense.moe.edu.tw/index.html
License: https://resources.publicense.moe.edu.tw/index.html
Some of its users: https://resources.publicense.moe.edu.tw/index.html, https://www.moedict.tw/
Dictionaries of ROC Mandarin Chinese written in ROC Mandarin Chinese.
https://cc-cedict.org/editor/editor.php
License: https://cc-cedict.org/wiki/
Some of its users: https://www.mdbg.net/chinese/dictionary, https://www.pleco.com/
Mandarin-English dictionary.
https://chine.in/mandarin/dictionnaire/CFDICT/
License: https://chine.in/mandarin/dictionnaire/CFDICT/
Some of its users: https://chine.in/, https://www.pleco.com/
Mandarin-French dictionary.
https://handedict.zydeo.net/de/download
License: https://handedict.zydeo.net/de/download
Some of its users: https://www.pleco.com/
Mandarin-German dictionary.
https://chdict.zydeo.n.et/en/download/
License: https://chdict.zydeo.net/en/download/
Some of its users: https://chdict.zydeo.net/hu/
Mandarin-Hungarian dictionary.
http://cantonese.org/download.html
License: http://cantonese.org/download.html
Some of its users: http://cantonese.org/, https://www.pleco.com/
Cantonese-English dictionary.
https://twblg.dict.edu.tw/holodict_new/compile1_6_1.jsp
License: https://twblg.dict.edu.tw/holodict_new/compile1_6_1.jsp
Some of its users: https://twblg.dict.edu.tw/holodict_new/default.jsp, https://www.moedict.tw/
Taiwanese-Endlish dictionary. It can be requested through email.
http://www.taiwanesedictionary.org/
License: http://www.taiwanesedictionary.org/
Taiwanese-English dictionary.
http://lingua.mtsu.edu/chinese-computing/
License: http://lingua.mtsu.edu/chinese-computing/copyright.html
Frequency lists.
https://www.tanos.co.uk/hsk/
License: https://www.tanos.co.uk/jlpt/sharing/
HSK data.
http://www.hskhsk.com/resources.html
License: http://www.hskhsk.com/resources.html
HSK data.
https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8
License: https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8
Frequent characters.
https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8
Frequent characters.
http://input.foruto.com/ccc/gongbiu/index.htm
License:
Frequent characters.
Esperanto
http://reta-vortaro.de/tgz/index.html
License: http://reta-vortaro.de/tgz/index.html
Some of its users: http://reta-vortaro.de/, http://www.busydoingnothing.co.uk/prevo/
Dictionary in several languages.
http://www.denisowski.org/Esperanto/ESPDIC/espdic_readme.html
License: http://www.denisowski.org/Esperanto/ESPDIC/espdic_readme.html
Some of its users: http://www.denisowski.org/Esperanto/ESPDIC/espdic_readme.html
Dictionary.
https://komputeko.net/elsxutejo-en.php
License: https://komputeko.net/index_en.php
Some of its users: https://komputeko.net/index_en.php
Computer terminology dictionary.
German Sign Language
https://signdict.org/
License: https://signdict.org/about
Some of its users: https://signdict.org/
Dictionary.
English
http://gcide.gnu.org.ua/download
License: http://gcide.gnu.org.ua/license
Some of its users: http://gcide.gnu.org.ua/
Dictionary of definition.
https://foldoc.org/source.html
License: https://foldoc.org/Free+On-line+Dictionary
Some of its users: https://foldoc.org/
Dictionary about computing.
https://github.com/skywind3000/ECDICT
License: https://github.com/skywind3000/ECDICT/blob/master/LICENSE
Some of its users: https://github.com/program-in-chinese/webextension_english_chinese_dictionary
Dictionary.
https://github.com/tony-mak/Eng-Chi-Dictionary/tree/master/app/src/main/assets/databases
License: https://github.com/tony-mak/Eng-Chi-Dictionary/blob/master/LICENSE
Dictionary.
https://github.com/linuxkathirvel/eng2tamildictionary/blob/master/dictionary.json
License: https://github.com/linuxkathirvel/eng2tamildictionary/blob/master/License.txt
Dictionary.
https://github.com/derekchuank/high-frequency-vocabulary
License: https://github.com/derekchuank/high-frequency-vocabulary/blob/master/LICENSE
Dictionary.
https://github.com/kujirahand/EJDict/tree/master/src
License: https://github.com/kujirahand/EJDict/blob/master/LICENSE
Dictionary.
Estonian
https://www.eki.ee/litsents/
License: https://www.eki.ee/litsents/
Some of its users: http://portaal.eki.ee/sonaraamatud.html
Dictionaries. Actually only 2 are available.
German
https://www.openthesaurus.de/about/download/
License: https://www.openthesaurus.de/about/download/
Some of its users: https://www.openthesaurus.de/about/download/
Thesaurus.
Hindi
http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/downloaderInfo.php
License: http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/index.php
Some of its users: http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/dict_search_user.php
Dictionary. Application is required.
Icelandic
https://www.ling.upenn.edu/~kurisuto/germanic/oi_cleasbyvigfusson_about.html
License: http://lexicon.ff.cuni.cz/txt/oi_cleasbyvigfusson.txt
Dictionary.
Interlingua
http://www.denisowski.org/Interlingua/IEDICT/iedict_readme.html
License: http://www.denisowski.org/Interlingua/IEDICT/iedict_readme.html
Some of its users: http://www.denisowski.org/Interlingua/IEDICT/iedict_readme.html
Dictionary.
Interlingue
https://github.com/Carmina16/hunspell-ie
License: https://github.com/Carmina16/hunspell-ie/blob/master/LICENSE
Spell checker with dictionary.
Iranian Persian
https://github.com/amirshnll/English-Persian-Word-Database
License: https://github.com/amirshnll/English-Persian-Word-Database/blob/master/LICENSE
Dictionary.
Japanese
http://www.edrdg.org/wiki/index.php/Main_Page
License: https://www.edrdg.org/edrdg/licence.html
Some of its users: https://jisho.org/, https://www.tagaini.net/
Japanese dictionaries.
https://github.com/KanjiVG/kanjivg/releases/
License: http://kanjivg.tagaini.net/
Some of its users: https://www.tagaini.net/, https://jisho.org/
Kanji strokes.
http://dico.fj.free.fr/dico.php
License: http://dico.fj.free.fr/copyright.php
Some of its users: http://dico.fj.free.fr/traduction/index.php
Japanese-French dictionary.
https://github.com/mifunetoshiro/kanjium
License: https://github.com/mifunetoshiro/kanjium/blob/master/LICENSE.txt
Kanji data.
https://www.tanos.co.uk/jlpt/
License: https://www.tanos.co.uk/jlpt/sharing/
JLPT data.
https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7
License: http://www.bunka.go.jp/bunkacho_homepage/index.html
Frequent characters.
https://ja.wiktionary.org/wiki/Wiktionary:%E4%BA%BA%E5%90%8D%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7
License: http://www.moj.go.jp/term.html
Frequent characters for names.
https://ja.wikipedia.org/wiki/%E5%AD%A6%E5%B9%B4%E5%88%A5%E6%BC%A2%E5%AD%97%E9%85%8D%E5%BD%93%E8%A1%A8
License: http://www.mext.go.jp/b_menu/about_link.htm
Frequent characters according to school grades.
Jeju
https://jeju.go.kr/culture/dialect/dictionary.htm
License: https://jeju.go.kr/help/policy/copyright.htm
Some of its users: https://jeju.go.kr/culture/dialect/dictionary.htm
Dictionary.
Klingon
http://klingonska.org/dict/dict.zdb
License: http://klingonska.org/dict/
Some of its users: http://klingonska.org/dict/
Dictionary.
Korean
https://krdict.korean.go.kr/mainAction
License: https://krdict.korean.go.kr/kboardPolicy/copyRightTermsInfo
Some of its users: https://krdict.korean.go.kr/mainAction
Dictionary. Download link is unknown.
https://opendict.korean.go.kr/main
License: https://opendict.korean.go.kr/service/copyrightPolicy
Some of its users: https://opendict.korean.go.kr/main
Dictionary. Download link is unknown.
https://stdict.korean.go.kr/main/main.do
License: https://stdict.korean.go.kr/join/copyrightPolicy.do
Some of its users: https://stdict.korean.go.kr/main/main.do
Dictionary. Download link is unknown.
https://github.com/garfieldnate/kengdic
License: https://github.com/garfieldnate/kengdic
Some if its users: http://www.toktogi.com/
Dictionary.
https://github.com/libhangul/libhangul/blob/master/data/hanja/hanja.txt
License: https://github.com/libhangul/libhangul/blob/master/data/hanja/hanja.txt
Words in Hangul and Hanja.
There is a page of introduction: https://wiki.kldp.org/wiki.php/libhangul.
https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%ED%95%9C%EB%AC%B8_%EA%B5%90%EC%9C%A1%EC%9A%A9_%EA%B8%B0%EC%B4%88_%ED%95%9C%EC%9E%90_1800
License: http://www.suneung.re.kr/sub/info.do?m=0601&s=suneung
http://www.suneung.re.kr/boardCnts/fileDown.do?fileSeq=59692112e521efa80d2af27916704082 in a easy-to-copy form.
License: https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110702
Word list of TOPIK.
https://github.com/mhagiwara/cc-kedict
License: https://github.com/mhagiwara/cc-kedict
Dictionary.
Lithuanian
https://github.com/ispell-lt/ispell-lt
License: https://github.com/ispell-lt/ispell-lt/blob/master/COPYING
Spell checker with dictionary.
Nepali
https://github.com/nirooj56/Nepdict
License: https://github.com/nirooj56/Nepdict/blob/master/LICENSE
Dictionary.
Russian
https://en.openrussian.org/dictionary
License: https://en.openrussian.org/dictionary
Some of its users: https://en.openrussian.org/
Dictionary.
Sanskrit
https://github.com/hemanth/sanskrit-dict/blob/master/dict.js
License: https://github.com/hemanth/sanskrit-dict/blob/master/license
Dictionary.
Slovak
http://sk-spell.sk.cx/hunspell-sk
License: http://sk-spell.sk.cx/hunspell-sk
Spell checker with dictionary.
Vietnamese
http://www.informatik.uni-leipzig.de/~duc/Dict/install.html
License: http://www.informatik.uni-leipzig.de/~duc/Dict/install.html
Some of its users: https://www.informatik.uni-leipzig.de/~duc/Dict/
Dictionaries in several languages.
There is a page of introduction: https://vi.wiktionary.org/wiki/Wiktionary:Ngu%E1%BB%93n_g%E1%BB%91c/FVDP
http://www.denisowski.org/Vietnamese/vnedict_readme.htm
License: http://www.denisowski.org/Vietnamese/vnedict_readme.htm
Some of its users: http://www.denisowski.org/Vietnamese/vnedict_readme.htm
Dictionary.
https://github.com/duyetdev/vietnamese-wordlist
License: https://github.com/duyetdev/vietnamese-wordlist/blob/master/LICENSE
Word list.
https://github.com/duyetdev/vietnamese-namedb
License: https://github.com/duyetdev/vietnamese-namedb/blob/master/LICENSE
Name list.
Non-language
https://unicode.org/ucd/
License: https://www.unicode.org/copyright.html
Some of its users: https://wiki.gnome.org/action/show/Apps/Gucharmap, http://www.decodeunicode.org/, https://unicode-table.com/en/, https://www.fontspace.com/
Unicode.
https://www.cia.gov/library/publications/download/
License: https://www.cia.gov/library/publications/the-world-factbook/docs/contributor_copyright.html
Some of its users: https://www.cia.gov/library/publications/resources/the-world-factbook/
General facts about countries and regions.
https://www.geonames.org/
License: https://www.geonames.org/
Gazetteer and postal code data for free.
https://iso639-3.sil.org/code_tables/download_tables/
License: https://iso639-3.sil.org/code_tables/download_tables/
Some of its users: https://iso639-3.sil.org/code_tables/639/data, https://polyglotclub.com/
ISO 639-3 tables. It assigns each language a code and is updated every year.
https://www.unicode.org/iso15924/codelists.html
License: https://www.unicode.org/copyright.html
Some of its users: http://www.unicode.org/iso15924/codelists.html
ISO 15924 lists. Codes for scripts.
https://www.unece.org/cefact/locode/welcome.html
License: https://www.unece.org/cefact/locode/locode_since1981.html
UN/LOCODE, an alternative to ISO 3166-2. It is updated twice a year.
http://www.nationalanthems.info/
License: http://www.nationalanthems.info/
National anthems.
Formats
Sheet
database name with link | file name | field separator | field 1 | field 2 | field 3 | field 4 | field 5 | field 6 | field 7 | field 8 | field 9 | field 10 | field 11 | field 12 | field 13 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
dictionary | ||||||||||||||||
An ordered and extended TOCFL word-list | tocfl.tsv | <tab> | Word | Pinyin | OtherPinyin | Level | First Translation | Other Translation | ||||||||
CC-Canto | cccanto-webdist.txt | <space> | Traditional | Simplified | [pin1 yin1] | {jyut6 ping3} | /English equivalent 1/equivalent 2/ | |||||||||
CC-CEDICT | cedict_ts.u8 | <space> | Traditional | Simplified | [pin1 yin1] | /English equivalent 1/equivalent 2/ | ||||||||||
CFDICT | CFDICT.u8 | <space> | Traditionnel | Simplifié | [pin1 yin1] | /traduction 1/traduction2/ | ||||||||||
CHDICT | CHDICT.u8 | <space> | Tradicionális | Egyszerűsített | [pin1 yin1] | /magyar egyenérték 1/ egyenérték 2 | ||||||||||
ECDICT | ecdict.csv | , | word | phonetic | definition | translation | pos | collins | oxford | tag | bnc | frq | exchange | detail | audio | |
English Persian Word Database | EnglishPersianWordDatabase.xlsx | EnglishWord | PersianWord | |||||||||||||
ESPDIC | espdict.txt | : | Esperanto | English | ||||||||||||
HanDeDict | handedict.u8 | <space> | Traditionel | Vereinfacht | [pin1 yin1] | /deutsche Entsprechung 1 /Entsprechung 2/ | ||||||||||
libhangul | hanja.txt | : | Hangul | Hanja | note | |||||||||||
IEDICT | iedict.txt | : | Interlingua | English | ||||||||||||
Inglise-eesti sõnaraamat | eestiinglise.txt | <tab> | eeste | inglise | ||||||||||||
JLPT Vocabulary | VocabList.N1.doc
VocabList.N2.doc VocabList.N3.doc VocabList.N4.doc VocabList.N5.doc |
Kanji | Hiragana | English | ||||||||||||
kengdic | kengdic_2011.tsv | <tab> | wordid | word | ? | def | ? | ? | submitter | doe | ? | hanja | ? | ? | ||
The Maryknoll Taiwanese-English Dictionary & English-Taiwanese Dictionary 2013 edition | Mkdictionary.xls | Sort | Taiwanese | Chinese | English | |||||||||||
VNEDICT | vnedict.txt | : | Vietnamese | English | ||||||||||||
word list | ||||||||||||||||
한국어능력시험 어휘목록 | 토픽 어휘 목록_공개 목록.xlsx | 수준 | 어휘 | 길잡이말 | 품사 | |||||||||||
古汉语单字字频: Character frequency list of Classical Chinese | CharFreq-Classical.xls | Serial number; 序号 | Character; 汉字 | |||||||||||||
现代汉语单字字频: Character frequency list of Modern Chinese | CharFreq.txt | <tab> | Serial number; 序号 | Character; 汉字 | Individual raw frequency; 频率 | Cumulative frequency in percentile; 累计频率 | Pinyin; 拼音 | English translation; 英文翻译 | ||||||||
通用规范汉字表 | 编号 | 字形 | ||||||||||||||
常用國字標準字體表 | 流水序 | 教育部字號 | Unicode | 常用字 | ||||||||||||
新汉语水平考试(HSK)词汇(2012年修订版) | HSK-2012.xls | 单词(等级) |
Manually convert to TSV
file name | process (on Linux) |
---|---|
cccanto-webdist.txt |
|
cedict_ts.u8 |
|
CharFreq.txt |
|
CharFreq-Classical.xls |
|
CHDICT.u8 |
|
ecdict.csv |
|
eestiinglise.txt |
|
EnglishPersianWordDatabase.xlsx |
|
espdict.txt |
|
handedict.u8 |
|
hanja.txt |
|
HSK-2012.xls |
|
iedict.txt |
|
kengdic_2011.tsv |
|
Mkdictionary.xls |
|
tocfl.tsv |
|
vnedict.txt |
|
토픽 어휘 목록_공개 목록.xlsx |
|
Others
database name with link | format |
---|---|
FreeDict | slob |
Free Vietnamese Dictionary Project | dict.dz |
XOBDO.ORG | db |