Difference between revisions of "Language/Multiple-languages/Culture/Licensed-Free-Databases"
m (Quick edit) |
|||
(207 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
[[ | <div class="pg_page_title">Licensed-Free Databases Around Languages</div> | ||
[[File:best-licensed-free-databases-polyglotclub.jpg|thumb]] | |||
Hi polyglots! 😀 | |||
➡ On this page we have listed free databases related to languages. | |||
* Those mentioned on [[Language/Multiple-languages/Culture/Internet-Dictionaries|Internet Dictionaries]] will not be mentioned again here. | |||
* The listed items are data, so if you don't know programming, this page might not be of much help to you. | |||
== Main == | |||
=== Multiple languages === | |||
====https://www.ethnologue.com/codes/download-code-tables<nowiki/>==== | |||
LanguageCodes.tab lists the 7,400+ distinct language identifiers used in the current Ethnologue database. | |||
=== https://dumps.wikimedia.org/ === | ==== https://dumps.wikimedia.org/ ==== | ||
License | License: https://dumps.wikimedia.org/legal.html | ||
Some of its users: | Some of its users: https://www.wikimedia.org/ | ||
Wikimedia. | Wikimedia. | ||
=== https://tatoeba.org/eng/downloads/ === | ==== https://tatoeba.org/eng/downloads/ ==== | ||
License | License: https://tatoeba.org/eng/downloads/ | ||
Some of its users: https://tatoeba.org/, http://www.listeningpractice.org/, https://jisho.org/ | |||
Parallel corpora. In common words, collections about a sentence in different languages. | |||
==== https://wiki.documentfoundation.org/Language_support_of_LibreOffice ==== | |||
License: https://wiki.documentfoundation.org/Language_support_of_LibreOffice | |||
Some of its users: https://www.libreoffice.org/ | |||
You can find the “Spell check dictionaries” and other useful things. | |||
==== http://www.gutenberg.org/wiki/Gutenberg:Information_About_Robot_Access_to_our_Pages ==== | |||
License: http://www.gutenberg.org/wiki/Gutenberg:Terms_of_Use | |||
Some of its users: http://www.gutenberg.org/, https://librivox.org/ LibriVox | |||
Ebooks. | |||
==== https://librivox.org/pages/about-librivox/ ==== | |||
License: https://librivox.org/pages/about-librivox/ | |||
Some of its users: https://librivox.org/, http://www.listeningpractice.org/ | |||
Some of its users: | Audio books. | ||
==== http://www.omegawiki.org/Help:Downloading_the_data ==== | |||
License: http://www.omegawiki.org/Meta:Main_Page | |||
Some of its users: http://www.omegawiki.org/Meta:Main_Page, http://dictionarymid.sourceforge.net/ | |||
Dictionaries. | Dictionaries. | ||
=== | ==== https://ltrc.iiit.ac.in/onlineServices/Dictionaries/Dict_Frame.html ==== | ||
License | License: https://ltrc.iiit.ac.in/onlineServices/Dictionaries/GPLHelp.html | ||
Dictionaries. | Dictionaries for South Asian languages and English. | ||
==== http://compling.hss.ntu.edu.sg/omw/ ==== | |||
License: http://compling.hss.ntu.edu.sg/omw/ | |||
Some of its users: http://compling.hss.ntu.edu.sg/omw/cgi-bin/wn-gridx.cgi?gridmode=grid | |||
Wordnets. | |||
==== http://www.dicto.org.ru/xdxf.html ==== | |||
License: http://dicto.org.ru/license.html | |||
Some of its users: http://dicto.org.ru/ | |||
Repository of dictionaries (from elsewhere). | |||
=== http:// | ==== http://shtooka.net/download.php ==== | ||
License | License: http://shtooka.net/ | ||
Collections of audio. | |||
=== https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists === | ==== https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists ==== | ||
License | License: https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists | ||
Frequency lists. | Frequency lists. | ||
=== https://lego.linguistlist.org/ === | ==== https://lego.linguistlist.org/about#contact ==== | ||
License | License: https://lego.linguistlist.org/about#copyright | ||
Some of its users: https://lego.linguistlist.org/ | |||
Lexicon. No download link on the website. | |||
==== https://panlex.org/source-list/ ==== | |||
License: https://panlex.org/license/ | |||
Some of its users: https://glosbe.com | |||
Lexical database links. | |||
==== https://github.com/cburgmer/cjklib ==== | |||
License: https://github.com/cburgmer/cjklib/blob/master/COPYING | |||
Some of its users: https://www.skishore.me/makemeahanzi/ | |||
Data about Han script. | |||
Some of its users: | ==== https://www.radio-browser.info/gui/#!/ ==== | ||
License: https://www.radio-browser.info/gui/#!/ | |||
Some of its users: https://github.com/segler-alex/RadioDroid | |||
Database of radio stations. | |||
==== https://help.archive.org/hc/en-us/articles/360017781111-How-to-download-files- ==== | |||
License: https://www.archive.org/about/terms.php | |||
Some of its users: https://www.archive.org/ | |||
Archived Internet content. | |||
==== https://www.fandom.com/ ==== | |||
License: https://www.fandom.com/licensing | |||
Fan-made wiki. | |||
=== Chinese === | |||
==== http://lingua.mtsu.edu/chinese-computing/ ==== | |||
License: http://lingua.mtsu.edu/chinese-computing/copyright.html | |||
Character frequency lists. | |||
==== https://github.com/gwinterstein/Cifu ==== | |||
License: https://github.com/gwinterstein/Cifu/blob/master/LICENSE | |||
Word frequency list for Yue Chinese. | |||
==== https://www.tanos.co.uk/hsk/ ==== | |||
License: https://www.tanos.co.uk/jlpt/sharing/ | |||
HSK data. | |||
==== http://www.hskhsk.com/resources.html ==== | |||
License: http://www.hskhsk.com/resources.html | |||
HSK data. | |||
==== https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8 ==== | |||
License: https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8 | |||
Frequent characters. | |||
==== https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8 ==== | |||
License: https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8 | |||
Frequent characters. | |||
==== http://input.foruto.com/ccc/gongbiu/index.htm ==== | |||
License: | |||
Frequent characters. | |||
=== English === | |||
==== http://gcide.gnu.org.ua/download ==== | |||
License: http://gcide.gnu.org.ua/license | |||
Some of its users: http://gcide.gnu.org.ua/ | |||
Dictionary of definition. | |||
==== https://foldoc.org/source.html ==== | |||
License: https://foldoc.org/Free+On-line+Dictionary | |||
Some of its users: https://foldoc.org/ | |||
Dictionary about computing. | |||
==== https://github.com/tony-mak/Eng-Chi-Dictionary/tree/master/app/src/main/assets/databases ==== | |||
License: https://github.com/tony-mak/Eng-Chi-Dictionary/blob/master/LICENSE | |||
Dictionary. | |||
==== https://github.com/linuxkathirvel/eng2tamildictionary/blob/master/dictionary.json ==== | |||
License: https://github.com/linuxkathirvel/eng2tamildictionary/blob/master/License.txt | |||
Dictionary. | |||
==== https://github.com/derekchuank/high-frequency-vocabulary ==== | |||
License: https://github.com/derekchuank/high-frequency-vocabulary/blob/master/LICENSE | |||
Dictionary. | |||
==== https://github.com/kujirahand/EJDict/tree/master/src ==== | |||
License: https://github.com/kujirahand/EJDict/blob/master/LICENSE | |||
Dictionary. | |||
=== Hindi === | |||
==== http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/downloaderInfo.php ==== | |||
License: http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/index.php | |||
Some of its users: http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/dict_search_user.php | |||
Dictionary. Application is required. | |||
=== Icelandic === | |||
==== https://www.ling.upenn.edu/~kurisuto/germanic/oi_cleasbyvigfusson_about.html ==== | |||
License: http://lexicon.ff.cuni.cz/txt/oi_cleasbyvigfusson.txt | |||
Dictionary. | |||
=== Interlingue === | |||
==== https://github.com/Carmina16/hunspell-ie ==== | |||
License: https://github.com/Carmina16/hunspell-ie/blob/master/LICENSE | |||
Spell checker with dictionary. | |||
=== Japanese === | |||
==== https://github.com/KanjiVG/kanjivg/releases/ ==== | |||
License: http://kanjivg.tagaini.net/ | |||
Some of its users: https://www.tagaini.net/, https://jisho.org/ | |||
Kanji strokes. | Kanji strokes. | ||
== | ==== https://github.com/mifunetoshiro/kanjium ==== | ||
License: https://github.com/mifunetoshiro/kanjium/blob/master/LICENSE.txt | |||
Kanji data. | |||
==== https://www.tanos.co.uk/jlpt/ ==== | |||
License: https://www.tanos.co.uk/jlpt/sharing/ | |||
JLPT data. | |||
==== https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7 ==== | |||
License: http://www.bunka.go.jp/bunkacho_homepage/index.html | |||
Frequent characters. | |||
==== https://ja.wiktionary.org/wiki/Wiktionary:%E4%BA%BA%E5%90%8D%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7 ==== | |||
License: http://www.moj.go.jp/term.html | |||
Frequent characters for names. | |||
==== https://ja.wikipedia.org/wiki/%E5%AD%A6%E5%B9%B4%E5%88%A5%E6%BC%A2%E5%AD%97%E9%85%8D%E5%BD%93%E8%A1%A8 ==== | |||
License: http://www.mext.go.jp/b_menu/about_link.htm | |||
Frequent characters according to school grades. | |||
=== Korean === | |||
==== https://github.com/libhangul/libhangul/blob/master/data/hanja/hanja.txt ==== | |||
License: https://github.com/libhangul/libhangul/blob/master/data/hanja/hanja.txt | |||
Words in Hangul and Hanja. | |||
There is a page of introduction: https://wiki.kldp.org/wiki.php/libhangul. | |||
==== https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%ED%95%9C%EB%AC%B8_%EA%B5%90%EC%9C%A1%EC%9A%A9_%EA%B8%B0%EC%B4%88_%ED%95%9C%EC%9E%90_1800 ==== | |||
License: http://www.suneung.re.kr/sub/info.do?m=0601&s=suneung | |||
http://www.suneung.re.kr/boardCnts/fileDown.do?fileSeq=59692112e521efa80d2af27916704082 in a easy-to-copy form. | |||
==== https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110503&boardSeq=64217 ==== | |||
License: https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110702 | |||
Word list of TOPIK. | |||
=== Lithuanian === | |||
==== https://github.com/ispell-lt/ispell-lt ==== | |||
License: https://github.com/ispell-lt/ispell-lt/blob/master/COPYING | |||
Spell checker with dictionary. | |||
=== Sanskrit === | |||
==== https://github.com/hemanth/sanskrit-dict/blob/master/dict.js ==== | |||
License: https://github.com/hemanth/sanskrit-dict/blob/master/license | |||
Dictionary. | |||
=== Slovak === | |||
==== http://sk-spell.sk.cx/hunspell-sk ==== | |||
License: http://sk-spell.sk.cx/hunspell-sk | |||
Spell checker with dictionary. | |||
=== Vietnamese === | |||
==== https://github.com/duyetdev/vietnamese-wordlist ==== | |||
License: https://github.com/duyetdev/vietnamese-wordlist/blob/master/LICENSE | |||
Word list. | |||
==== https://github.com/duyetdev/vietnamese-namedb ==== | |||
License: https://github.com/duyetdev/vietnamese-namedb/blob/master/LICENSE | |||
Name list. | |||
=== | === Non-language === | ||
==== https://unicode.org/ucd/ ==== | |||
License: https://www.unicode.org/copyright.html | |||
Some of its users: https://wiki.gnome.org/action/show/Apps/Gucharmap, http://www.decodeunicode.org/, https://unicode-table.com/en/, https://www.fontspace.com/ | |||
Unicode. | |||
==== https://www.cia.gov/library/publications/download/ ==== | |||
License: https://www.cia.gov/library/publications/the-world-factbook/docs/contributor_copyright.html | |||
Some of its users: https://www.cia.gov/library/publications/resources/the-world-factbook/ | |||
General facts about countries and regions. | |||
==== https://www.geonames.org/ ==== | |||
License: https://www.geonames.org/ | |||
Gazetteer and postal code data for free. | |||
=== | ==== https://iso639-3.sil.org/code_tables/download_tables/ ==== | ||
License | License: https://iso639-3.sil.org/code_tables/download_tables/ | ||
Some of its users: | Some of its users: https://iso639-3.sil.org/code_tables/639/data, https://polyglotclub.com/ | ||
ISO 639-3 tables. It assigns each language a code and is updated every year. | |||
=== | ==== https://www.unicode.org/iso15924/codelists.html ==== | ||
License | License: https://www.unicode.org/copyright.html | ||
Some of its users: http://www.unicode.org/iso15924/codelists.html | |||
ISO 15924 lists. Codes for scripts. | |||
==== https://www.unece.org/cefact/locode/welcome.html ==== | |||
License: https://www.unece.org/cefact/locode/locode_since1981.html | |||
UN/LOCODE, an alternative to ISO 3166-2. It is updated twice a year. | |||
==== http://www.nationalanthems.info/ ==== | |||
License: http://www.nationalanthems.info/ | |||
National anthems. | |||
== Formats == | |||
=== Sheet === | |||
{| class="wikitable" | |||
!database name with link | |||
!file name | |||
!field separator | |||
! | |||
!field 1 | |||
!field 2 | |||
!field 3 | |||
!field 4 | |||
!field 5 | |||
!field 6 | |||
!field 7 | |||
!field 8 | |||
!field 9 | |||
!field 10 | |||
!field 11 | |||
!field 12 | |||
!field 13 | |||
|- | |||
! colspan="15" |dictionary | |||
! | |||
! | |||
|- | |||
|[https://github.com/tomcumming/tocfl-word-list An ordered and extended TOCFL word-list] | |||
|tocfl.tsv | |||
|<tab> | |||
| | |||
|Word | |||
|Pinyin | |||
|OtherPinyin | |||
|Level | |||
|First Translation | |||
|Other Translation | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[https://cantonese.org/download.html CC-Canto] | |||
|cccanto-webdist.txt | |||
|<space> | |||
| | |||
|Traditional | |||
|Simplified | |||
|[pin1 yin1] | |||
|{jyut6 ping3} | |||
|/English equivalent 1/equivalent 2/ | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[https://cc-cedict.org/editor/editor.php?handler=Download CC-CEDICT] | |||
|cedict_ts.u8 | |||
|<space> | |||
| | |||
|Traditional | |||
|Simplified | |||
|[pin1 yin1] | |||
|/English equivalent 1/equivalent 2/ | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[https://chine.in/mandarin/dictionnaire/CFDICT/ CFDICT] | |||
|CFDICT.u8 | |||
|<space> | |||
| | |||
|Traditionnel | |||
|Simplifié | |||
|[pin1 yin1] | |||
|/traduction 1/traduction2/ | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[https://chdict.zydeo.net/en/download/ CHDICT] | |||
|CHDICT.u8 | |||
|<space> | |||
| | |||
|Tradicionális | |||
|Egyszerűsített | |||
|[pin1 yin1] | |||
|/magyar egyenérték 1/ egyenérték 2 | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[https://github.com/skywind3000/ECDICT ECDICT] | |||
|ecdict.csv | |||
|, | |||
| | |||
|word | |||
|phonetic | |||
|definition | |||
|translation | |||
|pos | |||
|collins | |||
|oxford | |||
|tag | |||
|bnc | |||
|frq | |||
|exchange | |||
|detail | |||
|audio | |||
|- | |||
|[https://github.com/amirshnll/English-Persian-Word-Database English Persian Word Database] | |||
|EnglishPersianWordDatabase.xlsx | |||
| | |||
| | |||
|EnglishWord | |||
|PersianWord | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[http://www.denisowski.org/Esperanto/ESPDIC/espdic_readme.html ESPDIC] | |||
|espdict.txt | |||
| : | |||
| | |||
|Esperanto | |||
|English | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[http://www.handedict.de/chinesisch_deutsch.php?mode=dl&sid=d80e36eefdb05750bd130ae1f322ca09 HanDeDict] | |||
|handedict.u8 | |||
|<space> | |||
| | |||
|Traditionel | |||
|Vereinfacht | |||
|[pin1 yin1] | |||
|/deutsche Entsprechung 1 /Entsprechung 2/ | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[https://github.com/libhangul/libhangul/tree/master/data/hanja libhangul] | |||
|hanja.txt | |||
|<nowiki>:</nowiki> | |||
| | |||
|Hangul | |||
|Hanja | |||
|note | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[http://www.denisowski.org/Interlingua/IEDICT/iedict_readme.html IEDICT] | |||
|iedict.txt | |||
| : | |||
| | |||
|Interlingua | |||
|English | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[https://www.eki.ee/litsents/vaba/dl.cgi?D=ies Inglise-eesti sõnaraamat] | |||
|eestiinglise.txt | |||
|<tab> | |||
| | |||
|eeste | |||
|inglise | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[https://www.tanos.co.uk/jlpt/skills/vocab/ JLPT Vocabulary] | |||
|VocabList.N1.doc | |||
VocabList.N2.doc | |||
VocabList.N3.doc | |||
VocabList.N4.doc | |||
== | VocabList.N5.doc | ||
| | |||
| | |||
|Kanji | |||
|Hiragana | |||
|English | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[https://github.com/garfieldnate/kengdic kengdic] | |||
|kengdic_2011.tsv | |||
|<tab> | |||
| | |||
|wordid | |||
|word | |||
|? | |||
|def | |||
|? | |||
|? | |||
|submitter | |||
|doe | |||
|? | |||
|hanja | |||
|? | |||
|? | |||
| | |||
|- | |||
|[http://www.taiwanesedictionary.org/ The Maryknoll Taiwanese-English Dictionary & English-Taiwanese Dictionary 2013 edition] | |||
|Mkdictionary.xls | |||
| | |||
| | |||
|Sort | |||
|Taiwanese | |||
|Chinese | |||
|English | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[http://www.denisowski.org/Vietnamese/vnedict_readme.htm VNEDICT] | |||
|vnedict.txt | |||
| : | |||
| | |||
|Vietnamese | |||
|English | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
! colspan="15" |word list | |||
| | |||
| | |||
|- | |||
|[https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110503&boardSeq=64217 한국어능력시험 어휘목록] | |||
|토픽 어휘 목록_공개 목록.xlsx | |||
| | |||
| | |||
|수준 | |||
|어휘 | |||
|길잡이말 | |||
|품사 | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[https://lingua.mtsu.edu/chinese-computing/statistics/index.html 古汉语单字字频: Character frequency list of Classical Chinese] | |||
|CharFreq-Classical.xls | |||
| | |||
| | |||
|Serial number; 序号 | |||
|Character; 汉字 | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[https://lingua.mtsu.edu/chinese-computing/statistics/index.html 现代汉语单字字频: Character frequency list of Modern Chinese] | |||
|CharFreq.txt | |||
|<tab> | |||
| | |||
|Serial number; 序号 | |||
|Character; 汉字 | |||
|Individual raw frequency; 频率 | |||
|Cumulative frequency in percentile; 累计频率 | |||
|Pinyin; 拼音 | |||
|English translation; 英文翻译 | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8 通用规范汉字表] | |||
| | |||
| | |||
| | |||
|编号 | |||
|字形 | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8 常用國字標準字體表] | |||
| | |||
| | |||
| | |||
|流水序 | |||
|教育部字號 | |||
|Unicode | |||
|常用字 | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|[http://www.chinesetest.cn/godownload.do 新汉语水平考试(HSK)词汇(2012年修订版)] | |||
|HSK-2012.xls | |||
| | |||
| | |||
|单词(等级) | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
| | |||
|} | |||
=== | ==== Manually convert to TSV ==== | ||
{| class="wikitable" | |||
!file name | |||
!process (on Linux) | |||
|- | |||
|cccanto-webdist.txt | |||
| | |||
# [https://stackoverflow.com/questions/8206280/delete-all-lines-beginning-with-a-from-a-file Delete lines starting with '#']; | |||
# [https://stackoverflow.com/questions/47010412/replace-first-space-on-each-line-by-a-tab Replace the first ' ' in each line with '\t']; | |||
# Replace the first ' [' in each line with '\t'; | |||
# Replace '] {' with '\t'; | |||
# Replace '} /' with '\t'; | |||
# Replace ' # adapted from cc-cedict' with <nowiki>''</nowiki>; | |||
# Replace '/\n' with '\n'; | |||
# Add 'Traditional\tSimplified\tpin1 yin1\tjyut6 ping3\tEnglish equivalent 1/equivalent 2\n' at the beginning; | |||
|- | |||
|cedict_ts.u8 | |||
| | |||
# Delete lines starting with '#'; | |||
# Replace the first ' ' in each line with '\t'; | |||
# Replace the first ' [' in each line with '\t'; | |||
# Replace '] /' with '\t'; | |||
# Replace '/\n' with '\n'; | |||
# Add 'Traditional\tSimplified\tpin1 yin1\tEnglish equivalent 1/equivalent 2\n' at the beginning; | |||
|- | |||
|CharFreq.txt | |||
| | |||
# Delete lines starting with '/'; | |||
# [https://stackoverflow.com/questions/15361632/delete-a-column-with-awk-or-sed Delete fields 3, 4]; | |||
# Add '序列号\t汉字\t拼音\t英文翻译' at the beginning; | |||
|- | |||
|CharFreq-Classical.xls | |||
| | |||
# Delete the first row; | |||
# Delete fields 3, 4; | |||
# Save as TSV file or save as CSV file and select '<tab>' as field separator; | |||
|- | |||
|CHDICT.u8 | |||
| | |||
# Delete lines starting with '#'; | |||
# Replace '\n\n' with '\n'; | |||
# Replace the first ' ' in each line with '\t'; | |||
# Replace the first ' [' in each line with '\t'; | |||
# Replace '] /' with '\t'; | |||
# Replace '/\n' with '\n'; | |||
# Add 'Tradicionális\tEgyszerűsített\tpin1 yin1\tmagyar egyenérték 1/ egyenérték 2\n' at the beginning; | |||
|- | |||
|ecdict.csv | |||
| | |||
# Open with a spreadsheet program; | |||
# Save as TSV file or save as CSV file and select '<tab>' as field separator; | |||
|- | |||
|eestiinglise.txt | |||
| | |||
# Add 'eeste\tinglise\n' at the beginning; | |||
|- | |||
|EnglishPersianWordDatabase.xlsx | |||
| | |||
# Open with a spreadsheet program; | |||
# Save as TSV file or save as CSV file and select '<tab>' as field separator; | |||
|- | |||
|espdict.txt | |||
| | |||
# Delete the line starting with '#'; | |||
# Replace ' : ' with '\t'; | |||
# Add 'Esperanto\tEnglish\n' at the beginning; | |||
|- | |||
|handedict.u8 | |||
| | |||
# Delete lines starting with '#'; | |||
# Replace '\n\n' with '\n'; | |||
# Replace the first ' ' in each line with '\t'; | |||
# Replace the first ' [' in each line with '\t'; | |||
# Replace '] /' with '\t'; | |||
# Replace '/\n' with '\n'; | |||
# Add 'Traditionel\tVereinfacht\tpin1 yin1\tdeutsche Entsprechung 1/Entsprechung 2\n' at the beginning; | |||
|- | |||
|hanja.txt | |||
| | |||
# Delete lines starting with ' #'; | |||
# Replace '<nowiki>:</nowiki>' with '\t'; | |||
# Add 'Hangul\tHanja\tnote' at the beginning; | |||
|- | |||
|HSK-2012.xls | |||
| | |||
# Open with a spreadsheet program; | |||
# Save as TSV file or save as CSV file and select '<tab>' as field separator; | |||
# Open the new file; | |||
# Replace '(' with '\t'; | |||
# Replace ')' with <nowiki>''</nowiki>; | |||
# Add '单词\t等级\n' at the beginning; | |||
|- | |||
|iedict.txt | |||
| | |||
# Delete the line starting with ' #'; | |||
# Replace ' : ' with '\t'; | |||
# Add 'Interlingua\tEnglish\n' at the beginning; | |||
|- | |||
|kengdic_2011.tsv | |||
| | |||
# Delete fields 1, 3, 5, 6, 7, 8, 9, 11, 12; | |||
# Add 'word\tdef\hanja\n' at the beginning; | |||
|- | |||
|Mkdictionary.xls | |||
| | |||
# Open with a spreadsheet program; | |||
# Save as TSV file or save as CSV file and select '<tab>' as field separator; | |||
|- | |||
|tocfl.tsv | |||
| | |||
# Replace '"\t"' with '\t'; | |||
# Replace '"\n"' with '\n'; | |||
# Replace the first '"' with <nowiki>''</nowiki>; | |||
# Replace the last '"' with <nowiki>''</nowiki>; | |||
|- | |||
|vnedict.txt | |||
| | |||
# Delete the line starting with '#'; | |||
# Replace ' : ' with '\t'; | |||
# Add 'Vietnamese\tEnglish\n' at the beginning; | |||
|- | |||
|토픽 어휘 목록_공개 목록.xlsx | |||
| | |||
# Open with a spreadssheet program; | |||
# Save as TSV file or save as CSV file and select '<tab>' as field separator; | |||
# Click on the other tab of sheet; | |||
# Save as TSV file or save as CSV file and select '<tab>' as field separator; | |||
|} | |||
=== Others === | |||
{| class="wikitable" | |||
!database name with link | |||
!format | |||
|- | |||
|[https://freedict.org/downloads/ FreeDict] | |||
|slob | |||
|- | |||
|[http://www.informatik.uni-leipzig.de/~duc/Dict/install.html Free Vietnamese Dictionary Project] | |||
|dict.dz | |||
|- | |||
|[http://www.xobdo.org/downloads/ XOBDO.ORG] | |||
|db | |||
|} | |||
[[Category:Free-Resources]] | |||
[[Category:Computer-Knowledge]] | |||
==Other Lessons== | |||
* [[Language/Multiple-languages/Culture/Wiki-Notice-Board|Wiki Notice Board]] | |||
* [[Language/Multiple-languages/Culture/Cultural-differences-by-country|Cultural differences by country]] | |||
* [[Language/Multiple-languages/Culture/Most-Famous-Non–Contemporary-Artists|Most Famous Non–Contemporary Artists]] | |||
* [[Language/Multiple-languages/Culture/IRFP-in-brief|IRFP in brief]] | |||
* [[Language/Multiple-languages/Culture/Introduction-to-Sci–Tech-Index|Introduction to Sci–Tech Index]] | |||
* [[Language/Multiple-languages/Culture/Online-Specialized-Dictionaries|Online Specialized Dictionaries]] | |||
* [[Language/Multiple-languages/Culture/How-to-contribute-to-wiki-lessons-(FAQ)|How to contribute to wiki lessons (FAQ)]] | |||
* [[Language/Multiple-languages/Culture/Cities-with-the-best-quality-of-life|Cities with the best quality of life]] | |||
* [[Language/Multiple-languages/Culture/Techniques-for-learning-languages|Techniques for learning languages]] | |||
* [[Language/Multiple-languages/Culture/Countries-and-Flag-Emoji-by-Languages|Countries and Flag Emoji by Languages]] | |||
<span links></span> |
Latest revision as of 23:17, 26 March 2023
Hi polyglots! 😀
➡ On this page we have listed free databases related to languages.
- Those mentioned on Internet Dictionaries will not be mentioned again here.
- The listed items are data, so if you don't know programming, this page might not be of much help to you.
Main[edit | edit source]
Multiple languages[edit | edit source]
https://www.ethnologue.com/codes/download-code-tables[edit | edit source]
LanguageCodes.tab lists the 7,400+ distinct language identifiers used in the current Ethnologue database.
https://dumps.wikimedia.org/[edit | edit source]
License: https://dumps.wikimedia.org/legal.html
Some of its users: https://www.wikimedia.org/
Wikimedia.
https://tatoeba.org/eng/downloads/[edit | edit source]
License: https://tatoeba.org/eng/downloads/
Some of its users: https://tatoeba.org/, http://www.listeningpractice.org/, https://jisho.org/
Parallel corpora. In common words, collections about a sentence in different languages.
https://wiki.documentfoundation.org/Language_support_of_LibreOffice[edit | edit source]
License: https://wiki.documentfoundation.org/Language_support_of_LibreOffice
Some of its users: https://www.libreoffice.org/
You can find the “Spell check dictionaries” and other useful things.
http://www.gutenberg.org/wiki/Gutenberg:Information_About_Robot_Access_to_our_Pages[edit | edit source]
License: http://www.gutenberg.org/wiki/Gutenberg:Terms_of_Use
Some of its users: http://www.gutenberg.org/, https://librivox.org/ LibriVox
Ebooks.
https://librivox.org/pages/about-librivox/[edit | edit source]
License: https://librivox.org/pages/about-librivox/
Some of its users: https://librivox.org/, http://www.listeningpractice.org/
Audio books.
http://www.omegawiki.org/Help:Downloading_the_data[edit | edit source]
License: http://www.omegawiki.org/Meta:Main_Page
Some of its users: http://www.omegawiki.org/Meta:Main_Page, http://dictionarymid.sourceforge.net/
Dictionaries.
https://ltrc.iiit.ac.in/onlineServices/Dictionaries/Dict_Frame.html[edit | edit source]
License: https://ltrc.iiit.ac.in/onlineServices/Dictionaries/GPLHelp.html
Dictionaries for South Asian languages and English.
http://compling.hss.ntu.edu.sg/omw/[edit | edit source]
License: http://compling.hss.ntu.edu.sg/omw/
Some of its users: http://compling.hss.ntu.edu.sg/omw/cgi-bin/wn-gridx.cgi?gridmode=grid
Wordnets.
http://www.dicto.org.ru/xdxf.html[edit | edit source]
License: http://dicto.org.ru/license.html
Some of its users: http://dicto.org.ru/
Repository of dictionaries (from elsewhere).
http://shtooka.net/download.php[edit | edit source]
License: http://shtooka.net/
Collections of audio.
https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists[edit | edit source]
License: https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists
Frequency lists.
https://lego.linguistlist.org/about#contact[edit | edit source]
License: https://lego.linguistlist.org/about#copyright
Some of its users: https://lego.linguistlist.org/
Lexicon. No download link on the website.
https://panlex.org/source-list/[edit | edit source]
License: https://panlex.org/license/
Some of its users: https://glosbe.com
Lexical database links.
https://github.com/cburgmer/cjklib[edit | edit source]
License: https://github.com/cburgmer/cjklib/blob/master/COPYING
Some of its users: https://www.skishore.me/makemeahanzi/
Data about Han script.
https://www.radio-browser.info/gui/#!/[edit | edit source]
License: https://www.radio-browser.info/gui/#!/
Some of its users: https://github.com/segler-alex/RadioDroid
Database of radio stations.
https://help.archive.org/hc/en-us/articles/360017781111-How-to-download-files-[edit | edit source]
License: https://www.archive.org/about/terms.php
Some of its users: https://www.archive.org/
Archived Internet content.
https://www.fandom.com/[edit | edit source]
License: https://www.fandom.com/licensing
Fan-made wiki.
Chinese[edit | edit source]
http://lingua.mtsu.edu/chinese-computing/[edit | edit source]
License: http://lingua.mtsu.edu/chinese-computing/copyright.html
Character frequency lists.
https://github.com/gwinterstein/Cifu[edit | edit source]
License: https://github.com/gwinterstein/Cifu/blob/master/LICENSE
Word frequency list for Yue Chinese.
https://www.tanos.co.uk/hsk/[edit | edit source]
License: https://www.tanos.co.uk/jlpt/sharing/
HSK data.
http://www.hskhsk.com/resources.html[edit | edit source]
License: http://www.hskhsk.com/resources.html
HSK data.
https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8[edit | edit source]
License: https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8
Frequent characters.
https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8[edit | edit source]
Frequent characters.
http://input.foruto.com/ccc/gongbiu/index.htm[edit | edit source]
License:
Frequent characters.
English[edit | edit source]
http://gcide.gnu.org.ua/download[edit | edit source]
License: http://gcide.gnu.org.ua/license
Some of its users: http://gcide.gnu.org.ua/
Dictionary of definition.
https://foldoc.org/source.html[edit | edit source]
License: https://foldoc.org/Free+On-line+Dictionary
Some of its users: https://foldoc.org/
Dictionary about computing.
https://github.com/tony-mak/Eng-Chi-Dictionary/tree/master/app/src/main/assets/databases[edit | edit source]
License: https://github.com/tony-mak/Eng-Chi-Dictionary/blob/master/LICENSE
Dictionary.
https://github.com/linuxkathirvel/eng2tamildictionary/blob/master/dictionary.json[edit | edit source]
License: https://github.com/linuxkathirvel/eng2tamildictionary/blob/master/License.txt
Dictionary.
https://github.com/derekchuank/high-frequency-vocabulary[edit | edit source]
License: https://github.com/derekchuank/high-frequency-vocabulary/blob/master/LICENSE
Dictionary.
https://github.com/kujirahand/EJDict/tree/master/src[edit | edit source]
License: https://github.com/kujirahand/EJDict/blob/master/LICENSE
Dictionary.
Hindi[edit | edit source]
http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/downloaderInfo.php[edit | edit source]
License: http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/index.php
Some of its users: http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/dict_search_user.php
Dictionary. Application is required.
Icelandic[edit | edit source]
https://www.ling.upenn.edu/~kurisuto/germanic/oi_cleasbyvigfusson_about.html[edit | edit source]
License: http://lexicon.ff.cuni.cz/txt/oi_cleasbyvigfusson.txt
Dictionary.
Interlingue[edit | edit source]
https://github.com/Carmina16/hunspell-ie[edit | edit source]
License: https://github.com/Carmina16/hunspell-ie/blob/master/LICENSE
Spell checker with dictionary.
Japanese[edit | edit source]
https://github.com/KanjiVG/kanjivg/releases/[edit | edit source]
License: http://kanjivg.tagaini.net/
Some of its users: https://www.tagaini.net/, https://jisho.org/
Kanji strokes.
https://github.com/mifunetoshiro/kanjium[edit | edit source]
License: https://github.com/mifunetoshiro/kanjium/blob/master/LICENSE.txt
Kanji data.
https://www.tanos.co.uk/jlpt/[edit | edit source]
License: https://www.tanos.co.uk/jlpt/sharing/
JLPT data.
https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7[edit | edit source]
License: http://www.bunka.go.jp/bunkacho_homepage/index.html
Frequent characters.
https://ja.wiktionary.org/wiki/Wiktionary:%E4%BA%BA%E5%90%8D%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7[edit | edit source]
License: http://www.moj.go.jp/term.html
Frequent characters for names.
https://ja.wikipedia.org/wiki/%E5%AD%A6%E5%B9%B4%E5%88%A5%E6%BC%A2%E5%AD%97%E9%85%8D%E5%BD%93%E8%A1%A8[edit | edit source]
License: http://www.mext.go.jp/b_menu/about_link.htm
Frequent characters according to school grades.
Korean[edit | edit source]
https://github.com/libhangul/libhangul/blob/master/data/hanja/hanja.txt[edit | edit source]
License: https://github.com/libhangul/libhangul/blob/master/data/hanja/hanja.txt
Words in Hangul and Hanja.
There is a page of introduction: https://wiki.kldp.org/wiki.php/libhangul.
https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%ED%95%9C%EB%AC%B8_%EA%B5%90%EC%9C%A1%EC%9A%A9_%EA%B8%B0%EC%B4%88_%ED%95%9C%EC%9E%90_1800[edit | edit source]
License: http://www.suneung.re.kr/sub/info.do?m=0601&s=suneung
http://www.suneung.re.kr/boardCnts/fileDown.do?fileSeq=59692112e521efa80d2af27916704082 in a easy-to-copy form.
[edit | edit source]
License: https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110702
Word list of TOPIK.
Lithuanian[edit | edit source]
https://github.com/ispell-lt/ispell-lt[edit | edit source]
License: https://github.com/ispell-lt/ispell-lt/blob/master/COPYING
Spell checker with dictionary.
Sanskrit[edit | edit source]
https://github.com/hemanth/sanskrit-dict/blob/master/dict.js[edit | edit source]
License: https://github.com/hemanth/sanskrit-dict/blob/master/license
Dictionary.
Slovak[edit | edit source]
http://sk-spell.sk.cx/hunspell-sk[edit | edit source]
License: http://sk-spell.sk.cx/hunspell-sk
Spell checker with dictionary.
Vietnamese[edit | edit source]
https://github.com/duyetdev/vietnamese-wordlist[edit | edit source]
License: https://github.com/duyetdev/vietnamese-wordlist/blob/master/LICENSE
Word list.
https://github.com/duyetdev/vietnamese-namedb[edit | edit source]
License: https://github.com/duyetdev/vietnamese-namedb/blob/master/LICENSE
Name list.
Non-language[edit | edit source]
https://unicode.org/ucd/[edit | edit source]
License: https://www.unicode.org/copyright.html
Some of its users: https://wiki.gnome.org/action/show/Apps/Gucharmap, http://www.decodeunicode.org/, https://unicode-table.com/en/, https://www.fontspace.com/
Unicode.
https://www.cia.gov/library/publications/download/[edit | edit source]
License: https://www.cia.gov/library/publications/the-world-factbook/docs/contributor_copyright.html
Some of its users: https://www.cia.gov/library/publications/resources/the-world-factbook/
General facts about countries and regions.
https://www.geonames.org/[edit | edit source]
License: https://www.geonames.org/
Gazetteer and postal code data for free.
https://iso639-3.sil.org/code_tables/download_tables/[edit | edit source]
License: https://iso639-3.sil.org/code_tables/download_tables/
Some of its users: https://iso639-3.sil.org/code_tables/639/data, https://polyglotclub.com/
ISO 639-3 tables. It assigns each language a code and is updated every year.
https://www.unicode.org/iso15924/codelists.html[edit | edit source]
License: https://www.unicode.org/copyright.html
Some of its users: http://www.unicode.org/iso15924/codelists.html
ISO 15924 lists. Codes for scripts.
https://www.unece.org/cefact/locode/welcome.html[edit | edit source]
License: https://www.unece.org/cefact/locode/locode_since1981.html
UN/LOCODE, an alternative to ISO 3166-2. It is updated twice a year.
http://www.nationalanthems.info/[edit | edit source]
License: http://www.nationalanthems.info/
National anthems.
Formats[edit | edit source]
Sheet[edit | edit source]
database name with link | file name | field separator | field 1 | field 2 | field 3 | field 4 | field 5 | field 6 | field 7 | field 8 | field 9 | field 10 | field 11 | field 12 | field 13 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
dictionary | ||||||||||||||||
An ordered and extended TOCFL word-list | tocfl.tsv | <tab> | Word | Pinyin | OtherPinyin | Level | First Translation | Other Translation | ||||||||
CC-Canto | cccanto-webdist.txt | <space> | Traditional | Simplified | [pin1 yin1] | {jyut6 ping3} | /English equivalent 1/equivalent 2/ | |||||||||
CC-CEDICT | cedict_ts.u8 | <space> | Traditional | Simplified | [pin1 yin1] | /English equivalent 1/equivalent 2/ | ||||||||||
CFDICT | CFDICT.u8 | <space> | Traditionnel | Simplifié | [pin1 yin1] | /traduction 1/traduction2/ | ||||||||||
CHDICT | CHDICT.u8 | <space> | Tradicionális | Egyszerűsített | [pin1 yin1] | /magyar egyenérték 1/ egyenérték 2 | ||||||||||
ECDICT | ecdict.csv | , | word | phonetic | definition | translation | pos | collins | oxford | tag | bnc | frq | exchange | detail | audio | |
English Persian Word Database | EnglishPersianWordDatabase.xlsx | EnglishWord | PersianWord | |||||||||||||
ESPDIC | espdict.txt | : | Esperanto | English | ||||||||||||
HanDeDict | handedict.u8 | <space> | Traditionel | Vereinfacht | [pin1 yin1] | /deutsche Entsprechung 1 /Entsprechung 2/ | ||||||||||
libhangul | hanja.txt | : | Hangul | Hanja | note | |||||||||||
IEDICT | iedict.txt | : | Interlingua | English | ||||||||||||
Inglise-eesti sõnaraamat | eestiinglise.txt | <tab> | eeste | inglise | ||||||||||||
JLPT Vocabulary | VocabList.N1.doc
VocabList.N2.doc VocabList.N3.doc VocabList.N4.doc VocabList.N5.doc |
Kanji | Hiragana | English | ||||||||||||
kengdic | kengdic_2011.tsv | <tab> | wordid | word | ? | def | ? | ? | submitter | doe | ? | hanja | ? | ? | ||
The Maryknoll Taiwanese-English Dictionary & English-Taiwanese Dictionary 2013 edition | Mkdictionary.xls | Sort | Taiwanese | Chinese | English | |||||||||||
VNEDICT | vnedict.txt | : | Vietnamese | English | ||||||||||||
word list | ||||||||||||||||
한국어능력시험 어휘목록 | 토픽 어휘 목록_공개 목록.xlsx | 수준 | 어휘 | 길잡이말 | 품사 | |||||||||||
古汉语单字字频: Character frequency list of Classical Chinese | CharFreq-Classical.xls | Serial number; 序号 | Character; 汉字 | |||||||||||||
现代汉语单字字频: Character frequency list of Modern Chinese | CharFreq.txt | <tab> | Serial number; 序号 | Character; 汉字 | Individual raw frequency; 频率 | Cumulative frequency in percentile; 累计频率 | Pinyin; 拼音 | English translation; 英文翻译 | ||||||||
通用规范汉字表 | 编号 | 字形 | ||||||||||||||
常用國字標準字體表 | 流水序 | 教育部字號 | Unicode | 常用字 | ||||||||||||
新汉语水平考试(HSK)词汇(2012年修订版) | HSK-2012.xls | 单词(等级) |
Manually convert to TSV[edit | edit source]
file name | process (on Linux) |
---|---|
cccanto-webdist.txt |
|
cedict_ts.u8 |
|
CharFreq.txt |
|
CharFreq-Classical.xls |
|
CHDICT.u8 |
|
ecdict.csv |
|
eestiinglise.txt |
|
EnglishPersianWordDatabase.xlsx |
|
espdict.txt |
|
handedict.u8 |
|
hanja.txt |
|
HSK-2012.xls |
|
iedict.txt |
|
kengdic_2011.tsv |
|
Mkdictionary.xls |
|
tocfl.tsv |
|
vnedict.txt |
|
토픽 어휘 목록_공개 목록.xlsx |
|
Others[edit | edit source]
database name with link | format |
---|---|
FreeDict | slob |
Free Vietnamese Dictionary Project | dict.dz |
XOBDO.ORG | db |
Other Lessons[edit | edit source]
- Wiki Notice Board
- Cultural differences by country
- Most Famous Non–Contemporary Artists
- IRFP in brief
- Introduction to Sci–Tech Index
- Online Specialized Dictionaries
- How to contribute to wiki lessons (FAQ)
- Cities with the best quality of life
- Techniques for learning languages
- Countries and Flag Emoji by Languages