Difference between revisions of "Language/Multiple-languages/Culture/Licensed-Free-Databases"

From Polyglot Club WIKI
Jump to navigation Jump to search
m (Quick edit)
 
(21 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Category:Free-Resources]]
<div class="pg_page_title">Licensed-Free Databases Around Languages</div>
[[File:best-licensed-free-databases-polyglotclub.jpg|thumb]]
Hi polyglots! 😀
 
➡ On this page we have listed free databases related to languages.


Hello,
* Those mentioned on [[Language/Multiple-languages/Culture/Internet-Dictionaries|Internet Dictionaries]] will not be mentioned again here.
On this page we have listed free databases providing useful data related to languages (dictionaries, scripts, resources...).


The listed are databases, not applications. That is to say, if you don't know programming, maybe they won't help you so much.
* The listed items are data, so if you don't know programming, this page might not be of much help to you.


== Main ==
== Main ==


=== Multiple languages ===
=== Multiple languages ===
====https://www.ethnologue.com/codes/download-code-tables<nowiki/>====
LanguageCodes.tab lists the 7,400+ distinct language identifiers used in the current Ethnologue database.


==== https://dumps.wikimedia.org/ ====
==== https://dumps.wikimedia.org/ ====
Line 16: Line 21:


Wikimedia.
Wikimedia.
==== https://iate.europa.eu/download-iate/ ====
License: https://iate.europa.eu/download-iate/
Some of its users: https://iate.europa.eu/download-iate/
Terminology dictionary of the EU.


==== https://tatoeba.org/eng/downloads/ ====
==== https://tatoeba.org/eng/downloads/ ====
Line 51: Line 49:


Audio books.
Audio books.
==== https://freedict.org/downloads/ ====
License: https://freedict.org/about/
Some of its users: http://aarddict.org/
Dictionaries.


==== http://www.omegawiki.org/Help:Downloading_the_data ====
==== http://www.omegawiki.org/Help:Downloading_the_data ====
Line 65: Line 56:


Dictionaries.
Dictionaries.
==== http://www.xobdo.org/downloads/ ====
License: http://www.xobdo.org/downloads/
Some of its users: http://www.xobdo.org/
Dictionaries for South Asian languages and English.


==== https://ltrc.iiit.ac.in/onlineServices/Dictionaries/Dict_Frame.html ====
==== https://ltrc.iiit.ac.in/onlineServices/Dictionaries/Dict_Frame.html ====
Line 141: Line 125:


Fan-made wiki.
Fan-made wiki.
=== American Sign Language ===
==== http://www.asl-lex.org/ ====
License: http://www.asl-lex.org/
Lexicon.
=== Burmese ===
==== https://github.com/saturngod/ornagai-V2 ====
License: https://github.com/saturngod/ornagai-V2/blob/master/License
Some of its users: https://www.ornagai.com/#/
Dictionary.
=== Catalan ===
==== http://www.catalandictionary.org/en/search/ ====
License: http://www.catalandictionary.org/en/search/
Dictionary. Font of license is too small.


=== Chinese ===
=== Chinese ===


==== https://resources.publicense.moe.edu.tw/index.html ====
==== http://lingua.mtsu.edu/chinese-computing/ ====
License: https://resources.publicense.moe.edu.tw/index.html
License: http://lingua.mtsu.edu/chinese-computing/copyright.html
 
Some of its users: https://resources.publicense.moe.edu.tw/index.html, https://www.moedict.tw/
 
Dictionaries of ROC Mandarin Chinese written in ROC Mandarin Chinese.
 
==== https://cc-cedict.org/editor/editor.php ====
License: https://cc-cedict.org/wiki/
 
Some of its users: https://www.mdbg.net/chinese/dictionary, https://www.pleco.com/
 
Mandarin-English dictionary.
 
==== https://chine.in/mandarin/dictionnaire/CFDICT/ ====
License: https://chine.in/mandarin/dictionnaire/CFDICT/
 
Some of its users: https://chine.in/, https://www.pleco.com/
 
Mandarin-French dictionary.
 
==== https://handedict.zydeo.net/de/download ====
License: https://handedict.zydeo.net/de/download
 
Some of its users: https://www.pleco.com/
 
Mandarin-German dictionary.
 
==== [https://chdict.zydeo.net/en/download/ https://chdict.zydeo.n].[https://chdict.zydeo.net/en/download/ et/en/download/] ====
License: https://chdict.zydeo.net/en/download/
 
Some of its users: https://chdict.zydeo.net/hu/
 
Mandarin-Hungarian dictionary.
 
==== http://cantonese.org/download.html ====
License: http://cantonese.org/download.html
 
Some of its users: http://cantonese.org/, https://www.pleco.com/
 
Cantonese-English dictionary.
 
==== https://twblg.dict.edu.tw/holodict_new/compile1_6_1.jsp ====
License: https://twblg.dict.edu.tw/holodict_new/compile1_6_1.jsp
 
Some of its users: https://twblg.dict.edu.tw/holodict_new/default.jsp, https://www.moedict.tw/


Taiwanese-Endlish dictionary. It can be requested through email.
Character frequency lists.


==== http://www.taiwanesedictionary.org/ ====
==== https://github.com/gwinterstein/Cifu ====
License: http://www.taiwanesedictionary.org/
License: https://github.com/gwinterstein/Cifu/blob/master/LICENSE


Taiwanese-English dictionary.
Word frequency list for Yue Chinese.
 
==== http://lingua.mtsu.edu/chinese-computing/ ====
License: http://lingua.mtsu.edu/chinese-computing/copyright.html
 
Frequency lists.


==== https://www.tanos.co.uk/hsk/ ====
==== https://www.tanos.co.uk/hsk/ ====
Line 250: Line 162:


Frequent characters.
Frequent characters.
=== Esperanto ===
==== http://reta-vortaro.de/tgz/index.html ====
License: http://reta-vortaro.de/tgz/index.html
Some of its users: http://reta-vortaro.de/, http://www.busydoingnothing.co.uk/prevo/
Dictionary in several languages.
==== http://www.denisowski.org/Esperanto/ESPDIC/espdic_readme.html ====
License: http://www.denisowski.org/Esperanto/ESPDIC/espdic_readme.html
Some of its users: http://www.denisowski.org/Esperanto/ESPDIC/espdic_readme.html
Dictionary.
==== https://komputeko.net/elsxutejo-en.php ====
License: https://komputeko.net/index_en.php
Some of its users: https://komputeko.net/index_en.php
Computer terminology dictionary.
=== German Sign Language ===
==== https://signdict.org/ ====
License: https://signdict.org/about
Some of its users: https://signdict.org/
Dictionary.


=== English ===
=== English ===
Line 298: Line 178:


Dictionary about computing.
Dictionary about computing.
==== https://github.com/skywind3000/ECDICT ====
License: https://github.com/skywind3000/ECDICT/blob/master/LICENSE
Some of its users: https://github.com/program-in-chinese/webextension_english_chinese_dictionary
Dictionary.


==== https://github.com/tony-mak/Eng-Chi-Dictionary/tree/master/app/src/main/assets/databases ====
==== https://github.com/tony-mak/Eng-Chi-Dictionary/tree/master/app/src/main/assets/databases ====
Line 325: Line 198:


Dictionary.
Dictionary.
=== Estonian ===
==== https://www.eki.ee/litsents/ ====
License: https://www.eki.ee/litsents/
Some of its users: http://portaal.eki.ee/sonaraamatud.html
Dictionaries. Actually only 2 are available.
=== German ===
==== https://www.openthesaurus.de/about/download/ ====
License: https://www.openthesaurus.de/about/download/
Some of its users: https://www.openthesaurus.de/about/download/
Thesaurus.


=== Hindi ===
=== Hindi ===
Line 357: Line 212:
==== https://www.ling.upenn.edu/~kurisuto/germanic/oi_cleasbyvigfusson_about.html ====
==== https://www.ling.upenn.edu/~kurisuto/germanic/oi_cleasbyvigfusson_about.html ====
License: http://lexicon.ff.cuni.cz/txt/oi_cleasbyvigfusson.txt
License: http://lexicon.ff.cuni.cz/txt/oi_cleasbyvigfusson.txt
Dictionary.
=== Interlingua ===
==== http://www.denisowski.org/Interlingua/IEDICT/iedict_readme.html ====
License: http://www.denisowski.org/Interlingua/IEDICT/iedict_readme.html
Some of its users: http://www.denisowski.org/Interlingua/IEDICT/iedict_readme.html


Dictionary.
Dictionary.
Line 375: Line 221:


Spell checker with dictionary.
Spell checker with dictionary.
=== Iranian Persian ===
==== https://github.com/amirshnll/English-Persian-Word-Database ====
License: https://github.com/amirshnll/English-Persian-Word-Database/blob/master/LICENSE
Dictionary.


=== Japanese ===
=== Japanese ===
==== http://www.edrdg.org/wiki/index.php/Main_Page ====
License: https://www.edrdg.org/edrdg/licence.html
Some of its users: https://jisho.org/, https://www.tagaini.net/
Japanese dictionaries.


==== https://github.com/KanjiVG/kanjivg/releases/ ====
==== https://github.com/KanjiVG/kanjivg/releases/ ====
Line 398: Line 230:


Kanji strokes.
Kanji strokes.
==== http://dico.fj.free.fr/dico.php ====
License: http://dico.fj.free.fr/copyright.php
Some of its users: http://dico.fj.free.fr/traduction/index.php
Japanese-French dictionary.


==== https://github.com/mifunetoshiro/kanjium ====
==== https://github.com/mifunetoshiro/kanjium ====
Line 430: Line 255:


Frequent characters according to school grades.
Frequent characters according to school grades.
=== Jeju ===
==== https://jeju.go.kr/culture/dialect/dictionary.htm ====
License: https://jeju.go.kr/help/policy/copyright.htm
Some of its users: https://jeju.go.kr/culture/dialect/dictionary.htm
Dictionary.
=== Klingon ===
==== http://klingonska.org/dict/dict.zdb ====
License: http://klingonska.org/dict/
Some of its users: http://klingonska.org/dict/
Dictionary.


=== Korean ===
=== Korean ===
==== https://krdict.korean.go.kr/mainAction ====
License: https://krdict.korean.go.kr/kboardPolicy/copyRightTermsInfo
Some of its users: https://krdict.korean.go.kr/mainAction
Dictionary. Download link is unknown.
==== https://opendict.korean.go.kr/main ====
License: https://opendict.korean.go.kr/service/copyrightPolicy
Some of its users: https://opendict.korean.go.kr/main
Dictionary. Download link is unknown.
==== https://stdict.korean.go.kr/main/main.do ====
License: https://stdict.korean.go.kr/join/copyrightPolicy.do
Some of its users: https://stdict.korean.go.kr/main/main.do
Dictionary. Download link is unknown.
==== https://github.com/garfieldnate/kengdic ====
License: https://github.com/garfieldnate/kengdic
Some if its users: http://www.toktogi.com/
Dictionary.


==== https://github.com/libhangul/libhangul/blob/master/data/hanja/hanja.txt ====
==== https://github.com/libhangul/libhangul/blob/master/data/hanja/hanja.txt ====
Line 495: Line 274:


Word list of TOPIK.
Word list of TOPIK.
==== https://github.com/mhagiwara/cc-kedict ====
License: https://github.com/mhagiwara/cc-kedict
Dictionary.


=== Lithuanian ===
=== Lithuanian ===
Line 507: Line 281:


Spell checker with dictionary.
Spell checker with dictionary.
=== Nepali ===
==== https://github.com/nirooj56/Nepdict ====
License: https://github.com/nirooj56/Nepdict/blob/master/LICENSE
Dictionary.
=== Russian ===
==== https://en.openrussian.org/dictionary ====
License: https://en.openrussian.org/dictionary
Some of its users: https://en.openrussian.org/
Dictionary.


=== Sanskrit ===
=== Sanskrit ===
Line 539: Line 297:


=== Vietnamese ===
=== Vietnamese ===
==== http://www.informatik.uni-leipzig.de/~duc/Dict/install.html ====
License: http://www.informatik.uni-leipzig.de/~duc/Dict/install.html
Some of its users: https://www.informatik.uni-leipzig.de/~duc/Dict/
Dictionaries in several languages.
There is a page of introduction: https://vi.wiktionary.org/wiki/Wiktionary:Ngu%E1%BB%93n_g%E1%BB%91c/FVDP
==== http://www.denisowski.org/Vietnamese/vnedict_readme.htm ====
License: http://www.denisowski.org/Vietnamese/vnedict_readme.htm
Some of its users: http://www.denisowski.org/Vietnamese/vnedict_readme.htm
Dictionary.


==== https://github.com/duyetdev/vietnamese-wordlist ====
==== https://github.com/duyetdev/vietnamese-wordlist ====
Line 1,188: Line 930:
|db
|db
|}
|}
[[Category:Free-Resources]]
[[Category:Computer-Knowledge]]
==Other Lessons==
* [[Language/Multiple-languages/Culture/Wiki-Notice-Board|Wiki Notice Board]]
* [[Language/Multiple-languages/Culture/Cultural-differences-by-country|Cultural differences by country]]
* [[Language/Multiple-languages/Culture/Most-Famous-Non–Contemporary-Artists|Most Famous Non–Contemporary Artists]]
* [[Language/Multiple-languages/Culture/IRFP-in-brief|IRFP in brief]]
* [[Language/Multiple-languages/Culture/Introduction-to-Sci–Tech-Index|Introduction to Sci–Tech Index]]
* [[Language/Multiple-languages/Culture/Online-Specialized-Dictionaries|Online Specialized Dictionaries]]
* [[Language/Multiple-languages/Culture/How-to-contribute-to-wiki-lessons-(FAQ)|How to contribute to wiki lessons (FAQ)]]
* [[Language/Multiple-languages/Culture/Cities-with-the-best-quality-of-life|Cities with the best quality of life]]
* [[Language/Multiple-languages/Culture/Techniques-for-learning-languages|Techniques for learning languages]]
* [[Language/Multiple-languages/Culture/Countries-and-Flag-Emoji-by-Languages|Countries and Flag Emoji by Languages]]
<span links></span>

Latest revision as of 23:17, 26 March 2023

Licensed-Free Databases Around Languages
Best-licensed-free-databases-polyglotclub.jpg

Hi polyglots! 😀

➡ On this page we have listed free databases related to languages.

  • The listed items are data, so if you don't know programming, this page might not be of much help to you.

Main[edit | edit source]

Multiple languages[edit | edit source]

https://www.ethnologue.com/codes/download-code-tables[edit | edit source]

LanguageCodes.tab lists the 7,400+ distinct language identifiers used in the current Ethnologue database.

https://dumps.wikimedia.org/[edit | edit source]

License: https://dumps.wikimedia.org/legal.html

Some of its users: https://www.wikimedia.org/

Wikimedia.

https://tatoeba.org/eng/downloads/[edit | edit source]

License: https://tatoeba.org/eng/downloads/

Some of its users: https://tatoeba.org/, http://www.listeningpractice.org/, https://jisho.org/

Parallel corpora. In common words, collections about a sentence in different languages.

https://wiki.documentfoundation.org/Language_support_of_LibreOffice[edit | edit source]

License: https://wiki.documentfoundation.org/Language_support_of_LibreOffice

Some of its users: https://www.libreoffice.org/

You can find the “Spell check dictionaries” and other useful things.

http://www.gutenberg.org/wiki/Gutenberg:Information_About_Robot_Access_to_our_Pages[edit | edit source]

License: http://www.gutenberg.org/wiki/Gutenberg:Terms_of_Use

Some of its users: http://www.gutenberg.org/, https://librivox.org/ LibriVox

Ebooks.

https://librivox.org/pages/about-librivox/[edit | edit source]

License: https://librivox.org/pages/about-librivox/

Some of its users: https://librivox.org/, http://www.listeningpractice.org/

Audio books.

http://www.omegawiki.org/Help:Downloading_the_data[edit | edit source]

License: http://www.omegawiki.org/Meta:Main_Page

Some of its users: http://www.omegawiki.org/Meta:Main_Page, http://dictionarymid.sourceforge.net/

Dictionaries.

https://ltrc.iiit.ac.in/onlineServices/Dictionaries/Dict_Frame.html[edit | edit source]

License: https://ltrc.iiit.ac.in/onlineServices/Dictionaries/GPLHelp.html

Dictionaries for South Asian languages and English.

http://compling.hss.ntu.edu.sg/omw/[edit | edit source]

License: http://compling.hss.ntu.edu.sg/omw/

Some of its users: http://compling.hss.ntu.edu.sg/omw/cgi-bin/wn-gridx.cgi?gridmode=grid

Wordnets.

http://www.dicto.org.ru/xdxf.html[edit | edit source]

License: http://dicto.org.ru/license.html

Some of its users: http://dicto.org.ru/

Repository of dictionaries (from elsewhere).

http://shtooka.net/download.php[edit | edit source]

License: http://shtooka.net/

Collections of audio.

https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists[edit | edit source]

License: https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists

Frequency lists.

https://lego.linguistlist.org/about#contact[edit | edit source]

License: https://lego.linguistlist.org/about#copyright

Some of its users: https://lego.linguistlist.org/

Lexicon. No download link on the website.

https://panlex.org/source-list/[edit | edit source]

License: https://panlex.org/license/

Some of its users: https://glosbe.com

Lexical database links.

https://github.com/cburgmer/cjklib[edit | edit source]

License: https://github.com/cburgmer/cjklib/blob/master/COPYING

Some of its users: https://www.skishore.me/makemeahanzi/

Data about Han script.

https://www.radio-browser.info/gui/#!/[edit | edit source]

License: https://www.radio-browser.info/gui/#!/

Some of its users: https://github.com/segler-alex/RadioDroid

Database of radio stations.

https://help.archive.org/hc/en-us/articles/360017781111-How-to-download-files-[edit | edit source]

License: https://www.archive.org/about/terms.php

Some of its users: https://www.archive.org/

Archived Internet content.

https://www.fandom.com/[edit | edit source]

License: https://www.fandom.com/licensing

Fan-made wiki.

Chinese[edit | edit source]

http://lingua.mtsu.edu/chinese-computing/[edit | edit source]

License: http://lingua.mtsu.edu/chinese-computing/copyright.html

Character frequency lists.

https://github.com/gwinterstein/Cifu[edit | edit source]

License: https://github.com/gwinterstein/Cifu/blob/master/LICENSE

Word frequency list for Yue Chinese.

https://www.tanos.co.uk/hsk/[edit | edit source]

License: https://www.tanos.co.uk/jlpt/sharing/

HSK data.

http://www.hskhsk.com/resources.html[edit | edit source]

License: http://www.hskhsk.com/resources.html

HSK data.

https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8[edit | edit source]

License: https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8

Frequent characters.

https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8[edit | edit source]

License: https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8

Frequent characters.

http://input.foruto.com/ccc/gongbiu/index.htm[edit | edit source]

License:

Frequent characters.

English[edit | edit source]

http://gcide.gnu.org.ua/download[edit | edit source]

License: http://gcide.gnu.org.ua/license

Some of its users: http://gcide.gnu.org.ua/

Dictionary of definition.

https://foldoc.org/source.html[edit | edit source]

License: https://foldoc.org/Free+On-line+Dictionary

Some of its users: https://foldoc.org/

Dictionary about computing.

https://github.com/tony-mak/Eng-Chi-Dictionary/tree/master/app/src/main/assets/databases[edit | edit source]

License: https://github.com/tony-mak/Eng-Chi-Dictionary/blob/master/LICENSE

Dictionary.

https://github.com/linuxkathirvel/eng2tamildictionary/blob/master/dictionary.json[edit | edit source]

License: https://github.com/linuxkathirvel/eng2tamildictionary/blob/master/License.txt

Dictionary.

https://github.com/derekchuank/high-frequency-vocabulary[edit | edit source]

License: https://github.com/derekchuank/high-frequency-vocabulary/blob/master/LICENSE

Dictionary.

https://github.com/kujirahand/EJDict/tree/master/src[edit | edit source]

License: https://github.com/kujirahand/EJDict/blob/master/LICENSE

Dictionary.

Hindi[edit | edit source]

http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/downloaderInfo.php[edit | edit source]

License: http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/index.php

Some of its users: http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/dict_search_user.php

Dictionary. Application is required.

Icelandic[edit | edit source]

https://www.ling.upenn.edu/~kurisuto/germanic/oi_cleasbyvigfusson_about.html[edit | edit source]

License: http://lexicon.ff.cuni.cz/txt/oi_cleasbyvigfusson.txt

Dictionary.

Interlingue[edit | edit source]

https://github.com/Carmina16/hunspell-ie[edit | edit source]

License: https://github.com/Carmina16/hunspell-ie/blob/master/LICENSE

Spell checker with dictionary.

Japanese[edit | edit source]

https://github.com/KanjiVG/kanjivg/releases/[edit | edit source]

License: http://kanjivg.tagaini.net/

Some of its users: https://www.tagaini.net/, https://jisho.org/

Kanji strokes.

https://github.com/mifunetoshiro/kanjium[edit | edit source]

License: https://github.com/mifunetoshiro/kanjium/blob/master/LICENSE.txt

Kanji data.

https://www.tanos.co.uk/jlpt/[edit | edit source]

License: https://www.tanos.co.uk/jlpt/sharing/

JLPT data.

https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7[edit | edit source]

License: http://www.bunka.go.jp/bunkacho_homepage/index.html

Frequent characters.

https://ja.wiktionary.org/wiki/Wiktionary:%E4%BA%BA%E5%90%8D%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7[edit | edit source]

License: http://www.moj.go.jp/term.html

Frequent characters for names.

https://ja.wikipedia.org/wiki/%E5%AD%A6%E5%B9%B4%E5%88%A5%E6%BC%A2%E5%AD%97%E9%85%8D%E5%BD%93%E8%A1%A8[edit | edit source]

License: http://www.mext.go.jp/b_menu/about_link.htm

Frequent characters according to school grades.

Korean[edit | edit source]

https://github.com/libhangul/libhangul/blob/master/data/hanja/hanja.txt[edit | edit source]

License: https://github.com/libhangul/libhangul/blob/master/data/hanja/hanja.txt

Words in Hangul and Hanja.

There is a page of introduction: https://wiki.kldp.org/wiki.php/libhangul.

https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%ED%95%9C%EB%AC%B8_%EA%B5%90%EC%9C%A1%EC%9A%A9_%EA%B8%B0%EC%B4%88_%ED%95%9C%EC%9E%90_1800[edit | edit source]

License: http://www.suneung.re.kr/sub/info.do?m=0601&s=suneung

http://www.suneung.re.kr/boardCnts/fileDown.do?fileSeq=59692112e521efa80d2af27916704082 in a easy-to-copy form.

https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110503&boardSeq=64217[edit | edit source]

License: https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110702

Word list of TOPIK.

Lithuanian[edit | edit source]

https://github.com/ispell-lt/ispell-lt[edit | edit source]

License: https://github.com/ispell-lt/ispell-lt/blob/master/COPYING

Spell checker with dictionary.

Sanskrit[edit | edit source]

https://github.com/hemanth/sanskrit-dict/blob/master/dict.js[edit | edit source]

License: https://github.com/hemanth/sanskrit-dict/blob/master/license

Dictionary.

Slovak[edit | edit source]

http://sk-spell.sk.cx/hunspell-sk[edit | edit source]

License: http://sk-spell.sk.cx/hunspell-sk

Spell checker with dictionary.

Vietnamese[edit | edit source]

https://github.com/duyetdev/vietnamese-wordlist[edit | edit source]

License: https://github.com/duyetdev/vietnamese-wordlist/blob/master/LICENSE

Word list.

https://github.com/duyetdev/vietnamese-namedb[edit | edit source]

License: https://github.com/duyetdev/vietnamese-namedb/blob/master/LICENSE

Name list.

Non-language[edit | edit source]

https://unicode.org/ucd/[edit | edit source]

License: https://www.unicode.org/copyright.html

Some of its users: https://wiki.gnome.org/action/show/Apps/Gucharmap, http://www.decodeunicode.org/, https://unicode-table.com/en/, https://www.fontspace.com/

Unicode.

https://www.cia.gov/library/publications/download/[edit | edit source]

License: https://www.cia.gov/library/publications/the-world-factbook/docs/contributor_copyright.html

Some of its users: https://www.cia.gov/library/publications/resources/the-world-factbook/

General facts about countries and regions.

https://www.geonames.org/[edit | edit source]

License: https://www.geonames.org/

Gazetteer and postal code data for free.

https://iso639-3.sil.org/code_tables/download_tables/[edit | edit source]

License: https://iso639-3.sil.org/code_tables/download_tables/

Some of its users: https://iso639-3.sil.org/code_tables/639/data, https://polyglotclub.com/

ISO 639-3 tables. It assigns each language a code and is updated every year.

https://www.unicode.org/iso15924/codelists.html[edit | edit source]

License: https://www.unicode.org/copyright.html

Some of its users: http://www.unicode.org/iso15924/codelists.html

ISO 15924 lists. Codes for scripts.

https://www.unece.org/cefact/locode/welcome.html[edit | edit source]

License: https://www.unece.org/cefact/locode/locode_since1981.html

UN/LOCODE, an alternative to ISO 3166-2. It is updated twice a year.

http://www.nationalanthems.info/[edit | edit source]

License: http://www.nationalanthems.info/

National anthems.

Formats[edit | edit source]

Sheet[edit | edit source]

database name with link file name field separator field 1 field 2 field 3 field 4 field 5 field 6 field 7 field 8 field 9 field 10 field 11 field 12 field 13
dictionary
An ordered and extended TOCFL word-list tocfl.tsv <tab> Word Pinyin OtherPinyin Level First Translation Other Translation
CC-Canto cccanto-webdist.txt <space> Traditional Simplified [pin1 yin1] {jyut6 ping3} /English equivalent 1/equivalent 2/
CC-CEDICT cedict_ts.u8 <space> Traditional Simplified [pin1 yin1] /English equivalent 1/equivalent 2/
CFDICT CFDICT.u8 <space> Traditionnel Simplifié [pin1 yin1] /traduction 1/traduction2/
CHDICT CHDICT.u8 <space> Tradicionális Egyszerűsített [pin1 yin1] /magyar egyenérték 1/ egyenérték 2
ECDICT ecdict.csv , word phonetic definition translation pos collins oxford tag bnc frq exchange detail audio
English Persian Word Database EnglishPersianWordDatabase.xlsx EnglishWord PersianWord
ESPDIC espdict.txt : Esperanto English
HanDeDict handedict.u8 <space> Traditionel Vereinfacht [pin1 yin1] /deutsche Entsprechung 1 /Entsprechung 2/
libhangul hanja.txt : Hangul Hanja note
IEDICT iedict.txt : Interlingua English
Inglise-eesti sõnaraamat eestiinglise.txt <tab> eeste inglise
JLPT Vocabulary VocabList.N1.doc

VocabList.N2.doc

VocabList.N3.doc

VocabList.N4.doc

VocabList.N5.doc

Kanji Hiragana English
kengdic kengdic_2011.tsv <tab> wordid word ? def ? ? submitter doe ? hanja ? ?
The Maryknoll Taiwanese-English Dictionary & English-Taiwanese Dictionary 2013 edition Mkdictionary.xls Sort Taiwanese Chinese English
VNEDICT vnedict.txt : Vietnamese English
word list
한국어능력시험 어휘목록 토픽 어휘 목록_공개 목록.xlsx 수준 어휘 길잡이말 품사
古汉语单字字频: Character frequency list of Classical Chinese CharFreq-Classical.xls Serial number; 序号 Character; 汉字
现代汉语单字字频: Character frequency list of Modern Chinese CharFreq.txt <tab> Serial number; 序号 Character; 汉字 Individual raw frequency; 频率 Cumulative frequency in percentile; 累计频率 Pinyin; 拼音 English translation; 英文翻译
通用规范汉字表 编号 字形
常用國字標準字體表 流水序 教育部字號 Unicode 常用字
新汉语水平考试(HSK)词汇(2012年修订版) HSK-2012.xls 单词(等级)

Manually convert to TSV[edit | edit source]

file name process (on Linux)
cccanto-webdist.txt
  1. Delete lines starting with '#';
  2. Replace the first ' ' in each line with '\t';
  3. Replace the first ' [' in each line with '\t';
  4. Replace '] {' with '\t';
  5. Replace '} /' with '\t';
  6. Replace ' # adapted from cc-cedict' with '';
  7. Replace '/\n' with '\n';
  8. Add 'Traditional\tSimplified\tpin1 yin1\tjyut6 ping3\tEnglish equivalent 1/equivalent 2\n' at the beginning;
cedict_ts.u8
  1. Delete lines starting with '#';
  2. Replace the first ' ' in each line with '\t';
  3. Replace the first ' [' in each line with '\t';
  4. Replace '] /' with '\t';
  5. Replace '/\n' with '\n';
  6. Add 'Traditional\tSimplified\tpin1 yin1\tEnglish equivalent 1/equivalent 2\n' at the beginning;
CharFreq.txt
  1. Delete lines starting with '/';
  2. Delete fields 3, 4;
  3. Add '序列号\t汉字\t拼音\t英文翻译' at the beginning;
CharFreq-Classical.xls
  1. Delete the first row;
  2. Delete fields 3, 4;
  3. Save as TSV file or save as CSV file and select '<tab>' as field separator;
CHDICT.u8
  1. Delete lines starting with '#';
  2. Replace '\n\n' with '\n';
  3. Replace the first ' ' in each line with '\t';
  4. Replace the first ' [' in each line with '\t';
  5. Replace '] /' with '\t';
  6. Replace '/\n' with '\n';
  7. Add 'Tradicionális\tEgyszerűsített\tpin1 yin1\tmagyar egyenérték 1/ egyenérték 2\n' at the beginning;
ecdict.csv
  1. Open with a spreadsheet program;
  2. Save as TSV file or save as CSV file and select '<tab>' as field separator;
eestiinglise.txt
  1. Add 'eeste\tinglise\n' at the beginning;
EnglishPersianWordDatabase.xlsx
  1. Open with a spreadsheet program;
  2. Save as TSV file or save as CSV file and select '<tab>' as field separator;
espdict.txt
  1. Delete the line starting with '#';
  2. Replace ' : ' with '\t';
  3. Add 'Esperanto\tEnglish\n' at the beginning;
handedict.u8
  1. Delete lines starting with '#';
  2. Replace '\n\n' with '\n';
  3. Replace the first ' ' in each line with '\t';
  4. Replace the first ' [' in each line with '\t';
  5. Replace '] /' with '\t';
  6. Replace '/\n' with '\n';
  7. Add 'Traditionel\tVereinfacht\tpin1 yin1\tdeutsche Entsprechung 1/Entsprechung 2\n' at the beginning;
hanja.txt
  1. Delete lines starting with ' #';
  2. Replace ':' with '\t';
  3. Add 'Hangul\tHanja\tnote' at the beginning;
HSK-2012.xls
  1. Open with a spreadsheet program;
  2. Save as TSV file or save as CSV file and select '<tab>' as field separator;
  3. Open the new file;
  4. Replace '(' with '\t';
  5. Replace ')' with '';
  6. Add '单词\t等级\n' at the beginning;
iedict.txt
  1. Delete the line starting with ' #';
  2. Replace ' : ' with '\t';
  3. Add 'Interlingua\tEnglish\n' at the beginning;
kengdic_2011.tsv
  1. Delete fields 1, 3, 5, 6, 7, 8, 9, 11, 12;
  2. Add 'word\tdef\hanja\n' at the beginning;
Mkdictionary.xls
  1. Open with a spreadsheet program;
  2. Save as TSV file or save as CSV file and select '<tab>' as field separator;
tocfl.tsv
  1. Replace '"\t"' with '\t';
  2. Replace '"\n"' with '\n';
  3. Replace the first '"' with '';
  4. Replace the last '"' with '';
vnedict.txt
  1. Delete the line starting with '#';
  2. Replace ' : ' with '\t';
  3. Add 'Vietnamese\tEnglish\n' at the beginning;
토픽 어휘 목록_공개 목록.xlsx
  1. Open with a spreadssheet program;
  2. Save as TSV file or save as CSV file and select '<tab>' as field separator;
  3. Click on the other tab of sheet;
  4. Save as TSV file or save as CSV file and select '<tab>' as field separator;

Others[edit | edit source]

database name with link format
FreeDict slob
Free Vietnamese Dictionary Project dict.dz
XOBDO.ORG db

Other Lessons[edit | edit source]