Difference between revisions of "Language/Multiple-languages/Culture/Internet-Vocabularies"

From Polyglot Club WIKI
Jump to navigation Jump to search
Line 9: Line 9:
== Common word/character list ==
== Common word/character list ==
Multiple languages https://en.wiktionary.org/wiki/Appendix:Swadesh_lists
Multiple languages https://en.wiktionary.org/wiki/Appendix:Swadesh_lists
Chinese
* https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8
* https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8


English https://github.com/HK-SHAO/English-Dictionary/blob/master/word/words.txt
English https://github.com/HK-SHAO/English-Dictionary/blob/master/word/words.txt
Line 17: Line 21:


Korean https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110503&boardSeq=64217
Korean https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110503&boardSeq=64217
Mandarin Chinese
* https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8
* https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8


Thai https://github.com/nv23/thai-wordlist
Thai https://github.com/nv23/thai-wordlist
Line 29: Line 29:
Multiple languages https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists
Multiple languages https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists


Chinese https://humanum.arts.cuhk.edu.hk/Lexis/chifreq/
Chinese
 
* https://humanum.arts.cuhk.edu.hk/Lexis/chifreq/
Kannada https://github.com/kakashi/kannada_IN_dictionary
* https://humanum.arts.cuhk.edu.hk/Lexis/lexi-can/faq.php
 
Mandarin Chinese
* https://lingua.mtsu.edu/chinese-computing/statistics/index.html
* https://lingua.mtsu.edu/chinese-computing/statistics/index.html
* http://technology.chtsai.org/charfreq/
* http://technology.chtsai.org/charfreq/


Yue Chinese https://humanum.arts.cuhk.edu.hk/Lexis/lexi-can/faq.php
Kannada https://github.com/kakashi/kannada_IN_dictionary


== Graded list ==
== Graded list ==
Chinese
* http://www.chinesetest.cn/godownload.do#list_1
* http://www.tw.org/tocfl/
Japanese https://ja.wikipedia.org/wiki/%E5%AD%A6%E5%B9%B4%E5%88%A5%E6%BC%A2%E5%AD%97%E9%85%8D%E5%BD%93%E8%A1%A8
Japanese https://ja.wikipedia.org/wiki/%E5%AD%A6%E5%B9%B4%E5%88%A5%E6%BC%A2%E5%AD%97%E9%85%8D%E5%BD%93%E8%A1%A8


Korean https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%ED%95%9C%EB%AC%B8_%EA%B5%90%EC%9C%A1%EC%9A%A9_%EA%B8%B0%EC%B4%88_%ED%95%9C%EC%9E%90_1800
Korean https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%ED%95%9C%EB%AC%B8_%EA%B5%90%EC%9C%A1%EC%9A%A9_%EA%B8%B0%EC%B4%88_%ED%95%9C%EC%9E%90_1800
Mandarin Chinese
* http://www.chinesetest.cn/godownload.do#list_1
* http://www.tw.org/tocfl/


== Spell checker ==
== Spell checker ==

Revision as of 12:10, 17 April 2021


On this page you will find vocabularies to memorise. This page is not to be confused with Language/Multiple-languages/Culture/Internet-Dictionaries. Here are word lists that do not have translations, definitions or pronunciations, and programs that apply such word lists.

They can be made use of by merging with dictionary meaning data. This may require web scraping.

In progress.

Common word/character list

Multiple languages https://en.wiktionary.org/wiki/Appendix:Swadesh_lists

Chinese

English https://github.com/HK-SHAO/English-Dictionary/blob/master/word/words.txt

Japanese

Korean https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110503&boardSeq=64217

Thai https://github.com/nv23/thai-wordlist

Vietnamese https://www.chunom.org/

Frequency list

Multiple languages https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists

Chinese

Kannada https://github.com/kakashi/kannada_IN_dictionary

Graded list

Chinese

Japanese https://ja.wikipedia.org/wiki/%E5%AD%A6%E5%B9%B4%E5%88%A5%E6%BC%A2%E5%AD%97%E9%85%8D%E5%BD%93%E8%A1%A8

Korean https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%ED%95%9C%EB%AC%B8_%EA%B5%90%EC%9C%A1%EC%9A%A9_%EA%B8%B0%EC%B4%88_%ED%95%9C%EC%9E%90_1800

Spell checker

Some require knowledge about GNU Aspell and Hunspell. The word lists are in the spell checkers' source code (.txt file for Aspell, .dic file for Hunspell).

Multiple languages

Croatian https://github.com/spideyfusion/elasticsearch-croatian

Indonesian https://github.com/shuLhan/hunspell-id

Kazakh https://github.com/taem/hunspell-kk