Difference between revisions of "Language/Multiple-languages/Culture/Internet-Vocabularies"
Line 9: | Line 9: | ||
== Common word/character list == | == Common word/character list == | ||
Multiple languages https://en.wiktionary.org/wiki/Appendix:Swadesh_lists | Multiple languages https://en.wiktionary.org/wiki/Appendix:Swadesh_lists | ||
Chinese | |||
* https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8 | |||
* https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8 | |||
English https://github.com/HK-SHAO/English-Dictionary/blob/master/word/words.txt | English https://github.com/HK-SHAO/English-Dictionary/blob/master/word/words.txt | ||
Line 17: | Line 21: | ||
Korean https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110503&boardSeq=64217 | Korean https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110503&boardSeq=64217 | ||
Thai https://github.com/nv23/thai-wordlist | Thai https://github.com/nv23/thai-wordlist | ||
Line 29: | Line 29: | ||
Multiple languages https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists | Multiple languages https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists | ||
Chinese https://humanum.arts.cuhk.edu.hk/Lexis/chifreq/ | Chinese | ||
* https://humanum.arts.cuhk.edu.hk/Lexis/chifreq/ | |||
* https://humanum.arts.cuhk.edu.hk/Lexis/lexi-can/faq.php | |||
* https://lingua.mtsu.edu/chinese-computing/statistics/index.html | * https://lingua.mtsu.edu/chinese-computing/statistics/index.html | ||
* http://technology.chtsai.org/charfreq/ | * http://technology.chtsai.org/charfreq/ | ||
Kannada https://github.com/kakashi/kannada_IN_dictionary | |||
== Graded list == | == Graded list == | ||
Chinese | |||
* http://www.chinesetest.cn/godownload.do#list_1 | |||
* http://www.tw.org/tocfl/ | |||
Japanese https://ja.wikipedia.org/wiki/%E5%AD%A6%E5%B9%B4%E5%88%A5%E6%BC%A2%E5%AD%97%E9%85%8D%E5%BD%93%E8%A1%A8 | Japanese https://ja.wikipedia.org/wiki/%E5%AD%A6%E5%B9%B4%E5%88%A5%E6%BC%A2%E5%AD%97%E9%85%8D%E5%BD%93%E8%A1%A8 | ||
Korean https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%ED%95%9C%EB%AC%B8_%EA%B5%90%EC%9C%A1%EC%9A%A9_%EA%B8%B0%EC%B4%88_%ED%95%9C%EC%9E%90_1800 | Korean https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%ED%95%9C%EB%AC%B8_%EA%B5%90%EC%9C%A1%EC%9A%A9_%EA%B8%B0%EC%B4%88_%ED%95%9C%EC%9E%90_1800 | ||
== Spell checker == | == Spell checker == |
Revision as of 12:10, 17 April 2021
On this page you will find vocabularies to memorise. This page is not to be confused with Language/Multiple-languages/Culture/Internet-Dictionaries. Here are word lists that do not have translations, definitions or pronunciations, and programs that apply such word lists.
They can be made use of by merging with dictionary meaning data. This may require web scraping.
In progress.
Common word/character list
Multiple languages https://en.wiktionary.org/wiki/Appendix:Swadesh_lists
Chinese
- https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8
- https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8
English https://github.com/HK-SHAO/English-Dictionary/blob/master/word/words.txt
Japanese
- https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7
- https://ja.wiktionary.org/wiki/Wiktionary:%E4%BA%BA%E5%90%8D%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7
Korean https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110503&boardSeq=64217
Thai https://github.com/nv23/thai-wordlist
Vietnamese https://www.chunom.org/
Frequency list
Multiple languages https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists
Chinese
- https://humanum.arts.cuhk.edu.hk/Lexis/chifreq/
- https://humanum.arts.cuhk.edu.hk/Lexis/lexi-can/faq.php
- https://lingua.mtsu.edu/chinese-computing/statistics/index.html
- http://technology.chtsai.org/charfreq/
Kannada https://github.com/kakashi/kannada_IN_dictionary
Graded list
Chinese
Spell checker
Some require knowledge about GNU Aspell and Hunspell. The word lists are in the spell checkers' source code (.txt file for Aspell, .dic file for Hunspell).
Multiple languages
- https://addons.mozilla.org/en-US/firefox/language-tools/
- https://ftp.gnu.org/gnu/aspell/dict/0index.html
- https://wiki.documentfoundation.org/Language_support_of_LibreOffice
Croatian https://github.com/spideyfusion/elasticsearch-croatian
Indonesian https://github.com/shuLhan/hunspell-id