Difference between revisions of "Language/Multiple-languages/Culture/Internet-Vocabularies"

From Polyglot Club WIKI
Jump to navigation Jump to search
 
(35 intermediate revisions by 4 users not shown)
Line 1: Line 1:
[[Category:Free-Resources]]
[[Category:Free-Resources]]
{{Multiple-languages-flag}}
On this page you will find vocabularies to memorise. This page is not to be confused with [[Language/Multiple-languages/Culture/Internet-Dictionaries]]. Here are word lists that possess one of the following features:
* contain frequency or grading information
* no translations, definitions or pronunciations


On this page you will find vocabularies to memorise. This page is not to be confused with [[Language/Multiple-languages/Culture/Internet-Dictionaries]]. Here are word lists that do not have translations, definitions or pronunciations, and programs that apply such word lists.
Computer programs are included, too.
 
They can be made use of by [https://polyglotclub.com/wiki/Language/Multiple-languages/Culture/How-to-make-a-TSV-file#How_to_combine_data_with_same_column_from_two_spreadsheets merging with dictionary meaning data]. This may require [https://polyglotclub.com/wiki/Language/Multiple-languages/Culture/Producing-dictionaries-with-web-scraping web scraping].


In progress.
In progress.


== Common word list ==
Visit https://codeberg.org/GrimPixel/standard-character-lists to download standard lists in TSV format.
 
== Common word/character list ==
Multiple languages https://en.wiktionary.org/wiki/Appendix:Swadesh_lists
Multiple languages https://en.wiktionary.org/wiki/Appendix:Swadesh_lists
Chinese
* 通用规范汉字表
** https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8
** https://zh.wiktionary.org/zh/Appendix:%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8
* 常用國字標準字體表 https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8


English https://github.com/HK-SHAO/English-Dictionary/blob/master/word/words.txt
English https://github.com/HK-SHAO/English-Dictionary/blob/master/word/words.txt


Mandarin Chinese https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8
Japanese
 
* 常用漢字
Mandarin Chinese https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8
** https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7
 
** https://joyokanji.info/list.html
Japanese https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7
** https://kanji.jitenon.jp/cat/joyo.html
 
** https://kanjitisiki.com/zyouyou/
Japanese https://ja.wiktionary.org/wiki/Wiktionary:%E4%BA%BA%E5%90%8D%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7
* 人名用漢字
** https://ja.wiktionary.org/wiki/Wiktionary:%E4%BA%BA%E5%90%8D%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7
** https://joyokanji.info/jinmei.html
** https://kanji.jitenon.jp/cat/jimmei.html
** https://kanjitisiki.com/zinmei/


Thai https://github.com/nv23/thai-wordlist
Thai https://github.com/nv23/thai-wordlist
Line 25: Line 43:
Multiple languages https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists
Multiple languages https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists


Chinese https://humanum.arts.cuhk.edu.hk/Lexis/chifreq/
Chinese
* https://humanum.arts.cuhk.edu.hk/Lexis/chifreq/
* https://humanum.arts.cuhk.edu.hk/Lexis/lexi-can/faq.php
* https://lingua.mtsu.edu/chinese-computing/statistics/index.html
* http://technology.chtsai.org/charfreq/


Kannada https://github.com/kakashi/kannada_IN_dictionary
Kannada https://github.com/kakashi/kannada_IN_dictionary


Mandarin Chinese https://lingua.mtsu.edu/chinese-computing/statistics/index.html
Korean 자주 쓰이는 한국어 낱말 https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%EC%9E%90%EC%A3%BC_%EC%93%B0%EC%9D%B4%EB%8A%94_%ED%95%9C%EA%B5%AD%EC%96%B4_%EB%82%B1%EB%A7%90_5800
 
Mandarin Chinese http://technology.chtsai.org/charfreq/
 
Yue Chinese https://humanum.arts.cuhk.edu.hk/Lexis/lexi-can/faq.php
 
== Graded word list ==
Japanese https://ja.wikipedia.org/wiki/%E5%AD%A6%E5%B9%B4%E5%88%A5%E6%BC%A2%E5%AD%97%E9%85%8D%E5%BD%93%E8%A1%A8


Korean https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%ED%95%9C%EB%AC%B8_%EA%B5%90%EC%9C%A1%EC%9A%A9_%EA%B8%B0%EC%B4%88_%ED%95%9C%EC%9E%90_1800
== Graded list ==
Chinese
* T.O.C.F.L. Word lists https://www.tw.org/tocfl/
* 新汉语水平考试(HSK)词汇(2012年修订版) http://www.chinesetest.cn/godownload.do#list_1


Korean https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110503&boardSeq=64217
Japanese
* 学年別漢字配当表 https://ja.wikipedia.org/wiki/%E5%AD%A6%E5%B9%B4%E5%88%A5%E6%BC%A2%E5%AD%97%E9%85%8D%E5%BD%93%E8%A1%A8
* JLPT Vocabulary https://www.tanos.co.uk/jlpt/skills/vocab/


Mandarin Chinese http://www.chinesetest.cn/godownload.do#list_1
Korean
* 한문 교육용 기초 한자 https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%ED%95%9C%EB%AC%B8_%EA%B5%90%EC%9C%A1%EC%9A%A9_%EA%B8%B0%EC%B4%88_%ED%95%9C%EC%9E%90_1800
* Korean Frequency List -Top 6000 Words https://www.topikguide.com/korean-frequency-list-top-6000-words/


Mandarin Chinese http://www.tw.org/tocfl/
Russian https://en.openrussian.org/vocab/A1


== Spell checker ==
== Spell checker ==
''Some require knowledge about [http://aspell.net/ GNU Aspell] and [https://hunspell.github.io/ Hunspell]. You can find the list of word in the spell checker's source code.''
''The word lists are in the spell checkers' source code: CWL files in GNU Aspell, can be opened with [https://www.texstudio.org/ TeXstudio]; DIC files in Hunspell, can be opened with a text editor.''
 
Multiple languages https://addons.mozilla.org/en-US/firefox/language-tools/


Multiple languages https://ftp.gnu.org/gnu/aspell/dict/0index.html
Multiple languages
 
* https://addons.mozilla.org/en-US/firefox/language-tools/
Multiple languages https://wiki.documentfoundation.org/Language_support_of_LibreOffice
* https://ftp.gnu.org/gnu/aspell/dict/0index.html
* https://wiki.documentfoundation.org/Language_support_of_LibreOffice


Croatian https://github.com/spideyfusion/elasticsearch-croatian
Croatian https://github.com/spideyfusion/elasticsearch-croatian
Line 60: Line 81:


Kazakh https://github.com/taem/hunspell-kk
Kazakh https://github.com/taem/hunspell-kk
==Other Lessons==
* [[Language/Multiple-languages/Culture/Different-ways-to-greet-in-the-world|Different ways to greet in the world]]
* [[Language/Multiple-languages/Culture/Texts-and-Audios-under-a-Public-License|Texts and Audios under a Public License]]
* [[Language/Multiple-languages/Culture/Producing-dictionaries-with-web-scraping|Producing dictionaries with web scraping]]
* [[Language/Multiple-languages/Culture/Websites-with-Multilingual-Articles|Websites with Multilingual Articles]]
* [[Language/Multiple-languages/Culture/How-to-locate-the-origin-of-a-video-or-a-photo|How to locate the origin of a video or a photo]]
* [[Language/Multiple-languages/Culture/Elements-of-Traditional-Architectures:-Eastern-Asia|Elements of Traditional Architectures: Eastern Asia]]
* [[Language/Multiple-languages/Culture/Good-Memories|Good Memories]]
* [[Language/Multiple-languages/Culture/Cities-with-the-best-quality-of-life|Cities with the best quality of life]]
* [[Language/Multiple-languages/Culture/The-Polyglot-Club-Team|The Polyglot Club Team]]
* [[Language/Multiple-languages/Culture/Important-Technologies|Important Technologies]]
<span links></span>

Latest revision as of 17:24, 22 May 2023

Multiple-languages-flag-polyglotclub.jpg

On this page you will find vocabularies to memorise. This page is not to be confused with Language/Multiple-languages/Culture/Internet-Dictionaries. Here are word lists that possess one of the following features:

  • contain frequency or grading information
  • no translations, definitions or pronunciations

Computer programs are included, too.

They can be made use of by merging with dictionary meaning data. This may require web scraping.

In progress.

Visit https://codeberg.org/GrimPixel/standard-character-lists to download standard lists in TSV format.

Common word/character list[edit | edit source]

Multiple languages https://en.wiktionary.org/wiki/Appendix:Swadesh_lists

Chinese

English https://github.com/HK-SHAO/English-Dictionary/blob/master/word/words.txt

Japanese

Thai https://github.com/nv23/thai-wordlist

Vietnamese https://www.chunom.org/

Frequency list[edit | edit source]

Multiple languages https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists

Chinese

Kannada https://github.com/kakashi/kannada_IN_dictionary

Korean 자주 쓰이는 한국어 낱말 https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%EC%9E%90%EC%A3%BC_%EC%93%B0%EC%9D%B4%EB%8A%94_%ED%95%9C%EA%B5%AD%EC%96%B4_%EB%82%B1%EB%A7%90_5800

Graded list[edit | edit source]

Chinese

Japanese

Korean

Russian https://en.openrussian.org/vocab/A1

Spell checker[edit | edit source]

The word lists are in the spell checkers' source code: CWL files in GNU Aspell, can be opened with TeXstudio; DIC files in Hunspell, can be opened with a text editor.

Multiple languages

Croatian https://github.com/spideyfusion/elasticsearch-croatian

Indonesian https://github.com/shuLhan/hunspell-id

Kazakh https://github.com/taem/hunspell-kk

Other Lessons[edit | edit source]