Difference between revisions of "Language/Multiple-languages/Culture/Internet-Vocabularies"

From Polyglot Club WIKI
Jump to navigation Jump to search
 
(39 intermediate revisions by 4 users not shown)
Line 1: Line 1:
[[Category:Free-Resources]]
[[Category:Free-Resources]]
{{Multiple-languages-flag}}
On this page you will find vocabularies to memorise. This page is not to be confused with [[Language/Multiple-languages/Culture/Internet-Dictionaries]]. Here are word lists that possess one of the following features:
* contain frequency or grading information
* no translations, definitions or pronunciations


On this page you will find vocabularies to memorise. This page is not to be confused with [[Language/Multiple-languages/Culture/Internet-Dictionaries]]. Here are word lists that do not have translations, definitions or pronunciations, and programs that apply such word lists.
Computer programs are included, too.
 
They can be made use of by [https://polyglotclub.com/wiki/Language/Multiple-languages/Culture/How-to-make-a-TSV-file#How_to_combine_data_with_same_column_from_two_spreadsheets merging with dictionary meaning data]. This may require [https://polyglotclub.com/wiki/Language/Multiple-languages/Culture/Producing-dictionaries-with-web-scraping web scraping].


In progress.
In progress.


== Common word list ==
Visit https://codeberg.org/GrimPixel/standard-character-lists to download standard lists in TSV format.
 
== Common word/character list ==
Multiple languages https://en.wiktionary.org/wiki/Appendix:Swadesh_lists
Multiple languages https://en.wiktionary.org/wiki/Appendix:Swadesh_lists


Mandarin Chinese https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8
Chinese
* 通用规范汉字表
** https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8
** https://zh.wiktionary.org/zh/Appendix:%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8
* 常用國字標準字體表 https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8


Mandarin Chinese https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8
English https://github.com/HK-SHAO/English-Dictionary/blob/master/word/words.txt


Japanese https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7
Japanese
* 常用漢字
** https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7
** https://joyokanji.info/list.html
** https://kanji.jitenon.jp/cat/joyo.html
** https://kanjitisiki.com/zyouyou/
* 人名用漢字
** https://ja.wiktionary.org/wiki/Wiktionary:%E4%BA%BA%E5%90%8D%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7
** https://joyokanji.info/jinmei.html
** https://kanji.jitenon.jp/cat/jimmei.html
** https://kanjitisiki.com/zinmei/


Japanese https://ja.wiktionary.org/wiki/Wiktionary:%E4%BA%BA%E5%90%8D%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7
Thai https://github.com/nv23/thai-wordlist


Vietnamese https://www.chunom.org/
Vietnamese https://www.chunom.org/
Line 21: Line 43:
Multiple languages https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists
Multiple languages https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists


Chinese https://humanum.arts.cuhk.edu.hk/Lexis/chifreq/
Chinese
* https://humanum.arts.cuhk.edu.hk/Lexis/chifreq/
* https://humanum.arts.cuhk.edu.hk/Lexis/lexi-can/faq.php
* https://lingua.mtsu.edu/chinese-computing/statistics/index.html
* http://technology.chtsai.org/charfreq/


Kannada https://github.com/kakashi/kannada_IN_dictionary
Kannada https://github.com/kakashi/kannada_IN_dictionary


Mandarin Chinese https://lingua.mtsu.edu/chinese-computing/statistics/index.html
Korean 자주 쓰이는 한국어 낱말 https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%EC%9E%90%EC%A3%BC_%EC%93%B0%EC%9D%B4%EB%8A%94_%ED%95%9C%EA%B5%AD%EC%96%B4_%EB%82%B1%EB%A7%90_5800


Mandarin Chinese http://technology.chtsai.org/charfreq/
== Graded list ==
Chinese
* T.O.C.F.L. Word lists https://www.tw.org/tocfl/
* 新汉语水平考试(HSK)词汇(2012年修订版) http://www.chinesetest.cn/godownload.do#list_1


Yue Chinese https://humanum.arts.cuhk.edu.hk/Lexis/lexi-can/faq.php
Japanese
* 学年別漢字配当表 https://ja.wikipedia.org/wiki/%E5%AD%A6%E5%B9%B4%E5%88%A5%E6%BC%A2%E5%AD%97%E9%85%8D%E5%BD%93%E8%A1%A8
* JLPT Vocabulary https://www.tanos.co.uk/jlpt/skills/vocab/


== Graded word list ==
Korean
Japanese https://ja.wikipedia.org/wiki/%E5%AD%A6%E5%B9%B4%E5%88%A5%E6%BC%A2%E5%AD%97%E9%85%8D%E5%BD%93%E8%A1%A8
* 한문 교육용 기초 한자 https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%ED%95%9C%EB%AC%B8_%EA%B5%90%EC%9C%A1%EC%9A%A9_%EA%B8%B0%EC%B4%88_%ED%95%9C%EC%9E%90_1800
* Korean Frequency List -Top 6000 Words https://www.topikguide.com/korean-frequency-list-top-6000-words/


Korean https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%ED%95%9C%EB%AC%B8_%EA%B5%90%EC%9C%A1%EC%9A%A9_%EA%B8%B0%EC%B4%88_%ED%95%9C%EC%9E%90_1800
Russian https://en.openrussian.org/vocab/A1
 
Korean https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110503&boardSeq=64217
 
Mandarin Chinese http://www.chinesetest.cn/godownload.do
 
Mandarin Chinese http://www.tw.org/tocfl/


== Spell checker ==
== Spell checker ==
''Some require knowledge about [http://aspell.net/ GNU Aspell] and [https://hunspell.github.io/ Hunspell]. You can find the list of word in the spell checker's source code.''
''The word lists are in the spell checkers' source code: CWL files in GNU Aspell, can be opened with [https://www.texstudio.org/ TeXstudio]; DIC files in Hunspell, can be opened with a text editor.''


Multiple languages https://addons.mozilla.org/en-US/firefox/language-tools/
Multiple languages
 
* https://addons.mozilla.org/en-US/firefox/language-tools/
Multiple languages https://ftp.gnu.org/gnu/aspell/dict/0index.html
* https://ftp.gnu.org/gnu/aspell/dict/0index.html
 
* https://wiki.documentfoundation.org/Language_support_of_LibreOffice
Multiple languages https://wiki.documentfoundation.org/Language_support_of_LibreOffice


Croatian https://github.com/spideyfusion/elasticsearch-croatian
Croatian https://github.com/spideyfusion/elasticsearch-croatian
Line 57: Line 82:
Kazakh https://github.com/taem/hunspell-kk
Kazakh https://github.com/taem/hunspell-kk


== Undefined-type word list ==
==Other Lessons==
English https://github.com/HK-SHAO/English-Dictionary/blob/master/word/words.txt
* [[Language/Multiple-languages/Culture/Different-ways-to-greet-in-the-world|Different ways to greet in the world]]
 
* [[Language/Multiple-languages/Culture/Texts-and-Audios-under-a-Public-License|Texts and Audios under a Public License]]
Shwe Palaung https://github.com/aungkoman/shwe_dictionary
* [[Language/Multiple-languages/Culture/Producing-dictionaries-with-web-scraping|Producing dictionaries with web scraping]]
 
* [[Language/Multiple-languages/Culture/Websites-with-Multilingual-Articles|Websites with Multilingual Articles]]
Thai https://github.com/nv23/thai-wordlist
* [[Language/Multiple-languages/Culture/How-to-locate-the-origin-of-a-video-or-a-photo|How to locate the origin of a video or a photo]]
* [[Language/Multiple-languages/Culture/Elements-of-Traditional-Architectures:-Eastern-Asia|Elements of Traditional Architectures: Eastern Asia]]
* [[Language/Multiple-languages/Culture/Good-Memories|Good Memories]]
* [[Language/Multiple-languages/Culture/Cities-with-the-best-quality-of-life|Cities with the best quality of life]]
* [[Language/Multiple-languages/Culture/The-Polyglot-Club-Team|The Polyglot Club Team]]
* [[Language/Multiple-languages/Culture/Important-Technologies|Important Technologies]]
<span links></span>

Latest revision as of 17:24, 22 May 2023

Multiple-languages-flag-polyglotclub.jpg

On this page you will find vocabularies to memorise. This page is not to be confused with Language/Multiple-languages/Culture/Internet-Dictionaries. Here are word lists that possess one of the following features:

  • contain frequency or grading information
  • no translations, definitions or pronunciations

Computer programs are included, too.

They can be made use of by merging with dictionary meaning data. This may require web scraping.

In progress.

Visit https://codeberg.org/GrimPixel/standard-character-lists to download standard lists in TSV format.

Common word/character list[edit | edit source]

Multiple languages https://en.wiktionary.org/wiki/Appendix:Swadesh_lists

Chinese

English https://github.com/HK-SHAO/English-Dictionary/blob/master/word/words.txt

Japanese

Thai https://github.com/nv23/thai-wordlist

Vietnamese https://www.chunom.org/

Frequency list[edit | edit source]

Multiple languages https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists

Chinese

Kannada https://github.com/kakashi/kannada_IN_dictionary

Korean 자주 쓰이는 한국어 낱말 https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%EC%9E%90%EC%A3%BC_%EC%93%B0%EC%9D%B4%EB%8A%94_%ED%95%9C%EA%B5%AD%EC%96%B4_%EB%82%B1%EB%A7%90_5800

Graded list[edit | edit source]

Chinese

Japanese

Korean

Russian https://en.openrussian.org/vocab/A1

Spell checker[edit | edit source]

The word lists are in the spell checkers' source code: CWL files in GNU Aspell, can be opened with TeXstudio; DIC files in Hunspell, can be opened with a text editor.

Multiple languages

Croatian https://github.com/spideyfusion/elasticsearch-croatian

Indonesian https://github.com/shuLhan/hunspell-id

Kazakh https://github.com/taem/hunspell-kk

Other Lessons[edit | edit source]