Difference between revisions of "Language/Multiple-languages/Culture/Licensed-Free-Databases"

From Polyglot Club WIKI
Jump to navigation Jump to search
(20 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:Free-Resources]]
[[Category:Free-Resources]]
The listed are databases, not applications. That is to say, if you don't know programming, maybe they won't help you so much.
On this page we have listed free language databases (organized collection of data related to languages).
 
The listed items are data sources, not sofwares able to use this data (like database-management systems). Therefore if you don't know programming, this page might not be of much help to you.


== Main ==
== Main ==


=== Multiple languages ===
=== Multiple languages ===
====https://www.ethnologue.com/codes/download-code-tables<nowiki/>====
LanguageCodes.tab lists the 7,400+ distinct language identifiers used in the current Ethnologue database.


==== https://dumps.wikimedia.org/ ====
==== https://dumps.wikimedia.org/ ====
Line 132: Line 136:


Archived Internet content.
Archived Internet content.
==== https://www.fandom.com/ ====
License: https://www.fandom.com/licensing
Fan-made wiki.


=== American Sign Language ===
=== American Sign Language ===
Line 299: Line 308:
==== https://github.com/tony-mak/Eng-Chi-Dictionary/tree/master/app/src/main/assets/databases ====
==== https://github.com/tony-mak/Eng-Chi-Dictionary/tree/master/app/src/main/assets/databases ====
License: https://github.com/tony-mak/Eng-Chi-Dictionary/blob/master/LICENSE
License: https://github.com/tony-mak/Eng-Chi-Dictionary/blob/master/LICENSE
Dictionary.
==== https://github.com/linuxkathirvel/eng2tamildictionary/blob/master/dictionary.json ====
License: https://github.com/linuxkathirvel/eng2tamildictionary/blob/master/License.txt
Dictionary.
==== https://github.com/derekchuank/high-frequency-vocabulary ====
License: https://github.com/derekchuank/high-frequency-vocabulary/blob/master/LICENSE
Dictionary.
==== https://github.com/kujirahand/EJDict/tree/master/src ====
License: https://github.com/kujirahand/EJDict/blob/master/LICENSE


Dictionary.
Dictionary.
Line 471: Line 495:


Word list of TOPIK.
Word list of TOPIK.
==== https://github.com/mhagiwara/cc-kedict ====
License: https://github.com/mhagiwara/cc-kedict
Dictionary.


=== Lithuanian ===
=== Lithuanian ===
Line 603: Line 632:
!field 12
!field 12
!field 13
!field 13
|-
! colspan="15" |dictionary
!
!
|-
|-
|[https://github.com/tomcumming/tocfl-word-list An ordered and extended TOCFL word-list]
|[https://github.com/tomcumming/tocfl-word-list An ordered and extended TOCFL word-list]
Line 896: Line 929:
|
|
|
|
|
|
|-
! colspan="15" |word list
|
|
|
|
Line 1,089: Line 1,126:
# Delete lines starting with ' #';
# Delete lines starting with ' #';
# Replace '<nowiki>:</nowiki>' with '\t';
# Replace '<nowiki>:</nowiki>' with '\t';
# Add 'Hangul\tHanja\note' at the beginning;
# Add 'Hangul\tHanja\tnote' at the beginning;
|-
|-
|HSK-2012.xls
|HSK-2012.xls
Line 1,136: Line 1,173:
# Save as TSV file or save as CSV file and select '<tab>' as field separator;
# Save as TSV file or save as CSV file and select '<tab>' as field separator;
|}
|}
How to merge data with common column: https://www.youtube.com/watch?v=VmanL-Vf8Eg


=== Others ===
=== Others ===
Line 1,152: Line 1,188:
|db
|db
|}
|}
Open-source programs for viewing databases: https://sqlitebrowser.org/, https://sqlitestudio.pl/

Revision as of 13:13, 19 July 2020

On this page we have listed free language databases (organized collection of data related to languages).

The listed items are data sources, not sofwares able to use this data (like database-management systems). Therefore if you don't know programming, this page might not be of much help to you.

Main

Multiple languages

https://www.ethnologue.com/codes/download-code-tables

LanguageCodes.tab lists the 7,400+ distinct language identifiers used in the current Ethnologue database.

https://dumps.wikimedia.org/

License: https://dumps.wikimedia.org/legal.html

Some of its users: https://www.wikimedia.org/

Wikimedia.

https://iate.europa.eu/download-iate/

License: https://iate.europa.eu/download-iate/

Some of its users: https://iate.europa.eu/download-iate/

Terminology dictionary of the EU.

https://tatoeba.org/eng/downloads/

License: https://tatoeba.org/eng/downloads/

Some of its users: https://tatoeba.org/, http://www.listeningpractice.org/, https://jisho.org/

Parallel corpora. In common words, collections about a sentence in different languages.

https://wiki.documentfoundation.org/Language_support_of_LibreOffice

License: https://wiki.documentfoundation.org/Language_support_of_LibreOffice

Some of its users: https://www.libreoffice.org/

You can find the “Spell check dictionaries” and other useful things.

http://www.gutenberg.org/wiki/Gutenberg:Information_About_Robot_Access_to_our_Pages

License: http://www.gutenberg.org/wiki/Gutenberg:Terms_of_Use

Some of its users: http://www.gutenberg.org/, https://librivox.org/ LibriVox

Ebooks.

https://librivox.org/pages/about-librivox/

License: https://librivox.org/pages/about-librivox/

Some of its users: https://librivox.org/, http://www.listeningpractice.org/

Audio books.

https://freedict.org/downloads/

License: https://freedict.org/about/

Some of its users: http://aarddict.org/

Dictionaries.

http://www.omegawiki.org/Help:Downloading_the_data

License: http://www.omegawiki.org/Meta:Main_Page

Some of its users: http://www.omegawiki.org/Meta:Main_Page, http://dictionarymid.sourceforge.net/

Dictionaries.

http://www.xobdo.org/downloads/

License: http://www.xobdo.org/downloads/

Some of its users: http://www.xobdo.org/

Dictionaries for South Asian languages and English.

https://ltrc.iiit.ac.in/onlineServices/Dictionaries/Dict_Frame.html

License: https://ltrc.iiit.ac.in/onlineServices/Dictionaries/GPLHelp.html

Dictionaries for South Asian languages and English.

http://compling.hss.ntu.edu.sg/omw/

License: http://compling.hss.ntu.edu.sg/omw/

Some of its users: http://compling.hss.ntu.edu.sg/omw/cgi-bin/wn-gridx.cgi?gridmode=grid

Wordnets.

http://www.dicto.org.ru/xdxf.html

License: http://dicto.org.ru/license.html

Some of its users: http://dicto.org.ru/

Repository of dictionaries (from elsewhere).

http://shtooka.net/download.php

License: http://shtooka.net/

Collections of audio.

https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists

License: https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists

Frequency lists.

https://lego.linguistlist.org/about#contact

License: https://lego.linguistlist.org/about#copyright

Some of its users: https://lego.linguistlist.org/

Lexicon. No download link on the website.

https://panlex.org/source-list/

License: https://panlex.org/license/

Some of its users: https://glosbe.com

Lexical database links.

https://github.com/cburgmer/cjklib

License: https://github.com/cburgmer/cjklib/blob/master/COPYING

Some of its users: https://www.skishore.me/makemeahanzi/

Data about Han script.

https://www.radio-browser.info/gui/#!/

License: https://www.radio-browser.info/gui/#!/

Some of its users: https://github.com/segler-alex/RadioDroid

Database of radio stations.

https://help.archive.org/hc/en-us/articles/360017781111-How-to-download-files-

License: https://www.archive.org/about/terms.php

Some of its users: https://www.archive.org/

Archived Internet content.

https://www.fandom.com/

License: https://www.fandom.com/licensing

Fan-made wiki.

American Sign Language

http://www.asl-lex.org/

License: http://www.asl-lex.org/

Lexicon.

Burmese

https://github.com/saturngod/ornagai-V2

License: https://github.com/saturngod/ornagai-V2/blob/master/License

Some of its users: https://www.ornagai.com/#/

Dictionary.

Catalan

http://www.catalandictionary.org/en/search/

License: http://www.catalandictionary.org/en/search/

Dictionary. Font of license is too small.

Chinese

https://resources.publicense.moe.edu.tw/index.html

License: https://resources.publicense.moe.edu.tw/index.html

Some of its users: https://resources.publicense.moe.edu.tw/index.html, https://www.moedict.tw/

Dictionaries of ROC Mandarin Chinese written in ROC Mandarin Chinese.

https://cc-cedict.org/editor/editor.php

License: https://cc-cedict.org/wiki/

Some of its users: https://www.mdbg.net/chinese/dictionary, https://www.pleco.com/

Mandarin-English dictionary.

https://chine.in/mandarin/dictionnaire/CFDICT/

License: https://chine.in/mandarin/dictionnaire/CFDICT/

Some of its users: https://chine.in/, https://www.pleco.com/

Mandarin-French dictionary.

https://handedict.zydeo.net/de/download

License: https://handedict.zydeo.net/de/download

Some of its users: https://www.pleco.com/

Mandarin-German dictionary.

https://chdict.zydeo.n.et/en/download/

License: https://chdict.zydeo.net/en/download/

Some of its users: https://chdict.zydeo.net/hu/

Mandarin-Hungarian dictionary.

http://cantonese.org/download.html

License: http://cantonese.org/download.html

Some of its users: http://cantonese.org/, https://www.pleco.com/

Cantonese-English dictionary.

https://twblg.dict.edu.tw/holodict_new/compile1_6_1.jsp

License: https://twblg.dict.edu.tw/holodict_new/compile1_6_1.jsp

Some of its users: https://twblg.dict.edu.tw/holodict_new/default.jsp, https://www.moedict.tw/

Taiwanese-Endlish dictionary. It can be requested through email.

http://www.taiwanesedictionary.org/

License: http://www.taiwanesedictionary.org/

Taiwanese-English dictionary.

http://lingua.mtsu.edu/chinese-computing/

License: http://lingua.mtsu.edu/chinese-computing/copyright.html

Frequency lists.

https://www.tanos.co.uk/hsk/

License: https://www.tanos.co.uk/jlpt/sharing/

HSK data.

http://www.hskhsk.com/resources.html

License: http://www.hskhsk.com/resources.html

HSK data.

https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8

License: https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8

Frequent characters.

https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8

License: https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8

Frequent characters.

http://input.foruto.com/ccc/gongbiu/index.htm

License:

Frequent characters.

Esperanto

http://reta-vortaro.de/tgz/index.html

License: http://reta-vortaro.de/tgz/index.html

Some of its users: http://reta-vortaro.de/, http://www.busydoingnothing.co.uk/prevo/

Dictionary in several languages.

http://www.denisowski.org/Esperanto/ESPDIC/espdic_readme.html

License: http://www.denisowski.org/Esperanto/ESPDIC/espdic_readme.html

Some of its users: http://www.denisowski.org/Esperanto/ESPDIC/espdic_readme.html

Dictionary.

https://komputeko.net/elsxutejo-en.php

License: https://komputeko.net/index_en.php

Some of its users: https://komputeko.net/index_en.php

Computer terminology dictionary.

German Sign Language

https://signdict.org/

License: https://signdict.org/about

Some of its users: https://signdict.org/

Dictionary.

English

http://gcide.gnu.org.ua/download

License: http://gcide.gnu.org.ua/license

Some of its users: http://gcide.gnu.org.ua/

Dictionary of definition.

https://foldoc.org/source.html

License: https://foldoc.org/Free+On-line+Dictionary

Some of its users: https://foldoc.org/

Dictionary about computing.

https://github.com/skywind3000/ECDICT

License: https://github.com/skywind3000/ECDICT/blob/master/LICENSE

Some of its users: https://github.com/program-in-chinese/webextension_english_chinese_dictionary

Dictionary.

https://github.com/tony-mak/Eng-Chi-Dictionary/tree/master/app/src/main/assets/databases

License: https://github.com/tony-mak/Eng-Chi-Dictionary/blob/master/LICENSE

Dictionary.

https://github.com/linuxkathirvel/eng2tamildictionary/blob/master/dictionary.json

License: https://github.com/linuxkathirvel/eng2tamildictionary/blob/master/License.txt

Dictionary.

https://github.com/derekchuank/high-frequency-vocabulary

License: https://github.com/derekchuank/high-frequency-vocabulary/blob/master/LICENSE

Dictionary.

https://github.com/kujirahand/EJDict/tree/master/src

License: https://github.com/kujirahand/EJDict/blob/master/LICENSE

Dictionary.

Estonian

https://www.eki.ee/litsents/

License: https://www.eki.ee/litsents/

Some of its users: http://portaal.eki.ee/sonaraamatud.html

Dictionaries. Actually only 2 are available.

German

https://www.openthesaurus.de/about/download/

License: https://www.openthesaurus.de/about/download/

Some of its users: https://www.openthesaurus.de/about/download/

Thesaurus.

Hindi

http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/downloaderInfo.php

License: http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/index.php

Some of its users: http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/dict_search_user.php

Dictionary. Application is required.

Icelandic

https://www.ling.upenn.edu/~kurisuto/germanic/oi_cleasbyvigfusson_about.html

License: http://lexicon.ff.cuni.cz/txt/oi_cleasbyvigfusson.txt

Dictionary.

Interlingua

http://www.denisowski.org/Interlingua/IEDICT/iedict_readme.html

License: http://www.denisowski.org/Interlingua/IEDICT/iedict_readme.html

Some of its users: http://www.denisowski.org/Interlingua/IEDICT/iedict_readme.html

Dictionary.

Interlingue

https://github.com/Carmina16/hunspell-ie

License: https://github.com/Carmina16/hunspell-ie/blob/master/LICENSE

Spell checker with dictionary.

Iranian Persian

https://github.com/amirshnll/English-Persian-Word-Database

License: https://github.com/amirshnll/English-Persian-Word-Database/blob/master/LICENSE

Dictionary.

Japanese

http://www.edrdg.org/wiki/index.php/Main_Page

License: https://www.edrdg.org/edrdg/licence.html

Some of its users: https://jisho.org/, https://www.tagaini.net/

Japanese dictionaries.

https://github.com/KanjiVG/kanjivg/releases/

License: http://kanjivg.tagaini.net/

Some of its users: https://www.tagaini.net/, https://jisho.org/

Kanji strokes.

http://dico.fj.free.fr/dico.php

License: http://dico.fj.free.fr/copyright.php

Some of its users: http://dico.fj.free.fr/traduction/index.php

Japanese-French dictionary.

https://github.com/mifunetoshiro/kanjium

License: https://github.com/mifunetoshiro/kanjium/blob/master/LICENSE.txt

Kanji data.

https://www.tanos.co.uk/jlpt/

License: https://www.tanos.co.uk/jlpt/sharing/

JLPT data.

https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7

License: http://www.bunka.go.jp/bunkacho_homepage/index.html

Frequent characters.

https://ja.wiktionary.org/wiki/Wiktionary:%E4%BA%BA%E5%90%8D%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7

License: http://www.moj.go.jp/term.html

Frequent characters for names.

https://ja.wikipedia.org/wiki/%E5%AD%A6%E5%B9%B4%E5%88%A5%E6%BC%A2%E5%AD%97%E9%85%8D%E5%BD%93%E8%A1%A8

License: http://www.mext.go.jp/b_menu/about_link.htm

Frequent characters according to school grades.

Jeju

https://jeju.go.kr/culture/dialect/dictionary.htm

License: https://jeju.go.kr/help/policy/copyright.htm

Some of its users: https://jeju.go.kr/culture/dialect/dictionary.htm

Dictionary.

Klingon

http://klingonska.org/dict/dict.zdb

License: http://klingonska.org/dict/

Some of its users: http://klingonska.org/dict/

Dictionary.

Korean

https://krdict.korean.go.kr/mainAction

License: https://krdict.korean.go.kr/kboardPolicy/copyRightTermsInfo

Some of its users: https://krdict.korean.go.kr/mainAction

Dictionary. Download link is unknown.

https://opendict.korean.go.kr/main

License: https://opendict.korean.go.kr/service/copyrightPolicy

Some of its users: https://opendict.korean.go.kr/main

Dictionary. Download link is unknown.

https://stdict.korean.go.kr/main/main.do

License: https://stdict.korean.go.kr/join/copyrightPolicy.do

Some of its users: https://stdict.korean.go.kr/main/main.do

Dictionary. Download link is unknown.

https://github.com/garfieldnate/kengdic

License: https://github.com/garfieldnate/kengdic

Some if its users: http://www.toktogi.com/

Dictionary.

https://github.com/libhangul/libhangul/blob/master/data/hanja/hanja.txt

License: https://github.com/libhangul/libhangul/blob/master/data/hanja/hanja.txt

Words in Hangul and Hanja.

There is a page of introduction: https://wiki.kldp.org/wiki.php/libhangul.

https://ko.wiktionary.org/wiki/%EB%B6%80%EB%A1%9D:%ED%95%9C%EB%AC%B8_%EA%B5%90%EC%9C%A1%EC%9A%A9_%EA%B8%B0%EC%B4%88_%ED%95%9C%EC%9E%90_1800

License: http://www.suneung.re.kr/sub/info.do?m=0601&s=suneung

http://www.suneung.re.kr/boardCnts/fileDown.do?fileSeq=59692112e521efa80d2af27916704082 in a easy-to-copy form.

https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110503&boardSeq=64217

License: https://www.topik.go.kr/usr/cmm/subLocation.do?menuSeq=2110702

Word list of TOPIK.

https://github.com/mhagiwara/cc-kedict

License: https://github.com/mhagiwara/cc-kedict

Dictionary.

Lithuanian

https://github.com/ispell-lt/ispell-lt

License: https://github.com/ispell-lt/ispell-lt/blob/master/COPYING

Spell checker with dictionary.

Nepali

https://github.com/nirooj56/Nepdict

License: https://github.com/nirooj56/Nepdict/blob/master/LICENSE

Dictionary.

Russian

https://en.openrussian.org/dictionary

License: https://en.openrussian.org/dictionary

Some of its users: https://en.openrussian.org/

Dictionary.

Sanskrit

https://github.com/hemanth/sanskrit-dict/blob/master/dict.js

License: https://github.com/hemanth/sanskrit-dict/blob/master/license

Dictionary.

Slovak

http://sk-spell.sk.cx/hunspell-sk

License: http://sk-spell.sk.cx/hunspell-sk

Spell checker with dictionary.

Vietnamese

http://www.informatik.uni-leipzig.de/~duc/Dict/install.html

License: http://www.informatik.uni-leipzig.de/~duc/Dict/install.html

Some of its users: https://www.informatik.uni-leipzig.de/~duc/Dict/

Dictionaries in several languages.

There is a page of introduction: https://vi.wiktionary.org/wiki/Wiktionary:Ngu%E1%BB%93n_g%E1%BB%91c/FVDP

http://www.denisowski.org/Vietnamese/vnedict_readme.htm

License: http://www.denisowski.org/Vietnamese/vnedict_readme.htm

Some of its users: http://www.denisowski.org/Vietnamese/vnedict_readme.htm

Dictionary.

https://github.com/duyetdev/vietnamese-wordlist

License: https://github.com/duyetdev/vietnamese-wordlist/blob/master/LICENSE

Word list.

https://github.com/duyetdev/vietnamese-namedb

License: https://github.com/duyetdev/vietnamese-namedb/blob/master/LICENSE

Name list.

Non-language

https://unicode.org/ucd/

License: https://www.unicode.org/copyright.html

Some of its users: https://wiki.gnome.org/action/show/Apps/Gucharmap, http://www.decodeunicode.org/, https://unicode-table.com/en/, https://www.fontspace.com/

Unicode.

https://www.cia.gov/library/publications/download/

License: https://www.cia.gov/library/publications/the-world-factbook/docs/contributor_copyright.html

Some of its users: https://www.cia.gov/library/publications/resources/the-world-factbook/

General facts about countries and regions.

https://www.geonames.org/

License: https://www.geonames.org/

Gazetteer and postal code data for free.

https://iso639-3.sil.org/code_tables/download_tables/

License: https://iso639-3.sil.org/code_tables/download_tables/

Some of its users: https://iso639-3.sil.org/code_tables/639/data, https://polyglotclub.com/

ISO 639-3 tables. It assigns each language a code and is updated every year.

https://www.unicode.org/iso15924/codelists.html

License: https://www.unicode.org/copyright.html

Some of its users: http://www.unicode.org/iso15924/codelists.html

ISO 15924 lists. Codes for scripts.

https://www.unece.org/cefact/locode/welcome.html

License: https://www.unece.org/cefact/locode/locode_since1981.html

UN/LOCODE, an alternative to ISO 3166-2. It is updated twice a year.

http://www.nationalanthems.info/

License: http://www.nationalanthems.info/

National anthems.

Formats

Sheet

database name with link file name field separator field 1 field 2 field 3 field 4 field 5 field 6 field 7 field 8 field 9 field 10 field 11 field 12 field 13
dictionary
An ordered and extended TOCFL word-list tocfl.tsv <tab> Word Pinyin OtherPinyin Level First Translation Other Translation
CC-Canto cccanto-webdist.txt <space> Traditional Simplified [pin1 yin1] {jyut6 ping3} /English equivalent 1/equivalent 2/
CC-CEDICT cedict_ts.u8 <space> Traditional Simplified [pin1 yin1] /English equivalent 1/equivalent 2/
CFDICT CFDICT.u8 <space> Traditionnel Simplifié [pin1 yin1] /traduction 1/traduction2/
CHDICT CHDICT.u8 <space> Tradicionális Egyszerűsített [pin1 yin1] /magyar egyenérték 1/ egyenérték 2
ECDICT ecdict.csv , word phonetic definition translation pos collins oxford tag bnc frq exchange detail audio
English Persian Word Database EnglishPersianWordDatabase.xlsx EnglishWord PersianWord
ESPDIC espdict.txt : Esperanto English
HanDeDict handedict.u8 <space> Traditionel Vereinfacht [pin1 yin1] /deutsche Entsprechung 1 /Entsprechung 2/
libhangul hanja.txt : Hangul Hanja note
IEDICT iedict.txt : Interlingua English
Inglise-eesti sõnaraamat eestiinglise.txt <tab> eeste inglise
JLPT Vocabulary VocabList.N1.doc

VocabList.N2.doc

VocabList.N3.doc

VocabList.N4.doc

VocabList.N5.doc

Kanji Hiragana English
kengdic kengdic_2011.tsv <tab> wordid word ? def ? ? submitter doe ? hanja ? ?
The Maryknoll Taiwanese-English Dictionary & English-Taiwanese Dictionary 2013 edition Mkdictionary.xls Sort Taiwanese Chinese English
VNEDICT vnedict.txt : Vietnamese English
word list
한국어능력시험 어휘목록 토픽 어휘 목록_공개 목록.xlsx 수준 어휘 길잡이말 품사
古汉语单字字频: Character frequency list of Classical Chinese CharFreq-Classical.xls Serial number; 序号 Character; 汉字
现代汉语单字字频: Character frequency list of Modern Chinese CharFreq.txt <tab> Serial number; 序号 Character; 汉字 Individual raw frequency; 频率 Cumulative frequency in percentile; 累计频率 Pinyin; 拼音 English translation; 英文翻译
通用规范汉字表 编号 字形
常用國字標準字體表 流水序 教育部字號 Unicode 常用字
新汉语水平考试(HSK)词汇(2012年修订版) HSK-2012.xls 单词(等级)

Manually convert to TSV

file name process (on Linux)
cccanto-webdist.txt
  1. Delete lines starting with '#';
  2. Replace the first ' ' in each line with '\t';
  3. Replace the first ' [' in each line with '\t';
  4. Replace '] {' with '\t';
  5. Replace '} /' with '\t';
  6. Replace ' # adapted from cc-cedict' with '';
  7. Replace '/\n' with '\n';
  8. Add 'Traditional\tSimplified\tpin1 yin1\tjyut6 ping3\tEnglish equivalent 1/equivalent 2\n' at the beginning;
cedict_ts.u8
  1. Delete lines starting with '#';
  2. Replace the first ' ' in each line with '\t';
  3. Replace the first ' [' in each line with '\t';
  4. Replace '] /' with '\t';
  5. Replace '/\n' with '\n';
  6. Add 'Traditional\tSimplified\tpin1 yin1\tEnglish equivalent 1/equivalent 2\n' at the beginning;
CharFreq.txt
  1. Delete lines starting with '/';
  2. Delete fields 3, 4;
  3. Add '序列号\t汉字\t拼音\t英文翻译' at the beginning;
CharFreq-Classical.xls
  1. Delete the first row;
  2. Delete fields 3, 4;
  3. Save as TSV file or save as CSV file and select '<tab>' as field separator;
CHDICT.u8
  1. Delete lines starting with '#';
  2. Replace '\n\n' with '\n';
  3. Replace the first ' ' in each line with '\t';
  4. Replace the first ' [' in each line with '\t';
  5. Replace '] /' with '\t';
  6. Replace '/\n' with '\n';
  7. Add 'Tradicionális\tEgyszerűsített\tpin1 yin1\tmagyar egyenérték 1/ egyenérték 2\n' at the beginning;
ecdict.csv
  1. Open with a spreadsheet program;
  2. Save as TSV file or save as CSV file and select '<tab>' as field separator;
eestiinglise.txt
  1. Add 'eeste\tinglise\n' at the beginning;
EnglishPersianWordDatabase.xlsx
  1. Open with a spreadsheet program;
  2. Save as TSV file or save as CSV file and select '<tab>' as field separator;
espdict.txt
  1. Delete the line starting with '#';
  2. Replace ' : ' with '\t';
  3. Add 'Esperanto\tEnglish\n' at the beginning;
handedict.u8
  1. Delete lines starting with '#';
  2. Replace '\n\n' with '\n';
  3. Replace the first ' ' in each line with '\t';
  4. Replace the first ' [' in each line with '\t';
  5. Replace '] /' with '\t';
  6. Replace '/\n' with '\n';
  7. Add 'Traditionel\tVereinfacht\tpin1 yin1\tdeutsche Entsprechung 1/Entsprechung 2\n' at the beginning;
hanja.txt
  1. Delete lines starting with ' #';
  2. Replace ':' with '\t';
  3. Add 'Hangul\tHanja\tnote' at the beginning;
HSK-2012.xls
  1. Open with a spreadsheet program;
  2. Save as TSV file or save as CSV file and select '<tab>' as field separator;
  3. Open the new file;
  4. Replace '(' with '\t';
  5. Replace ')' with '';
  6. Add '单词\t等级\n' at the beginning;
iedict.txt
  1. Delete the line starting with ' #';
  2. Replace ' : ' with '\t';
  3. Add 'Interlingua\tEnglish\n' at the beginning;
kengdic_2011.tsv
  1. Delete fields 1, 3, 5, 6, 7, 8, 9, 11, 12;
  2. Add 'word\tdef\hanja\n' at the beginning;
Mkdictionary.xls
  1. Open with a spreadsheet program;
  2. Save as TSV file or save as CSV file and select '<tab>' as field separator;
tocfl.tsv
  1. Replace '"\t"' with '\t';
  2. Replace '"\n"' with '\n';
  3. Replace the first '"' with '';
  4. Replace the last '"' with '';
vnedict.txt
  1. Delete the line starting with '#';
  2. Replace ' : ' with '\t';
  3. Add 'Vietnamese\tEnglish\n' at the beginning;
토픽 어휘 목록_공개 목록.xlsx
  1. Open with a spreadssheet program;
  2. Save as TSV file or save as CSV file and select '<tab>' as field separator;
  3. Click on the other tab of sheet;
  4. Save as TSV file or save as CSV file and select '<tab>' as field separator;

Others

database name with link format
FreeDict slob
Free Vietnamese Dictionary Project dict.dz
XOBDO.ORG db