Editing Language/Multiple-languages/Culture/How-to-make-a-TSV-file
Jump to navigation
Jump to search
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
(under construction) | |||
You may have visited [[Language/Multiple-languages/Culture/Internet-Dictionaries] | == Introduction to TSV == | ||
You may have visited [[Language/Multiple-languages/Culture/Internet-Dictionaries]] and want to utilise some downloadable stuff, create flashcards on Anki or Mnemosyne. But it requires a lot of efforts if you copy-paste entry by entry. If we can use a spreadsheet, things will be much easier. Can we do that? | |||
You may have noticed that both programs have “File -> Import...” option. But they don't support XLS or XLSX files. What should you do? | You may have noticed that both programs have “File -> Import...” option. But they don't support XLS or XLSX files. What should you do? | ||
If you have opened a spreadsheet program (e.g. [https://www.libreoffice.org/ LibreOffice] Calc, [https://www.openoffice.org/ Apache OpenOffice] Calc, [https://www.onlyoffice.com/ ONLYOFFICE] Spreadsheet Editor, [https://www.office.com/ Microsoft Office] Excel), and click on “File -> Save As...”, you can see some other formats to choose from, one of which is “CSV”. | |||
If you have opened a spreadsheet program (e.g. [https://www.libreoffice.org/ LibreOffice] Calc, [https://www.openoffice.org/ Apache OpenOffice] Calc, [https://www.onlyoffice.com/ ONLYOFFICE] Spreadsheet Editor, [https://www.office.com/ Microsoft Office] Excel) and click on “File -> Save As...”, you can see some other formats to choose from, one of which is “CSV”. | |||
“[ | “[[wikipedia:Comma-separated_values|CSV]]” means “Comma-separated values”. It uses commas to separate columns. If you have a comma in the text, then it uses quotation marks to quote your text, so the comma won't be counted as a column separator. If you have quotation marks in your text, then another pair of quotation marks will be used. This is an example: https://github.com/skywind3000/ECDICT/blob/master/ecdict.mini.csv. | ||
You may have realised that a CSV file doesn't store any styling data. If you save as a CSV file, all the information about fonts, colours, hyperlinks, etc. will be lost. CSV files are lightweight, so when you just need pure data, this format is ideal. Do Anki and Mnemosyne support it? | You may have realised that a CSV file doesn't store any styling data. If you save as a CSV file, all the information about fonts, colours, hyperlinks, etc. will be lost. CSV files are lightweight, so when you just need pure data, this format is ideal. Do Anki and Mnemosyne support it? | ||
No, but its sibling TSV is supported. In Anki, it is called “Text separated by tabs or semicolons”; in Mnemosyne, it is called “Tab-separated text files”. What is it? | |||
“[ | “[[TSV]]” means “Tab-separated values”. It is similar with CSV, and has an advantage over CSV: it uses “tabs” to separate columns, so there is no need to use quotation marks to indicate commas as text instead of column separators. Both “TSV” and “CSV” belong to “[[wikipedia:Delimiter-separated_values|DSV]]”, delimiter-separated values. | ||
You may wonder what a “tab” means. [ | You may wonder what a “tab” means. [[wikipedia:Tab_key|The tabular key]] is the key above the “Caps Lock” key on your keyboard (in most cases). It is used for making a table easily for typewriters and is inherited by computers. When you use a spreadsheet program, you can press the Tab key to move to the next column or the Enter key to move to the next row. TSV files also use these two keys to separate columns and rows. It is more ideal than CSV. This is an example: https://www.eki.ee/litsents/vaba/ies/eestiinglise.txt. | ||
How to save as TSV file? This is a bit confusing | How to save as TSV file? This is a bit confusing. In that “Save As...”, you need to select CSV, then in the dialogue box, choose {Tab} as “Field delimiter” and ignore “String delimiter”. This is because TSV is not so well-known as CSV. The file you save has “CSV” as its file extension, but it' actually a TSV file. | ||
== How to convert to TSV == | == How to convert to TSV == | ||
=== Spreadsheet formats === | |||
In XLS format or what else, you just need to open it and save as TSV. | |||
Example: https://github.com/amirshnll/English-Persian-Word-Database | |||
=== Sheets in other document formats === | |||
If it is a sheet in DOC, PDF or other document formats, then you can select the first several characters in the table and scroll to the bottom of the table, press a Shift key, then select the last character in the table. Copy and paste to a spreadsheet program, save as TSV. | If it is a sheet in DOC, PDF or other document formats, then you can select the first several characters in the table and scroll to the bottom of the table, press a Shift key, then select the last character in the table. Copy and paste to a spreadsheet program, save as TSV. | ||
But this can cause problems sometimes: all content are stuffed in the first cell. In this case, you can use open-source tools [https://tabula.technology/ Tabula] | But this can cause problems sometimes: all content are stuffed in the first cell. In this case, you can use the open-source tools [https://tabula.technology/ Tabula] or [https://excalibur-py.readthedocs.io/en/master/ Excalibur] to do this work. But they may be not always reliable. If they have failed, you can try online services. Some online services have page limits, then you need open-source tools [https://pdfsam.org/ PDFsam], [http://angusj.com/pdftkb/ PDFTK Builder], [https://sourceforge.net/projects/pdfshuffler/ PDF-Shuffler] or something else to split PDF files. | ||
Example: | |||
=== Custom sheet format === | |||
=== DB format === | |||
Some people use DB format. You need open-source tools [https://sqlitebrowser.org/ DB Browser for SQLite], [https://sqlitestudio.pl/ SQLiteStudio] or something else to open it. If you use DB Browser for SQLite, open the DB file, see which tables it contains, select “File -> Export -> Table(s) as CSV file”, select tables you want to export, make sure “Field separator” is “Tab”. | |||
Example: http://www.xobdo.org/downloads/ | |||