Difference between revisions of "Language/Multiple-languages/Culture/How-to-make-a-TSV-file"

From Polyglot Club WIKI
Jump to navigation Jump to search
Line 32: Line 32:
=== DB format ===
=== DB format ===
Some people use DB format. You need open-source tools [https://sqlitebrowser.org/ DB Browser for SQLite], [https://sqlitestudio.pl/ SQLiteStudio] or something else to open it. If you use DB Browser for SQLite, open the DB file, see which tables it contains, select “File -> Export -> Table(s) as CSV file”, select tables you want to export, make sure “Field separator” is “Tab”.
Some people use DB format. You need open-source tools [https://sqlitebrowser.org/ DB Browser for SQLite], [https://sqlitestudio.pl/ SQLiteStudio] or something else to open it. If you use DB Browser for SQLite, open the DB file, see which tables it contains, select “File -> Export -> Table(s) as CSV file”, select tables you want to export, make sure “Field separator” is “Tab”.
=== Custom sheet format ===

Revision as of 18:18, 26 June 2020

(under construction)

Introduction to TSV

You may have visited Language/Multiple-languages/Culture/Internet-Dictionaries and want to utilise some downloadable stuff, create flashcards on Anki or Mnemosyne. But it requires a lot of efforts if you copy-paste entry by entry. If we can use a spreadsheet, things will be much easier. Can we do that?

You may have noticed that both programs have “File -> Import...” option. But they don't support XLS or XLSX files. What should you do?

If you have opened a spreadsheet program (e.g. LibreOffice Calc, Apache OpenOffice Calc, ONLYOFFICE Spreadsheet Editor, Microsoft Office Excel), and click on “File -> Save As...”, you can see some other formats to choose from, one of which is “CSV”.

CSV” means “Comma-separated values”. It uses commas to separate columns. If you have a comma in the text, then it uses quotation marks to quote your text, so the comma won't be counted as a column separator. If you have quotation marks in your text, then another pair of quotation marks will be used. This is an example: https://github.com/skywind3000/ECDICT/blob/master/ecdict.mini.csv.

You may have realised that a CSV file doesn't store any styling data. If you save as a CSV file, all the information about fonts, colours, hyperlinks, etc. will be lost. CSV files are lightweight, so when you just need pure data, this format is ideal. Do Anki and Mnemosyne support it?

No, but its sibling TSV is supported. In Anki, it is called “Text separated by tabs or semicolons”; in Mnemosyne, it is called “Tab-separated text files”. What is it?

“TSV” means “Tab-separated values”. It is similar with CSV, and has an advantage over CSV: it uses “tabs” to separate columns, so there is no need to use quotation marks to indicate commas as text instead of column separators.

You may wonder what a “tab” means. The tabular key is the key above the “Caps Lock” key on your keyboard (in most cases). It is used for making a table easily for typewriters and is inherited by computers. When you use a spreadsheet program, you can press the Tab key to move to the next column or the Enter key to move to the next row. TSV files also use these two keys to separate columns and rows. It is more ideal than CSV. This is an example: https://www.eki.ee/litsents/vaba/ies/eestiinglise.txt.

How to save as TSV file? This is a bit confusing. In that “Save As...”, you need to select CSV, then in the dialogue box, choose {Tab} as “Field delimiter” and ignore “String delimiter”. This is because TSV is not so well-known as CSV. The file you save has “CSV” as its file extension, but it' actually a TSV file.

How to convert to TSV

Spreadsheet formats

In XLS format or what else, you just need to open it and save as TSV.

Sheets in other document formats

If it is a sheet in DOC, PDF or other document formats, then you can select the first several characters in the table and scroll to the bottom of the table, press a Shift key, then select the last character in the table. Copy and paste to a spreadsheet program, save as TSV.

But this can cause problems sometimes: all content are stuffed in the first cell. In this case, you can use the open-source tools Tabula or Excalibur to do this work. But they may be not always reliable. If they have failed, you can try online services. Some online services have page limits, then you need open-source tools PDFsam, PDFTK Builder, PDF-Shuffler or something else to split PDF files.

DB format

Some people use DB format. You need open-source tools DB Browser for SQLite, SQLiteStudio or something else to open it. If you use DB Browser for SQLite, open the DB file, see which tables it contains, select “File -> Export -> Table(s) as CSV file”, select tables you want to export, make sure “Field separator” is “Tab”.