Difference between revisions of "Language/Multiple-languages/Culture/Numeration-Tutorial"

From Polyglot Club WIKI
Jump to navigation Jump to search
Line 222: Line 222:
I had a debate with myself: should the large archmagnitude by Conway-Guy and long scale in Chinese be supported? I thought that there are written rules for them, they exist, so they should be supported. Then I thought that as nearly no one is using them, they are practically not existing. I chose the latter. Maybe I would add support for them later, when I couldn't support more languages due to lack of references.
I had a debate with myself: should the large archmagnitude by Conway-Guy and long scale in Chinese be supported? I thought that there are written rules for them, they exist, so they should be supported. Then I thought that as nearly no one is using them, they are practically not existing. I chose the latter. Maybe I would add support for them later, when I couldn't support more languages due to lack of references.


Planned next languages to be supported before creating GUI: Spanish, Russian, Italian, Standard Arabic.
Planned next languages to be supported: Spanish, Russian, Italian, Standard Arabic.





Revision as of 11:50, 5 May 2022

Finally, the first version of the Numeration tool is released! The current version will allow you to practise Chinese, English, Esperanto, French, German, Japanese, Lojban numerals! More languages to come...

Numeration20220306.png

If you are interested in providing hard-to-find information, please leave a comment.

What is the Numeration tool?

The Numeration project is for practising conversion of numbers from symbols to writing systems, written in Python and YAML.


Address: https://codeberg.org/GrimPixel/Numeration


NB: It would be ridiculous to post this tutorial on such a website, because this tutorial's target readers are non-programmers, so it is too detailed for programmers.

What are Python and YAML and why choose them?

Python

Python is one of the most favoured languages.

Why not Rust or TypeScript, but Python? Well, because Python codes are very easy to maintain, Python also has some unique useful features including f-string and negative indexing. In addition, Python has a lot of scientific libraries and is more relevant to my main area of study.

YAML

YAML is one of the most useful data serialization formats.

And why not JSON or TOML, but YAML? Because YAML has a high readability and is very easy to maintain, at least no need to type soo many quotation marks.

By the way, XML is not that horrible as described in that linked page. That is a joke with escape characters.

How to get started with Numeration?

  1. Download the latest Python (or theoretically any Python 3 version) and install it, so you can run Python programs. For Linux users, the latest version can be installed following this article.
  2. Download Python's YAML parser, so that the Python interpreter can understand YAML. If you are using Windows, tap WinKey, input "cmd", tap Enter, so the command line is opened. Copy the line starting with "pip" in that webpage, paste into the command line window, tap Enter. For users of other operating systems, you should already have known how to open the command line of your system, I suppose. Sakura is recommended because it supports Unicode characters well and is lightweight.
  3. Download required packages for text-to-speech:
    • Online Google Translate TTS: gTTS
    • Offline TTS: pyttsx3 (yet to be implemented in Numeration)
    • Audio player: playsound2

They are all open-source.


If a newer version of Python is available and you want to use that version, you need to install packages like ruamel.yaml for the new version again. This is how Python works: users can have different versions with different packages and settings.

After doing this, go to that Address above (little punishment for those who don't read from beginning). Download the source code and extract the compressed file.

Use command line to open 'numerate.py': For Windows, there is an article about this; for users of other operating systems, you should have known how to do this. Type 'python numerate.py', it will run in the command line.

There are two setting files:

  • 'setting.yaml': details of the program, can be changed following the comment, including language selection.
  • 'rule/setting/0.yaml': details of writings in languages, can be changed following the comment.

If you make a copy of 'rule/setting/0.yaml' and change its name, like “qiu.yaml”, and change the settings as you wish, then you can run the program, input the language setting like “yueqiu”, then the Yue Chinese setting in 'rule/setting/qiu.yaml' will be applied.

Numeration20220306.png

“nano” is a text editor for Linux; I used it to edit the settings.

You can practise numbers with Google Translate TTS: in 'setting.yaml', change “mode” to “practising”, then run the program, select language setting, you will hear the TTS if supported; tap “Enter” to see the number in Western Arabic, tap “Enter” to see the writing.

Why choose Apache 2.0?

I want to let people use this even commercially, e.g. in language-learning games like Influent. Thus I need a permissive license. You may ask why not public domain. There are some countries where people cannot put their work in public domain, None of the current 3 public domain licenses is friendly to commercial use.

MIT is good, but there is a problem: it doesn't explain how to deal with patents and trade marks. This allows patent trolls to sue those who use the code commercially without knowing that it contains patent. When the MIT license came into being, there were no many software patent issues, so the license can be called outdated.

A guy named Lawrence E. Rosen created the Academic Free License to deal with this issue and this idea was incorporated into Apache 2.0, a better alternative.

Explain the code?

The main program is “numerate.py”. It reads your selected language codes, read the language settings (rule/library/setting.yaml), then read your subject (mathematical numeral, ordinal number, nominal number, date and time) selection number, then read the notation, convert it to Western Arabic, then call the language's “.py” file and show the result, finally ask you what to do next (do again from a step or end). This is going to be changed: everything before inputting the numeral will be settled in 'setting.yaml' first.

In the language “.py” files, everything is in a function “do”, so it can be called from another file (numerate.py) easily. They read the notation and the language setting as part of the input of the function “do”, then read the corresponding lexicon YAML file. Then the notation will be processed according to different notation types.


There are something to be noticed like the magnitude and archmagnitude.

Take the cardinal number 987654321 as an example:

  • digit grouping: 3 (European)
9 8 7 6 5 4 3 2 1
magnitude 2 1 0 2 1 0 2 1 0
archmagnitude 2 2 2 1 1 1 0 0 0
  • digit grouping: 4 (East Asian)
9 8 7 6 5 4 3 2 1
magnitude 0 3 2 1 0 3 2 1 0
archmagnitude 2 1 1 1 1 0 0 0 0
  • digit grouping: [3, 2] (South Asian)
9 8 7 6 5 4 3 2 1
magnitude 1 0 1 0 1 0 2 1 0
archmagnitude 3 3 2 2 1 1 0 0 0

What are the differences from num2words?

  • Purpose: num2words is simply a tool for outputting the text; Numeration is for educational use with a lot of configurations, so you can see different ways of speaking the number in a language.
  • Structure: num2words is done by a bunch of people, so the code styles for languages are different one another; Numeration has merely single (unmarried) contributor, so they are in one style: lots of little functions with self-explaining names and little to no comments.
  • Content: num2words has cardinal number, ordinal number, year, currency, and support 32 languages (no Chinese or Esperanto) since 2019; Numeration can do integer, fraction, mixed number, decimal, percentage, scientific notation, ordinal number, nominal number, date and time, and support the languages mentioned above and is counting, and TTS for practice.
  • Activity: num2words releases is staled in spite of some recent commits, and in its “issue” section, a lot of problems are left unresolved as the contributor of that language goes inactive; Numeration's development is unfinished, updated irregularly, had serious errors, and is free from bugs now, as I know.

If I want to add a language, what should I know?

There are several places you need to visit to make sure it works:

  • 'numerate.py'
  • 'setting.yaml'
  • 'rule/setting/0.yaml'
  • 'rule/library/language_decode.yaml'
  • 'rule/library/language_encode.yaml'
  • 'about/language.yaml'

Just pick a language that is similar to the language you want to add, copy and paste, do some modifications. This will suffice.

Why are the instructions in the program all imperative, instead of being polite?

Imperative is great for simplicity. What's more, there is no sincerity in automatic messages, so I decided not to be courteous in there. My sincerity is elsewhere: Apache 2.0.

What means that mosaic?

It's the ideograph 𠬞, graphically meaning “two hands”. Details at CUHK.

What's the future plan?

I need to

  • add support for more numeral systems (enough were done for now)
  • add support for more languages, in which languages being mostly learned and mostly used are given the priority, then official languages.
  • create a GUI with BeeWare. For iOS users, it will not be available in the app store, Xcode will be required to run it, but I can't guarantee that it works.

I need to accumulate experience through this project, then I can create other ones.

I had a debate with myself: should the large archmagnitude by Conway-Guy and long scale in Chinese be supported? I thought that there are written rules for them, they exist, so they should be supported. Then I thought that as nearly no one is using them, they are practically not existing. I chose the latter. Maybe I would add support for them later, when I couldn't support more languages due to lack of references.

Planned next languages to be supported: Spanish, Russian, Italian, Standard Arabic.


AUTHOR

GrimPixel