Language/Multiple-languages/Culture/Numeration-Tutorial

From Polyglot Club WIKI
Jump to navigation Jump to search
This lesson can still be improved. EDIT IT NOW! & become VIP
Rate this lesson:
5.00
(one vote)

Numeration20220514.png

Finally, the first version of the Numeration tool is released! The current version will allow you to practise Chinese, English, Esperanto, French, German, Japanese, Spanish numerals! More languages to come...

If you are interested in providing hard-to-find information, please leave a comment.

Don't hesitate to look into these other pages after completing this lesson: How to locate the origin of a video or a photo & Philosophical and Religious Texts.

What is the Numeration tool?[edit | edit source]

The Numeration project is for practising conversion of numbers from symbols to writing systems, written in Python and YAML.

Address: https://codeberg.org/GrimPixel/Numeration

NB: It would be ridiculous to post this tutorial on such a website, because this tutorial's target readers are non-programmers, so it is too detailed for programmers.

What are Python and YAML, and why choose them?[edit | edit source]

Python[edit | edit source]

Python is one of the most favoured languages.

Why not Rust or TypeScript, but Python? Well, because Python codes are very easy to maintain, Python also has some unique useful features including f-string and negative indexing. In addition, Python has a lot of scientific libraries and is more relevant to my main area of study.

YAML[edit | edit source]

YAML is one of the most useful data serialization formats.

And why not JSON or TOML, but YAML? Because YAML has a high readability and is very easy to maintain, at least no need to type soo many quotation marks.

By the way, XML is not that horrible, as described in that linked page. That is a joke with escape characters.

How to get started with Numeration?[edit | edit source]

  1. Download the latest Python (or theoretically any Python 3 version) and install it, so you can run Python programs. For Linux users, the latest version can be installed following this article.
  2. Download Python's YAML parser, so that the Python interpreter can understand YAML. If you are using Windows, tap WinKey, input "cmd", tap Enter, so the command line is opened. Copy the line starting with "pip" in that webpage, paste into the command line window, tap Enter. For users of other operating systems, you should already have known how to open the command line of your system, I suppose. For Linux users, Sakura is recommended because it supports Unicode characters well and is lightweight; for Windows users, ConEmu is recommended.
  3. Download required packages for text-to-speech:
    • Online Google Translate TTS: gTTS
    • Offline TTS: pyttsx3 (yet to be implemented in Numeration)
    • Audio player: playsound2

They are all open-source.

If a newer version of Python is available, and you want to use that version, you need to install packages like ruamel.yaml for the new version again. This is how Python works: users can have different versions with different packages and settings.

After doing this, go to that Address above (little punishment for those who don't read from the beginning). Download the source code and extract the compressed file.

Use command line to open 'numerate.py': For Windows, there is an article about this; for users of other operating systems, you should have known how to do this. Type 'python numerate.py', it will run in the command line.

There are two setting files:

  • 'setting.yaml': details of the program, can be changed following the comment, including language selection.
  • 'rule/setting/0.yaml': details of writings in languages, can be changed following the comment.

If you make a copy of 'rule/setting/0.yaml' and change its name, like “qiu.yaml”, and change the settings as you wish, then you can run the program, input the language setting like “yueqiu”, then the Yue Chinese setting in 'rule/setting/qiu.yaml' will be applied.

Numeration20220514.png

“geany” is a text editor for Linux; I used it to edit the settings.

You can practise numbers with Google Translate TTS: in 'setting.yaml', change “mode” to “practising”, then run the program, select language setting, you will hear the TTS if supported; tap “Enter” to see the number in Western Arabic, tap “Enter” to see the writing.

Why choose Apache 2.0?[edit | edit source]

I want to let people use this even commercially, e.g. in language-learning games like Influent. Thus, I need a permissive license. You may ask why not public domain. There are some countries where people cannot put their work in public domain, None of the current 3 public domain licenses is friendly to commercial use.

MIT is good, but there is a problem: it doesn't explain how to deal with patents and trademarks. This allows patent trolls to sue those who use the code commercially without knowing that it contains a patent. When the MIT license came into being, there were not many software patent issues, so the license can be called outdated.

A guy named Lawrence E. Rosen created the Academic Free License to deal with this issue and this idea was incorporated into Apache 2.0, a better alternative.

Explain the code?[edit | edit source]

The main program is “numerate.py”. It reads your selected language codes, read the language settings (rule/library/setting.yaml), then read your subject (mathematical numeral, ordinal number, nominal number, date and time) selection number, then read the notation, convert it to Western Arabic, then call the language's “.py” file and show the result, finally ask you what to do next (do again from a step or end). This is going to be changed: everything before inputting the numeral will be settled in 'setting.yaml' first.

In the language “.py” files, everything is in a function “do”, so it can be called from another file (numerate.py) easily. They read the notation and the language setting as part of the input of the function “do”, then read the corresponding lexicon YAML file. Then the notation will be processed according to different notation types.

Magnitude and Archmagnitude[edit | edit source]

There are something to be noticed like the magnitude and archmagnitude.

Take the cardinal number 987654321 as an example:

  • digit grouping: 3 (European)
9 8 7 6 5 4 3 2 1
magnitude 2 1 0 2 1 0 2 1 0
archmagnitude 2 2 2 1 1 1 0 0 0
  • digit grouping: 3 (European double digit group)
9 8 7 6 5 4 3 2 1
magnitude 2 1 0 5 4 3 2 1 0
archmagnitude 1 1 1 0 0 0 0 0 0
  • digit grouping: 4 (East Asian)
9 8 7 6 5 4 3 2 1
magnitude 0 3 2 1 0 3 2 1 0
archmagnitude 2 1 1 1 1 0 0 0 0
  • digit grouping: [3, 2] (South Asian)
9 8 7 6 5 4 3 2 1
magnitude 1 0 1 0 1 0 2 1 0
archmagnitude 3 3 2 2 1 1 0 0 0

Construction of a natural number[edit | edit source]

There are two ways:

  • write the digit and magnitude for each place; write the archmagnitude
  • write each place or a combination of two places separately; write the archmagnitude

The first one is for Esperanto, Chinese, Japanese in mixed script of Kanji and Kana, etc.

The second one is for English, French, German, Japanese in Kana, etc.

What are the differences from num2words?[edit | edit source]

  • Purpose: num2words is simply a tool for outputting the text; Numeration is for educational use with a lot of configurations, so you can see different ways of speaking the number in a language.
  • Structure: num2words is done by a bunch of people, so the code styles for languages are different one another; Numeration has merely single (unmarried) contributor, so they are in one style: lots of little functions with self-explaining names and little to no comments.
  • Content: num2words has cardinal number, ordinal number, year, currency, and support 32 languages (no Chinese or Esperanto) since 2019; Numeration can do integer, fraction, mixed number, decimal, percentage, scientific notation, ordinal number, nominal number, date and time, and support the languages mentioned above and is counting, and TTS for practice.
  • Activity: num2words releases is staled in spite of some recent commits, and in its “issue” section, a lot of problems are left unresolved as the contributor of that language goes inactive; Numeration's development is unfinished, updated irregularly, had serious errors, and is free from bugs now, as I know.

If I want to add a language, what should I know?[edit | edit source]

There are several places you need to visit to make sure it works:

  • 'numerate.py'
  • 'setting.yaml'
  • 'rule/setting/0.yaml'
  • 'rule/library/language_decode.yaml'
  • 'rule/library/language_encode.yaml'
  • 'about/language.yaml'

Just pick a language that is similar to the language you want to add, copy and paste, do some modifications. This will suffice.

Why are the instructions in the program all imperative, instead of being polite?[edit | edit source]

Imperative is great for simplicity. What's more, there is no sincerity in automatic messages, so I decided not to be courteous in there. My sincerity is elsewhere: Apache 2.0.

What's the future plan?[edit | edit source]

I need to

  • simplify the recognition of notation type (from manual to automatic)
  • add support for more numeral systems
  • add support for more languages, in which languages being mostly learned and mostly used are given the priority, then official languages.
  • [on hold] create a GUI with Neutralinojs and add Flask support.

The GUI needs to have four output areas, corresponding to four text directions:

Arabic, Hebrew, etc.

Latin, Cyrillic, Greek, etc.

(traditional) Han, Hangul, Japanese, etc.

Mongolian, etc.

See https://en.wikipedia.org/wiki/Writing_system#Directionality for details of directions.

I need to accumulate experience through this project, then I can create other ones.

I had a debate with myself: should the large archmagnitude by Conway-Guy and long scale in Chinese be supported? I thought that there are written rules for them, they exist, so they should be supported. Then I thought that as nearly no one is using them, they are practically not existing. I chose the latter. Maybe I would add support for them later, when I couldn't support more languages due to lack of references.

Planned next languages to be supported: Russian, Italian, Standard Arabic.

AUTHOR[edit | edit source]

GrimPixel

Other Lessons[edit | edit source]

Contributors

GrimPixel, Vincent and Maintenance script


Create a new Lesson