Difference between revisions of "Features/Language-List-REQUEST"
(11 intermediate revisions by 2 users not shown) | |||
Line 63: | Line 63: | ||
|} | |} | ||
<span style=" color: red">GrimPixel:</span> Combined names are according to https://tech.lds.org/wiki/ISO_Language_Codes, I think Combined Names can be URL names. <span style=" color: red">Vincent:</span> Yes for URLs we could use /zho-Hant instead of /madarin-chinese-traditional but I chose to use an understandable URL because it is better for SEO (<span style=" color: red">http://digitalverge.net/seo/url-optimization-best-practices-to-create-seo-friendly-url-structure/</span>). I will create the url by combining the "English script name" (<span style=" color: red">http://unicode.org/iso15924/iso15924-codes.html</span>) + "Language name". Changing the url <span style=" color: red">https://polyglotclub.com/index/translate-russian</span> to [https://polyglotclub.com/index/translate-russian https://polyglotclub.com/index/translate-rus-cyrl] would mean change about 1,000,000 indexed pages !! For the russian language there is only one script, the URL do not need to change. <span style=" color: red">GrimPixel:</span> It's understandable. But I have found an interesting fact - people in Romania and Serbia use Latin for Bulgarian. <span style=" color: red">GrimPixel:</span> How would other translation be named? Such as "translate-central-khmer" or "translate-central-khmer-khmer"? <span style=" color: red">Vincent:</span> URL name must be UNIQUE and SIMPLE so "translate-central-khmer" <span style=" color: red">GrimPixel: Will translation teams be available for each language, or only for the most popular ones? | <span style=" color: red">GrimPixel:</span> Combined names are according to https://tech.lds.org/wiki/ISO_Language_Codes, I think Combined Names can be URL names. <span style=" color: red">Vincent:</span> Yes for URLs we could use /zho-Hant instead of /madarin-chinese-traditional but I chose to use an understandable URL because it is better for SEO (<span style=" color: red">http://digitalverge.net/seo/url-optimization-best-practices-to-create-seo-friendly-url-structure/</span>). I will create the url by combining the "English script name" (<span style=" color: red">http://unicode.org/iso15924/iso15924-codes.html</span>) + "Language name". Changing the url <span style=" color: red">https://polyglotclub.com/index/translate-russian</span> to [https://polyglotclub.com/index/translate-russian https://polyglotclub.com/index/translate-rus-cyrl] would mean change about 1,000,000 indexed pages !! For the russian language there is only one script, the URL do not need to change. <span style=" color: red">GrimPixel:</span> It's understandable. But I have found an interesting fact - people in Romania and Serbia use Latin for Bulgarian. <span style=" color: red">GrimPixel:</span> How would other translation be named? Such as "translate-central-khmer" or "translate-central-khmer-khmer"? <span style=" color: red">Vincent:</span> URL name must be UNIQUE and SIMPLE so "translate-central-khmer" <span style=" color: red">GrimPixel:</span> Will translation teams be available for each language, or only for the most popular ones? | ||
<span style=" color: red">GrimPixel:</span> You can see it after clicking on "Cyrillic". It's listed in Writing systems that use this script. Let me check if there is a ready-made sheet. <span style=" color: red">Vincent:</span> In case you don't find, it's not too much work to extract by hand on each page like here for cyrillic <span style=" color: red">http://scriptsource.org/cms/scripts/page.php?item_id=script_detail&key=Cyrl</span>. | <span style=" color: red">GrimPixel:</span> You can see it after clicking on "Cyrillic". It's listed in Writing systems that use this script. Let me check if there is a ready-made sheet. <span style=" color: red">Vincent:</span> In case you don't find, it's not too much work to extract by hand on each page like here for cyrillic <span style=" color: red">http://scriptsource.org/cms/scripts/page.php?item_id=script_detail&key=Cyrl</span>. | ||
Line 71: | Line 71: | ||
<span style=" color: red">GrimPixel:</span> Some autonyms can be found here, relatively reliable http://www.omniglot.com/language/names.htm <span style=" color: red">Vincent:</span> The idea would be to find all the autonyms from Ethnologue but your list <span style=" color: red">https://polyglotclub.com/wiki/Features/Language-List-autonym</span> is already very good <span style=" color: red">GrimPixel:</span> I don't know how many autonyms does Ethnologue have, but I guess it's no more than that of Omniglot. | <span style=" color: red">GrimPixel:</span> Some autonyms can be found here, relatively reliable http://www.omniglot.com/language/names.htm <span style=" color: red">Vincent:</span> The idea would be to find all the autonyms from Ethnologue but your list <span style=" color: red">https://polyglotclub.com/wiki/Features/Language-List-autonym</span> is already very good <span style=" color: red">GrimPixel:</span> I don't know how many autonyms does Ethnologue have, but I guess it's no more than that of Omniglot. | ||
<span style=" color: red">Vincent: OMG! :</span>-O to translate the webiste, we must follow Google rules (there is no point in translating the site if it cannot be seen by Google): <span style=" color: red">https://support.google.com/webmasters/answer/189077?hl=en.</span> Inside the source of each page I have to add special tags linking to the same page translated in all the other languages. For each translation: this code: <span style=" color: red"><code><link rel="alternate" hreflang="es" href="https://polyglotclub/index/translate-spanish" /></code></span>. This will allow Google to display the right page to the right user according to his speaking language. For the hreflang attribute, the ISO639-3 code cannot be used. Only ISO 639-1 (+ ISO 3166-1, the country code) (+ ISO15924, the script code). Examples: en-GB, zh-Hans, zh-Hans-TW. In my main list, I have ISO 639-1 (185 languages), so only languages with an existing ISO639-1 and ISO639-1/ISO15924 can be used! <span style=" color: red">GrimPixel:</span> I think it means there will be a bit more than 185 translation teams? <span style=" color: red">Vincent:</span> 206 lines here: <span style=" color: red">https://polyglotclub.com/wiki/Features/Language-List-SCRIPT</span> I'm not sure it's ok. I have to check. <span style=" color: red">GrimPixel: It's sure that it's far from ok. I think There can be more than 206 translation teams - those not included in ISO 639-1 will not be detected by Google, but it doesn't matter for them to be existing.</span> | <span style=" color: red">Vincent: OMG! :</span>-O to translate the webiste, we must follow Google rules (there is no point in translating the site if it cannot be seen by Google): <span style=" color: red">https://support.google.com/webmasters/answer/189077?hl=en.</span> Inside the source of each page I have to add special tags linking to the same page translated in all the other languages. For each translation: this code: <span style=" color: red"><code><link rel="alternate" hreflang="es" href="https://polyglotclub/index/translate-spanish" /></code></span>. This will allow Google to display the right page to the right user according to his speaking language. For the hreflang attribute, the ISO639-3 code cannot be used. Only ISO 639-1 (+ ISO 3166-1, the country code) (+ ISO15924, the script code). Examples: en-GB, zh-Hans, zh-Hans-TW. In my main list, I have ISO 639-1 (185 languages), so only languages with an existing ISO639-1 and ISO639-1/ISO15924 can be used! <span style=" color: red">GrimPixel:</span> I think it means there will be a bit more than 185 translation teams? <span style=" color: red">Vincent:</span> 206 lines here: <span style=" color: red">https://polyglotclub.com/wiki/Features/Language-List-SCRIPT</span> I'm not sure it's ok. I have to check. <span style=" color: red">GrimPixel</span>: It's sure that it's far from ok. I think There can be more than 206 translation teams - those not included in ISO 639-1 will not be detected by Google, but it doesn't matter for them to be existing. <span style=" color: red">Vincent:</span> if I don't filter according to iso1, but to languages having a script, it makes 1105 languages. you must take into account web constraints : each time a new translation is online it can make about 1000,000 new indexed pages. and with a lot of duplicate content, because the dynamic content (content posted by users) is not always translated. webcrawlers while indexing are increasing server load. also one language can be put online when like 60% of content has been translated. Maximum 100-200 languages can be translated to 60%, others will never be online. Yes, there can be a lot translation teams although only a few versions will be online. <span style=" color: red">GrimPixel:</span> I understand. It's still good enough on the whole Internet. <span style=" color: red">Vincent:</span> "Who can do more, can do less". Let's open all languages to the translations team. It will take time before many new language versions of the site will be online anyway (especially for rare languages). <span style=" color: red">GrimPixel: Yeah! It will be fun!</span> | ||
===Search box=== | ===Search box=== | ||
Line 82: | Line 82: | ||
<span style=" color: red">GrimPixel:</span> I have just realized that the word "family" is a linguistic term, and it is not appropriate here. <span style=" color: red">Vincent:</span> 99,9% of our users are not linguists and the word family seemed understandable by usual people. "Macro language" does not mean anything for people. Do you know any other simple word? "Group"? <span style=" color: red">GrimPixel:</span> It's nearly impossible to replace "macrolanguage", because a macrolanguage is both one language and many languages. I think the word "family" or "group" should be omitted, because it changes "Arabic" from a noun to an adjective. <span style=" color: red">Vincent:</span> I'll keep 'family' for now even if it's wrong from the linguist point of view. it will bother only 0.01% of people whereas if I write 'macrolanguage' it will bother 99.9% of people <span style=" color: red">GrimPixel: </span>I still think no word attaching is good. If you still want to use a word, then "cluster" is the best one, introduced from ''Handbook of African Languages''. "language cluster" seems to be the only alternative of "macrolanguage". <span style=" color: red">Vincent:</span> I did not get you were saying 'no word', ok, I will not use any word then. <span style=" color: red">Vincent:</span> You say: "a macrolanguage is both one language and many languages" so it mean a member can add a macrolanguage as "Language you can teach" or "language you can learn"? <span style=" color: red">GrimPixel:</span> A macrolanguage can be considered as a language, because its dialects are very similar in writing, and one written dialect can be understood by people of other dialects. It is used in ISO 639-2. But in ISO 639-3, a macrolanguage is considered as many languages (instead of dialects), the reason is that the native speakers of those languages are mutually unintelligible (when speaking, and in many cases even when writing). So a member shouldn't select a macrolanguage as a "Language you can teach" or "Language you want to learn". <span style=" color: red">Vincent:</span> here we are using ISO 639-3, so a macrolanguage is not a language. for me, it's simply a group/category of languages. | <span style=" color: red">GrimPixel:</span> I have just realized that the word "family" is a linguistic term, and it is not appropriate here. <span style=" color: red">Vincent:</span> 99,9% of our users are not linguists and the word family seemed understandable by usual people. "Macro language" does not mean anything for people. Do you know any other simple word? "Group"? <span style=" color: red">GrimPixel:</span> It's nearly impossible to replace "macrolanguage", because a macrolanguage is both one language and many languages. I think the word "family" or "group" should be omitted, because it changes "Arabic" from a noun to an adjective. <span style=" color: red">Vincent:</span> I'll keep 'family' for now even if it's wrong from the linguist point of view. it will bother only 0.01% of people whereas if I write 'macrolanguage' it will bother 99.9% of people <span style=" color: red">GrimPixel: </span>I still think no word attaching is good. If you still want to use a word, then "cluster" is the best one, introduced from ''Handbook of African Languages''. "language cluster" seems to be the only alternative of "macrolanguage". <span style=" color: red">Vincent:</span> I did not get you were saying 'no word', ok, I will not use any word then. <span style=" color: red">Vincent:</span> You say: "a macrolanguage is both one language and many languages" so it mean a member can add a macrolanguage as "Language you can teach" or "language you can learn"? <span style=" color: red">GrimPixel:</span> A macrolanguage can be considered as a language, because its dialects are very similar in writing, and one written dialect can be understood by people of other dialects. It is used in ISO 639-2. But in ISO 639-3, a macrolanguage is considered as many languages (instead of dialects), the reason is that the native speakers of those languages are mutually unintelligible (when speaking, and in many cases even when writing). So a member shouldn't select a macrolanguage as a "Language you can teach" or "Language you want to learn". <span style=" color: red">Vincent:</span> here we are using ISO 639-3, so a macrolanguage is not a language. for me, it's simply a group/category of languages. | ||
<span style=" color: red">GrimPixel:</span> I think there are too many unnecessary "Arabic". So, that "Arabic" can be sticked on the top of the list when scroll down, until it meets the separate line. And then, those "Arabic" ("Arabic family" on the image) after each item can be removed. <span style=" color: red">Vincent:</span> Ok <span style=" color: red">GrimPixel: When typing an individual language, the macrolanguage it belongs to (if there is one) and a separate line should also be displayed.</span> | <span style=" color: red">GrimPixel:</span> I think there are too many unnecessary "Arabic". So, that "Arabic" can be sticked on the top of the list when scroll down, until it meets the separate line. And then, those "Arabic" ("Arabic family" on the image) after each item can be removed. <span style=" color: red">Vincent:</span> Ok <span style=" color: red">GrimPixel:</span> When typing an individual language, the macrolanguage it belongs to (if there is one) and a separate line should also be displayed. <span style=" color: red">Vincent: Ok</span> | ||
[[File:french_screen.jpg|thumb|none]] | [[File:french_screen.jpg|thumb|none]] |
Latest revision as of 10:24, 16 September 2017
Main Features[edit | edit source]
- Where on the site use 'Macrolanguage' in the language list?:
- Find Friends page: we must be able to search according to 'individual' or 'macrolanguage', but in the search box, results must show 'macrolanguage' first in BOLD (with number of 'individual' inside the macrolanguage), then 'invididual'. it must be clear that individual belong to macro. Example for 'Arabic', there are 40 lines!
- Corrections, profile, translations pages: only individual language
- Lessons, videos, questions, quizzes : macrolanguage AND individual
- Type: all (easiest to keep them all)
- For "language you can TEACH" and "Language you want to LEARN", there will be an explanation for users about that if they couldn't find the result, they can search alternative names in Wikipedia, find the ISO 639-3 and type it. If they still can't find it on that, they can choose "Special" ISO 639-3 code "mis" and type the name. There are four "special" codes in that file, for example, Pinghua Chinese. The definition of usage is here: https://en.wikipedia.org/wiki/ISO_639-3#Special_codes There can be a list for missed languages which has been known, and can be choosen only from them. There can be also a function to "suggest a language".
- When creating wiki lessons, users' "language you can TEACH" and "Language you want to LEARN" can be shown firstly: vincent: OK
- Create a special page to allow an administrator to edit the 'new_lib' and the 'new_lib2' fields. Only create a page in /wiki/Features : table with the following columns, 'iso', 'new_lib', 'new_lib2' and i'll use this table to update the main table. Only for the main 500 languages. Vincent: I started a new list with autonyms and alternative English names here: https://polyglotclub.com/wiki/Features/Language-List-autonym
- Icons of flags for languages will be removed. Forvo, Glosbe, HiNative, Wikipedia. Italki uses flags only to indicate location. Where there are only handful languages, flags are used. Tatoeba is an exception, it made up flags to ensure every language has one there. Vincent: In the first version, flags will still be there because there are too many changes to do. I will decide later if we need to remove them. Remember 99% of users will select the most common languages. Tatoeba has more than 300 languages, if we can have flags for the main 300 or 500, we will cover maybe 99,9% of users. See the table below, it is interesting. Almost no other website/app has the full language list, that's why I want to do it. But I am doing all that for the very minority, anyway, the majority is not always right...GrimPixel: Yes, it is seizing the initiative that matters. We will be the only stronghold for the minority. Vincent: true :) !
Translations[edit | edit source]
GrimPixel: For Chinese, there should be Traditional and Simplified. Also, Cyrillic and Latin for Serbian, and so on for some other.
GrimPixel: Yeah, new table for scripts. But there is a problem - Japanese uses both Kana and Chinese, Korean uses Hangul and sometimes Chinese. Also, both Traditional and Simplified Chinese are belonging to Chinese script. So it's not strictly according to script systems.
GrimPixel: A professtional site: http://scriptsource.org/cms/scripts/page.php It will require ISO 15924: https://en.wikipedia.org/wiki/ISO_15924 But it's still not good enough, because of being exceedingly precise, not practical. Some should be hidden, such as Hiragana, Katakana, Japanese Syllables, because there is a "Japanese" which includes them all.
GrimPixel: I think it's a difficult problem. It seems that using scripts to divide them is not enough, people in different area have different customs. There are differences between Portugese (Brazil) and Portugese (Portugal), English (USA) and English (UK), and so on. So it would be too much content if divide by regions, just as what Windows 10 Regions & languages - add a language.
GrimPixel: I think it will finally be according to scripts, though there are different customs in different areas within a script.
GrimPixel: I think it makes - Bulgarian can be Cyrillic, Russian can also be Cyrillic. In each entry, there is Writing systems that use this script, which tells which languages use the script.
Language | ISO 639-3 | ISO 15924 Code | ISO 15924 Name | URL name | |
---|---|---|---|---|---|
Bulgarian | bul | Cyrl | Cyrillic | bulgarian | |
Russian | rus | Cyrl | Cyrillic | russian | |
Mandarin Chinese | cmn | Hans | Han (Simplified variant) | mandarin-chinese-simplified | |
Mandarin Chinese | cmn | Hant | Han (Traditional variant) | mandarin-chinese-traditional |
GrimPixel: Combined names are according to https://tech.lds.org/wiki/ISO_Language_Codes, I think Combined Names can be URL names. Vincent: Yes for URLs we could use /zho-Hant instead of /madarin-chinese-traditional but I chose to use an understandable URL because it is better for SEO (http://digitalverge.net/seo/url-optimization-best-practices-to-create-seo-friendly-url-structure/). I will create the url by combining the "English script name" (http://unicode.org/iso15924/iso15924-codes.html) + "Language name". Changing the url https://polyglotclub.com/index/translate-russian to https://polyglotclub.com/index/translate-rus-cyrl would mean change about 1,000,000 indexed pages !! For the russian language there is only one script, the URL do not need to change. GrimPixel: It's understandable. But I have found an interesting fact - people in Romania and Serbia use Latin for Bulgarian. GrimPixel: How would other translation be named? Such as "translate-central-khmer" or "translate-central-khmer-khmer"? Vincent: URL name must be UNIQUE and SIMPLE so "translate-central-khmer" GrimPixel: Will translation teams be available for each language, or only for the most popular ones?
GrimPixel: You can see it after clicking on "Cyrillic". It's listed in Writing systems that use this script. Let me check if there is a ready-made sheet. Vincent: In case you don't find, it's not too much work to extract by hand on each page like here for cyrillic http://scriptsource.org/cms/scripts/page.php?item_id=script_detail&key=Cyrl.
Vincent: Oh my God! here there is the full Ethnologue list with the fields I needed: Alternate names and Is mainly used in: I can update our full list with this. [COMPLETED]
GrimPixel: Some autonyms can be found here, relatively reliable http://www.omniglot.com/language/names.htm Vincent: The idea would be to find all the autonyms from Ethnologue but your list https://polyglotclub.com/wiki/Features/Language-List-autonym is already very good GrimPixel: I don't know how many autonyms does Ethnologue have, but I guess it's no more than that of Omniglot.
Vincent: OMG! :-O to translate the webiste, we must follow Google rules (there is no point in translating the site if it cannot be seen by Google): https://support.google.com/webmasters/answer/189077?hl=en. Inside the source of each page I have to add special tags linking to the same page translated in all the other languages. For each translation: this code: <link rel="alternate" hreflang="es" href="https://polyglotclub/index/translate-spanish" />
. This will allow Google to display the right page to the right user according to his speaking language. For the hreflang attribute, the ISO639-3 code cannot be used. Only ISO 639-1 (+ ISO 3166-1, the country code) (+ ISO15924, the script code). Examples: en-GB, zh-Hans, zh-Hans-TW. In my main list, I have ISO 639-1 (185 languages), so only languages with an existing ISO639-1 and ISO639-1/ISO15924 can be used! GrimPixel: I think it means there will be a bit more than 185 translation teams? Vincent: 206 lines here: https://polyglotclub.com/wiki/Features/Language-List-SCRIPT I'm not sure it's ok. I have to check. GrimPixel: It's sure that it's far from ok. I think There can be more than 206 translation teams - those not included in ISO 639-1 will not be detected by Google, but it doesn't matter for them to be existing. Vincent: if I don't filter according to iso1, but to languages having a script, it makes 1105 languages. you must take into account web constraints : each time a new translation is online it can make about 1000,000 new indexed pages. and with a lot of duplicate content, because the dynamic content (content posted by users) is not always translated. webcrawlers while indexing are increasing server load. also one language can be put online when like 60% of content has been translated. Maximum 100-200 languages can be translated to 60%, others will never be online. Yes, there can be a lot translation teams although only a few versions will be online. GrimPixel: I understand. It's still good enough on the whole Internet. Vincent: "Who can do more, can do less". Let's open all languages to the translations team. It will take time before many new language versions of the site will be online anyway (especially for rare languages). GrimPixel: Yeah! It will be fun!
Search box[edit | edit source]
Here are examples:
GrimPixel: It's very clear. I have noticed that it's better to have a <hr/> between Andalusian Arabic and Judeo-Yemeni Arabic. Vincent: Ok I'll add a break line at the end of the family members
GrimPixel: Why can't I see adds? Vincent: VIP don't have ads: https://polyglotclub.com/trust GrimPixel: Then why can you? :^)Vincent: except me ;)
GrimPixel: I have just realized that the word "family" is a linguistic term, and it is not appropriate here. Vincent: 99,9% of our users are not linguists and the word family seemed understandable by usual people. "Macro language" does not mean anything for people. Do you know any other simple word? "Group"? GrimPixel: It's nearly impossible to replace "macrolanguage", because a macrolanguage is both one language and many languages. I think the word "family" or "group" should be omitted, because it changes "Arabic" from a noun to an adjective. Vincent: I'll keep 'family' for now even if it's wrong from the linguist point of view. it will bother only 0.01% of people whereas if I write 'macrolanguage' it will bother 99.9% of people GrimPixel: I still think no word attaching is good. If you still want to use a word, then "cluster" is the best one, introduced from Handbook of African Languages. "language cluster" seems to be the only alternative of "macrolanguage". Vincent: I did not get you were saying 'no word', ok, I will not use any word then. Vincent: You say: "a macrolanguage is both one language and many languages" so it mean a member can add a macrolanguage as "Language you can teach" or "language you can learn"? GrimPixel: A macrolanguage can be considered as a language, because its dialects are very similar in writing, and one written dialect can be understood by people of other dialects. It is used in ISO 639-2. But in ISO 639-3, a macrolanguage is considered as many languages (instead of dialects), the reason is that the native speakers of those languages are mutually unintelligible (when speaking, and in many cases even when writing). So a member shouldn't select a macrolanguage as a "Language you can teach" or "Language you want to learn". Vincent: here we are using ISO 639-3, so a macrolanguage is not a language. for me, it's simply a group/category of languages.
GrimPixel: I think there are too many unnecessary "Arabic". So, that "Arabic" can be sticked on the top of the list when scroll down, until it meets the separate line. And then, those "Arabic" ("Arabic family" on the image) after each item can be removed. Vincent: Ok GrimPixel: When typing an individual language, the macrolanguage it belongs to (if there is one) and a separate line should also be displayed. Vincent: Ok
Page Name Changes[edit | edit source]
When a language is 'renamed' or 'merged', many URL will be updated and a redirection created. Old URL will be automatically redirected to the new ones (for example in case the old page still exists in Google search results, which will be the case for several months) so the user doesn't get a 404 error page.
I'm writing here the type of pages to test when the new version is online.
Automatic update[edit | edit source]
- https://polyglotclub.com/index/translate-chinese-mandarin will be redirected to /index/translate-chinese-simplified (translate language list)
- https://polyglotclub.com/find/language-Chinese,_Mandarin will be redirected to /find/language-mandarin-chinese (main list)
- https://polyglotclub.com/language/chinese-mandarin/question will be redirected to /language/mandarin-chinese/question (main list)
- https://polyglotclub.com/language/chinese-mandarin/note (...)
- https://polyglotclub.com/language/chinese-mandarin/video
- https://polyglotclub.com/language/chinese-mandarin/post
Manual update[edit | edit source]
- https://polyglotclub.com/wiki/Language/Chinese-mandarin/Pronunciation/Accents will be redirected to /wiki/Language/Mandarin-chinese/Pronunciation/Accents (main list)
The wiki will need to by updated for each page using the mediawiki 'redirect' button (for admins only). It cannot be done automatically. There are not so many pages to change, so it will be OK.