i18n: typos, keys, plurals and samples

2025-08-16 10:43:58 +02:00 · 2016-02-23 03:19:45 -03:00
parent 26b5607328
commit 844594e8cc
1 changed files with 191 additions and 14 deletions
--- a/_posts/05-06-01-Internationalization-and-Localization.md
+++ b/_posts/05-06-01-Internationalization-and-Localization.md
@@ -15,11 +15,11 @@ First of all, we need to define those two similar concepts and other related thi
 without refactors. This is usually done once - preferably, in the beginning of the project, or else you'll probably
 need some huge changes in the source!
 - **Localization** happens when you adapt the interface (mainly) by translating contents, based on the i18n work done
-before. It usually us done every time a new language or region needs support, and is updated when new interface pieces
+before. It usually is done every time a new language or region needs support, and is updated when new interface pieces
 are added, as they need to be available in all supported languages.
 - **Pluralization** defines the rules needed between different languages to interoperate strings containing numbers and 
 counters. For instance, in English when you have only one item, it's singular, and anything different from that is 
-called plural; plural is this language is indicated by adding an S after some words, and sometimes changes parts of it.
+called plural; plural in this language is indicated by adding an S after some words, and sometimes changes parts of it.
 In other languages such as Russian or Serbian there are two plural forms plus the singular one - you may even find
 languages with a total of four, five or six forms, such as Slovenian, Irish or Arabic.

@@ -41,9 +41,6 @@ running, while it still sports powerful supporting tools. It's about Gettext we
 not get messy over the command-line, we will be presenting a great GUI application that can be used to easily update
 your l10n source files.

-### Discussion on l10n keys
-> TODO: talk about static keys versus text keys, as in https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/#What_form_of_msgids_should_be_used
-
 ## Gettext

 ### Installation
@@ -51,6 +48,10 @@ You might need to install Gettext and the related PHP library by using your pack
 After installed, enable it by adding `extension=gettext.so` (Linux/Unix) or `extension=php_gettext.dll` (Windows) to
 your `php.ini`.

+Here we will also be using [Poedit] to create translation files. You will probably find it in your system's package
+manager; it's available for Unix, Mac and Windows, and can be [downloaded for free in their website][poedit_download]
+as well.
+
 ### Structure

 #### Types of files
@@ -65,7 +66,7 @@ You'll always have one pair of PO/MO files per language and region, but only one
 There are some cases, in big projects, where you might need to separate translations when the same words convey 
 different meaning given a context. In those cases you split them into different _domains_. They're basically named
 groups of POT/PO/MO files, where the filename is the said _translation domain_. Small and medium-sized projects usually,
-for simplicity, use only one domain; it's name is arbitrary, but we will be using "main" for our code samples.
+for simplicity, use only one domain; its name is arbitrary, but we will be using "main" for our code samples.

 #### Locale code
 A locale is simple code that identifies a version of a language. It's defined following [ISO 639-1][639-1] and 
@@ -106,16 +107,181 @@ root for your l10n files in your source repository. Inside it you'll have a fold
 {% endhighlight %}

 ### Plural forms
-> TODO
+As we said in the introduction, different languages might sport different plural rules. However, gettext saves us from
+this trouble once again. When creating a new .po file, you'll have to declare the [plural rules][plural] for that
+language, and translated pieces that are plural-sensitive will have a different form for each of those rules. When
+calling Gettext in code, you'll have to specify the number related to the sentence, and it will work out the correct
+form to use - even using string substitution if needed.
+
+Plural rules include the number of plurals available and a boolean test with `n` that would define in which rule the
+given number falls (starting the count with 0). For example:
+
+- Japanese: `nplurals=1; plural=0` - only one rule
+- English: `nplurals=2; plural=(n != 1);` - two rules, first if N is one, second rule otherwise
+- Brazilian Portuguese: `nplurals=2; plural=(n > 1);` - two rules, second if N is bigger than one, first otherwise
+
+Now that you understood the basis of how plural rules works - and if you didn't, please look at a deeper explanation
+on the [LingoHub tutorial](lingohub) -, you might want to copy the ones you need from a [list][plural] instead of
+writing them by hand.
+
+When calling out Gettext to do the localization of sentences that include counters, you'll have to pass to it the
+related number as well. Gettext will work out what rule should be in effect and use the correct localized version.
+You will need to include in the .po file a different sentence for each plural rule present in the language file.

 ### Sample implementation
-> TODO: Add sample code implementing i18n using gettext.
+After all that theory, let's get a little practical. Here's an excerpt of a .po file - don't mind with its format,
+but instead the overall content, you'll learn how to edit it easily later:
+
+{% highlight po %}
+msgid ""
+msgstr ""
+"Language: pt_BR\n"
+"Content-Type: text/plain; charset=UTF-8\n"
+"Plural-Forms: nplurals=2; plural=(n > 1);\n"
+
+msgid "We're now translating some strings"
+msgstr "Nós estamos traduzindo algumas strings agora"
+
+msgid "Hello %1$s! Your last visit was on %2$s"
+msgstr "Olá %1$s! Sua última visita foi em %2$s"
+
+msgid "Only one unread message"
+msgid_plural "%d unread messages"
+msgstr[0] "Só uma mensagem não lida"
+msgstr[1] "%d mensagens não lidas"
+{% endhighlight %}
+
+The first section works like a header, having the `msgid` and `msgstr` specially empty. It describes the file encoding,
+plural forms and other things that are less relevant. The second section translates a simple string from English to
+Brazilian Portuguese, and the third does the same, but leveraging string replacement from [`sprintf`](sprintf) so the
+translation may contain the user name and visit date. The last section is a sample of pluralization forms, displaying
+the singular and plural version as `msgid` in English and their corresponding translations as `msgstr` 0 and 1
+(following the number given by the plural rule). There, string replacement is used as well so the number can be seen
+directly in the sentence, by using `%d`. The plural forms always have two `msgid` (singular and plural), so it's
+advised to not use a complex language as source of translation.
+
+### Discussion on l10n keys
+As you might have noticed, we're using as source ID the actual sentence in English. That `msgid` is the same used
+throughout all your `.po` files, meaning other languages will have the same format and the same `msgid` fields, but
+translated `msgstr` lines.
+
+Talking about translation keys, there are two main "schools" here:
+
+1. `msgid` as a real sentence. The main advantage here is that, if there's pieces of the software untranslated in any
+given language, it will be displaying in a meaningful-ish way. If you happen to translate by heart from English to
+Spanish but needs help to translate to French, you might publish the new page with missing French sentences, and parts
+of the website would be displayed in English instead. Another point is that it's much easier for the translator to
+understand what's going on and make a proper translation based on the `msgid`. It also gives you "free" l10n for a
+language - the source one. However, if you need to change the actual text, you would need to replace the same `msgid`
+across several language files.
+2. `msgid` as a unique, structured key. It would describe the sentence role in the application in a structured way,
+including the template or part where the string is located instead of its content. It's a great way to have the code
+organized, but would bring problems to the translator that would miss the context. A source translation file would be
+needed as a basis for other translations - so the developer would ideally have an `en.po` file, that translators would
+then read to understand what to write in `fr.po` for instance. This is also both good and bad, as missing translations
+would display meaningless keys on screen (`TOP_MENU_WELCOME` instead of `Hello there, User!` on the given French
+untranslated page), forcing translation to be complete before publishing - while translation errors would be really
+awful in the interface.
+
+The [Gettext manual][manual] favors the first approach, as in general it's easier for translators and users in
+case of trouble. That's how we will be working here as well.

 ### Everyday usage
-> TODO: Explain what's the l10n routine for a project with existing i18n in place, using Poedit (and maybe command line as seen
-in the LingoHub file).
+In a common application, you would use some Gettext functions while writing static text in your pages. Those sentences
+would then appear in `.po` files, get translated, compiled into `.mo` files and then, used by Gettext when rendering
+the actual interface. Given that, let's tie together what we have discussed so far in a a step-by-step example:

-#### Tips & Tricks
+#### 1. A sample template file, including some different gettext calls
+{% highlight php %}
+<?php include 'i18n_setup.php' ?>
+<div id="header">
+    <h1><?=sprintf(gettext('Welcome, %s!'), $name)?></h1>
+    <!-- code indented this way only for legibility here -->
+    <?php if ($unread): ?>
+        <h2><?=sprintf(
+            ngettext('Only one unread message',
+                     '%d unread messages',
+                     $unread),
+            $unread)?>
+        </h2>
+    <? endif ?>
+</div>
+
+<h1><?=gettext('Introduction')?></h1>
+<p><?=gettext('We\'re now translating some strings')?></p>
+{% endhighlight %}
+
+- [`gettext()`][func] simply translates a `msgid` into it's corresponding `msgstr` for a given language. There's also
+the shorthand function `_()` that works the same way;
+- [`ngettext()`][n_func] does the same but with plural rules;
+- there's also [`dgettext()`][d_func] and [`dngettext()`][dn_func], that allows you to override the domain for a single
+call. More on domain configuration in the next example.
+
+#### 2. A sample setup file (`i18n_setup.php` as used above), selecting the correct locale and configuring Gettext
+{% highlight php %}
+<?php
+/**
+ * Verifies if the given $locale is supported in the project
+ * @param string $locale
+ * @return bool
+ */
+function valid($locale) {
+   return in_array($locale, ['en_US', 'en', 'pt_BR', 'pt', 'es_ES', 'es');
+}
+
+//setting the source/default locale, for informational purposes
+$lang = 'en_US';
+
+if (isset($_GET['lang']) && valid($_GET['lang'])) {
+    // the locale can be changed through the query-string
+    $lang = $_GET['lang'];    //you should sanitize this!
+    setcookie('lang', $lang); //it's stored in a cookie so it can be reused
+} elseif (isset($_COOKIE['lang']) && valid($_COOKIE['lang'])) {
+    // if the cookie is present instead, let's just keep it
+    $lang = $_COOKIE['lang']; //you should sanitize this!
+} elseif (isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])) {
+    // default resort: look for the languages the browser says the user accepts
+    $langs = explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE']);
+    array_walk($langs, function (&$lang) { $lang = strtr(strtok($lang, ';'), ['-' => '_']); });
+    foreach ($langs as $browser_lang) {
+        if (valid($browser_lang)) {
+            $lang = $browser_lang;
+            break;
+        }
+    }
+}
+
+// here we define the global system locale given the found language
+putenv("LANG=$lang");
+
+// this might be useful for date functions (LC_TIME) or money formatting (LC_MONETARY), for instance
+setlocale(LC_ALL, $lang);
+
+// this will make Gettext look for ../locales/<lang>/LC_MESSAGES/main.mo
+bindtextdomain('main', '../locales');
+
+// indicates in what encoding the file should be read
+bind_textdomain_codeset('main', 'UTF-8');
+
+// if your application has additional domains, as cited before, you should bind them here as well
+bindtextdomain('forum', '../locales');
+bind_textdomain_codeset('forum', 'UTF-8');
+
+// here we indicate the default domain the gettext() calls will respond to
+textdomain('main');
+
+// this would look for the string in forum.mo instead of main.mo
+// echo dgettext('forum', 'Welcome back!');
+?>
+{% endhighlight %}
+
+#### 3. Preparing translation for the first run
+> TODO: explain how to install Poedit and how to setup it
+
+#### 4. Translating strings
+> TODO: overall view on how to use Poedit for translation
+
+### Tips & Tricks
 > TODO: Talk about possible issue with caching.
 > TODO: Suggest creation of helper functions.

@@ -123,11 +289,22 @@ in the LingoHub file).

 * [Wikipedia: i18n and l10n](https://en.wikipedia.org/wiki/Internationalization_and_localization)
 * [Wikipedia: Gettext](https://en.wikipedia.org/wiki/Gettext)
-* [LingoHub: PHP internationalization with gettext tutorial](https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/)
-* [PHP Manual: Gettext](http://br2.php.net/manual/en/book.gettext.php)
-* [Gettext Manual](http://www.gnu.org/software/gettext/manual/gettext.html)
+* [LingoHub: PHP internationalization with gettext tutorial](lingohub)
+* [PHP Manual: Gettext](http://php.net/manual/en/book.gettext.php)
+* [Gettext Manual][manual]

+[Poedit]: https://poedit.net/
+[poedit_download]: https://poedit.net/download
+[lingohub]: https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/#Plurals
+[plural]: http://docs.translatehouse.org/projects/localization-guide/en/latest/l10n/pluralforms.html
 [gettext]: https://en.wikipedia.org/wiki/Gettext
+[manual]: (http://www.gnu.org/software/gettext/manual/gettext.html)
 [639-1]: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
 [3166-1]: http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
 [rare]: http://www.gnu.org/software/gettext/manual/gettext.html#Rare-Language-Codes
+
+[sprintf]: http://php.net/manual/en/function.sprintf.php
+[func]: http://php.net/manual/en/function.gettext.php
+[n_func]: http://php.net/manual/en/function.ngettext.php
+[d_func]: http://php.net/manual/en/function.dgettext.php
+[dn_func]: http://php.net/manual/en/function.dngettext.php