From 2032a643468ad37ccc0a019e29c44c776728b263 Mon Sep 17 00:00:00 2001 From: Igor Santos Date: Mon, 15 Feb 2016 03:36:09 -0200 Subject: [PATCH 1/7] Adding basic explanation on i18n and section TODOs - what's i18n, l10n and pluralization forms for - common methods of implementation - added the rest of the sub-sections, with TODOs --- ...1-Internationalization-and-Localization.md | 75 +++++++++++++++++++ 1 file changed, 75 insertions(+) create mode 100644 _posts/05-06-01-Internationalization-and-Localization.md diff --git a/_posts/05-06-01-Internationalization-and-Localization.md b/_posts/05-06-01-Internationalization-and-Localization.md new file mode 100644 index 0000000..eadf568 --- /dev/null +++ b/_posts/05-06-01-Internationalization-and-Localization.md @@ -0,0 +1,75 @@ +--- +title: Internationalization and Localization +isChild: true +anchor: i18n_l10n +--- + +## Internationalization (i18n) and Localization (l10n) {#i18n_l10n_title} + +_Disclaimer for newcomers: i18n and l10n are numeronyms, a kind of abbreviation where numbers are used to shorten +words - in our case, internationalization becomes i18n and localization, l10n._ + +First of all, we need to define those two similar concepts and other related things: + +- **Internationalization** is when you organize your code so it can be adapted to different languages or regions +without refactors. This is usually done once - preferably, in the beginning of the project, or else you'll probably +need some huge changes in the source! +- **Localization** happens when you adapt the interface (mainly) by translating contents, based on the i18n work done +before. It usually us done every time a new language or region needs support, and is updated when new interface pieces +are added, as they need to be available in all supported languages. +- **Pluralization** defines the rules needed between different languages to interoperate strings containing numbers and +counters. For instance, in English when you have only one item, it's singular, and anything different from that is +called plural; plural is this language is indicated by adding an S after some words, and sometimes changes parts of it. +In other languages such as Russian or Serbian there are two plural forms plus the singular one - you may even find +languages with a total of four, five or six forms, such as Slovenian, Irish or Arabic. + +## Common ways to implement + +The easiest way to internationalize PHP software is by using array files and using those strings in templates, such as +`

`. This is, however, hardly a recommended way for serious projects, as it poses +some maintenance issues along the road - some might appear in the very beginning, such as pluralization. So, please, +don't try this if your project will contain more than a couple of pages. + +Some frameworks will sport their own i18n packages. Those usually are a more powerful version of the above approach, +but including features needed for real localization, such as plural forms and string replacement. You're free to use +those if you feel like, but you might find bothering to edit array source files, having to deal with pure code issues +(such as string scaping and so on). The main pro here is integration with the environment you're using - the framework +is called _full-stack_ for a reason, right? + +However, the most classic way and often taken as reference for i18n and l10n is a [UNIX tool called `gettext`][gettext]. +It dates back to 1995 and is still the most complete implementation for translating software. It is pretty easy to get +running, while it still sports powerful supporting tools. It's about Gettext we will be talking here. Also, to help you +not get messy over the command-line, we will be presenting a great GUI application that can be used to easily update +your l10n source files. + +### Discussion on l10n keys +TODO: talk about static keys versus text keys, as in https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/#What_form_of_msgids_should_be_used + +## Gettext + +### Installation +TODO: You might need to install Gettext and the related PHP library by using your package manager, like `apt-get` or `yum`. + +### Structure +TODO: Talk about POT/PO/MO files and Poedit. Explain details about plural forms, directory structures and domains. + +### Sample implementation +TODO: Add sample code implementing i18n using gettext. + +### Everyday usage +TODO: Explain what's the l10n routine for a project with existing i18n in place, using Poedit (and maybe command line as seen +in the LingoHub file). + +#### Tips & Tricks +TODO: Talk about possible issue with caching. +TODO: Suggest creation of helper functions. + +### References + +* [Wikipedia: i18n and l10n](https://en.wikipedia.org/wiki/Internationalization_and_localization) +* [Wikipedia: Gettext](https://en.wikipedia.org/wiki/Gettext) +* [LingoHub: PHP internationalization with gettext tutorial](https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/) +* [PHP Manual: Gettext](http://br2.php.net/manual/en/book.gettext.php) + + +[gettext]: https://en.wikipedia.org/wiki/Gettext From 26b56073286852fa4541b9b4601632283f5a48e2 Mon Sep 17 00:00:00 2001 From: Igor Santos Date: Tue, 16 Feb 2016 04:24:16 -0200 Subject: [PATCH 2/7] i18n: Adding info on extensions, files and folders --- ...1-Internationalization-and-Localization.md | 78 ++++++++++++++++--- 1 file changed, 68 insertions(+), 10 deletions(-) diff --git a/_posts/05-06-01-Internationalization-and-Localization.md b/_posts/05-06-01-Internationalization-and-Localization.md index eadf568..3a297cb 100644 --- a/_posts/05-06-01-Internationalization-and-Localization.md +++ b/_posts/05-06-01-Internationalization-and-Localization.md @@ -24,7 +24,6 @@ In other languages such as Russian or Serbian there are two plural forms plus th languages with a total of four, five or six forms, such as Slovenian, Irish or Arabic. ## Common ways to implement - The easiest way to internationalize PHP software is by using array files and using those strings in templates, such as `

`. This is, however, hardly a recommended way for serious projects, as it poses some maintenance issues along the road - some might appear in the very beginning, such as pluralization. So, please, @@ -36,33 +35,89 @@ those if you feel like, but you might find bothering to edit array source files, (such as string scaping and so on). The main pro here is integration with the environment you're using - the framework is called _full-stack_ for a reason, right? -However, the most classic way and often taken as reference for i18n and l10n is a [UNIX tool called `gettext`][gettext]. +However, the most classic way and often taken as reference for i18n and l10n is a [Unix tool called `gettext`][gettext]. It dates back to 1995 and is still the most complete implementation for translating software. It is pretty easy to get running, while it still sports powerful supporting tools. It's about Gettext we will be talking here. Also, to help you not get messy over the command-line, we will be presenting a great GUI application that can be used to easily update your l10n source files. ### Discussion on l10n keys -TODO: talk about static keys versus text keys, as in https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/#What_form_of_msgids_should_be_used +> TODO: talk about static keys versus text keys, as in https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/#What_form_of_msgids_should_be_used ## Gettext ### Installation -TODO: You might need to install Gettext and the related PHP library by using your package manager, like `apt-get` or `yum`. +You might need to install Gettext and the related PHP library by using your package manager, like `apt-get` or `yum`. +After installed, enable it by adding `extension=gettext.so` (Linux/Unix) or `extension=php_gettext.dll` (Windows) to +your `php.ini`. ### Structure -TODO: Talk about POT/PO/MO files and Poedit. Explain details about plural forms, directory structures and domains. + +#### Types of files +There are three files you usually deal with while working with gettext. The main ones are PO (Portable Object) and +MO (Machine Object) files, the first being a list of readable "translated objects" and the second, the corresponding +binary to be interpreted by gettext when doing localization. There's also a POT (Template) file, that simply contains +all existing keys from your source files, and can be used as a guide to generate and update all PO files. Those template +files are not mandatory: depending on the tool you're using to do l10n, you can go just fine with only PO/MO files. +You'll always have one pair of PO/MO files per language and region, but only one POT per domain. + +### Domains +There are some cases, in big projects, where you might need to separate translations when the same words convey +different meaning given a context. In those cases you split them into different _domains_. They're basically named +groups of POT/PO/MO files, where the filename is the said _translation domain_. Small and medium-sized projects usually, +for simplicity, use only one domain; it's name is arbitrary, but we will be using "main" for our code samples. + +#### Locale code +A locale is simple code that identifies a version of a language. It's defined following [ISO 639-1][639-1] and +[ISO 3166-1 alpha-2][3166-1] specs: two lower-case letters for the language, optionally followed by an underline and two +upper-case letters identifying the country or regional code. For [rare languages][rare], three letters are used. + +For some speakers, the country part may seem redundant; but in fact, some languages have dialects in different +countries, such as Austrian German (`de_AT`) or Brazilian Portuguese (`pt_BR`). The second part is used to distinguish +between those dialects - when it's not present, it's taken as a "generic" or "hybrid" version of the language. + +### Directory structure +To use Gettext, we will need to adhere to a specific structure of folders. First, you'll need to select an arbitrary +root for your l10n files in your source repository. Inside it you'll have a folder for each needed locale, and a fixed +`LC_MESSAGES` folder that will contain all your PO/MO pairs. Example: + +{% highlight console %} + + ├─ src/ + ├─ templates/ + └─ locales/ + ├─ forum.pot + ├─ site.pot + ├─ de/ + │ └─ LC_MESSAGES/ + │ ├─ forum.mo + │ ├─ forum.po + │ ├─ site.mo + │ └─ site.po + ├─ es_ES/ + │ └─ LC_MESSAGES/ + │ └─ ... + ├─ fr/ + │ └─ ... + ├─ pt_BR/ + │ └─ ... + └─ pt_PT/ + └─ ... +{% endhighlight %} + +### Plural forms +> TODO ### Sample implementation -TODO: Add sample code implementing i18n using gettext. +> TODO: Add sample code implementing i18n using gettext. ### Everyday usage -TODO: Explain what's the l10n routine for a project with existing i18n in place, using Poedit (and maybe command line as seen +> TODO: Explain what's the l10n routine for a project with existing i18n in place, using Poedit (and maybe command line as seen in the LingoHub file). #### Tips & Tricks -TODO: Talk about possible issue with caching. -TODO: Suggest creation of helper functions. +> TODO: Talk about possible issue with caching. +> TODO: Suggest creation of helper functions. ### References @@ -70,6 +125,9 @@ TODO: Suggest creation of helper functions. * [Wikipedia: Gettext](https://en.wikipedia.org/wiki/Gettext) * [LingoHub: PHP internationalization with gettext tutorial](https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/) * [PHP Manual: Gettext](http://br2.php.net/manual/en/book.gettext.php) - +* [Gettext Manual](http://www.gnu.org/software/gettext/manual/gettext.html) [gettext]: https://en.wikipedia.org/wiki/Gettext +[639-1]: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes +[3166-1]: http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 +[rare]: http://www.gnu.org/software/gettext/manual/gettext.html#Rare-Language-Codes From 844594e8cc89b637d439679cda91e997c98e1be3 Mon Sep 17 00:00:00 2001 From: Igor Santos Date: Tue, 23 Feb 2016 03:19:45 -0300 Subject: [PATCH 3/7] i18n: typos, keys, plurals and samples --- ...1-Internationalization-and-Localization.md | 205 ++++++++++++++++-- 1 file changed, 191 insertions(+), 14 deletions(-) diff --git a/_posts/05-06-01-Internationalization-and-Localization.md b/_posts/05-06-01-Internationalization-and-Localization.md index 3a297cb..cf09ca1 100644 --- a/_posts/05-06-01-Internationalization-and-Localization.md +++ b/_posts/05-06-01-Internationalization-and-Localization.md @@ -15,11 +15,11 @@ First of all, we need to define those two similar concepts and other related thi without refactors. This is usually done once - preferably, in the beginning of the project, or else you'll probably need some huge changes in the source! - **Localization** happens when you adapt the interface (mainly) by translating contents, based on the i18n work done -before. It usually us done every time a new language or region needs support, and is updated when new interface pieces +before. It usually is done every time a new language or region needs support, and is updated when new interface pieces are added, as they need to be available in all supported languages. - **Pluralization** defines the rules needed between different languages to interoperate strings containing numbers and counters. For instance, in English when you have only one item, it's singular, and anything different from that is -called plural; plural is this language is indicated by adding an S after some words, and sometimes changes parts of it. +called plural; plural in this language is indicated by adding an S after some words, and sometimes changes parts of it. In other languages such as Russian or Serbian there are two plural forms plus the singular one - you may even find languages with a total of four, five or six forms, such as Slovenian, Irish or Arabic. @@ -41,9 +41,6 @@ running, while it still sports powerful supporting tools. It's about Gettext we not get messy over the command-line, we will be presenting a great GUI application that can be used to easily update your l10n source files. -### Discussion on l10n keys -> TODO: talk about static keys versus text keys, as in https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/#What_form_of_msgids_should_be_used - ## Gettext ### Installation @@ -51,6 +48,10 @@ You might need to install Gettext and the related PHP library by using your pack After installed, enable it by adding `extension=gettext.so` (Linux/Unix) or `extension=php_gettext.dll` (Windows) to your `php.ini`. +Here we will also be using [Poedit] to create translation files. You will probably find it in your system's package +manager; it's available for Unix, Mac and Windows, and can be [downloaded for free in their website][poedit_download] +as well. + ### Structure #### Types of files @@ -65,7 +66,7 @@ You'll always have one pair of PO/MO files per language and region, but only one There are some cases, in big projects, where you might need to separate translations when the same words convey different meaning given a context. In those cases you split them into different _domains_. They're basically named groups of POT/PO/MO files, where the filename is the said _translation domain_. Small and medium-sized projects usually, -for simplicity, use only one domain; it's name is arbitrary, but we will be using "main" for our code samples. +for simplicity, use only one domain; its name is arbitrary, but we will be using "main" for our code samples. #### Locale code A locale is simple code that identifies a version of a language. It's defined following [ISO 639-1][639-1] and @@ -106,16 +107,181 @@ root for your l10n files in your source repository. Inside it you'll have a fold {% endhighlight %} ### Plural forms -> TODO +As we said in the introduction, different languages might sport different plural rules. However, gettext saves us from +this trouble once again. When creating a new .po file, you'll have to declare the [plural rules][plural] for that +language, and translated pieces that are plural-sensitive will have a different form for each of those rules. When +calling Gettext in code, you'll have to specify the number related to the sentence, and it will work out the correct +form to use - even using string substitution if needed. + +Plural rules include the number of plurals available and a boolean test with `n` that would define in which rule the +given number falls (starting the count with 0). For example: + +- Japanese: `nplurals=1; plural=0` - only one rule +- English: `nplurals=2; plural=(n != 1);` - two rules, first if N is one, second rule otherwise +- Brazilian Portuguese: `nplurals=2; plural=(n > 1);` - two rules, second if N is bigger than one, first otherwise + +Now that you understood the basis of how plural rules works - and if you didn't, please look at a deeper explanation +on the [LingoHub tutorial](lingohub) -, you might want to copy the ones you need from a [list][plural] instead of +writing them by hand. + +When calling out Gettext to do the localization of sentences that include counters, you'll have to pass to it the +related number as well. Gettext will work out what rule should be in effect and use the correct localized version. +You will need to include in the .po file a different sentence for each plural rule present in the language file. ### Sample implementation -> TODO: Add sample code implementing i18n using gettext. +After all that theory, let's get a little practical. Here's an excerpt of a .po file - don't mind with its format, +but instead the overall content, you'll learn how to edit it easily later: + +{% highlight po %} +msgid "" +msgstr "" +"Language: pt_BR\n" +"Content-Type: text/plain; charset=UTF-8\n" +"Plural-Forms: nplurals=2; plural=(n > 1);\n" + +msgid "We're now translating some strings" +msgstr "Nós estamos traduzindo algumas strings agora" + +msgid "Hello %1$s! Your last visit was on %2$s" +msgstr "Olá %1$s! Sua última visita foi em %2$s" + +msgid "Only one unread message" +msgid_plural "%d unread messages" +msgstr[0] "Só uma mensagem não lida" +msgstr[1] "%d mensagens não lidas" +{% endhighlight %} + +The first section works like a header, having the `msgid` and `msgstr` specially empty. It describes the file encoding, +plural forms and other things that are less relevant. The second section translates a simple string from English to +Brazilian Portuguese, and the third does the same, but leveraging string replacement from [`sprintf`](sprintf) so the +translation may contain the user name and visit date. The last section is a sample of pluralization forms, displaying +the singular and plural version as `msgid` in English and their corresponding translations as `msgstr` 0 and 1 +(following the number given by the plural rule). There, string replacement is used as well so the number can be seen +directly in the sentence, by using `%d`. The plural forms always have two `msgid` (singular and plural), so it's +advised to not use a complex language as source of translation. + +### Discussion on l10n keys +As you might have noticed, we're using as source ID the actual sentence in English. That `msgid` is the same used +throughout all your `.po` files, meaning other languages will have the same format and the same `msgid` fields, but +translated `msgstr` lines. + +Talking about translation keys, there are two main "schools" here: + +1. `msgid` as a real sentence. The main advantage here is that, if there's pieces of the software untranslated in any +given language, it will be displaying in a meaningful-ish way. If you happen to translate by heart from English to +Spanish but needs help to translate to French, you might publish the new page with missing French sentences, and parts +of the website would be displayed in English instead. Another point is that it's much easier for the translator to +understand what's going on and make a proper translation based on the `msgid`. It also gives you "free" l10n for a +language - the source one. However, if you need to change the actual text, you would need to replace the same `msgid` +across several language files. +2. `msgid` as a unique, structured key. It would describe the sentence role in the application in a structured way, +including the template or part where the string is located instead of its content. It's a great way to have the code +organized, but would bring problems to the translator that would miss the context. A source translation file would be +needed as a basis for other translations - so the developer would ideally have an `en.po` file, that translators would +then read to understand what to write in `fr.po` for instance. This is also both good and bad, as missing translations +would display meaningless keys on screen (`TOP_MENU_WELCOME` instead of `Hello there, User!` on the given French +untranslated page), forcing translation to be complete before publishing - while translation errors would be really +awful in the interface. + +The [Gettext manual][manual] favors the first approach, as in general it's easier for translators and users in +case of trouble. That's how we will be working here as well. ### Everyday usage -> TODO: Explain what's the l10n routine for a project with existing i18n in place, using Poedit (and maybe command line as seen -in the LingoHub file). +In a common application, you would use some Gettext functions while writing static text in your pages. Those sentences +would then appear in `.po` files, get translated, compiled into `.mo` files and then, used by Gettext when rendering +the actual interface. Given that, let's tie together what we have discussed so far in a a step-by-step example: -#### Tips & Tricks +#### 1. A sample template file, including some different gettext calls +{% highlight php %} + + + +

+

+{% endhighlight %} + +- [`gettext()`][func] simply translates a `msgid` into it's corresponding `msgstr` for a given language. There's also +the shorthand function `_()` that works the same way; +- [`ngettext()`][n_func] does the same but with plural rules; +- there's also [`dgettext()`][d_func] and [`dngettext()`][dn_func], that allows you to override the domain for a single +call. More on domain configuration in the next example. + +#### 2. A sample setup file (`i18n_setup.php` as used above), selecting the correct locale and configuring Gettext +{% highlight php %} + '_']); }); + foreach ($langs as $browser_lang) { + if (valid($browser_lang)) { + $lang = $browser_lang; + break; + } + } +} + +// here we define the global system locale given the found language +putenv("LANG=$lang"); + +// this might be useful for date functions (LC_TIME) or money formatting (LC_MONETARY), for instance +setlocale(LC_ALL, $lang); + +// this will make Gettext look for ../locales//LC_MESSAGES/main.mo +bindtextdomain('main', '../locales'); + +// indicates in what encoding the file should be read +bind_textdomain_codeset('main', 'UTF-8'); + +// if your application has additional domains, as cited before, you should bind them here as well +bindtextdomain('forum', '../locales'); +bind_textdomain_codeset('forum', 'UTF-8'); + +// here we indicate the default domain the gettext() calls will respond to +textdomain('main'); + +// this would look for the string in forum.mo instead of main.mo +// echo dgettext('forum', 'Welcome back!'); +?> +{% endhighlight %} + +#### 3. Preparing translation for the first run +> TODO: explain how to install Poedit and how to setup it + +#### 4. Translating strings +> TODO: overall view on how to use Poedit for translation + +### Tips & Tricks > TODO: Talk about possible issue with caching. > TODO: Suggest creation of helper functions. @@ -123,11 +289,22 @@ in the LingoHub file). * [Wikipedia: i18n and l10n](https://en.wikipedia.org/wiki/Internationalization_and_localization) * [Wikipedia: Gettext](https://en.wikipedia.org/wiki/Gettext) -* [LingoHub: PHP internationalization with gettext tutorial](https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/) -* [PHP Manual: Gettext](http://br2.php.net/manual/en/book.gettext.php) -* [Gettext Manual](http://www.gnu.org/software/gettext/manual/gettext.html) +* [LingoHub: PHP internationalization with gettext tutorial](lingohub) +* [PHP Manual: Gettext](http://php.net/manual/en/book.gettext.php) +* [Gettext Manual][manual] +[Poedit]: https://poedit.net/ +[poedit_download]: https://poedit.net/download +[lingohub]: https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/#Plurals +[plural]: http://docs.translatehouse.org/projects/localization-guide/en/latest/l10n/pluralforms.html [gettext]: https://en.wikipedia.org/wiki/Gettext +[manual]: (http://www.gnu.org/software/gettext/manual/gettext.html) [639-1]: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes [3166-1]: http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 [rare]: http://www.gnu.org/software/gettext/manual/gettext.html#Rare-Language-Codes + +[sprintf]: http://php.net/manual/en/function.sprintf.php +[func]: http://php.net/manual/en/function.gettext.php +[n_func]: http://php.net/manual/en/function.ngettext.php +[d_func]: http://php.net/manual/en/function.dgettext.php +[dn_func]: http://php.net/manual/en/function.dngettext.php From ca90db9b1a2489813a83a34d4940d2095583a661 Mon Sep 17 00:00:00 2001 From: Igor Santos Date: Sat, 27 Feb 2016 14:10:07 -0300 Subject: [PATCH 4/7] i18n: review and finishing post - part about key types has been reviewed - included the pieces about using Poedit, caching issues and usual helper functions --- ...1-Internationalization-and-Localization.md | 94 +++++++++++++------ 1 file changed, 66 insertions(+), 28 deletions(-) diff --git a/_posts/05-06-01-Internationalization-and-Localization.md b/_posts/05-06-01-Internationalization-and-Localization.md index cf09ca1..95b4ea2 100644 --- a/_posts/05-06-01-Internationalization-and-Localization.md +++ b/_posts/05-06-01-Internationalization-and-Localization.md @@ -108,7 +108,7 @@ root for your l10n files in your source repository. Inside it you'll have a fold ### Plural forms As we said in the introduction, different languages might sport different plural rules. However, gettext saves us from -this trouble once again. When creating a new .po file, you'll have to declare the [plural rules][plural] for that +this trouble once again. When creating a new `.po` file, you'll have to declare the [plural rules][plural] for that language, and translated pieces that are plural-sensitive will have a different form for each of those rules. When calling Gettext in code, you'll have to specify the number related to the sentence, and it will work out the correct form to use - even using string substitution if needed. @@ -121,15 +121,15 @@ given number falls (starting the count with 0). For example: - Brazilian Portuguese: `nplurals=2; plural=(n > 1);` - two rules, second if N is bigger than one, first otherwise Now that you understood the basis of how plural rules works - and if you didn't, please look at a deeper explanation -on the [LingoHub tutorial](lingohub) -, you might want to copy the ones you need from a [list][plural] instead of +on the [LingoHub tutorial](lingohub_plurals) -, you might want to copy the ones you need from a [list][plural] instead of writing them by hand. -When calling out Gettext to do the localization of sentences that include counters, you'll have to pass to it the +When calling out Gettext to do localization on sentences with counters, you'll have to give him the related number as well. Gettext will work out what rule should be in effect and use the correct localized version. -You will need to include in the .po file a different sentence for each plural rule present in the language file. +You will need to include in the `.po` file a different sentence for each plural rule defined. ### Sample implementation -After all that theory, let's get a little practical. Here's an excerpt of a .po file - don't mind with its format, +After all that theory, let's get a little practical. Here's an excerpt of a `.po` file - don't mind with its format, but instead the overall content, you'll learn how to edit it easily later: {% highlight po %} @@ -152,9 +152,11 @@ msgstr[1] "%d mensagens não lidas" {% endhighlight %} The first section works like a header, having the `msgid` and `msgstr` specially empty. It describes the file encoding, -plural forms and other things that are less relevant. The second section translates a simple string from English to +plural forms and other things that are less relevant. +The second section translates a simple string from English to Brazilian Portuguese, and the third does the same, but leveraging string replacement from [`sprintf`](sprintf) so the -translation may contain the user name and visit date. The last section is a sample of pluralization forms, displaying +translation may contain the user name and visit date. +The last section is a sample of pluralization forms, displaying the singular and plural version as `msgid` in English and their corresponding translations as `msgstr` 0 and 1 (following the number given by the plural rule). There, string replacement is used as well so the number can be seen directly in the sentence, by using `%d`. The plural forms always have two `msgid` (singular and plural), so it's @@ -167,21 +169,27 @@ translated `msgstr` lines. Talking about translation keys, there are two main "schools" here: -1. `msgid` as a real sentence. The main advantage here is that, if there's pieces of the software untranslated in any -given language, it will be displaying in a meaningful-ish way. If you happen to translate by heart from English to -Spanish but needs help to translate to French, you might publish the new page with missing French sentences, and parts -of the website would be displayed in English instead. Another point is that it's much easier for the translator to -understand what's going on and make a proper translation based on the `msgid`. It also gives you "free" l10n for a -language - the source one. However, if you need to change the actual text, you would need to replace the same `msgid` -across several language files. -2. `msgid` as a unique, structured key. It would describe the sentence role in the application in a structured way, -including the template or part where the string is located instead of its content. It's a great way to have the code -organized, but would bring problems to the translator that would miss the context. A source translation file would be -needed as a basis for other translations - so the developer would ideally have an `en.po` file, that translators would -then read to understand what to write in `fr.po` for instance. This is also both good and bad, as missing translations -would display meaningless keys on screen (`TOP_MENU_WELCOME` instead of `Hello there, User!` on the given French -untranslated page), forcing translation to be complete before publishing - while translation errors would be really -awful in the interface. +1. _`msgid` as a real sentence_. + The main advantage are: + - if there's pieces of the software untranslated in any given language, the key displayed will still maintain some + meaning. Example: if you happen to translate by heart from English to Spanish but needs help to translate to French, + you might publish the new page with missing French sentences, and parts of the website would be displayed in English + instead; + - it's much easier for the translator to understand what's going on and make a proper translation based on the + `msgid`; + - it gives you "free" l10n for one language - the source one; + - The only disadvantage: if you need to change the actual text, you would need to replace the same `msgid` + across several language files. + +2. _`msgid` as a unique, structured key_. It would describe the sentence role in the application in a structured way, +including the template or part where the string is located instead of its content. + - it's a great way to have the code organized, separating the text content from the template logic. + - however, that could bring problems to the translator that would miss the context. A source language file would be + needed as a basis for other translations. Example: the developer would ideally have an `en.po` file, that + translators would read to understand what to write in `fr.po` for instance. + - missing translations would display meaningless keys on screen (`TOP_MENU_WELCOME` instead of `Hello there, User!` + on the said untranslated French page). That's good as would force translation to be complete before publishing - + but bad as translation issues would be really awful in the interface. The [Gettext manual][manual] favors the first approach, as in general it's easier for translators and users in case of trouble. That's how we will be working here as well. @@ -276,14 +284,42 @@ textdomain('main'); {% endhighlight %} #### 3. Preparing translation for the first run -> TODO: explain how to install Poedit and how to setup it +To make matters easier - and one of the powerful advantages Gettext has over custom framework i18n packages - is it's custom file type. "Oh man, that's quite hard to understand and edit by hand, a simple array would be easier!" Make no mistake, applications like [Poedit] are here to help - _a lot_. You can get the program from [their website], it's free and available for all platforms. It's a pretty easy tool to get used to, and a very powerful one at the same time - using with responsability the powers Gettext gave it. + +In the first run, you should select "File > New Catalog" from the menu. There you'll have a small screen where we will set the terrain so everything else runs smoothly. You'll be able to find those settings later through "Catalog > Properties": + +- Project name and version, Translation Team and email address: useful information that goes in the `.po` file header; +- Language: here you should use that format we mentioned before, such as `en_US` or `pt_BR`; +- Charsets: UTF-8, preferably; +- Source charset: set here the charset used by your PHP files - probably UTF-8 AS well, right? +- plural forms: here goes those rules we mentioned before - there's a link in there with samples as well; +- Source paths: here you must include all folders from the project where `gettext()` (and siblings) will happen - this is usually your templates folder(s) +- Source keywords: this last part is filled by default, but you might need to alter it later - and is one of the powerful points of Gettext. The underlying software knows how the `gettext()` calls look like in several programming languages, but you might as well create your own translation forms. This will be discussed later in the "Tips" section. + +After setting those points you'll be prompted to save the file - using that directory structure we mentioned as well, and then it will run a scan through your source files to find the localization calls. They'll be fed empty into the translation table, and you'll start typing in the localized versions of those strings. Save it and a `.mo` file will be (re)compiled into the same folder and ta-dah: your project is internationalized. #### 4. Translating strings -> TODO: overall view on how to use Poedit for translation +As you may have noticed before, there are two main types of localized strings: simple ones and the ones with plural forms. The first ones have simply two boxes: source and localized string. The source string can't be modified as Gettext/Poedit do not include the powers to alter your source files - you should change the source itself and rescan the files. Tip: you may right-click a translation line and it will hint you with the source files and lines where that string is being. +On the other hand, plural form strings include two boxes to show the two source strings, and tabs so you can configure the different final forms. + +Whenever you change your sources and need to update the translations, just hit Refresh and Poedit will rescan the code, removing non-existent entries, merging the ones that changed and adding new ones. It may also try to guess some translations, based on other ones you did. Those guesses and the changed entries will receive a "Fuzzy" marker, indicating it needs review, being highlighted in the list. It's also useful if you have a translation team and someone tries to write something they're not sure about: just mark Fuzzy and someone else will review later. + +Finally, it's advised to leave "View > Untranslated entries first" marked, as it will help you _a lot_ to not forget any entry. From that menu you can also open parts of the UI that allow you to leave contextual information for translators, if needed. ### Tips & Tricks -> TODO: Talk about possible issue with caching. -> TODO: Suggest creation of helper functions. + +#### Possible caching issues +If you're running PHP as a module on Apache (`mod_php`), you might face issues with the `.mo` file being cached. It happens the first time it's read, and then, to update it, you might need to restart the server. On Nginx and PHP5 it usually takes only a couple of page refreshes to refresh the translation cache, and on PHP7 it is rarely needed. + +#### Additional helper functions +As preferred by many people, it's easier to use `_()` instead of `gettext()`. Many custom i18n libraries from frameworks use something similar to `t()` as well, to make translated code shorter. However, that's the only function that sports a shortcut. You might want to add in your project some others, such as `__()` or `_n()` for `ngettext()`, or maybe a fancy `_r()` that would join `gettext()` and `sprintf()` calls. + +In those cases, you'll need to instruct the Gettext utility on how to extract the strings from those new functions. Don't be afraid, it's something very easy. It's just a field in the `.po` file, or a Settings screen on Poedit - remember when we mentioned it before? In the editor that option is inside "Catalog > Properties > Source keywords". You need to include there the specifications of those new functions, following [a specific format](func_format): + +- if you create something like `t()` that simply returns the translation for a string, you can specify it as `t`. Gettext will know the only function argument is the string to be translated; +- if the function has more than one argument, you can specify in which one the first string is - and if needed, the plural form as well. For instance, if we call our function like this: `__('one user', '%d users', $number)`, the specification would be `__:1,2`, meaning the first form is the first argument, and the second form is the second argument. If your number comes as first argument instead, the spec would be `__:2,3`, indicating the first form is the second argument, and so on. + +After including those new rules in the `.po` file, a new scan will bring your new strings just as easy as before. ### References @@ -295,13 +331,15 @@ textdomain('main'); [Poedit]: https://poedit.net/ [poedit_download]: https://poedit.net/download -[lingohub]: https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/#Plurals +[lingohub]: https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/ +[lingohub_plurals]: https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/#Plurals [plural]: http://docs.translatehouse.org/projects/localization-guide/en/latest/l10n/pluralforms.html [gettext]: https://en.wikipedia.org/wiki/Gettext -[manual]: (http://www.gnu.org/software/gettext/manual/gettext.html) +[manual]: http://www.gnu.org/software/gettext/manual/gettext.html [639-1]: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes [3166-1]: http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 [rare]: http://www.gnu.org/software/gettext/manual/gettext.html#Rare-Language-Codes +[func_format]: https://www.gnu.org/software/gettext/manual/gettext.html#Language-specific-options [sprintf]: http://php.net/manual/en/function.sprintf.php [func]: http://php.net/manual/en/function.gettext.php From 56b01c71d6b616f671b21f2e3175503079e7198a Mon Sep 17 00:00:00 2001 From: Igor Santos Date: Sat, 27 Feb 2016 22:52:59 -0300 Subject: [PATCH 5/7] i18n: fixing typos and right column margin --- ...1-Internationalization-and-Localization.md | 88 +++++++++++++------ 1 file changed, 62 insertions(+), 26 deletions(-) diff --git a/_posts/05-06-01-Internationalization-and-Localization.md b/_posts/05-06-01-Internationalization-and-Localization.md index 95b4ea2..0307942 100644 --- a/_posts/05-06-01-Internationalization-and-Localization.md +++ b/_posts/05-06-01-Internationalization-and-Localization.md @@ -121,8 +121,8 @@ given number falls (starting the count with 0). For example: - Brazilian Portuguese: `nplurals=2; plural=(n > 1);` - two rules, second if N is bigger than one, first otherwise Now that you understood the basis of how plural rules works - and if you didn't, please look at a deeper explanation -on the [LingoHub tutorial](lingohub_plurals) -, you might want to copy the ones you need from a [list][plural] instead of -writing them by hand. +on the [LingoHub tutorial](lingohub_plurals) -, you might want to copy the ones you need from a [list][plural] instead +of writing them by hand. When calling out Gettext to do localization on sentences with counters, you'll have to give him the related number as well. Gettext will work out what rule should be in effect and use the correct localized version. @@ -170,9 +170,9 @@ translated `msgstr` lines. Talking about translation keys, there are two main "schools" here: 1. _`msgid` as a real sentence_. - The main advantage are: - - if there's pieces of the software untranslated in any given language, the key displayed will still maintain some - meaning. Example: if you happen to translate by heart from English to Spanish but needs help to translate to French, + The main advantages are: + - if there are pieces of the software untranslated in any given language, the key displayed will still maintain some + meaning. Example: if you happen to translate by heart from English to Spanish but need help to translate to French, you might publish the new page with missing French sentences, and parts of the website would be displayed in English instead; - it's much easier for the translator to understand what's going on and make a proper translation based on the @@ -188,7 +188,7 @@ including the template or part where the string is located instead of its conten needed as a basis for other translations. Example: the developer would ideally have an `en.po` file, that translators would read to understand what to write in `fr.po` for instance. - missing translations would display meaningless keys on screen (`TOP_MENU_WELCOME` instead of `Hello there, User!` - on the said untranslated French page). That's good as would force translation to be complete before publishing - + on the said untranslated French page). That's good it as would force translation to be complete before publishing - but bad as translation issues would be really awful in the interface. The [Gettext manual][manual] favors the first approach, as in general it's easier for translators and users in @@ -197,7 +197,7 @@ case of trouble. That's how we will be working here as well. ### Everyday usage In a common application, you would use some Gettext functions while writing static text in your pages. Those sentences would then appear in `.po` files, get translated, compiled into `.mo` files and then, used by Gettext when rendering -the actual interface. Given that, let's tie together what we have discussed so far in a a step-by-step example: +the actual interface. Given that, let's tie together what we have discussed so far in a step-by-step example: #### 1. A sample template file, including some different gettext calls {% highlight php %} @@ -212,14 +212,14 @@ the actual interface. Given that, let's tie together what we have discussed so f $unread), $unread)?> - +

{% endhighlight %} -- [`gettext()`][func] simply translates a `msgid` into it's corresponding `msgstr` for a given language. There's also +- [`gettext()`][func] simply translates a `msgid` into its corresponding `msgstr` for a given language. There's also the shorthand function `_()` that works the same way; - [`ngettext()`][n_func] does the same but with plural rules; - there's also [`dgettext()`][d_func] and [`dngettext()`][dn_func], that allows you to override the domain for a single @@ -248,7 +248,7 @@ if (isset($_GET['lang']) && valid($_GET['lang'])) { // if the cookie is present instead, let's just keep it $lang = $_COOKIE['lang']; //you should sanitize this! } elseif (isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])) { - // default resort: look for the languages the browser says the user accepts + // default: look for the languages the browser says the user accepts $langs = explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE']); array_walk($langs, function (&$lang) { $lang = strtr(strtok($lang, ';'), ['-' => '_']); }); foreach ($langs as $browser_lang) { @@ -284,42 +284,78 @@ textdomain('main'); {% endhighlight %} #### 3. Preparing translation for the first run -To make matters easier - and one of the powerful advantages Gettext has over custom framework i18n packages - is it's custom file type. "Oh man, that's quite hard to understand and edit by hand, a simple array would be easier!" Make no mistake, applications like [Poedit] are here to help - _a lot_. You can get the program from [their website], it's free and available for all platforms. It's a pretty easy tool to get used to, and a very powerful one at the same time - using with responsability the powers Gettext gave it. +To make matters easier - and one of the powerful advantages Gettext has over custom framework i18n packages - is its +custom file type. "Oh man, that's quite hard to understand and edit by hand, a simple array would be easier!" Make no +mistake, applications like [Poedit] are here to help - _a lot_. You can get the program from +[their website](poedit_download), it's free and available for all platforms. It's a pretty easy tool to get used to, +and a very powerful one at the same time - using all powerful features Gettext has available. -In the first run, you should select "File > New Catalog" from the menu. There you'll have a small screen where we will set the terrain so everything else runs smoothly. You'll be able to find those settings later through "Catalog > Properties": +In the first run, you should select "File > New Catalog" from the menu. There you'll have a small screen where we will +set the terrain so everything else runs smoothly. You'll be able to find those settings later through +"Catalog > Properties": - Project name and version, Translation Team and email address: useful information that goes in the `.po` file header; - Language: here you should use that format we mentioned before, such as `en_US` or `pt_BR`; - Charsets: UTF-8, preferably; -- Source charset: set here the charset used by your PHP files - probably UTF-8 AS well, right? +- Source charset: set here the charset used by your PHP files - probably UTF-8 as well, right? - plural forms: here goes those rules we mentioned before - there's a link in there with samples as well; -- Source paths: here you must include all folders from the project where `gettext()` (and siblings) will happen - this is usually your templates folder(s) -- Source keywords: this last part is filled by default, but you might need to alter it later - and is one of the powerful points of Gettext. The underlying software knows how the `gettext()` calls look like in several programming languages, but you might as well create your own translation forms. This will be discussed later in the "Tips" section. +- Source paths: here you must include all folders from the project where `gettext()` (and siblings) will happen - this +is usually your templates folder(s) +- Source keywords: this last part is filled by default, but you might need to alter it later - and is one of the +powerful points of Gettext. The underlying software knows how the `gettext()` calls look like in several programming +languages, but you might as well create your own translation forms. This will be discussed later in the "Tips" section. -After setting those points you'll be prompted to save the file - using that directory structure we mentioned as well, and then it will run a scan through your source files to find the localization calls. They'll be fed empty into the translation table, and you'll start typing in the localized versions of those strings. Save it and a `.mo` file will be (re)compiled into the same folder and ta-dah: your project is internationalized. +After setting those points you'll be prompted to save the file - using that directory structure we mentioned as well, +and then it will run a scan through your source files to find the localization calls. They'll be fed empty into the +translation table, and you'll start typing in the localized versions of those strings. Save it and a `.mo` file will be +(re)compiled into the same folder and ta-dah: your project is internationalized. #### 4. Translating strings -As you may have noticed before, there are two main types of localized strings: simple ones and the ones with plural forms. The first ones have simply two boxes: source and localized string. The source string can't be modified as Gettext/Poedit do not include the powers to alter your source files - you should change the source itself and rescan the files. Tip: you may right-click a translation line and it will hint you with the source files and lines where that string is being. -On the other hand, plural form strings include two boxes to show the two source strings, and tabs so you can configure the different final forms. +As you may have noticed before, there are two main types of localized strings: simple ones and the ones with plural +forms. The first ones have simply two boxes: source and localized string. The source string can't be modified as +Gettext/Poedit do not include the powers to alter your source files - you should change the source itself and rescan +the files. Tip: you may right-click a translation line and it will hint you with the source files and lines where that +string is being used. +On the other hand, plural form strings include two boxes to show the two source strings, and tabs so you can configure +the different final forms. -Whenever you change your sources and need to update the translations, just hit Refresh and Poedit will rescan the code, removing non-existent entries, merging the ones that changed and adding new ones. It may also try to guess some translations, based on other ones you did. Those guesses and the changed entries will receive a "Fuzzy" marker, indicating it needs review, being highlighted in the list. It's also useful if you have a translation team and someone tries to write something they're not sure about: just mark Fuzzy and someone else will review later. +Whenever you change your sources and need to update the translations, just hit Refresh and Poedit will rescan the code, +removing non-existent entries, merging the ones that changed and adding new ones. It may also try to guess some +translations, based on other ones you did. Those guesses and the changed entries will receive a "Fuzzy" marker, +indicating it needs review, being highlighted in the list. It's also useful if you have a translation team and someone +tries to write something they're not sure about: just mark Fuzzy and someone else will review later. -Finally, it's advised to leave "View > Untranslated entries first" marked, as it will help you _a lot_ to not forget any entry. From that menu you can also open parts of the UI that allow you to leave contextual information for translators, if needed. +Finally, it's advised to leave "View > Untranslated entries first" marked, as it will help you _a lot_ to not forget +any entry. From that menu you can also open parts of the UI that allow you to leave contextual information for +translators, if needed. ### Tips & Tricks #### Possible caching issues -If you're running PHP as a module on Apache (`mod_php`), you might face issues with the `.mo` file being cached. It happens the first time it's read, and then, to update it, you might need to restart the server. On Nginx and PHP5 it usually takes only a couple of page refreshes to refresh the translation cache, and on PHP7 it is rarely needed. +If you're running PHP as a module on Apache (`mod_php`), you might face issues with the `.mo` file being cached. It +happens the first time it's read, and then, to update it, you might need to restart the server. On Nginx and PHP5 it +usually takes only a couple of page refreshes to refresh the translation cache, and on PHP7 it is rarely needed. #### Additional helper functions -As preferred by many people, it's easier to use `_()` instead of `gettext()`. Many custom i18n libraries from frameworks use something similar to `t()` as well, to make translated code shorter. However, that's the only function that sports a shortcut. You might want to add in your project some others, such as `__()` or `_n()` for `ngettext()`, or maybe a fancy `_r()` that would join `gettext()` and `sprintf()` calls. +As preferred by many people, it's easier to use `_()` instead of `gettext()`. Many custom i18n libraries from +frameworks use something similar to `t()` as well, to make translated code shorter. However, that's the only function +that sports a shortcut. You might want to add in your project some others, such as `__()` or `_n()` for `ngettext()`, +or maybe a fancy `_r()` that would join `gettext()` and `sprintf()` calls. -In those cases, you'll need to instruct the Gettext utility on how to extract the strings from those new functions. Don't be afraid, it's something very easy. It's just a field in the `.po` file, or a Settings screen on Poedit - remember when we mentioned it before? In the editor that option is inside "Catalog > Properties > Source keywords". You need to include there the specifications of those new functions, following [a specific format](func_format): +In those cases, you'll need to instruct the Gettext utility on how to extract the strings from those new functions. +Don't be afraid, it's very easy. It's just a field in the `.po` file, or a Settings screen on Poedit. In the editor +that option is inside "Catalog > Properties > Source keywords". You need to include there the specifications of those +new functions, following [a specific format](func_format): -- if you create something like `t()` that simply returns the translation for a string, you can specify it as `t`. Gettext will know the only function argument is the string to be translated; -- if the function has more than one argument, you can specify in which one the first string is - and if needed, the plural form as well. For instance, if we call our function like this: `__('one user', '%d users', $number)`, the specification would be `__:1,2`, meaning the first form is the first argument, and the second form is the second argument. If your number comes as first argument instead, the spec would be `__:2,3`, indicating the first form is the second argument, and so on. +- if you create something like `t()` that simply returns the translation for a string, you can specify it as `t`. +Gettext will know the only function argument is the string to be translated; +- if the function has more than one argument, you can specify in which one the first string is - and if needed, the +plural form as well. For instance, if we call our function like this: `__('one user', '%d users', $number)`, the +specification would be `__:1,2`, meaning the first form is the first argument, and the second form is the second +argument. If your number comes as first argument instead, the spec would be `__:2,3`, indicating the first form is the +second argument, and so on. -After including those new rules in the `.po` file, a new scan will bring your new strings just as easy as before. +After including those new rules in the `.po` file, a new scan will bring in your new strings just as easy as before. ### References From ade9c6e5714765065a8cbe97734ab9d18fa491bf Mon Sep 17 00:00:00 2001 From: Igor Santos Date: Sun, 13 Nov 2016 18:17:47 -0200 Subject: [PATCH 6/7] Adding information on other i18n libraries Cleaning up framework references, as they're listed below --- ...1-Internationalization-and-Localization.md | 70 ++++++++++++++----- 1 file changed, 51 insertions(+), 19 deletions(-) diff --git a/_posts/05-06-01-Internationalization-and-Localization.md b/_posts/05-06-01-Internationalization-and-Localization.md index 0307942..2cc26ff 100644 --- a/_posts/05-06-01-Internationalization-and-Localization.md +++ b/_posts/05-06-01-Internationalization-and-Localization.md @@ -29,17 +29,36 @@ The easiest way to internationalize PHP software is by using array files and usi some maintenance issues along the road - some might appear in the very beginning, such as pluralization. So, please, don't try this if your project will contain more than a couple of pages. -Some frameworks will sport their own i18n packages. Those usually are a more powerful version of the above approach, -but including features needed for real localization, such as plural forms and string replacement. You're free to use -those if you feel like, but you might find bothering to edit array source files, having to deal with pure code issues -(such as string scaping and so on). The main pro here is integration with the environment you're using - the framework -is called _full-stack_ for a reason, right? +The most classic way and often taken as reference for i18n and l10n is a [Unix tool called `gettext`][gettext]. It dates +back to 1995 and is still a complete implementation for translating software. It is pretty easy to get running, while +it still sports powerful supporting tools. It's about Gettext we will be talking here. Also, to help you not get messy +over the command-line, we will be presenting a great GUI application that can be used to easily update your l10n source +files. -However, the most classic way and often taken as reference for i18n and l10n is a [Unix tool called `gettext`][gettext]. -It dates back to 1995 and is still the most complete implementation for translating software. It is pretty easy to get -running, while it still sports powerful supporting tools. It's about Gettext we will be talking here. Also, to help you -not get messy over the command-line, we will be presenting a great GUI application that can be used to easily update -your l10n source files. +### Other tools + +There are common libraries used that support Gettext, and other implementations of i18n. Some of the may seem easier to +install, or sport additional features or i18n file formats. In this document we focus on the tools provided with the +PHP core, but here we list others for completion: + +- [oscarotero/Gettext][oscarotero]: Gettext support with an OO interface; includes improved helper functions, powerful +extractors for several file formats (some of them not supported natively by the `gettext` command), and can also export +to other formats besides `.mo/.po` files. Can be useful if you need to integrate your translation files into other parts +of the system, like a JavaScript interface. +- [symfony/translation][symfony]: supports a lot of different formats, but recommends using verbose XLIFF's. Doesn't +include helper functions nor a built-in extractor, but supports placeholders using `strtr()` internally. +- [zend/i18n][zend]: supports array and INI files, or Gettext formats. Implements a caching layer to save you from +reading the filesystem every time. Also includes view helpers, and locale-aware input filters and validators. However, +it has no message extractor. + +Other frameworks also include i18n modules, but those are not available outside of their codebases: +- [Laravel] supports basic array files, has no automatic extractor but includes a `@lang` helper for template files. +- [Yii] supports array, Gettext and database-based translation, and includes a messages extractor. It is backed by the +[`Intl`][intl] extension, available since PHP 5.3, and based on the [ICU project]; this enables Yii to run powerful +replacements, like spelling out numbers, formatting dates, times, intervals, currency and ordinals. + +If you decide to go for one of the libraries that provide no extractors, you may want to use the gettext formats, so +you can use the original gettext toolchain (including Poedit) as described in the rest of the chapter. ## Gettext @@ -66,7 +85,8 @@ You'll always have one pair of PO/MO files per language and region, but only one There are some cases, in big projects, where you might need to separate translations when the same words convey different meaning given a context. In those cases you split them into different _domains_. They're basically named groups of POT/PO/MO files, where the filename is the said _translation domain_. Small and medium-sized projects usually, -for simplicity, use only one domain; its name is arbitrary, but we will be using "main" for our code samples. +for simplicity, use only one domain; its name is arbitrary, but we will be using "main" for our code samples. +In [Symfony] projects, for example, domains are used to separate the translation for validation messages. #### Locale code A locale is simple code that identifies a version of a language. It's defined following [ISO 639-1][639-1] and @@ -181,18 +201,21 @@ Talking about translation keys, there are two main "schools" here: - The only disadvantage: if you need to change the actual text, you would need to replace the same `msgid` across several language files. -2. _`msgid` as a unique, structured key_. It would describe the sentence role in the application in a structured way, -including the template or part where the string is located instead of its content. +2. _`msgid` as a unique, structured key_. +It would describe the sentence role in the application in a structured way, including the template or part where the +string is located instead of its content. - it's a great way to have the code organized, separating the text content from the template logic. - however, that could bring problems to the translator that would miss the context. A source language file would be needed as a basis for other translations. Example: the developer would ideally have an `en.po` file, that translators would read to understand what to write in `fr.po` for instance. - - missing translations would display meaningless keys on screen (`TOP_MENU_WELCOME` instead of `Hello there, User!` + - missing translations would display meaningless keys on screen (`top_menu.welcome` instead of `Hello there, User!` on the said untranslated French page). That's good it as would force translation to be complete before publishing - - but bad as translation issues would be really awful in the interface. + but bad as translation issues would be really awful in the interface. Some libraries, though, include an option to + specify a given language as "fallback", having a similar behavior as the other approach. The [Gettext manual][manual] favors the first approach, as in general it's easier for translators and users in -case of trouble. That's how we will be working here as well. +case of trouble. That's how we will be working here as well. However, the [Symfony documentation][symfony-keys] favors +keyword-based translation, to allow for independent changes of all translations without affecting templates as well. ### Everyday usage In a common application, you would use some Gettext functions while writing static text in your pages. Those sentences @@ -204,7 +227,7 @@ the actual interface. Given that, let's tie together what we have discussed so f