mirror of
https://github.com/codeguy/php-the-right-way.git
synced 2025-08-17 19:16:20 +02:00
i18n: typos, keys, plurals and samples
This commit is contained in:
@@ -15,11 +15,11 @@ First of all, we need to define those two similar concepts and other related thi
|
|||||||
without refactors. This is usually done once - preferably, in the beginning of the project, or else you'll probably
|
without refactors. This is usually done once - preferably, in the beginning of the project, or else you'll probably
|
||||||
need some huge changes in the source!
|
need some huge changes in the source!
|
||||||
- **Localization** happens when you adapt the interface (mainly) by translating contents, based on the i18n work done
|
- **Localization** happens when you adapt the interface (mainly) by translating contents, based on the i18n work done
|
||||||
before. It usually us done every time a new language or region needs support, and is updated when new interface pieces
|
before. It usually is done every time a new language or region needs support, and is updated when new interface pieces
|
||||||
are added, as they need to be available in all supported languages.
|
are added, as they need to be available in all supported languages.
|
||||||
- **Pluralization** defines the rules needed between different languages to interoperate strings containing numbers and
|
- **Pluralization** defines the rules needed between different languages to interoperate strings containing numbers and
|
||||||
counters. For instance, in English when you have only one item, it's singular, and anything different from that is
|
counters. For instance, in English when you have only one item, it's singular, and anything different from that is
|
||||||
called plural; plural is this language is indicated by adding an S after some words, and sometimes changes parts of it.
|
called plural; plural in this language is indicated by adding an S after some words, and sometimes changes parts of it.
|
||||||
In other languages such as Russian or Serbian there are two plural forms plus the singular one - you may even find
|
In other languages such as Russian or Serbian there are two plural forms plus the singular one - you may even find
|
||||||
languages with a total of four, five or six forms, such as Slovenian, Irish or Arabic.
|
languages with a total of four, five or six forms, such as Slovenian, Irish or Arabic.
|
||||||
|
|
||||||
@@ -41,9 +41,6 @@ running, while it still sports powerful supporting tools. It's about Gettext we
|
|||||||
not get messy over the command-line, we will be presenting a great GUI application that can be used to easily update
|
not get messy over the command-line, we will be presenting a great GUI application that can be used to easily update
|
||||||
your l10n source files.
|
your l10n source files.
|
||||||
|
|
||||||
### Discussion on l10n keys
|
|
||||||
> TODO: talk about static keys versus text keys, as in https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/#What_form_of_msgids_should_be_used
|
|
||||||
|
|
||||||
## Gettext
|
## Gettext
|
||||||
|
|
||||||
### Installation
|
### Installation
|
||||||
@@ -51,6 +48,10 @@ You might need to install Gettext and the related PHP library by using your pack
|
|||||||
After installed, enable it by adding `extension=gettext.so` (Linux/Unix) or `extension=php_gettext.dll` (Windows) to
|
After installed, enable it by adding `extension=gettext.so` (Linux/Unix) or `extension=php_gettext.dll` (Windows) to
|
||||||
your `php.ini`.
|
your `php.ini`.
|
||||||
|
|
||||||
|
Here we will also be using [Poedit] to create translation files. You will probably find it in your system's package
|
||||||
|
manager; it's available for Unix, Mac and Windows, and can be [downloaded for free in their website][poedit_download]
|
||||||
|
as well.
|
||||||
|
|
||||||
### Structure
|
### Structure
|
||||||
|
|
||||||
#### Types of files
|
#### Types of files
|
||||||
@@ -65,7 +66,7 @@ You'll always have one pair of PO/MO files per language and region, but only one
|
|||||||
There are some cases, in big projects, where you might need to separate translations when the same words convey
|
There are some cases, in big projects, where you might need to separate translations when the same words convey
|
||||||
different meaning given a context. In those cases you split them into different _domains_. They're basically named
|
different meaning given a context. In those cases you split them into different _domains_. They're basically named
|
||||||
groups of POT/PO/MO files, where the filename is the said _translation domain_. Small and medium-sized projects usually,
|
groups of POT/PO/MO files, where the filename is the said _translation domain_. Small and medium-sized projects usually,
|
||||||
for simplicity, use only one domain; it's name is arbitrary, but we will be using "main" for our code samples.
|
for simplicity, use only one domain; its name is arbitrary, but we will be using "main" for our code samples.
|
||||||
|
|
||||||
#### Locale code
|
#### Locale code
|
||||||
A locale is simple code that identifies a version of a language. It's defined following [ISO 639-1][639-1] and
|
A locale is simple code that identifies a version of a language. It's defined following [ISO 639-1][639-1] and
|
||||||
@@ -106,16 +107,181 @@ root for your l10n files in your source repository. Inside it you'll have a fold
|
|||||||
{% endhighlight %}
|
{% endhighlight %}
|
||||||
|
|
||||||
### Plural forms
|
### Plural forms
|
||||||
> TODO
|
As we said in the introduction, different languages might sport different plural rules. However, gettext saves us from
|
||||||
|
this trouble once again. When creating a new .po file, you'll have to declare the [plural rules][plural] for that
|
||||||
|
language, and translated pieces that are plural-sensitive will have a different form for each of those rules. When
|
||||||
|
calling Gettext in code, you'll have to specify the number related to the sentence, and it will work out the correct
|
||||||
|
form to use - even using string substitution if needed.
|
||||||
|
|
||||||
|
Plural rules include the number of plurals available and a boolean test with `n` that would define in which rule the
|
||||||
|
given number falls (starting the count with 0). For example:
|
||||||
|
|
||||||
|
- Japanese: `nplurals=1; plural=0` - only one rule
|
||||||
|
- English: `nplurals=2; plural=(n != 1);` - two rules, first if N is one, second rule otherwise
|
||||||
|
- Brazilian Portuguese: `nplurals=2; plural=(n > 1);` - two rules, second if N is bigger than one, first otherwise
|
||||||
|
|
||||||
|
Now that you understood the basis of how plural rules works - and if you didn't, please look at a deeper explanation
|
||||||
|
on the [LingoHub tutorial](lingohub) -, you might want to copy the ones you need from a [list][plural] instead of
|
||||||
|
writing them by hand.
|
||||||
|
|
||||||
|
When calling out Gettext to do the localization of sentences that include counters, you'll have to pass to it the
|
||||||
|
related number as well. Gettext will work out what rule should be in effect and use the correct localized version.
|
||||||
|
You will need to include in the .po file a different sentence for each plural rule present in the language file.
|
||||||
|
|
||||||
### Sample implementation
|
### Sample implementation
|
||||||
> TODO: Add sample code implementing i18n using gettext.
|
After all that theory, let's get a little practical. Here's an excerpt of a .po file - don't mind with its format,
|
||||||
|
but instead the overall content, you'll learn how to edit it easily later:
|
||||||
|
|
||||||
|
{% highlight po %}
|
||||||
|
msgid ""
|
||||||
|
msgstr ""
|
||||||
|
"Language: pt_BR\n"
|
||||||
|
"Content-Type: text/plain; charset=UTF-8\n"
|
||||||
|
"Plural-Forms: nplurals=2; plural=(n > 1);\n"
|
||||||
|
|
||||||
|
msgid "We're now translating some strings"
|
||||||
|
msgstr "Nós estamos traduzindo algumas strings agora"
|
||||||
|
|
||||||
|
msgid "Hello %1$s! Your last visit was on %2$s"
|
||||||
|
msgstr "Olá %1$s! Sua última visita foi em %2$s"
|
||||||
|
|
||||||
|
msgid "Only one unread message"
|
||||||
|
msgid_plural "%d unread messages"
|
||||||
|
msgstr[0] "Só uma mensagem não lida"
|
||||||
|
msgstr[1] "%d mensagens não lidas"
|
||||||
|
{% endhighlight %}
|
||||||
|
|
||||||
|
The first section works like a header, having the `msgid` and `msgstr` specially empty. It describes the file encoding,
|
||||||
|
plural forms and other things that are less relevant. The second section translates a simple string from English to
|
||||||
|
Brazilian Portuguese, and the third does the same, but leveraging string replacement from [`sprintf`](sprintf) so the
|
||||||
|
translation may contain the user name and visit date. The last section is a sample of pluralization forms, displaying
|
||||||
|
the singular and plural version as `msgid` in English and their corresponding translations as `msgstr` 0 and 1
|
||||||
|
(following the number given by the plural rule). There, string replacement is used as well so the number can be seen
|
||||||
|
directly in the sentence, by using `%d`. The plural forms always have two `msgid` (singular and plural), so it's
|
||||||
|
advised to not use a complex language as source of translation.
|
||||||
|
|
||||||
|
### Discussion on l10n keys
|
||||||
|
As you might have noticed, we're using as source ID the actual sentence in English. That `msgid` is the same used
|
||||||
|
throughout all your `.po` files, meaning other languages will have the same format and the same `msgid` fields, but
|
||||||
|
translated `msgstr` lines.
|
||||||
|
|
||||||
|
Talking about translation keys, there are two main "schools" here:
|
||||||
|
|
||||||
|
1. `msgid` as a real sentence. The main advantage here is that, if there's pieces of the software untranslated in any
|
||||||
|
given language, it will be displaying in a meaningful-ish way. If you happen to translate by heart from English to
|
||||||
|
Spanish but needs help to translate to French, you might publish the new page with missing French sentences, and parts
|
||||||
|
of the website would be displayed in English instead. Another point is that it's much easier for the translator to
|
||||||
|
understand what's going on and make a proper translation based on the `msgid`. It also gives you "free" l10n for a
|
||||||
|
language - the source one. However, if you need to change the actual text, you would need to replace the same `msgid`
|
||||||
|
across several language files.
|
||||||
|
2. `msgid` as a unique, structured key. It would describe the sentence role in the application in a structured way,
|
||||||
|
including the template or part where the string is located instead of its content. It's a great way to have the code
|
||||||
|
organized, but would bring problems to the translator that would miss the context. A source translation file would be
|
||||||
|
needed as a basis for other translations - so the developer would ideally have an `en.po` file, that translators would
|
||||||
|
then read to understand what to write in `fr.po` for instance. This is also both good and bad, as missing translations
|
||||||
|
would display meaningless keys on screen (`TOP_MENU_WELCOME` instead of `Hello there, User!` on the given French
|
||||||
|
untranslated page), forcing translation to be complete before publishing - while translation errors would be really
|
||||||
|
awful in the interface.
|
||||||
|
|
||||||
|
The [Gettext manual][manual] favors the first approach, as in general it's easier for translators and users in
|
||||||
|
case of trouble. That's how we will be working here as well.
|
||||||
|
|
||||||
### Everyday usage
|
### Everyday usage
|
||||||
> TODO: Explain what's the l10n routine for a project with existing i18n in place, using Poedit (and maybe command line as seen
|
In a common application, you would use some Gettext functions while writing static text in your pages. Those sentences
|
||||||
in the LingoHub file).
|
would then appear in `.po` files, get translated, compiled into `.mo` files and then, used by Gettext when rendering
|
||||||
|
the actual interface. Given that, let's tie together what we have discussed so far in a a step-by-step example:
|
||||||
|
|
||||||
#### Tips & Tricks
|
#### 1. A sample template file, including some different gettext calls
|
||||||
|
{% highlight php %}
|
||||||
|
<?php include 'i18n_setup.php' ?>
|
||||||
|
<div id="header">
|
||||||
|
<h1><?=sprintf(gettext('Welcome, %s!'), $name)?></h1>
|
||||||
|
<!-- code indented this way only for legibility here -->
|
||||||
|
<?php if ($unread): ?>
|
||||||
|
<h2><?=sprintf(
|
||||||
|
ngettext('Only one unread message',
|
||||||
|
'%d unread messages',
|
||||||
|
$unread),
|
||||||
|
$unread)?>
|
||||||
|
</h2>
|
||||||
|
<? endif ?>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<h1><?=gettext('Introduction')?></h1>
|
||||||
|
<p><?=gettext('We\'re now translating some strings')?></p>
|
||||||
|
{% endhighlight %}
|
||||||
|
|
||||||
|
- [`gettext()`][func] simply translates a `msgid` into it's corresponding `msgstr` for a given language. There's also
|
||||||
|
the shorthand function `_()` that works the same way;
|
||||||
|
- [`ngettext()`][n_func] does the same but with plural rules;
|
||||||
|
- there's also [`dgettext()`][d_func] and [`dngettext()`][dn_func], that allows you to override the domain for a single
|
||||||
|
call. More on domain configuration in the next example.
|
||||||
|
|
||||||
|
#### 2. A sample setup file (`i18n_setup.php` as used above), selecting the correct locale and configuring Gettext
|
||||||
|
{% highlight php %}
|
||||||
|
<?php
|
||||||
|
/**
|
||||||
|
* Verifies if the given $locale is supported in the project
|
||||||
|
* @param string $locale
|
||||||
|
* @return bool
|
||||||
|
*/
|
||||||
|
function valid($locale) {
|
||||||
|
return in_array($locale, ['en_US', 'en', 'pt_BR', 'pt', 'es_ES', 'es');
|
||||||
|
}
|
||||||
|
|
||||||
|
//setting the source/default locale, for informational purposes
|
||||||
|
$lang = 'en_US';
|
||||||
|
|
||||||
|
if (isset($_GET['lang']) && valid($_GET['lang'])) {
|
||||||
|
// the locale can be changed through the query-string
|
||||||
|
$lang = $_GET['lang']; //you should sanitize this!
|
||||||
|
setcookie('lang', $lang); //it's stored in a cookie so it can be reused
|
||||||
|
} elseif (isset($_COOKIE['lang']) && valid($_COOKIE['lang'])) {
|
||||||
|
// if the cookie is present instead, let's just keep it
|
||||||
|
$lang = $_COOKIE['lang']; //you should sanitize this!
|
||||||
|
} elseif (isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])) {
|
||||||
|
// default resort: look for the languages the browser says the user accepts
|
||||||
|
$langs = explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE']);
|
||||||
|
array_walk($langs, function (&$lang) { $lang = strtr(strtok($lang, ';'), ['-' => '_']); });
|
||||||
|
foreach ($langs as $browser_lang) {
|
||||||
|
if (valid($browser_lang)) {
|
||||||
|
$lang = $browser_lang;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// here we define the global system locale given the found language
|
||||||
|
putenv("LANG=$lang");
|
||||||
|
|
||||||
|
// this might be useful for date functions (LC_TIME) or money formatting (LC_MONETARY), for instance
|
||||||
|
setlocale(LC_ALL, $lang);
|
||||||
|
|
||||||
|
// this will make Gettext look for ../locales/<lang>/LC_MESSAGES/main.mo
|
||||||
|
bindtextdomain('main', '../locales');
|
||||||
|
|
||||||
|
// indicates in what encoding the file should be read
|
||||||
|
bind_textdomain_codeset('main', 'UTF-8');
|
||||||
|
|
||||||
|
// if your application has additional domains, as cited before, you should bind them here as well
|
||||||
|
bindtextdomain('forum', '../locales');
|
||||||
|
bind_textdomain_codeset('forum', 'UTF-8');
|
||||||
|
|
||||||
|
// here we indicate the default domain the gettext() calls will respond to
|
||||||
|
textdomain('main');
|
||||||
|
|
||||||
|
// this would look for the string in forum.mo instead of main.mo
|
||||||
|
// echo dgettext('forum', 'Welcome back!');
|
||||||
|
?>
|
||||||
|
{% endhighlight %}
|
||||||
|
|
||||||
|
#### 3. Preparing translation for the first run
|
||||||
|
> TODO: explain how to install Poedit and how to setup it
|
||||||
|
|
||||||
|
#### 4. Translating strings
|
||||||
|
> TODO: overall view on how to use Poedit for translation
|
||||||
|
|
||||||
|
### Tips & Tricks
|
||||||
> TODO: Talk about possible issue with caching.
|
> TODO: Talk about possible issue with caching.
|
||||||
> TODO: Suggest creation of helper functions.
|
> TODO: Suggest creation of helper functions.
|
||||||
|
|
||||||
@@ -123,11 +289,22 @@ in the LingoHub file).
|
|||||||
|
|
||||||
* [Wikipedia: i18n and l10n](https://en.wikipedia.org/wiki/Internationalization_and_localization)
|
* [Wikipedia: i18n and l10n](https://en.wikipedia.org/wiki/Internationalization_and_localization)
|
||||||
* [Wikipedia: Gettext](https://en.wikipedia.org/wiki/Gettext)
|
* [Wikipedia: Gettext](https://en.wikipedia.org/wiki/Gettext)
|
||||||
* [LingoHub: PHP internationalization with gettext tutorial](https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/)
|
* [LingoHub: PHP internationalization with gettext tutorial](lingohub)
|
||||||
* [PHP Manual: Gettext](http://br2.php.net/manual/en/book.gettext.php)
|
* [PHP Manual: Gettext](http://php.net/manual/en/book.gettext.php)
|
||||||
* [Gettext Manual](http://www.gnu.org/software/gettext/manual/gettext.html)
|
* [Gettext Manual][manual]
|
||||||
|
|
||||||
|
[Poedit]: https://poedit.net/
|
||||||
|
[poedit_download]: https://poedit.net/download
|
||||||
|
[lingohub]: https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/#Plurals
|
||||||
|
[plural]: http://docs.translatehouse.org/projects/localization-guide/en/latest/l10n/pluralforms.html
|
||||||
[gettext]: https://en.wikipedia.org/wiki/Gettext
|
[gettext]: https://en.wikipedia.org/wiki/Gettext
|
||||||
|
[manual]: (http://www.gnu.org/software/gettext/manual/gettext.html)
|
||||||
[639-1]: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
|
[639-1]: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
|
||||||
[3166-1]: http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
|
[3166-1]: http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
|
||||||
[rare]: http://www.gnu.org/software/gettext/manual/gettext.html#Rare-Language-Codes
|
[rare]: http://www.gnu.org/software/gettext/manual/gettext.html#Rare-Language-Codes
|
||||||
|
|
||||||
|
[sprintf]: http://php.net/manual/en/function.sprintf.php
|
||||||
|
[func]: http://php.net/manual/en/function.gettext.php
|
||||||
|
[n_func]: http://php.net/manual/en/function.ngettext.php
|
||||||
|
[d_func]: http://php.net/manual/en/function.dgettext.php
|
||||||
|
[dn_func]: http://php.net/manual/en/function.dngettext.php
|
||||||
|
Reference in New Issue
Block a user