mirror of
https://github.com/codeguy/php-the-right-way.git
synced 2025-08-09 15:36:36 +02:00
Merge pull request #712 from igorsantos07/i18n
Proof-writing the i18n chapter
This commit is contained in:
@@ -12,28 +12,27 @@ words - in our case, internationalization becomes i18n and localization, l10n._
|
||||
First of all, we need to define those two similar concepts and other related things:
|
||||
|
||||
- **Internationalization** is when you organize your code so it can be adapted to different languages or regions
|
||||
without refactorings. This is usually done once - preferably, in the beginning of the project, or else you'll probably
|
||||
need some huge changes in the source!
|
||||
without refactorings. This action is usually done once - preferably, at the beginning of the project, or else you will
|
||||
probably need some huge changes in the source!
|
||||
- **Localization** happens when you adapt the interface (mainly) by translating contents, based on the i18n work done
|
||||
before. It usually is done every time a new language or region needs support and is updated when new interface pieces
|
||||
are added, as they need to be available in all supported languages.
|
||||
- **Pluralization** defines the rules needed between different languages to interoperate strings containing numbers and
|
||||
counters. For instance, in English when you have only one item, it's singular, and anything different from that is
|
||||
- **Pluralization** defines the rules required between distinct languages to interoperate strings containing numbers and
|
||||
counters. For instance, in English when you have only one item, it is singular, and anything different from that is
|
||||
called plural; plural in this language is indicated by adding an S after some words, and sometimes changes parts of it.
|
||||
In other languages, such as Russian or Serbian, there are two plural forms in addition to the singular - you may even
|
||||
find languages with a total of four, five or six forms, such as Slovenian, Irish or Arabic.
|
||||
|
||||
## Common ways to implement
|
||||
The easiest way to internationalize PHP software is by using array files and using those strings in templates, such as
|
||||
`<h1><?=$TRANS['title_about_page']?></h1>`. This is, however, hardly a recommended way for serious projects, as it poses
|
||||
`<h1><?=$TRANS['title_about_page']?></h1>`. This way is, however, hardly recommended for serious projects, as it poses
|
||||
some maintenance issues along the road - some might appear in the very beginning, such as pluralization. So, please,
|
||||
don't try this if your project will contain more than a couple of pages.
|
||||
|
||||
The most classic way and often taken as reference for i18n and l10n is a [Unix tool called `gettext`][gettext]. It dates
|
||||
back to 1995 and is still a complete implementation for translating software. It is pretty easy to get running, while
|
||||
it still sports powerful supporting tools. We will talk about Gettext in more detail below. If you would prefer to not
|
||||
have to get your hands dirty with the command line, we will also be presenting a great GUI application that can be used
|
||||
to easily update your l10n source files.
|
||||
back to 1995 and is still a complete implementation for translating software. It is easy enough to get running, while
|
||||
still sporting powerful supporting tools. It is about Gettext we will be talking here. Also, to help you not get messy
|
||||
over the command-line, we will be presenting a great GUI application that can be used to easily update your l10n source
|
||||
|
||||
### Other tools
|
||||
|
||||
@@ -46,8 +45,8 @@ translation. It uses array formats for message. Does not provide a message extra
|
||||
message formatting via the `intl` extension (including pluralized messages).
|
||||
- [oscarotero/Gettext][oscarotero]: Gettext support with an OO interface; includes improved helper functions, powerful
|
||||
extractors for several file formats (some of them not supported natively by the `gettext` command), and can also export
|
||||
to other formats besides `.mo/.po` files. Can be useful if you need to integrate your translation files into other parts
|
||||
of the system, like a JavaScript interface.
|
||||
to other formats besides `.mo/.po` files. Can be useful if you need to integrate your translation files into other
|
||||
parts of the system, like a JavaScript interface.
|
||||
- [symfony/translation][symfony]: supports a lot of different formats, but recommends using verbose XLIFF's. Doesn't
|
||||
include helper functions nor a built-in extractor, but supports placeholders using `strtr()` internally.
|
||||
- [zend/i18n][zend]: supports array and INI files, or Gettext formats. Implements a caching layer to save you from
|
||||
@@ -55,6 +54,7 @@ reading the filesystem every time. It also includes view helpers, and locale-awa
|
||||
However, it has no message extractor.
|
||||
|
||||
Other frameworks also include i18n modules, but those are not available outside of their codebases:
|
||||
|
||||
- [Laravel] supports basic array files, has no automatic extractor but includes a `@lang` helper for template files.
|
||||
- [Yii] supports array, Gettext, and database-based translation, and includes a messages extractor. It is backed by the
|
||||
[`Intl`][intl] extension, available since PHP 5.3, and based on the [ICU project]; this enables Yii to run powerful
|
||||
@@ -71,7 +71,7 @@ After installed, enable it by adding `extension=gettext.so` (Linux/Unix) or `ext
|
||||
your `php.ini`.
|
||||
|
||||
Here we will also be using [Poedit] to create translation files. You will probably find it in your system's package
|
||||
manager; it's available for Unix, Mac, and Windows, and can be [downloaded for free on their website][poedit_download]
|
||||
manager; it is available for Unix, Mac, and Windows, and can be [downloaded for free on their website][poedit_download]
|
||||
as well.
|
||||
|
||||
### Structure
|
||||
@@ -79,31 +79,31 @@ as well.
|
||||
#### Types of files
|
||||
There are three files you usually deal with while working with gettext. The main ones are PO (Portable Object) and
|
||||
MO (Machine Object) files, the first being a list of readable "translated objects" and the second, the corresponding
|
||||
binary to be interpreted by gettext when doing localization. There's also a POT (Template) file, that simply contains
|
||||
binary to be interpreted by gettext when doing localization. There's also a POT (Template) file, which simply contains
|
||||
all existing keys from your source files, and can be used as a guide to generate and update all PO files. Those template
|
||||
files are not mandatory: depending on the tool you're using to do l10n, you can go just fine with only PO/MO files.
|
||||
You'll always have one pair of PO/MO files per language and region, but only one POT per domain.
|
||||
files are not mandatory: depending on the tool you are using to do l10n, you can go just fine with only PO/MO files.
|
||||
You will always have one pair of PO/MO files per language and region, but only one POT per domain.
|
||||
|
||||
### Domains
|
||||
There are some cases, in big projects, where you might need to separate translations when the same words convey
|
||||
different meaning given a context. In those cases, you split them into different _domains_. They're basically named
|
||||
There are some cases, in big projects, where you might need to separate translations when the same words convey
|
||||
different meaning given a context. In those cases, you split them into different _domains_. They are, basically, named
|
||||
groups of POT/PO/MO files, where the filename is the said _translation domain_. Small and medium-sized projects usually,
|
||||
for simplicity, use only one domain; its name is arbitrary, but we will be using "main" for our code samples.
|
||||
In [Symfony] projects, for example, domains are used to separate the translation for validation messages.
|
||||
|
||||
#### Locale code
|
||||
A locale is simply a code that identifies one version of a language. It's defined following the [ISO 639-1][639-1] and
|
||||
A locale is simply a code that identifies one version of a language. It is defined following the [ISO 639-1][639-1] and
|
||||
[ISO 3166-1 alpha-2][3166-1] specs: two lower-case letters for the language, optionally followed by an underline and two
|
||||
upper-case letters identifying the country or regional code. For [rare languages][rare], three letters are used.
|
||||
|
||||
For some speakers, the country part may seem redundant. In fact, some languages have dialects in different
|
||||
countries, such as Austrian German (`de_AT`) or Brazilian Portuguese (`pt_BR`). The second part is used to distinguish
|
||||
between those dialects - when it's not present, it's taken as a "generic" or "hybrid" version of the language.
|
||||
between those dialects - when it is not present, it is taken as a "generic" or "hybrid" version of the language.
|
||||
|
||||
### Directory structure
|
||||
To use Gettext, we will need to adhere to a specific structure of folders. First, you'll need to select an arbitrary
|
||||
root for your l10n files in your source repository. Inside it, you'll have a folder for each needed locale, and a fixed
|
||||
`LC_MESSAGES` folder that will contain all your PO/MO pairs. Example:
|
||||
To use Gettext, we will need to adhere to a specific structure of folders. First, you will need to select an arbitrary
|
||||
root for your l10n files in your source repository. Inside it, you will have a folder for each needed locale, and a
|
||||
fixed `LC_MESSAGES` folder that will contain all your PO/MO pairs. Example:
|
||||
|
||||
{% highlight console %}
|
||||
<project root>
|
||||
@@ -131,9 +131,9 @@ root for your l10n files in your source repository. Inside it, you'll have a fol
|
||||
|
||||
### Plural forms
|
||||
As we said in the introduction, different languages might sport different plural rules. However, gettext saves us from
|
||||
this trouble once again. When creating a new `.po` file, you'll have to declare the [plural rules][plural] for that
|
||||
this trouble once again. When creating a new `.po` file, you will have to declare the [plural rules][plural] for that
|
||||
language, and translated pieces that are plural-sensitive will have a different form for each of those rules. When
|
||||
calling Gettext in code, you'll have to specify the number related to the sentence, and it will work out the correct
|
||||
calling Gettext in code, you will have to specify the number related to the sentence, and it will work out the correct
|
||||
form to use - even using string substitution if needed.
|
||||
|
||||
Plural rules include the number of plurals available and a boolean test with `n` that would define in which rule the
|
||||
@@ -147,13 +147,13 @@ Now that you understood the basis of how plural rules works - and if you didn't,
|
||||
on the [LingoHub tutorial][lingohub_plurals] -, you might want to copy the ones you need from a [list][plural] instead
|
||||
of writing them by hand.
|
||||
|
||||
When calling out Gettext to do localization on sentences with counters, you'll have to give him the
|
||||
When calling out Gettext to do localization on sentences with counters, you will have to give him the
|
||||
related number as well. Gettext will work out what rule should be in effect and use the correct localized version.
|
||||
You will need to include in the `.po` file a different sentence for each plural rule defined.
|
||||
|
||||
### Sample implementation
|
||||
After all that theory, let's get a little practical. Here's an excerpt of a `.po` file - don't mind with its format,
|
||||
but instead the overall content, you'll learn how to edit it easily later:
|
||||
but with the overall content instead; you will learn how to edit it easily later:
|
||||
|
||||
{% highlight po %}
|
||||
msgid ""
|
||||
@@ -162,7 +162,7 @@ msgstr ""
|
||||
"Content-Type: text/plain; charset=UTF-8\n"
|
||||
"Plural-Forms: nplurals=2; plural=(n > 1);\n"
|
||||
|
||||
msgid "We're now translating some strings"
|
||||
msgid "We are now translating some strings"
|
||||
msgstr "Nós estamos traduzindo algumas strings agora"
|
||||
|
||||
msgid "Hello %1$s! Your last visit was on %2$s"
|
||||
@@ -182,11 +182,11 @@ translation may contain the user name and visit date.
|
||||
The last section is a sample of pluralization forms, displaying
|
||||
the singular and plural version as `msgid` in English and their corresponding translations as `msgstr` 0 and 1
|
||||
(following the number given by the plural rule). There, string replacement is used as well so the number can be seen
|
||||
directly in the sentence, by using `%d`. The plural forms always have two `msgid` (singular and plural), so it's
|
||||
advised to not use a complex language as the source of translation.
|
||||
directly in the sentence, by using `%d`. The plural forms always have two `msgid` (singular and plural), so it is
|
||||
advised not to use a complex language as the source of translation.
|
||||
|
||||
### Discussion on l10n keys
|
||||
As you might have noticed, we're using as source ID the actual sentence in English. That `msgid` is the same used
|
||||
As you might have noticed, we are using as source ID the actual sentence in English. That `msgid` is the same used
|
||||
throughout all your `.po` files, meaning other languages will have the same format and the same `msgid` fields but
|
||||
translated `msgstr` lines.
|
||||
|
||||
@@ -198,7 +198,7 @@ Talking about translation keys, there are two main "schools" here:
|
||||
meaning. Example: if you happen to translate by heart from English to Spanish but need help to translate to French,
|
||||
you might publish the new page with missing French sentences, and parts of the website would be displayed in English
|
||||
instead;
|
||||
- it's much easier for the translator to understand what's going on and make a proper translation based on the
|
||||
- it is much easier for the translator to understand what's going on and do a proper translation based on the
|
||||
`msgid`;
|
||||
- it gives you "free" l10n for one language - the source one;
|
||||
- The only disadvantage: if you need to change the actual text, you would need to replace the same `msgid`
|
||||
@@ -207,21 +207,21 @@ Talking about translation keys, there are two main "schools" here:
|
||||
2. _`msgid` as a unique, structured key_.
|
||||
It would describe the sentence role in the application in a structured way, including the template or part where the
|
||||
string is located instead of its content.
|
||||
- it's a great way to have the code organized, separating the text content from the template logic.
|
||||
- it is a great way to have the code organized, separating the text content from the template logic.
|
||||
- however, that could bring problems to the translator that would miss the context. A source language file would be
|
||||
needed as a basis for other translations. Example: the developer would ideally have an `en.po` file, that
|
||||
translators would read to understand what to write in `fr.po` for instance.
|
||||
- missing translations would display meaningless keys on screen (`top_menu.welcome` instead of `Hello there, User!`
|
||||
on the said untranslated French page). That's good it as would force translation to be complete before publishing -
|
||||
but bad as translation issues would be really awful in the interface. Some libraries, though, include an option to
|
||||
specify a given language as "fallback", having a similar behavior as the other approach.
|
||||
on the said untranslated French page). That is good it as would force translation to be complete before publishing -
|
||||
however, bad as translation issues would be remarkably awful in the interface. Some libraries, though, include an
|
||||
option to specify a given language as "fallback", having a similar behavior as the other approach.
|
||||
|
||||
The [Gettext manual][manual] favors the first approach as, in general, it's easier for translators and users in
|
||||
case of trouble. That's how we will be working here as well. However, the [Symfony documentation][symfony-keys] favors
|
||||
The [Gettext manual][manual] favors the first approach as, in general, it is easier for translators and users in
|
||||
case of trouble. That is how we will be working here as well. However, the [Symfony documentation][symfony-keys] favors
|
||||
keyword-based translation, to allow for independent changes of all translations without affecting templates as well.
|
||||
|
||||
### Everyday usage
|
||||
In a common application, you would use some Gettext functions while writing static text in your pages. Those sentences
|
||||
In a typical application, you would use some Gettext functions while writing static text in your pages. Those sentences
|
||||
would then appear in `.po` files, get translated, compiled into `.mo` files and then, used by Gettext when rendering
|
||||
the actual interface. Given that, let's tie together what we have discussed so far in a step-by-step example:
|
||||
|
||||
@@ -310,35 +310,41 @@ textdomain('main');
|
||||
{% endhighlight %}
|
||||
|
||||
#### 3. Preparing translation for the first run
|
||||
To make matters easier - and one of the powerful advantages Gettext has over custom framework i18n packages - is its
|
||||
custom file type. "Oh man, that's quite hard to understand and edit by hand, a simple array would be easier!" Make no
|
||||
mistake, applications like [Poedit] are here to help - _a lot_. You can get the program from
|
||||
[their website][poedit_download], it's free and available for all platforms. It's a pretty easy tool to get used to,
|
||||
and a very powerful one at the same time - using all powerful features Gettext has available.
|
||||
One of the great advantages Gettext has over custom framework i18n packages is its extensive and powerful file format.
|
||||
"Oh man, that’s quite hard to understand and edit by hand, a simple array would be easier!" Make no mistake,
|
||||
applications like [Poedit] are here to help - _a lot_. You can get the program from [their website][poedit_download],
|
||||
it’s free and available for all platforms. It’s a pretty easy tool to get used to, and a very powerful one at the same
|
||||
time - using all features Gettext has available. This guide is based on PoEdit 1.8.
|
||||
|
||||
In the first run, you should select "File > New Catalog" from the menu. There you'll have a small screen where we will
|
||||
set the terrain so everything else runs smoothly. You'll be able to find those settings later through
|
||||
"Catalog > Properties":
|
||||
In the first run, you should select “File > New...” from the menu. You’ll be asked straight ahead for the language:
|
||||
here you can select/filter the language you want to translate to, or use that format we mentioned before, such as
|
||||
`en_US` or `pt_BR`.
|
||||
|
||||
- Project name and version, Translation Team and email address: useful information that goes in the `.po` file header;
|
||||
- Language: here you should use that format we mentioned before, such as `en_US` or `pt_BR`;
|
||||
- Charsets: UTF-8, preferably;
|
||||
- Source charset: set here the charset used by your PHP files - probably UTF-8 as well, right?
|
||||
- plural forms: here go those rules we mentioned before - there's a link in there with samples as well;
|
||||
- Source paths: here you must include all folders from the project where `gettext()` (and siblings) will happen - this
|
||||
is usually your templates folder(s)
|
||||
- Source keywords: this last part is filled by default, but you might need to alter it later - and is one of the
|
||||
powerful points of Gettext. The underlying software knows how the `gettext()` calls look like in several programming
|
||||
languages, but you might as well create your own translation forms. This will be discussed later in the "Tips" section.
|
||||
Now, save the file - using that directory structure we mentioned as well. Then you should click “Extract from sources”,
|
||||
and here you’ll configure various settings for the extraction and translation tasks. You’ll be able to find all those
|
||||
later through “Catalog > Properties”:
|
||||
|
||||
After setting those points you'll be prompted to save the file - using that directory structure we mentioned as well,
|
||||
and then it will run a scan through your source files to find the localization calls. They'll be fed empty into the
|
||||
translation table, and you'll start typing in the localized versions of those strings. Save it and a `.mo` file will be
|
||||
(re)compiled into the same folder and ta-dah: your project is internationalized.
|
||||
- Source paths: here you must include all folders from the project where `gettext()` (and siblings) are called - this
|
||||
is usually your templates/views folder(s). This is the only mandatory setting;
|
||||
- Translation properties:
|
||||
- Project name and version, Team and Team’s email address: useful information that goes in the .po file header;
|
||||
- Plural forms: here go those rules we mentioned before - there’s a link in there with samples as well. You can
|
||||
leave it with the default option most of the time, as PoEdit already includes a handy database of plural rules for
|
||||
many languages.
|
||||
- Charsets: UTF-8, preferably;
|
||||
- Source code charset: set here the charset used by your codebase - probably UTF-8 as well, right?
|
||||
- Source keywords: The underlying software knows how `gettext()` and similar function calls look like in several
|
||||
programming languages, but you might as well create your own translation functions. It will be here you’ll add those
|
||||
other methods. This will be discussed later in the “Tips” section.
|
||||
|
||||
After setting those points it will run a scan through your source files to find all the localization calls. After every
|
||||
scan PoEdit will display a summary of what was found and what was removed from the source files. New entries will fed
|
||||
empty into the translation table, and you’ll start typing in the localized versions of those strings. Save it and a .mo
|
||||
file will be (re)compiled into the same folder and ta-dah: your project is internationalized.
|
||||
|
||||
#### 4. Translating strings
|
||||
As you may have noticed before, there are two main types of localized strings: simple ones and the ones with plural
|
||||
forms. The first ones have simply two boxes: source and localized string. The source string can't be modified as
|
||||
As you may have noticed before, there are two main types of localized strings: simple ones and those with plural
|
||||
forms. The first ones have simply two boxes: source and localized string. The source string cannot be modified as
|
||||
Gettext/Poedit do not include the powers to alter your source files - you should change the source itself and rescan
|
||||
the files. Tip: you may right-click a translation line and it will hint you with the source files and lines where that
|
||||
string is being used.
|
||||
@@ -348,30 +354,31 @@ the different final forms.
|
||||
Whenever you change your sources and need to update the translations, just hit Refresh and Poedit will rescan the code,
|
||||
removing non-existent entries, merging the ones that changed and adding new ones. It may also try to guess some
|
||||
translations, based on other ones you did. Those guesses and the changed entries will receive a "Fuzzy" marker,
|
||||
indicating it needs review, being highlighted in the list. It's also useful if you have a translation team and someone
|
||||
tries to write something they're not sure about: just mark Fuzzy and someone else will review later.
|
||||
indicating it needs review, appearing golden in the list. It is also useful if you have a translation team and someone
|
||||
tries to write something they are not sure about: just mark Fuzzy, and someone else will review later.
|
||||
|
||||
Finally, it's advised to leave "View > Untranslated entries first" marked, as it will help you _a lot_ to not forget
|
||||
Finally, it is advised to leave "View > Untranslated entries first" marked, as it will help you _a lot_ to not forget
|
||||
any entry. From that menu, you can also open parts of the UI that allow you to leave contextual information for
|
||||
translators if needed.
|
||||
|
||||
### Tips & Tricks
|
||||
|
||||
#### Possible caching issues
|
||||
If you're running PHP as a module on Apache (`mod_php`), you might face issues with the `.mo` file being cached. It
|
||||
happens the first time it's read, and then, to update it, you might need to restart the server. On Nginx and PHP5 it
|
||||
If you are running PHP as a module on Apache (`mod_php`), you might face issues with the `.mo` file being cached. It
|
||||
happens the first time it is read, and then, to update it, you might need to restart the server. On Nginx and PHP5 it
|
||||
usually takes only a couple of page refreshes to refresh the translation cache, and on PHP7 it is rarely needed.
|
||||
|
||||
#### Additional helper functions
|
||||
As preferred by many people, it's easier to use `_()` instead of `gettext()`. Many custom i18n libraries from
|
||||
frameworks use something similar to `t()` as well, to make translated code shorter. However, that's the only function
|
||||
As preferred by many people, it is easier to use `_()` instead of `gettext()`. Many custom i18n libraries from
|
||||
frameworks use something similar to `t()` as well, to make translated code shorter. However, that is the only function
|
||||
that sports a shortcut. You might want to add in your project some others, such as `__()` or `_n()` for `ngettext()`,
|
||||
or maybe a fancy `_r()` that would join `gettext()` and `sprintf()` calls. Other libraries, such as
|
||||
[oscarotero's Gettext][oscarotero] also provide helper functions like these.
|
||||
|
||||
In those cases, you'll need to instruct the Gettext utility on how to extract the strings from those new functions.
|
||||
Don't be afraid, it's very easy. It's just a field in the `.po` file, or a Settings screen on Poedit. In the editor,
|
||||
that option is inside "Catalog > Properties > Source keywords". You need to include there the specifications of those
|
||||
Don't be afraid; it is very easy. It is just a field in the `.po` file, or a Settings screen on Poedit. In the editor,
|
||||
that option is inside "Catalog > Properties > Source keywords". Remember: Gettext already knows the default functions
|
||||
for many languages, so don’t be afraid if that list seems empty. You need to include there the specifications of those
|
||||
new functions, following [a specific format][func_format]:
|
||||
|
||||
- if you create something like `t()` that simply returns the translation for a string, you can specify it as `t`.
|
||||
|
Reference in New Issue
Block a user