Merge pull request #712 from igorsantos07/i18n

Proof-writing the i18n chapter
2025-08-09 15:36:36 +02:00 · 2018-06-21 07:35:47 -04:00
parent 9f88b73edf 43f8b81d35
commit a51d99baa3
1 changed files with 78 additions and 71 deletions
--- a/_posts/05-06-01-Internationalization-and-Localization.md
+++ b/_posts/05-06-01-Internationalization-and-Localization.md
@@ -12,28 +12,27 @@ words - in our case, internationalization becomes i18n and localization, l10n._
 First of all, we need to define those two similar concepts and other related things:

 - **Internationalization** is when you organize your code so it can be adapted to different languages or regions
-without refactorings. This is usually done once - preferably, in the beginning of the project, or else you'll probably
-need some huge changes in the source!
+without refactorings. This action is usually done once - preferably, at the beginning of the project, or else you will
+probably need some huge changes in the source!
 - **Localization** happens when you adapt the interface (mainly) by translating contents, based on the i18n work done
 before. It usually is done every time a new language or region needs support and is updated when new interface pieces
 are added, as they need to be available in all supported languages.
- **Pluralization** defines the rules needed between different languages to interoperate strings containing numbers and
-counters. For instance, in English when you have only one item, it's singular, and anything different from that is
+- **Pluralization** defines the rules required between distinct languages to interoperate strings containing numbers and 
+counters. For instance, in English when you have only one item, it is singular, and anything different from that is 
 called plural; plural in this language is indicated by adding an S after some words, and sometimes changes parts of it.
 In other languages, such as Russian or Serbian, there are two plural forms in addition to the singular - you may even
 find languages with a total of four, five or six forms, such as Slovenian, Irish or Arabic.

 ## Common ways to implement
 The easiest way to internationalize PHP software is by using array files and using those strings in templates, such as
-`<h1><?=$TRANS['title_about_page']?></h1>`. This is, however, hardly a recommended way for serious projects, as it poses
+`<h1><?=$TRANS['title_about_page']?></h1>`. This way is, however, hardly recommended for serious projects, as it poses
 some maintenance issues along the road - some might appear in the very beginning, such as pluralization. So, please,
 don't try this if your project will contain more than a couple of pages.

 The most classic way and often taken as reference for i18n and l10n is a [Unix tool called `gettext`][gettext]. It dates
-back to 1995 and is still a complete implementation for translating software. It is pretty easy to get running, while
-it still sports powerful supporting tools. We will talk about Gettext in more detail below. If you would prefer to not
-have to get your hands dirty with the command line, we will also be presenting a great GUI application that can be used
-to easily update your l10n source files.
+back to 1995 and is still a complete implementation for translating software. It is easy enough to get running, while
+still sporting powerful supporting tools. It is about Gettext we will be talking here. Also, to help you not get messy
+over the command-line, we will be presenting a great GUI application that can be used to easily update your l10n source

 ### Other tools

@@ -46,8 +45,8 @@ translation. It uses array formats for message. Does not provide a message extra
 message formatting via the `intl` extension (including pluralized messages).
 - [oscarotero/Gettext][oscarotero]: Gettext support with an OO interface; includes improved helper functions, powerful
 extractors for several file formats (some of them not supported natively by the `gettext` command), and can also export
-to other formats besides `.mo/.po` files. Can be useful if you need to integrate your translation files into other parts
-of the system, like a JavaScript interface.
+to other formats besides `.mo/.po` files. Can be useful if you need to integrate your translation files into other
+parts of the system, like a JavaScript interface.
 - [symfony/translation][symfony]: supports a lot of different formats, but recommends using verbose XLIFF's. Doesn't
 include helper functions nor a built-in extractor, but supports placeholders using `strtr()` internally.
 - [zend/i18n][zend]: supports array and INI files, or Gettext formats. Implements a caching layer to save you from
@@ -55,6 +54,7 @@ reading the filesystem every time. It also includes view helpers, and locale-awa
 However, it has no message extractor.

 Other frameworks also include i18n modules, but those are not available outside of their codebases:
+
 - [Laravel] supports basic array files, has no automatic extractor but includes a `@lang` helper for template files.
 - [Yii] supports array, Gettext, and database-based translation, and includes a messages extractor. It is backed by the
 [`Intl`][intl] extension, available since PHP 5.3, and based on the [ICU project]; this enables Yii to run powerful
@@ -71,7 +71,7 @@ After installed, enable it by adding `extension=gettext.so` (Linux/Unix) or `ext
 your `php.ini`.

 Here we will also be using [Poedit] to create translation files. You will probably find it in your system's package
-manager; it's available for Unix, Mac, and Windows, and can be [downloaded for free on their website][poedit_download]
+manager; it is available for Unix, Mac, and Windows, and can be [downloaded for free on their website][poedit_download]
 as well.

 ### Structure
@@ -79,31 +79,31 @@ as well.
 #### Types of files
 There are three files you usually deal with while working with gettext. The main ones are PO (Portable Object) and
 MO (Machine Object) files, the first being a list of readable "translated objects" and the second, the corresponding
-binary to be interpreted by gettext when doing localization. There's also a POT (Template) file, that simply contains
+binary to be interpreted by gettext when doing localization. There's also a POT (Template) file, which simply contains
 all existing keys from your source files, and can be used as a guide to generate and update all PO files. Those template
-files are not mandatory: depending on the tool you're using to do l10n, you can go just fine with only PO/MO files.
-You'll always have one pair of PO/MO files per language and region, but only one POT per domain.
+files are not mandatory: depending on the tool you are using to do l10n, you can go just fine with only PO/MO files.
+You will always have one pair of PO/MO files per language and region, but only one POT per domain.

 ### Domains
-There are some cases, in big projects, where you might need to separate translations when the same words convey
-different meaning given a context. In those cases, you split them into different _domains_. They're basically named
+There are some cases, in big projects, where you might need to separate translations when the same words convey 
+different meaning given a context. In those cases, you split them into different _domains_. They are, basically, named
 groups of POT/PO/MO files, where the filename is the said _translation domain_. Small and medium-sized projects usually,
 for simplicity, use only one domain; its name is arbitrary, but we will be using "main" for our code samples.
 In [Symfony] projects, for example, domains are used to separate the translation for validation messages.

 #### Locale code
-A locale is simply a code that identifies one version of a language. It's defined following the [ISO 639-1][639-1] and
+A locale is simply a code that identifies one version of a language. It is defined following the [ISO 639-1][639-1] and 
 [ISO 3166-1 alpha-2][3166-1] specs: two lower-case letters for the language, optionally followed by an underline and two
 upper-case letters identifying the country or regional code. For [rare languages][rare], three letters are used.

 For some speakers, the country part may seem redundant. In fact, some languages have dialects in different
 countries, such as Austrian German (`de_AT`) or Brazilian Portuguese (`pt_BR`). The second part is used to distinguish
-between those dialects - when it's not present, it's taken as a "generic" or "hybrid" version of the language.
+between those dialects - when it is not present, it is taken as a "generic" or "hybrid" version of the language.

 ### Directory structure
-To use Gettext, we will need to adhere to a specific structure of folders. First, you'll need to select an arbitrary
-root for your l10n files in your source repository. Inside it, you'll have a folder for each needed locale, and a fixed
-`LC_MESSAGES` folder that will contain all your PO/MO pairs. Example:
+To use Gettext, we will need to adhere to a specific structure of folders. First, you will need to select an arbitrary
+root for your l10n files in your source repository. Inside it, you will have a folder for each needed locale, and a
+fixed `LC_MESSAGES` folder that will contain all your PO/MO pairs. Example:

 {% highlight console %}
 <project root>
@@ -131,9 +131,9 @@ root for your l10n files in your source repository. Inside it, you'll have a fol

 ### Plural forms
 As we said in the introduction, different languages might sport different plural rules. However, gettext saves us from
-this trouble once again. When creating a new `.po` file, you'll have to declare the [plural rules][plural] for that
+this trouble once again. When creating a new `.po` file, you will have to declare the [plural rules][plural] for that
 language, and translated pieces that are plural-sensitive will have a different form for each of those rules. When
-calling Gettext in code, you'll have to specify the number related to the sentence, and it will work out the correct
+calling Gettext in code, you will have to specify the number related to the sentence, and it will work out the correct
 form to use - even using string substitution if needed.

 Plural rules include the number of plurals available and a boolean test with `n` that would define in which rule the
@@ -147,13 +147,13 @@ Now that you understood the basis of how plural rules works - and if you didn't,
 on the [LingoHub tutorial][lingohub_plurals] -, you might want to copy the ones you need from a [list][plural] instead
 of writing them by hand.

-When calling out Gettext to do localization on sentences with counters, you'll have to give him the
+When calling out Gettext to do localization on sentences with counters, you will have to give him the
 related number as well. Gettext will work out what rule should be in effect and use the correct localized version.
 You will need to include in the `.po` file a different sentence for each plural rule defined.

 ### Sample implementation
 After all that theory, let's get a little practical. Here's an excerpt of a `.po` file - don't mind with its format,
-but instead the overall content, you'll learn how to edit it easily later:
+but with the overall content instead; you will learn how to edit it easily later:

 {% highlight po %}
 msgid ""
@@ -162,7 +162,7 @@ msgstr ""
 "Content-Type: text/plain; charset=UTF-8\n"
 "Plural-Forms: nplurals=2; plural=(n > 1);\n"

-msgid "We're now translating some strings"
+msgid "We are now translating some strings"
 msgstr "Nós estamos traduzindo algumas strings agora"

 msgid "Hello %1$s! Your last visit was on %2$s"
@@ -182,11 +182,11 @@ translation may contain the user name and visit date.
 The last section is a sample of pluralization forms, displaying
 the singular and plural version as `msgid` in English and their corresponding translations as `msgstr` 0 and 1
 (following the number given by the plural rule). There, string replacement is used as well so the number can be seen
-directly in the sentence, by using `%d`. The plural forms always have two `msgid` (singular and plural), so it's
-advised to not use a complex language as the source of translation.
+directly in the sentence, by using `%d`. The plural forms always have two `msgid` (singular and plural), so it is
+advised not to use a complex language as the source of translation.

 ### Discussion on l10n keys
-As you might have noticed, we're using as source ID the actual sentence in English. That `msgid` is the same used
+As you might have noticed, we are using as source ID the actual sentence in English. That `msgid` is the same used
 throughout all your `.po` files, meaning other languages will have the same format and the same `msgid` fields but
 translated `msgstr` lines.

@@ -198,7 +198,7 @@ Talking about translation keys, there are two main "schools" here:
    meaning. Example: if you happen to translate by heart from English to Spanish but need help to translate to French,
    you might publish the new page with missing French sentences, and parts of the website would be displayed in English
    instead;
-    - it's much easier for the translator to understand what's going on and make a proper translation based on the
+    - it is much easier for the translator to understand what's going on and do a proper translation based on the
    `msgid`;
    - it gives you "free" l10n for one language - the source one;
    - The only disadvantage: if you need to change the actual text, you would need to replace the same `msgid`
@@ -207,21 +207,21 @@ Talking about translation keys, there are two main "schools" here:
 2. _`msgid` as a unique, structured key_.
 It would describe the sentence role in the application in a structured way, including the template or part where the
 string is located instead of its content.
-    - it's a great way to have the code organized, separating the text content from the template logic.
+    - it is a great way to have the code organized, separating the text content from the template logic.
    - however, that could bring problems to the translator that would miss the context. A source language file would be
    needed as a basis for other translations. Example: the developer would ideally have an `en.po` file, that
    translators would read to understand what to write in `fr.po` for instance.
    - missing translations would display meaningless keys on screen (`top_menu.welcome` instead of `Hello there, User!`
-    on the said untranslated French page). That's good it as would force translation to be complete before publishing -
-    but bad as translation issues would be really awful in the interface. Some libraries, though, include an option to
-    specify a given language as "fallback", having a similar behavior as the other approach.
+    on the said untranslated French page). That is good it as would force translation to be complete before publishing -
+    however, bad as translation issues would be remarkably awful in the interface. Some libraries, though, include an
+    option to specify a given language as "fallback", having a similar behavior as the other approach.

-The [Gettext manual][manual] favors the first approach as, in general, it's easier for translators and users in
-case of trouble. That's how we will be working here as well. However, the [Symfony documentation][symfony-keys] favors
+The [Gettext manual][manual] favors the first approach as, in general, it is easier for translators and users in
+case of trouble. That is how we will be working here as well. However, the [Symfony documentation][symfony-keys] favors
 keyword-based translation, to allow for independent changes of all translations without affecting templates as well.

 ### Everyday usage
-In a common application, you would use some Gettext functions while writing static text in your pages. Those sentences
+In a typical application, you would use some Gettext functions while writing static text in your pages. Those sentences
 would then appear in `.po` files, get translated, compiled into `.mo` files and then, used by Gettext when rendering
 the actual interface. Given that, let's tie together what we have discussed so far in a step-by-step example:

@@ -310,35 +310,41 @@ textdomain('main');
 {% endhighlight %}

 #### 3. Preparing translation for the first run
-To make matters easier - and one of the powerful advantages Gettext has over custom framework i18n packages - is its
-custom file type. "Oh man, that's quite hard to understand and edit by hand, a simple array would be easier!" Make no
-mistake, applications like [Poedit] are here to help - _a lot_. You can get the program from
-[their website][poedit_download], it's free and available for all platforms. It's a pretty easy tool to get used to,
-and a very powerful one at the same time - using all powerful features Gettext has available.
+One of the great advantages Gettext has over custom framework i18n packages is its extensive and powerful file format.
+"Oh man, that’s quite hard to understand and edit by hand, a simple array would be easier!" Make no mistake,
+applications like [Poedit] are here to help - _a lot_. You can get the program from [their website][poedit_download],
+it’s free and available for all platforms. It’s a pretty easy tool to get used to, and a very powerful one at the same
+time - using all features Gettext has available. This guide is based on PoEdit 1.8.

-In the first run, you should select "File > New Catalog" from the menu. There you'll have a small screen where we will
-set the terrain so everything else runs smoothly. You'll be able to find those settings later through
-"Catalog > Properties":
+In the first run, you should select “File > New...” from the menu. You’ll be asked straight ahead for the language:
+here you can select/filter the language you want to translate to, or use that format we mentioned before, such as
+`en_US` or `pt_BR`.

- Project name and version, Translation Team and email address: useful information that goes in the `.po` file header;
- Language: here you should use that format we mentioned before, such as `en_US` or `pt_BR`;
- Charsets: UTF-8, preferably;
- Source charset: set here the charset used by your PHP files - probably UTF-8 as well, right?
- plural forms: here go those rules we mentioned before - there's a link in there with samples as well;
- Source paths: here you must include all folders from the project where `gettext()` (and siblings) will happen - this
-is usually your templates folder(s)
- Source keywords: this last part is filled by default, but you might need to alter it later - and is one of the
-powerful points of Gettext. The underlying software knows how the `gettext()` calls look like in several programming
-languages, but you might as well create your own translation forms. This will be discussed later in the "Tips" section.
+Now, save the file - using that directory structure we mentioned as well. Then you should click “Extract from sources”,
+and here you’ll configure various settings for the extraction and translation tasks. You’ll be able to find all those
+later through “Catalog > Properties”:

-After setting those points you'll be prompted to save the file - using that directory structure we mentioned as well,
-and then it will run a scan through your source files to find the localization calls. They'll be fed empty into the
-translation table, and you'll start typing in the localized versions of those strings. Save it and a `.mo` file will be
-(re)compiled into the same folder and ta-dah: your project is internationalized.
+- Source paths: here you must include all folders from the project where `gettext()` (and siblings) are called - this
+is usually your templates/views folder(s). This is the only mandatory setting;
+- Translation properties:
+    - Project name and version, Team and Team’s email address: useful information that goes in the .po file header;
+    - Plural forms: here go those rules we mentioned before - there’s a link in there with samples as well. You can
+    leave it with the default option most of the time, as PoEdit already includes a handy database of plural rules for
+    many languages.
+    - Charsets: UTF-8, preferably;
+    - Source code charset: set here the charset used by your codebase - probably UTF-8 as well, right?
+- Source keywords: The underlying software knows how `gettext()` and similar function calls look like in several
+programming languages, but you might as well create your own translation functions. It will be here you’ll add those
+other methods. This will be discussed later in the “Tips” section.
+
+After setting those points it will run a scan through your source files to find all the localization calls. After every
+scan PoEdit will display a summary of what was found and what was removed from the source files. New entries will fed
+empty into the translation table, and you’ll start typing in the localized versions of those strings. Save it and a .mo
+file will be (re)compiled into the same folder and ta-dah: your project is internationalized.

 #### 4. Translating strings
-As you may have noticed before, there are two main types of localized strings: simple ones and the ones with plural
-forms. The first ones have simply two boxes: source and localized string. The source string can't be modified as
+As you may have noticed before, there are two main types of localized strings: simple ones and those with plural
+forms. The first ones have simply two boxes: source and localized string. The source string cannot be modified as
 Gettext/Poedit do not include the powers to alter your source files - you should change the source itself and rescan
 the files. Tip: you may right-click a translation line and it will hint you with the source files and lines where that
 string is being used.
@@ -348,30 +354,31 @@ the different final forms.
 Whenever you change your sources and need to update the translations, just hit Refresh and Poedit will rescan the code,
 removing non-existent entries, merging the ones that changed and adding new ones. It may also try to guess some
 translations, based on other ones you did. Those guesses and the changed entries will receive a "Fuzzy" marker,
-indicating it needs review, being highlighted in the list. It's also useful if you have a translation team and someone
-tries to write something they're not sure about: just mark Fuzzy and someone else will review later.
+indicating it needs review, appearing golden in the list. It is also useful if you have a translation team and someone
+tries to write something they are not sure about: just mark Fuzzy, and someone else will review later.

-Finally, it's advised to leave "View > Untranslated entries first" marked, as it will help you _a lot_ to not forget
+Finally, it is advised to leave "View > Untranslated entries first" marked, as it will help you _a lot_ to not forget
 any entry. From that menu, you can also open parts of the UI that allow you to leave contextual information for
 translators if needed.

 ### Tips & Tricks

 #### Possible caching issues
-If you're running PHP as a module on Apache (`mod_php`), you might face issues with the `.mo` file being cached. It
-happens the first time it's read, and then, to update it, you might need to restart the server. On Nginx and PHP5 it
+If you are running PHP as a module on Apache (`mod_php`), you might face issues with the `.mo` file being cached. It
+happens the first time it is read, and then, to update it, you might need to restart the server. On Nginx and PHP5 it
 usually takes only a couple of page refreshes to refresh the translation cache, and on PHP7 it is rarely needed.

 #### Additional helper functions
-As preferred by many people, it's easier to use `_()` instead of `gettext()`. Many custom i18n libraries from
-frameworks use something similar to `t()` as well, to make translated code shorter. However, that's the only function
+As preferred by many people, it is easier to use `_()` instead of `gettext()`. Many custom i18n libraries from
+frameworks use something similar to `t()` as well, to make translated code shorter. However, that is the only function
 that sports a shortcut. You might want to add in your project some others, such as `__()` or `_n()` for `ngettext()`,
 or maybe a fancy `_r()` that would join `gettext()` and `sprintf()` calls. Other libraries, such as
 [oscarotero's Gettext][oscarotero] also provide helper functions like these.

 In those cases, you'll need to instruct the Gettext utility on how to extract the strings from those new functions.
-Don't be afraid, it's very easy. It's just a field in the `.po` file, or a Settings screen on Poedit. In the editor,
-that option is inside "Catalog > Properties > Source keywords". You need to include there the specifications of those
+Don't be afraid; it is very easy. It is just a field in the `.po` file, or a Settings screen on Poedit. In the editor,
+that option is inside "Catalog > Properties > Source keywords". Remember: Gettext already knows the default functions
+for many languages, so don’t be afraid if that list seems empty. You need to include there the specifications of those
 new functions, following [a specific format][func_format]:

 - if you create something like `t()` that simply returns the translation for a string, you can specify it as `t`.