Fix line wrapping consistency - round 1

Comply with the 'wrap at 120 chars' style rule stated in the contributing.md document
This commit is contained in:
jrfnl
2014-12-07 23:07:49 +01:00
parent ac95e06b40
commit 82ee7ad76a
26 changed files with 348 additions and 268 deletions

View File

@@ -11,60 +11,63 @@ _This section was originally written by [Alex Cabal](https://alexcabal.com/) ove
### There's no one-liner. Be careful, detailed, and consistent.
Right now PHP does not support Unicode at a low level. There are ways to ensure that UTF-8 strings are processed OK,
but it's not easy, and it requires digging in to almost all levels of the web app, from HTML to SQL to PHP. We'll aim
Right now PHP does not support Unicode at a low level. There are ways to ensure that UTF-8 strings are processed OK,
but it's not easy, and it requires digging in to almost all levels of the web app, from HTML to SQL to PHP. We'll aim
for a brief, practical summary.
### UTF-8 at the PHP level
The basic string operations, like concatenating two strings and assigning strings to variables, don't need anything
special for UTF-8. However most string functions, like `strpos()` and `strlen()`, do need special consideration. These
functions often have an `mb_*` counterpart: for example, `mb_strpos()` and `mb_strlen()`. These `mb_*` strings are made
The basic string operations, like concatenating two strings and assigning strings to variables, don't need anything
special for UTF-8. However most string functions, like `strpos()` and `strlen()`, do need special consideration. These
functions often have an `mb_*` counterpart: for example, `mb_strpos()` and `mb_strlen()`. These `mb_*` strings are made
available to you via the [Multibyte String Extension], and are specifically designed to operate on Unicode strings.
You must use the `mb_*` functions whenever you operate on a Unicode string. For example, if you use `substr()` on a
UTF-8 string, there's a good chance the result will include some garbled half-characters. The correct function to use
You must use the `mb_*` functions whenever you operate on a Unicode string. For example, if you use `substr()` on a
UTF-8 string, there's a good chance the result will include some garbled half-characters. The correct function to use
would be the multibyte counterpart, `mb_substr()`.
The hard part is remembering to use the `mb_*` functions at all times. If you forget even just once, your Unicode
The hard part is remembering to use the `mb_*` functions at all times. If you forget even just once, your Unicode
string has a chance of being garbled during further processing.
Not all string functions have an `mb_*` counterpart. If there isn't one for what you want to do, then you might be out
Not all string functions have an `mb_*` counterpart. If there isn't one for what you want to do, then you might be out
of luck.
You should use the `mb_internal_encoding()` function at the top of every PHP script you write (or at the
top of your global include script), and the `mb_http_output()` function right after it if your script is outputting to
a browser. Explicitly defining the encoding of your strings in every script will save you a lot of headaches down the
road.
You should use the `mb_internal_encoding()` function at the top of every PHP script you write (or at the top of your
global include script), and the `mb_http_output()` function right after it if your script is outputting to a browser.
Explicitly defining the encoding of your strings in every script will save you a lot of headaches down the road.
Additionally, many PHP functions that operate on strings have an optional parameter letting you specify the character
encoding. You should always explicitly indicate UTF-8 when given the option. For example, `htmlentities()` has an
option for character encoding, and you should always specify UTF-8 if dealing with such strings. Note that as of PHP 5.4.0, UTF-8 is the default encoding for `htmlentities()` and `htmlspecialchars()`.
Additionally, many PHP functions that operate on strings have an optional parameter letting you specify the character
encoding. You should always explicitly indicate UTF-8 when given the option. For example, `htmlentities()` has an
option for character encoding, and you should always specify UTF-8 if dealing with such strings. Note that as of PHP 5.
4.0, UTF-8 is the default encoding for `htmlentities()` and `htmlspecialchars()`.
Finally, If you are building an distributed application and cannot be certain that the `mbstring` extension will be
enabled, then consider using the [patchwork/utf8] Composer package. This
will use `mbstring` if it is available, and fall back to non UTF-8 functions if not.
enabled, then consider using the [patchwork/utf8] Composer package. This will use `mbstring` if it is available, and
fall back to non UTF-8 functions if not.
[Multibyte String Extension]: http://php.net/book.mbstring
[patchwork/utf8]: https://packagist.org/packages/patchwork/utf8
### UTF-8 at the Database level
If your PHP script accesses MySQL, there's a chance your strings could be stored as non-UTF-8 strings in the database
If your PHP script accesses MySQL, there's a chance your strings could be stored as non-UTF-8 strings in the database
even if you follow all of the precautions above.
To make sure your strings go from PHP to MySQL as UTF-8, make sure your database and tables are all set to the
`utf8mb4` character set and collation, and that you use the `utf8mb4` character set in the PDO connection string. See
To make sure your strings go from PHP to MySQL as UTF-8, make sure your database and tables are all set to the
`utf8mb4` character set and collation, and that you use the `utf8mb4` character set in the PDO connection string. See
example code below. This is _critically important_.
Note that you must use the `utf8mb4` character set for complete UTF-8 support, not the `utf8` character set! See
Note that you must use the `utf8mb4` character set for complete UTF-8 support, not the `utf8` character set! See
Further Reading for why.
### UTF-8 at the browser level
Use the `mb_http_output()` function to ensure that your PHP script outputs UTF-8 strings to your browser.
Use the `mb_http_output()` function to ensure that your PHP script outputs UTF-8 strings to your browser.
The browser will then need to be told by the HTTP response that this page should be considered as UTF-8. The historic approach to doing that was to include the [charset `<meta>` tag](http://htmlpurifier.org/docs/enduser-utf8.html) in your page's `<head>` tag. This approach is perfectly valid, but setting the charset in the `Content-Type` header is actually [much faster](https://developers.google.com/speed/docs/best-practices/rendering#SpecifyCharsetEarly).
The browser will then need to be told by the HTTP response that this page should be considered as UTF-8. The historic
approach to doing that was to include the [charset `<meta>` tag](http://htmlpurifier.org/docs/enduser-utf8.html) in
your page's `<head>` tag. This approach is perfectly valid, but setting the charset in the `Content-Type` header is
actually [much faster](https://developers.google.com/speed/docs/best-practices/rendering#SpecifyCharsetEarly).
{% highlight php %}
<?php