1
0
mirror of https://github.com/nbeaver/why-linux-is-better.git synced 2025-08-26 23:34:34 +02:00

Rearrange and reformat.

This commit is contained in:
Nathaniel Beaver
2015-09-11 19:29:18 -05:00
parent a532149236
commit 478352c8fa

View File

@@ -692,26 +692,34 @@ and its various forks are not as mature as the Linux implementation.
UTF-16, not UTF-8. UTF-16, not UTF-8.
------------------ ------------------
If the Windows API were designed today, it would most likely use `UTF-8`_. `UTF-8 has many practical advantages`_ over UTF-16.
The Unicode Consortium primarily `recommends UTF-16`_ for compatibility with Java and the Windows API.
Some `practical reasons for preferring UTF-8`_:
.. _UTF-8: http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt - It is a superset of ASCII,
.. _recommends UTF-16: http://www.unicode.org/faq/programming.html#2 so it is backwards-compatible with existing text files.
.. _practical reasons for preferring UTF-8: https://annevankesteren.nl/2009/09/utf-8-reasons - `Zero bytes do not appear`_ at any point in a valid UTF-8 representation,
so ``strcpy()`` still works.
- It is a superset of ASCII, so it is backwards-compatible with existing text files. - It is `self-synchronizing`_, i.e. it is possible to resynchronize
- `Zero bytes do not appear`_ at any point in a valid UTF-8 representation, so ``strcpy()`` still works. after a lost or corrupted code point without re-reading the entire string.
- It is `self-synchronizing`_, i.e. it is possible to resynchronize after a lost or corrupted code point without re-reading the entire string. - It is more portable because it does not require a `byte-order mark`_
- It is more portable because it does not require a `byte-order mark`_ and is less likely to be mistaken for other encodings. and is less likely to be mistaken for other encodings.
- Internet Explorer has been known to have `security issues with UTF-16`_. - Internet Explorer has been known to have `security issues with UTF-16`_.
.. _UTF-8 has many practical advantages: https://annevankesteren.nl/2009/09/utf-8-reasons
.. _Zero bytes do not appear: https://docs.python.org/2/howto/unicode.html#encodings .. _Zero bytes do not appear: https://docs.python.org/2/howto/unicode.html#encodings
.. _self-synchronizing: http://research.swtch.com/utf8 .. _self-synchronizing: http://research.swtch.com/utf8
.. _byte-order mark: http://www.unicode.org/faq/utf_bom.html .. _byte-order mark: http://www.unicode.org/faq/utf_bom.html
.. _security issues with UTF-16: http://permalink.gmane.org/gmane.ietf.charsets/372 .. _security issues with UTF-16: http://permalink.gmane.org/gmane.ietf.charsets/372
In principle, UTF-16 would have the advantage of constant time addressing of single characters, If the Windows API were designed today,
it would probably use `UTF-8`_.
The Unicode Consortium primarily `recommends UTF-16`_
for compatibility with Java and the Windows API.
.. _UTF-8: http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt
.. _recommends UTF-16: http://www.unicode.org/faq/programming.html#2
In principle, UTF-16 would have the advantage
of constant time addressing of single characters,
but in practice most programming languages do not provide data types for this, but in practice most programming languages do not provide data types for this,
with the `exception of Go and rust`_. with the `exception of Go and rust`_.