<p>In order to make this as simple as possible, we will be using tabs, not spaces. We enforce 4 (four) spaces for one tab - therefore you need to set your tab width within your editor to 4 spaces. Make sure that when you <strong>save</strong> the file, it's saving tabs and not spaces. This way, we can each have the code be displayed the way we like it, without breaking the layout of the actual files.</p>
<p>Tabs in front of lines are no problem, but having them within the text can be a problem if you do not set it to the amount of spaces every one of us uses. Here is a short example of how it should look like:</p>
<p>Ensure that your editor is saving files in the UNIX (LF) line ending format. This means that lines are terminated with a newline, not with Windows Line endings (CR/LF combo) as they are on Win32 or Classic Mac (CR) Line endings. Any decent editor should be able to do this, but it might not always be the default setting. Know your editor. If you want advice for an editor for your Operating System, just ask one of the developers. Some of them do their editing on Win32.</p>
<p>For those files you have to put an empty comment directly after the header to prevent the documentor assigning the header to the first code element found.</p>
<p>Do not forget to comment the functions (especially the first function following the header). Each function should have at least a comment of what this function does. For more complex functions it is recommended to document the parameters too.</p>
<p>Do not forget to comment the class. Classes need a separate @package definition, it is the same as the header package name. Apart from this special case the above statement for files containing only functions needs to be applied to classes and it's methods too.</p>
<p>Functions used by more than one page should be placed in functions.php, functions specific to one page should be placed on that page (at the bottom) or within the relevant sections functions file. Some files in <code>/includes</code> are holding functions responsible for special sections, for example uploading files, displaying "things", user related functions and so forth.</p>
<p>The following packages are defined, and related new features/functions should be placed within the mentioned files/locations, as well as specifying the correct package name. The package names are bold within this list:</p>
<li><strong>acp</strong><br/><code>/adm</code>, <code>/includes/acp</code>, <code>/includes/functions_admin.php</code><br/>Administration Control Panel</li>
<li><strong>dbal</strong><br/><code>/includes/db</code><br/>Database Abstraction Layer.<br/>Base class is <code>dbal</code>
PHPBB_ACM_MEMCACHE_PORT (overwrite memcached port, default is 11211)
PHPBB_ACM_MEMCACHE_COMPRESS (overwrite memcached compress setting, default is disabled)
PHPBB_ACM_MEMCACHE_HOST (overwrite memcached host name, default is localhost)
PHPBB_QA (Set board to QA-Mode, which means the updater also checks for RC-releases)
</pre></div>
<h4>PHPBB_USE_BOARD_URL_PATH</h4>
<p>If the <code>PHPBB_USE_BOARD_URL_PATH</code> constant is set to true, phpBB uses generate_board_url() (this will return the boards url with the script path included) on all instances where web-accessible images are loaded. The exact locations are:</p>
<p>We will not be using any form of hungarian notation in our naming conventions. Many of us believe that hungarian naming is one of the primary code obfuscation techniques currently in use.</p>
<p>Variable names should be in all lowercase, with words separated by an underscore, example:</p>
<divclass="indent">
<p><code>$current_user</code> is right, but <code>$currentuser</code> and <code> $currentUser</code> are not.</p>
</div>
<p>Names should be descriptive, but concise. We don't want huge sentences as our variable names, but typing an extra couple of characters is always better than wondering what exactly a certain variable is for. </p>
<p>The <em>only</em> situation where a one-character variable name is allowed is when it's the index for some looping construct. In this case, the index of the outer loop should always be $i. If there's a loop inside that loop, its index should be $j, followed by $k, and so on. If the loop is being indexed by some already-existing variable with a meaningful name, this guideline does not apply, example:</p>
<p>Functions should also be named descriptively. We're not programming in C here, we don't want to write functions called things like "stristr()". Again, all lower-case names with words separated by a single underscore character. Function names should preferably have a verb in them somewhere. Good function names are <code>print_login_status()</code>, <code>get_user_data()</code>, etc. </p>
<p>Arguments are subject to the same guidelines as variable names. We don't want a bunch of functions like: <code>do_stuff($a, $b, $c)</code>. In most cases, we'd like to be able to tell how to use a function by just looking at its declaration. </p>
<p>The basic philosophy here is to not hurt code clarity for the sake of laziness. This has to be balanced by a little bit of common sense, though; <code>print_login_status_for_a_given_user()</code> goes too far, for example -- that function would be better named <code>print_user_login_status()</code>, or just <code>print_login_status()</code>.</p>
<p>This is another case of being too lazy to type 2 extra characters causing problems with code clarity. Even if the body of some construct is only one line long, do <em>not</em> drop the braces. Just don't, examples:</p>
<p>This one is a bit of a holy war, but we're going to use a style that can be summed up in one sentence: Braces always go on their own line. The closing brace should also always be at the same column as the corresponding opening brace, examples:</p>
<p>This is another simple, easy step that helps keep code readable without much effort. Whenever you write an assignment, expression, etc.. Always leave <em>one</em> space between the tokens. Basically, write code as if it was English. Put spaces between variable names and operators. Don't put spaces just after an opening bracket or before a closing bracket. Don't put spaces just before a comma or a semicolon. This is best shown with a few examples, examples:</p>
<p>// Each pair shows the wrong way followed by the right way. </p>
<p>Do you know the exact precedence of all the operators in PHP? Neither do I. Don't guess. Always make it obvious by using brackets to force the precedence of an equation so you know what it does. Remember to not over-use this, as it may harden the readability. Basically, do not enclose single expressions. Examples:</p>
<pclass="bad">// what's the result? who knows. </p>
<p>There are two different ways to quote strings in PHP - either with single quotes or with double quotes. The main difference is that the parser does variable interpolation in double-quoted strings, but not in single quoted strings. Because of this, you should <em>always</em> use single quotes <em>unless</em> you specifically need variable interpolation to be done on that string. This way, we can save the parser the trouble of parsing a bunch of strings where no interpolation needs to be done.</p>
<p>Also, if you are using a string variable as part of a function call, you do not need to enclose that variable in quotes. Again, this will just make unnecessary work for the parser. Note, however, that nearly all of the escape sequences that exist for double-quoted strings will not work with single-quoted strings. Be careful, and feel free to break this guideline if it's making your code easier to read, examples:</p>
<p>In SQL Statements mixing single and double quotes is partly allowed (following the guidelines listed here about SQL Formatting), else it should be tryed to only use one method - mostly single quotes.</p>
<p>In PHP, it's legal to use a literal string as a key to an associative array without quoting that string. We don't want to do this -- the string should always be quoted to avoid confusion. Note that this is only when we're using a literal, not when we're using a variable, examples:</p>
<p>Each complex function should be preceded by a comment that tells a programmer everything they need to know to use that function. The meaning of every parameter, the expected input, and the output are required as a minimal comment. The function's behaviour in error conditions (and what those error conditions are) should also be present - but mostly included within the comment about the output.<br/><br/>Especially important to document are any assumptions the code makes, or preconditions for its proper operation. Any one of the developers should be able to look at any part of the application and figure out what's going on in a reasonable amount of time.<br/><br/>Avoid using <code>/* */</code> comment blocks for one-line comments, <code>//</code> should be used for one/two-liners.</p>
<p>Don't use them. Use named constants for any literal value other than obvious special cases. Basically, it's ok to check if an array has 0 elements by using the literal 0. It's not ok to assign some special meaning to a number and then use it everywhere as a literal. This hurts readability AND maintainability. The constants <code>true</code> and <code>false</code> should be used in place of the literals 1 and 0 -- even though they have the same values (but not type!), it's more obvious what the actual logic is when you use the named constants. Typecast variables where it is needed, do not rely on the correct variable type (PHP is currently very loose on typecasting which can lead to security problems if a developer does not have a very close eye to it).</p>
<p>The only shortcut operators that cause readability problems are the shortcut increment <code>$i++</code> and decrement <code>$j--</code> operators. These operators should not be used as part of an expression. They can, however, be used on their own line. Using them in expressions is just not worth the headaches when debugging, examples:</p>
<p>Inline conditionals should only be used to do very simple things. Preferably, they will only be used to do assignments, and not for function calls or anything complex at all. They can be harmful to readability if used incorrectly, so don't fall in love with saving typing by using them, examples:</p>
<p>For phpBB3, we intend to use a higher level of run-time error reporting. This will mean that the use of an uninitialized variable will be reported as a warning. These warnings can be avoided by using the built-in isset() function to check whether a variable has been set - but preferably the variable is always existing. For checking if an array has a key set this can come in handy though, examples:</p>
<p>The <code>empty()</code> function is useful if you want to check if a variable is not set or being empty (an empty string, 0 as an integer or string, NULL, false, an empty array or a variable declared, but without a value in a class). Therefore empty should be used in favor of <code>isset($array) && sizeof($array) > 0</code> - this can be written in a shorter way as <code>!empty($array)</code>.</p>
<p>Switch/case code blocks can get a bit long sometimes. To have some level of notice and being in-line with the opening/closing brace requirement (where they are on the same line for better readability), this also applies to switch/case code blocks and the breaks. An example:</p>
<p>All SQL should be cross-DB compatible, if DB specific SQL is used alternatives must be provided which work on all supported DB's (MySQL3/4/5, MSSQL (7.0 and 2000), PostgreSQL (7.0+), Firebird, SQLite, Oracle8, ODBC (generalised if possible)).</p>
<p>SQL Statements are often unreadable without some formatting, since they tend to be big at times. Though the formatting of sql statements adds a lot to the readability of code. SQL statements should be formatted in the following way, basically writing keywords: </p>
<p>In other words use single quotes where no variable substitution is required or where the variable involved shouldn't appear within double quotes. Otherwise use double quotes.</p>
<p>Always use <code>$db->sql_escape()</code> if you need to check for a string within an SQL statement (even if you are sure the variable cannot contain single quotes - never trust your input), for example:</p>
<p>We do not add limit statements to the sql query, but instead use <code>$db->sql_query_limit()</code>. You basically pass the query, the total number of lines to retrieve and the offset.</p>
<p><strong>Note: </strong> Since Oracle handles limits differently and because of how we implemented this handling you need to take special care if you use <code>sql_query_limit</code> with an sql query retrieving data from more than one table.</p>
<p>Make sure when using something like "SELECT x.*, y.jars" that there is not a column named jars in x; make sure that there is no overlap between an implicit column and the explicit columns.</p>
<p>If you need to UPDATE or INSERT data, make use of the <code>$db->sql_build_array()</code> function. This function already escapes strings and checks other types, so there is no need to do this here. The data to be inserted should go into an array - <code>$sql_ary</code> - or directly within the statement if one or two variables needs to be inserted/updated. An example of an insert statement would be:</p>
<p>The <code>$db->sql_build_array()</code> function supports the following modes: <code>INSERT</code> (example above), <code>INSERT_SELECT</code> (building query for <code>INSERT INTO table (...) SELECT value, column ...</code> statements), <code>UPDATE</code> (example above) and <code>SELECT</code> (for building WHERE statement [AND logic]).</p>
<h4>sql_multi_insert():</h4>
<p>If you want to insert multiple statements at once, please use the separate <code>sql_multi_insert()</code> method. An example:</p>
<p>The <code>$db->sql_in_set()</code> function should be used for building <code>IN ()</code> and <code>NOT IN ()</code> constructs. Since (specifically) MySQL tend to be faster if for one value to be compared the <code>=</code> and <code><></code> operator is used, we let the DBAL decide what to do. A typical example of doing a positive match against a number of values would be:</p>
<p>The <code>$db->sql_build_query()</code> function is responsible for building sql statements for select and select distinct queries if you need to JOIN on more than one table or retrieving data from more than one table while doing a JOIN. This needs to be used to make sure the resulting statement is working on all supported db's. Instead of explaining every possible combination, i will give a short example:</p>
<p>The possible first parameter for sql_build_query() is SELECT or SELECT_DISTINCT. As you can see, the logic is pretty self-explaining. For the LEFT_JOIN key, just add another array if you want to join on to tables for example. The added benefit of using this construct is that you are able to easily build the query statement based on conditions - for example the above LEFT_JOIN is only necessary if server side topic tracking is enabled; a slight adjustement would be:</p>
<p>Always try to optimize your loops if operations are going on at the comparing part, since this part is executed every time the loop is parsed through. For assignments a descriptive name should be chosen. Example:</p>
<pclass="bad">// On every iteration the sizeof function is called</p>
<p>Try to avoid using in_array() on huge arrays, and try to not place them into loops if the array to check consist of more than 20 entries. in_array() can be very time consuming and uses a lot of cpu processing time. For little checks it is not noticable, but if checked against a huge array within a loop those checks alone can be a bunch of seconds. If you need this functionality, try using isset() on the arrays keys instead, actually shifting the values into keys and vice versa. A call to <code>isset($array[$var])</code> is a lot faster than <code>in_array($var, array_keys($array))</code> for example.</p>
<p>Never trust user input (this also applies to server variables as well as cookies).</p>
<p>Try to sanitize values returned from a function.</p>
<p>Try to sanitize given function variables within your function.</p>
<p>The auth class should be used for all authorisation checking.</p>
<p>No attempt should be made to remove any copyright information (either contained within the source or displayed interactively when the source is run/compiled), neither should the copyright information be altered in any way (it may be added to).</p>
<p>Make use of the <code>request_var()</code> function for anything except for submit or single checking params. </p>
<p>The request_var function determines the type to set from the second parameter (which determines the default value too). If you need to get a scalar variable type, you need to tell this the request_var function explicitly. Examples:</p>
<p>The <code>login_box()</code> function can have a redirect as the first parameter. As a thumb of rule, specify an empty string if you want to redirect to the users current location, else do not add the <code>$SID</code> to the redirect string (for example within the ucp/login we redirect to the board index because else the user would be redirected to the login screen).</p>
<p>For sensitive operations always let the user confirm the action. For the confirmation screens, make use of the <code>confirm_box()</code> function.</p>
<p>For operations altering the state of the database, for instance posting, always verify the form token, unless you are already using <code>confirm_box()</code>. To do so, make use of the <code>add_form_key()</code> and <code>check_form_key()</code> functions. </p>
<p>The string passed to <code>add_form_key()</code> needs to match the string passed to <code>check_form_key()</code>. Another requirement for this to work correctly is that all forms include the <code>{S_FORM_TOKEN}</code> template variable.</p>
<p>All urls pointing to internal files need to be prepended by the <code>$phpbb_root_path</code> variable. Within the administration control panel all urls pointing to internal files need to be prepended by the <code>$phpbb_admin_path</code> variable. This makes sure the path is always correct and users being able to just rename the admin folder and the acp still working as intended (though some links will fail and the code need to be slightly adjusted).</p>
<p>The <code>append_sid()</code> function from 2.0.x is available too, though does not handle url alterations automatically. Please have a look at the code documentation if you want to get more details on how to use append_sid(). A sample call to append_sid() can look like this:</p>
<p>Some of these functions are only chosen over others because of personal preference and having no other benefit than to be consistant over the code.</p>
<p>Your page should either call <code>page_footer()</code> in the end to trigger output through the template engine and terminate the script, or alternatively at least call the <code>exit_handler()</code>. That call is necessary because it provides a method for external applications embedding phpBB to be called at the end of the script.</p>
<p>Style cfg files are simple name-value lists with the information necessary for installing a style. Similar cfg files exist for templates, themes and imagesets. These follow the same principle and will not be introduced individually. Styles can use installed components by using the required_theme/required_template/required_imageset entries. The important part of the style configuration file is assigning an unique name.</p>
<p>Templates should be produced in a consistent manner. Where appropriate they should be based off an existing copy, e.g. index, viewforum or viewtopic (the combination of which implement a range of conditional and variable forms). Please also note that the intendation and coding guidelines also apply to templates where possible.</p>
<p>The outer table class <code>forumline</code> has gone and is replaced with <code>tablebg</code>.</p>
<p>When writing <code><table></code> the order <code><table class="" cellspacing="" cellpadding="" border="" align=""></code> creates consistency and allows everyone to easily see which table produces which "look". The same applies to most other tags for which additional parameters can be set, consistency is the major aim here.</p>
<p>Each block level element should be indented by one tab, same for tabular elements, e.g. <code><tr></code><code><td></code> etc., whereby the intendiation of <code><table></code> and the following/ending <code><tr></code> should be on the same line. This applies not to div elements of course.</p>
<p>Don't use <code><span></code> more than is essential ... the CSS is such that text sizes are dependent on the parent class. So writing <code><span class="gensmall"><span class="gensmall">TEST</span></span></code> will result in very very small text. Similarly don't use span at all if another element can contain the class definition, e.g.</p>
<p>Try to match text class types with existing useage, e.g. don't use the nav class where viewtopic uses gensmall for example.</p>
<p>Row colours/classes are now defined by the template, use an <code>IF S_ROW_COUNT</code> switch, see viewtopic or viewforum for an example.</p>
<p>Remember block level ordering is important ... while not all pages validate as XHTML 1.0 Strict compliant it is something we're trying to work too.</p>
<p>Use a standard cellpadding of 2 and cellspacing of 0 on outer tables. Inner tables can vary from 0 to 3 or even 4 depending on the need.</p>
<p>The separate catXXXX and thXXX classes are gone. When defining a header cell just use <code><th></code> rather than <code><th class="thHead"></code> etc. Similarly for cat, don't use <code><td class="catLeft"></code> use <code><td class="cat"></code> etc.</p>
<p>Try to retain consistency of basic layout and class useage, i.e. _EXPLAIN text should generally be placed below the title it explains, e.g. <code>{L_POST_USERNAME}<br /><span class="gensmall">{L_POST_USERNAME_EXPLAIN}</span></code> is the typical way of handling this ... there may be exceptions and this isn't a hard and fast rule.</p>
<p>Firstly templates now take the suffix ".html" rather than ".tpl". This was done simply to make the lifes of some people easier wrt syntax highlighting, etc.</p>
<p>All template variables should be named appropriately (using underscores for spaces), language entries should be prefixed with L_, system data with S_, urls with U_, javascript urls with UA_, language to be put in javascript statements with LA_, all other variables should be presented 'as is'.</p>
<p>L_* template variables are automatically tried to be mapped to the corresponding language entry if the code does not set (and therefore overwrite) this variable specifically. For example <code>{L_USERNAME}</code> maps to <code>$user->lang['USERNAME']</code>. The LA_* template variables are handled within the same way, but properly escaped to be put in javascript code. This should reduce the need to assign loads of new lang vars in Modifications.
<p>Something that existed in 2.0.x which no longer exists in 3.0.x is the ability to assign a template to a variable. This was used (for example) to output the jumpbox. Instead (perhaps better, perhaps not but certainly more flexible) we now have INCLUDE. This takes the simple form:</p>
<p>You will note in the 3.0 templates the major sources start with <code><!-- INCLUDE overall_header.html --></code> or <code><!-- INCLUDE simple_header.html --></code>, etc. In 2.0.x control of "which" header to use was defined entirely within the code. In 3.0.x the template designer can output what they like. Note that you can introduce new templates (i.e. other than those in the default set) using this system and include them as you wish ... perhaps useful for a common "menu" bar or some such. No need to modify loads of files as with 2.0.x.</p>
<p>Added in <strong>3.0.6</strong> is the ability to include a file using a template variable to specify the file, this functionality only works for root variables (i.e. not block variables).</p>
<divclass="codebox"><pre>
<spanclass="comment"><!-- INCLUDE {FILE_VAR} --></span>
<p>A contentious decision has seen the ability to include PHP within the template introduced. This is achieved by enclosing the PHP within relevant tags:</p>
<p>it will be included and executed inline.<br/><br/>A note, it is very much encouraged that template designers do not include PHP. The ability to include raw PHP was introduced primarily to allow end users to include banner code, etc. without modifying multiple files (as with 2.0.x). It was not intended for general use ... hence <!-- w --><ahref="http://www.phpbb.com">www.phpbb.com</a><!-- w --> will <strong>not</strong> make available template sets which include PHP. And by default templates will have PHP disabled (the admin will need to specifically activate PHP for a template).</p>
<p>The most significant addition to 3.0.x are conditions or control structures, "if something then do this else do that". The system deployed is very similar to Smarty. This may confuse some people at first but it offers great potential and great flexibility with a little imagination. In their most simple form these constructs take the form:</p>
<p>This will output the markup if the S_ROW_COUNT variable in the current iteration of loop is an even value (i.e. the expr is TRUE). You can use various comparison methods (standard as well as equivalent textual versions noted in square brackets) including (<code>not, or, and, eq, neq, is</code> should be used if possible for better readability):</p>
<p>Each statement will be tested in turn and the relevant output generated when a match (if a match) is found. It is not necessary to always use ELSEIF, ELSE can be used alone to match "everything else".<br/><br/>So what can you do with all this? Well take for example the colouration of rows in viewforum. In 2.0.x row colours were predefined within the source as either row color1, row color2 or row class1, row class2. In 3.0.x this is moved to the template, it may look a little daunting at first but remember control flows from top to bottom and it's not too difficult:</p>
<p>This will cause the row cell to be output using class row1 when the row count is even, and class row2 otherwise. The S_ROW_COUNT parameter gets assigned to loops by default. Another example would be the following: </p>
<p>This will output the row cell in purple for the first two rows, blue for rows 2 to 5, green for rows 5 to 10 and red for remainder. So, you could produce a "nice" gradient effect, for example.<br/><br/>What else can you do? Well, you could use IF to do common checks on for example the login state of a user:</p>
<p>This will cause the markup between <code>BEGINELSE</code> and <code>END</code> to be output if the loop contains no values. This is useful for forums with no topics (for example) ... in some ways it replaces "bits of" the existing "switch_" type control (the rest being replaced by conditionals).</p>
<p>Another way of checking if a loop contains values is by prefixing the loops name with a dot:</p>
<p>Sometimes it is necessary to break out of nested loops to be able to call another loop within the current iteration. This sounds a little bit confusing and it is not used very often. The following (rather complex) example shows this quite good - it also shows how you test for the first and last row in a loop (i will explain the example in detail further down):</p>
<p>Here we open the loop l_block1 and doing some things if the value S_SELECTED within the current loop iteration is true, else we write the blocks link and title. Here, you see <code>{l_block1.L_TITLE}</code> referenced - you remember that L_* variables get automatically assigned the corresponding language entry? This is true, but not within loops. The L_TITLE variable within the loop l_block1 is assigned within the code itself.</p>
<p>The <code><!-- IF S_PRIVMSGS --></code> statement clearly checks a global variable and not one within the loop, since the loop is not given here. So, if S_PRIVMSGS is true we execute the shown markup. Now, you see the <code><!-- BEGIN !folder --></code> statement. The exclamation mark is responsible for instructing the template engine to iterate through the main loop folder. So, we are now within the loop folder - with <code><!-- BEGIN folder --></code> we would have been within the loop <code>l_block1.folder</code> automatically as is the case with l_block2:</p>
<p>You may have wondered what the comparison to S_FIRST_ROW and S_LAST_ROW is about. If you haven't guessed already - it is checking for the first iteration of the loop with <code>S_FIRST_ROW</code> and the last iteration with <code>S_LAST_ROW</code>. This can come in handy quite often if you want to open or close design elements, like the above list. Let us imagine a folder loop build with three iterations, it would go this way:</p>
<p>As you can see, all three elements are written down as well as the markup for the first iteration and the last one. Sometimes you want to omit writing the general markup - for example:</p>
<p>If a form is used for a non-trivial operation (i.e. more than a jumpbox), then it should include the <code>{S_FORM_TOKEN}</code> template variable.</p>
<p>When basing a new template on an existing one, it is not necessary to provide all template files. By declaring the template to be "<strong>inheriting</strong>" in the template configuration file.</p>
<p>The effect of doing so is that the template engine will use the files in the new template where they exist, but fall back to files in the base template otherwise. Declaring a style to be inheriting also causes it to use some of the configuration settings of the base style, notably database storage.</p>
<p>The <ahref="http://en.wikipedia.org/wiki/Universal_Character_Set">Universal Character Set (UCS)</a> described in ISO/IEC 10646 consists of a large amount of characters. Each of them has a unique name and a code point which is an integer number. <ahref="http://en.wikipedia.org/wiki/Unicode">Unicode</a> - which is an industry standard - complements the Universal Character Set with further information about the characters' properties and alternative character encodings. More information on Unicode can be found on the <ahref="http://www.unicode.org/">Unicode Consortium's website</a>. One of the Unicode encodings is the <ahref="http://en.wikipedia.org/wiki/UTF-8">8-bit Unicode Transformation Format (UTF-8)</a>. It encodes characters with up to four bytes aiming for maximum compatibility with the <ahref="http://en.wikipedia.org/wiki/ASCII">American Standard Code for Information Interchange</a> which is a 7-bit encoding of a relatively small subset of the UCS.</p>
<p>Unfortunately PHP does not faciliate the use of Unicode prior to version 6. Most functions simply treat strings as sequences of bytes assuming that each character takes up exactly one byte. This behaviour still allows for storing UTF-8 encoded text in PHP strings but many operations on strings have unexpected results. To circumvent this problem we have created some alternative functions to PHP's native string operations which use code points instead of bytes. These functions can be found in <code>/includes/utf/utf_tools.php</code>. They are also covered in the <ahref="http://area51.phpbb.com/docs/code/">phpBB3 Sourcecode Documentation</a>. A lot of native PHP functions still work with UTF-8 as long as you stick to certain restrictions. For example <code>explode</code> still works as long as the first and the last character of the delimiter string are ASCII characters.</p>
<p>phpBB only uses the ASCII and the UTF-8 character encodings. Still all Strings are UTF-8 encoded because ASCII is a subset of UTF-8. The only exceptions to this rule are code sections which deal with external systems which use other encodings and character sets. Such external data should be converted to UTF-8 using the <code>utf8_recode()</code> function supplied with phpBB. It supports a variety of other character sets and encodings, a full list can be found below.</p>
<p>With <code>request_var()</code> you can either allow all UCS characters in user input or restrict user input to ASCII characters. This feature is controlled by the function's third parameter called <code>$multibyte</code>. You should allow multibyte characters in posts, PMs, topic titles, forum names, etc. but it's not necessary for internal uses like a <code>$mode</code> variable which should only hold a predefined list of ASCII strings anyway.</p>
<p>If you retrieve user input with multibyte characters you should additionally normalize the string using <code>utf8_normalize_nfc()</code> before you work with it. This is necessary to make sure that equal characters can only occur in one particular binary representation. For example the character Å can be represented either as <code>U+00C5</code> (LATIN CAPITAL LETTER A WITH RING ABOVE) or as <code>U+212B</code> (ANGSTROM SIGN). phpBB uses Normalization Form Canonical Composition (NFC) for all text. So the correct version of the above example would look like this:</p>
<p>Case insensitive comparison of strings is no longer possible with <code>strtolower</code> or <code>strtoupper</code> as some characters have multiple lower case or multiple upper case forms depending on their position in a word. The <code>utf8_strtolower</code> and the <code>utf8_strtoupper</code> functions suffer from the same problem so they can only be used to display upper/lower case versions of a string but they cannot be used for case insensitive comparisons either. So instead you should use case folding which gives you a case insensitive version of the string which can be used for case insensitive comparisons. An NFC normalized string can be case folded using <code>utf8_case_fold_nfc()</code>.</p>
<p>phpBB offers a special method <code>utf8_clean_string</code> which can be used to make sure string identifiers are unique. This method uses Normalization Form Compatibility Composition (NFKC) instead of NFC and replaces similarly looking characters with a particular representative of the equivalence class. This method is currently used for usernames and group names to avoid confusion with similarly looking names.</p>
<p>phpBB is one of the most translated open-source projects, with the current stable version being available in over 60 localisations. Whilst the ad hoc approach to the naming of language packs has worked, for phpBB3 and beyond we hope to make this process saner which will allow for better interoperation with current and future web browsers.</p>
<p>With phpBB3, the output encoding for the forum in now UTF-8, a Universal Character Encoding by the Unicode Consortium that is by design a superset to US-ASCII and ISO-8859-1. By using one character set which simultaenously supports all scripts which previously would have required different encodings (eg: ISO-8859-1 to ISO-8859-15 (Latin, Greek, Cyrillic, Thai, Hebrew, Arabic); GB2312 (Simplified Chinese); Big5 (Traditional Chinese), EUC-JP (Japanese), EUC-KR (Korean), VISCII (Vietnamese); et cetera), this removes the need to convert between encodings and improves the accessibility of multilingual forums.</p>
<p>The impact is that the language files for phpBB must now also be encoded as UTF-8, with a caveat that the files must <strong>not contain</strong> a <acronymtitle="Byte-Order-Mark">BOM</acronym> for compatibility reasons with non-Unicode aware versions of PHP. For those with forums using the Latin character set (ie: most European languages), this change is transparent since UTF-8 is superset to US-ASCII and ISO-8859-1.</p>
<p>The <abbrtitle="Internet Engineering Task Force">IETF</abbr> recently published <ahref="http://tools.ietf.org/html/rfc4646">RFC 4646</a> for tags used to identify languages, which in combination with <ahref="http://tools.ietf.org/html/rfc4647">RFC 4647</a> obseletes the older <ahref="http://tools.ietf.org/html/rfc3066">RFC 3006</a> and older-still <ahref="http://tools.ietf.org/html/rfc1766">RFC 1766</a>. <ahref="http://tools.ietf.org/html/rfc4646">RFC 4646</a> uses <ahref="http://www.loc.gov/standards/iso639-2/php/English_list.php">ISO 639-1/ISO 639-2</a>, <ahref="http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html">ISO 3166-1 alpha-2</a>, <ahref="http://www.unicode.org/iso15924/iso15924-codes.html">ISO 15924</a> and <ahref="http://unstats.un.org/unsd/methods/m49/m49.htm">UN M.49</a> to define a language tag. Each complete tag is composed of subtags which are not case sensitive and can also be empty.</p>
<p>Ordering of the subtags in the case that they are all non-empty is: <code>language</code>-<code>script</code>-<code>region</code>-<code>variant</code>-<code>extension</code>-<code>privateuse</code>. Should any subtag be empty, its corresponding hyphen would also be ommited. Thus, the language tag for English will be <code>en</code><strong>and not</strong><code>en-----</code>.</p>
<p>Most language tags consist of a two- or three-letter language subtag (from <ahref="http://www.loc.gov/standards/iso639-2/php/English_list.php">ISO 639-1/ISO 639-2</a>). Sometimes, this is followed by a two-letter or three-digit region subtag (from <ahref="http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html">ISO 3166-1 alpha-2</a> or <ahref="http://unstats.un.org/unsd/methods/m49/m49.htm">UN M.49</a>). Some examples are:</p>
<tablesummary="Examples of various possible language tags as described by RFC 4646 and RFC 4647">
<p>The ultimate aim of a language tag is to convey the needed <strong>useful distingushing information</strong>, whilst keeping it as <strong>short as possible</strong>. So for example, use <code>en</code>, <code>fr</code> and <code>ja</code> as opposed to <code>en-GB</code>, <code>fr-FR</code> and <code>ja-JP</code>, since we know English, French and Japanese are the native language of Great Britain, France and Japan respectively.</p>
<p>Next is the <ahref="http://www.unicode.org/iso15924/iso15924-codes.html">ISO 15924</a> language script code and when one should or shouldn't use it. For example, whilst <code>en-Latn</code> is syntaxically correct for describing English written with Latin script, real world English writing is <strong>more-or-less exclusively in the Latin script</strong>. For such languages like English that are written in a single script, the <ahref="http://www.iana.org/assignments/language-subtag-registry"><abbrtitle="Internet Assigned Numbers Authority">IANA</abbr> Language Subtag Registry</a> has a "Suppress-Script" field meaning the script code <strong>should be ommitted</strong> unless a specific language tag requires a specific script code. Some languages are <strong>written in more than one script</strong> and in such cases, the script code <strong>is encouraged</strong> since an end-user may be able to read their language in one script, but not the other. Some examples are:</p>
<tablesummary="Examples of using a language subtag in combination with a script subtag">
<p>Usage of the three-digit <ahref="http://unstats.un.org/unsd/methods/m49/m49.htm">UN M.49</a> code over the two-letter <ahref="http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html">ISO 3166-1 alpha-2</a> code should hapen if a macro-geographical entity is required and/or the <ahref="http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html">ISO 3166-1 alpha-2</a> is ambiguous.</p>
<p>Examples of English using marco-geographical regions:</p>
<tablesummary="Examples for English of ISO 3166-1 alpha-2 vs. UN M.49 code">
<caption>Coding for English using macro-geographical regions</caption>
<thead>
<tr>
<thscope="col">ISO 639-1/ISO 639-2 + ISO 3166-1 alpha-2</th>
<thscope="col"colspan="2">ISO 639-1/ISO 639-2 + UN M.49 (Example macro regions)</th>
</tr>
</thead>
<tbody>
<tr>
<td><dl><dt><code>en-AU</code></dt><dd>English as used in <strong>Australia</strong></dd></dl></td>
<tdrowspan="2"><dl><dt><code>en-053</code></dt><dd>English as used in <strong>Australia & New Zealand</strong></dd></dl></td>
<tdrowspan="3"><dl><dt><code>en-009</code></dt><dd>English as used in <strong>Oceania</strong></dd></dl></td>
</tr>
<tr>
<td><dl><dt><code>en-NZ</code></dt><dd>English as used in <strong>New Zealand</strong></dd></dl></td>
</tr>
<tr>
<td><dl><dt><code>en-FJ</code></dt><dd>English as used in <strong>Fiji</strong></dd></dl></td>
<td><dl><dt><code>en-054 </code></dt><dd>English as used in <strong>Melanesia</strong></dd></dl></td>
</tr>
</tbody>
</table>
<p>Examples of Spanish using marco-geographical regions:</p>
<tablesummary="Examples for Spanish of ISO 3166-1 alpha-2 vs. UN M.49 code">
<caption>Coding for Spanish macro-geographical regions</caption>
<thead>
<tr>
<thscope="col">ISO 639-1/ISO 639-2 + ISO 3166-1 alpha-2</th>
<thscope="col"colspan="2">ISO 639-1/ISO 639-2 + UN M.49 (Example macro regions)</th>
</tr>
</thead>
<tbody>
<tr>
<td><dl><dt><code>es-PR</code></dt><dd>Spanish as used in <strong>Puerto Rico</strong></dd></dl></td>
<tdrowspan="3"><dl><dt><code>es-419</code></dt><dd>Spanish as used in <strong>Latin America & the Caribbean</strong></dd></dl></td>
<tdrowspan="4"><dl><dt><code>es-019</code></dt><dd>Spanish as used in <strong>the Americas</strong></dd></dl></td>
</tr>
<tr>
<td><dl><dt><code>es-HN</code></dt><dd>Spanish as used in <strong>Honduras</strong></dd></dl></td>
</tr>
<tr>
<td><dl><dt><code>es-AR</code></dt><dd>Spanish as used in <strong>Argentina</strong></dd></dl></td>
</tr>
<tr>
<td><dl><dt><code>es-US</code></dt><dd>Spanish as used in <strong>United States of America</strong></dd></dl></td>
<td><dl><dt><code>es-021</code></dt><dd>Spanish as used in <strong>North America</strong></dd></dl></td>
</tr>
</tbody>
</table>
<p>Example of where the <ahref="http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html">ISO 3166-1 alpha-2</a> is ambiguous and why <ahref="http://unstats.un.org/unsd/methods/m49/m49.htm">UN M.49</a> might be preferred:</p>
<tablesummary="Example where the ISO 3166-1 alpha-2 is ambiguous">
<caption>Coding for ambiguous ISO 3166-1 alpha-2 regions</caption>
<p><ahref="http://tools.ietf.org/html/rfc4646">RFC 4646</a> anticipates features which shall be available in (currently draft) <ahref="http://www.sil.org/iso639-3/">ISO 639-3</a> which aims to provide as complete enumeration of languages as possible, including living, extinct, ancient and constructed languages, whether majour, minor or unwritten. A new feature of <ahref="http://www.sil.org/iso639-3/">ISO 639-3</a> compared to the previous two revisions is the concept of <ahref="http://www.sil.org/iso639-3/macrolanguages.asp">macrolanguages</a> where Arabic and Chinese are two such examples. In such cases, their respective codes of <code>ar</code> and <code>zh</code> is very vague as to which dialect/topolect is used or perhaps some terse classical variant which may be difficult for all but very educated users. For such macrolanguages, it is recommended that the sub-language tag is used as a suffix to the macrolanguage tag, eg:</p>
<tablesummary="Examples of macrolanguages used with sub-language subtags">
<p>For phpBB, the language tags are <strong>not</strong> used in their raw form and instead converted to all lower-case and have the hyphen <code>-</code> replaced with an underscore <code>_</code> where appropriate, with some examples below:</p>
<p><code>iso.txt</code> is automatically generated by the language pack submission system on phpBB.com. You don't have to create this file yourself if you plan on releasing your language pack on phpBB.com, but do keep in mind that phpBB itself does require this file to be present.</p>
<p>Because language tags themselves are meant to be machine read, they can be rather obtuse to humans and why descriptive strings as provided by <code>iso.txt</code> are needed. Whilst <code>en-US</code> could be fairly easily deduced to be "English as used in the United States", <code>de-CH</code> is more difficult less one happens to know that <code>de</code> is from "<spanlang="de">Deutsch</span>", German for "German" and <code>CH</code> is the abbreviation of the official Latin name for Switzerland, "<spanlang="la">Confoederatio Helvetica</span>".</p>
<p>For the English language description, the language name is always first and any additional attributes required to describe the subtags within the language code are then listed in order separated with commas and enclosed within parentheses, eg:</p>
<tablesummary="English language description examples of iso.txt for usage in phpBB">
<caption>English language description examples for iso.txt</caption>
<thead>
<tr>
<thscope="col">Raw language tag</th>
<thscope="col">English description within <code>iso.txt</code></th>
</tr>
</thead>
<tbody>
<tr>
<td><code>en</code></td>
<td>British English</td>
</tr>
<tr>
<td><code>en-US</code></td>
<td>English (United States)</td>
</tr>
<tr>
<td><code>en-053</code></td>
<td>English (Australia & New Zealand)</td>
</tr>
<tr>
<td><code>de</code></td>
<td>German</td>
</tr>
<tr>
<td><code>de-CH-1996</code></td>
<td>German (Switzerland, 1996 orthography)</td>
</tr>
<tr>
<td><code>gws-1996</code></td>
<td>Swiss German (1996 orthography)</td>
</tr>
<tr>
<td><code>zh-cmn-Hans-CN</code></td>
<td>Mandarin Chinese (Simplified, Mainland China)</td>
</tr>
<tr>
<td><code>zh-yue-Hant-HK</code></td>
<td>Cantonese Chinese (Traditional, Hong Kong)</td>
<p>For the localised language description, just translate the English version though use whatever appropriate punctuation typical for your own locale, assuming the language uses punctuation at all.</p>
<p>Because phpBB is now UTF-8, all translators must take into account that certain strings may be shown when the directionality of the document is either opposite to normal or is ambiguous.</p>
<p>For <code>iso.txt</code>, the directionality of the text can be explicitly set using special Unicode characters via any of the three methods provided by left-to-right/right-to-left markers/embeds/overrides, as without them, the ordering of characters will be incorrect, eg:</p>
<p>In choosing which of the three methods to use, in the majority of cases, the <code>LRM</code> or <code>RLM</code> to put a "strong" character to fully enclose an ambiguous punctuation character and thus make it inherit the correct directionality is sufficient.</p>
<p>Within some cases, there may be mixed scripts of a left-to-right and right-to-left direction, so using <code>LRE</code>&<code>RLE</code> with <code>PDF</code> may be more appropriate. Lastly, in very rare instances where directionality must be forced, then use <code>LRO</code>&<code>RLO</code> with <code>PDF</code>.</p>
<p>For further information on authoring techniques of bi-directional text, please see the W3C tutorial on <ahref="http://www.w3.org/International/tutorials/bidi-xhtml/">authoring techniques for XHTML pages with bi-directional text</a>.</p>
<p>As phpBB is translated into languages with different ordering rules to that of English, it is possible to show specific values in any order deemed appropriate. Take for example the extremely simple "Page <em>X</em> of <em>Y</em>", whilst in English this could just be coded as:</p>
<p>As the language files are PHP files, where the various strings for phpBB are stored within an array which in turn are used for display within an HTML page, rules of syntax for both must be considered. Potentially problematic characters are: <code>'</code> (straight quote/apostrophe), <code>"</code> (straight double quote), <code><</code> (less-than sign), <code>></code> (greater-than sign) and <code>&</code> (ampersand).</p>
<p>However, because phpBB3 now uses UTF-8 as its sole encoding, we can actually use this to our advantage and not have to remember to escape a straight quote when we don't have to:</p>
<p>The <code>"</code> (straight double quote), <code><</code> (less-than sign) and <code>></code> (greater-than sign) characters can all be used as displayed glyphs or as part of HTML markup, for example:</p>
<p>As for how these charcters are entered depends very much on choice of Operating System, current language locale/keyboard configuration and native abilities of the text editor used to edit phpBB language files. Please see <ahref="http://en.wikipedia.org/wiki/Unicode#Input_methods">http://en.wikipedia.org/wiki/Unicode#Input_methods</a> for more information.</p>
<p>The default language pack bundled with phpBB is <strong>British English</strong> using <ahref="http://www.cambridge.org/">Cambridge University Press</a> spelling and is assigned the language code <code>en</code>. The style and tone of writing tends towards formal and translations <strong>should</strong> emulate this style, at least for the variant using the most compact language code. Less formal translations or those with colloquialisms <strong>must</strong> be denoted as such via either an <code>extension</code> or <code>privateuse</code> tag within its language code.</p>
<p>The version control system for phpBB3 is subversion. The repository is available at <ahref="http://code.phpbb.com/svn/phpbb"title="repository">http://code.phpbb.com/svn/phpbb</a>.</p>
<li><strong>trunk</strong><br/>The latest unstable development version with new features etc. Contains the actual board in <code>/trunk/phpBB</code></li>
<li><strong>branches</strong><br/>Development branches of stable phpBB releases. Copied from <code>/trunk</code> at the time of release.
<ul>
<li><strong>phpBB3.0</strong><code>/branches/phpBB-3_0_0/phpBB</code><br/>Development branch of the stable 3.0 line. Bug fixes are applied here.</li>
<li><strong>phpBB2</strong><code>/branches/phpBB-2_0_0/phpBB</code><br/>Old phpBB2 development branch.</li>
</ul>
</li>
<li><strong>tags</strong><br/>Released versions. Copies of trunk or the respective branch, made at the time of release.
<ul>
<li><code>/tags/release_3_0_BX</code><br/>Beta release X of the 3.0 line.</li>
<li><code>/tags/release_3_0_RCX</code><br/>Release candidate X of the 3.0 line.</li>
<li><code>/tags/release_3_0_X-RCY</code><br/>Release candidate Y of the stable 3.0.X release.</li>
<p>The commit message should contain a brief explanation of all changes made within the commit. Often identical to the changelog entry. A bug ticket can be referenced by specifying the ticket ID with a hash, e.g. #12345. A reference to another revision should simply be prefixed with r, e.g. r12345.</p>
<p>Junior Developers need to have their patches approved by a development team member first. The commit message must end in a line with the following format:</p>
<li>Added <ahref="#translation">6. Translation (<abbrtitle="Internationalisation">i18n</abbr>/<abbrtitle="Localisation">L10n</abbr>) Guidelines</a> section to explain expected format and authoring considerations for language packs that are to be created for phpBB.</li>
<p>This application is opensource software released under the <ahref="http://opensource.org/licenses/gpl-license.php">GPL</a>. Please see source code and the docs directory for more details. This package and its contents are Copyright (c) 2000, 2002, 2005, 2007 <ahref="http://www.phpbb.com/">phpBB Group</a>, All Rights Reserved.</p>