Support

 

Use unicode in string values

The PDF format allows you to use unicode characters in certain strings, which contain information that is indeed to be human-readable. Examples for such strings are: document information (e.g. SetaPDF_Merger::setTitle()), bookmark titles/names (e.g. SetaPDF_Merger::addFile()) or formfield values (e.g. SetaPDF_FormField::setValue() - if some other requirements are given).

In PDF Unicode characters have to be in UTF16-BE (big endian) encoding. The first 2 bytes of the passed strings must be 254 followed by 255. These 2 bytes represents the Unicode byte order marker:

PHP offers some functions/extensions to convert a string from one character encoding to another one:

Also there are several pure php scripts on the road, which are able to convert e.g. UTF-8 to UTF-16BE.

In the following examples we'll use UTF-8 as input encoding all the time. To use a different input encoding just change the in_charset/from_encoding parameter of the iconv or mb_convert_encoding function.

The general process for converting with both function is:

Here's an example with the SetaPDF-Merger API which sets some document information and bookmark titles in Unicode:

The resulting document will looks like this:

For sure you can write a kind of helper function/method which will to the conversation and prepending the BOM for you. We didn't do it here just to show you how things are going.

Further information about "Text Strings" and Unicode can be found in the PDF Reference 3.8.1.