PDF parsing and handling can be an expensive task in view of needed cpu-power.
To avoid doing default tasks for a single document a few times the parser class offers a caching mechanism to reduce the overhead and avoid reparsing of PDF documents a few times.
The parser simply saves serialized data in the filesystem and load them back if needed. This data can be used with ANY SetaPDF API. So if for example the SetaPDF-Merger API creates the cache data, the SetaPDF-Stamper API can benefit from them.
As of this, the handling of the cache mechanism is done through static methods of the SetaPDF_Parser class. Calls to this methods will change static variables in their method contexts, so that changes doesn't depend on the object instance but applies to all instances of a parser object. (We used static variable because of compatibility to PHP4)
There are 2 parts that the parser can cache:
This is a kind of table of contents of a PDF document. It includes information about all objects in a document and their byte-offset positions in the document. Often documents include several hundreds or thousands of entries in that table. Further more a PDF document can include more than one xref table, which relys on several updates of a document (incremental updates). But at least all tables have to be processed to get the final state of the document... By caching that data, the parser don't have to reparse the xref table out of the document.
Each entry in the above described xref table points to an object representing specific data, like Images, Fonts, Pages,... If the parser should read such an object it have to go to the desired byte-offset position in the document, known from the xref-table, and have to parse the object token-wise. This process needs several string comparsions and also runs recursive until the object is totally read.
The parser can cache the read objects and use the cached versions at the next situation when it is needed. No byte-position change or parsing of any string is done but simply unserializing the data from the cached data.
As already written the handling of the cache functionallity is done by static methods of the SetaPDF_Parser class.
You can use the static method right after including a desired API like the SetaPDF-Merger API: