Commit e429a2e17053d16efc5b9bcb61c22221e5075765

Authored by Jay Berkenbilt
1 parent 30380b64

Describe content normalization edge cases in manual

Showing 1 changed file with 34 additions and 1 deletions
manual/qpdf-manual.xml
... ... @@ -1050,7 +1050,10 @@ outfile.pdf</option>
1050 1050 <term><option>--normalize-content=[yn]</option></term>
1051 1051 <listitem>
1052 1052 <para>
1053   - Enables or disables normalization of content streams.
  1053 + Enables or disables normalization of content streams. Content
  1054 + normalization is enabled by default in QDF mode. Please see
  1055 + <xref linkend="ref.qdf"/> for additional discussion of QDF
  1056 + mode.
1054 1057 </para>
1055 1058 </listitem>
1056 1059 </varlistentry>
... ... @@ -1206,6 +1209,36 @@ outfile.pdf&lt;/option&gt;
1206 1209 You should not use this for &ldquo;production&rdquo; PDF files.
1207 1210 </para>
1208 1211 <para>
  1212 + This paragraph discusses edge cases of content normalization that
  1213 + are not of concern to most users and are not relevant when content
  1214 + normalization is not enabled. When normalizing content, if qpdf
  1215 + runs into any lexical errors, it will print a warning indicating
  1216 + that content may be damaged. The only situation in which qpdf is
  1217 + known to cause damage during content normalization is when a
  1218 + page's contents are split across multiple streams and streams are
  1219 + split in the middle of a lexical token such as a string, name, or
  1220 + inline image. There may be some pathological cases in which qpdf
  1221 + could damage content without noticing this, such as if the partial
  1222 + tokens at the end of one stream and the beginning of the next
  1223 + stream are both valid, but usually qpdf will be able to detect
  1224 + this case. For slightly increased safety, you can specify
  1225 + <option>--coalesce-contents</option> in addition to
  1226 + <option>--normalize-content</option> or <option>--qdf</option>.
  1227 + This will cause qpdf to combine all the content streams into one,
  1228 + thus recombining any split tokens. However doing this will prevent
  1229 + you from being able to see the original layout of the content
  1230 + streams. If you must inspect the original content streams in an
  1231 + uncompressed format, you can always run with <option>--qdf
  1232 + --normalize-content=n</option> for a QDF file without content
  1233 + normalization, or alternatively
  1234 + <option>--stream-data=uncompress</option> for a regular non-QDF
  1235 + mode file with uncompressed streams. These will both uncompress
  1236 + all the streams but will not attempt to normalize content. Please
  1237 + note that if you are using content normalization or QDF mode for
  1238 + the purpose of manually inspecting files, you don't have to care
  1239 + about this.
  1240 + </para>
  1241 + <para>
1209 1242 Object streams, also known as compressed objects, were introduced
1210 1243 into the PDF specification at version 1.5, corresponding to
1211 1244 Acrobat 6. Some older PDF viewers may not support files with
... ...