Commit e429a2e17053d16efc5b9bcb61c22221e5075765
1 parent
30380b64
Describe content normalization edge cases in manual
Showing
1 changed file
with
34 additions
and
1 deletions
manual/qpdf-manual.xml
| @@ -1050,7 +1050,10 @@ outfile.pdf</option> | @@ -1050,7 +1050,10 @@ outfile.pdf</option> | ||
| 1050 | <term><option>--normalize-content=[yn]</option></term> | 1050 | <term><option>--normalize-content=[yn]</option></term> |
| 1051 | <listitem> | 1051 | <listitem> |
| 1052 | <para> | 1052 | <para> |
| 1053 | - Enables or disables normalization of content streams. | 1053 | + Enables or disables normalization of content streams. Content |
| 1054 | + normalization is enabled by default in QDF mode. Please see | ||
| 1055 | + <xref linkend="ref.qdf"/> for additional discussion of QDF | ||
| 1056 | + mode. | ||
| 1054 | </para> | 1057 | </para> |
| 1055 | </listitem> | 1058 | </listitem> |
| 1056 | </varlistentry> | 1059 | </varlistentry> |
| @@ -1206,6 +1209,36 @@ outfile.pdf</option> | @@ -1206,6 +1209,36 @@ outfile.pdf</option> | ||
| 1206 | You should not use this for “production” PDF files. | 1209 | You should not use this for “production” PDF files. |
| 1207 | </para> | 1210 | </para> |
| 1208 | <para> | 1211 | <para> |
| 1212 | + This paragraph discusses edge cases of content normalization that | ||
| 1213 | + are not of concern to most users and are not relevant when content | ||
| 1214 | + normalization is not enabled. When normalizing content, if qpdf | ||
| 1215 | + runs into any lexical errors, it will print a warning indicating | ||
| 1216 | + that content may be damaged. The only situation in which qpdf is | ||
| 1217 | + known to cause damage during content normalization is when a | ||
| 1218 | + page's contents are split across multiple streams and streams are | ||
| 1219 | + split in the middle of a lexical token such as a string, name, or | ||
| 1220 | + inline image. There may be some pathological cases in which qpdf | ||
| 1221 | + could damage content without noticing this, such as if the partial | ||
| 1222 | + tokens at the end of one stream and the beginning of the next | ||
| 1223 | + stream are both valid, but usually qpdf will be able to detect | ||
| 1224 | + this case. For slightly increased safety, you can specify | ||
| 1225 | + <option>--coalesce-contents</option> in addition to | ||
| 1226 | + <option>--normalize-content</option> or <option>--qdf</option>. | ||
| 1227 | + This will cause qpdf to combine all the content streams into one, | ||
| 1228 | + thus recombining any split tokens. However doing this will prevent | ||
| 1229 | + you from being able to see the original layout of the content | ||
| 1230 | + streams. If you must inspect the original content streams in an | ||
| 1231 | + uncompressed format, you can always run with <option>--qdf | ||
| 1232 | + --normalize-content=n</option> for a QDF file without content | ||
| 1233 | + normalization, or alternatively | ||
| 1234 | + <option>--stream-data=uncompress</option> for a regular non-QDF | ||
| 1235 | + mode file with uncompressed streams. These will both uncompress | ||
| 1236 | + all the streams but will not attempt to normalize content. Please | ||
| 1237 | + note that if you are using content normalization or QDF mode for | ||
| 1238 | + the purpose of manually inspecting files, you don't have to care | ||
| 1239 | + about this. | ||
| 1240 | + </para> | ||
| 1241 | + <para> | ||
| 1209 | Object streams, also known as compressed objects, were introduced | 1242 | Object streams, also known as compressed objects, were introduced |
| 1210 | into the PDF specification at version 1.5, corresponding to | 1243 | into the PDF specification at version 1.5, corresponding to |
| 1211 | Acrobat 6. Some older PDF viewers may not support files with | 1244 | Acrobat 6. Some older PDF viewers may not support files with |