Commit 6fca27995ea2d4da06bab85e5d7911e01618ddda

Authored by Jay Berkenbilt
1 parent cc2e8853

Update casting policy in the documentation

README-maintainer
... ... @@ -119,7 +119,7 @@ RELEASE PREPARATION
119 119 * If any interfaces were added or changed, check C API to see whether
120 120 changes are appropriate there as well. If necessary, review the
121 121 casting policy in the manual, and ensure that integer types are
122   - properly handled.
  122 + properly handled with QIntC or the appropriate cast.
123 123  
124 124 * Increment shared library version information as needed (`LT_*` in
125 125 `configure.ac`)
... ...
manual/qpdf-manual.xml
... ... @@ -3340,139 +3340,68 @@ outfile.pdf</option>
3340 3340 and C++ code.
3341 3341 </para>
3342 3342 <para>
3343   - The casting policy explicitly prohibits casting between integer
3344   - sizes for no purpose other than to quiet a compiler warning when
3345   - there is no reasonable chance of a problem resulting. The reason
3346   - for this exclusion is that the practice of adding these additional
3347   - casts precludes future use of additional compiler warnings as a
3348   - tool for making future improvements to this aspect of the code,
3349   - and it also damages the readability of the code.
3350   - </para>
3351   - <para>
3352   - There are a few significant areas where casting is common in the
3353   - qpdf sources or where casting would be required to quiet higher
3354   - levels of compiler warnings but is omitted at present:
3355   - <itemizedlist>
3356   - <listitem>
3357   - <para>
3358   - <type>char</type> vs. <type>unsigned char</type>. For
3359   - historical reasons, there are a lot of places in qpdf's
3360   - internals that deal with <type>unsigned char</type>, which
3361   - means that a lot of casting is required to interoperate with
3362   - standard library calls and <type>std::string</type>. In
3363   - retrospect, qpdf should have probably used regular (signed)
3364   - <type>char</type> and <type>char*</type> everywhere and just
3365   - cast to <type>unsigned char</type> when needed, but it's too
3366   - late to make that change now. There are
3367   - <function>reinterpret_cast</function> calls to go between
3368   - <type>char*</type> and <type>unsigned char*</type>, and there
3369   - are <function>static_cast</function> calls to go between
3370   - <type>char</type> and <type>unsigned char</type>. These should
3371   - always be safe.
3372   - </para>
3373   - </listitem>
3374   - <listitem>
3375   - <para>
3376   - Non-const <type>unsigned char*</type> used in the
3377   - <type>Pipeline</type> interface. The pipeline interface has a
3378   - <function>write</function> call that uses <type>unsigned
3379   - char*</type> without a <type>const</type> qualifier. The main
3380   - reason for this is to support pipelines that make calls to
3381   - third-party libraries, such as zlib, that don't include
3382   - <type>const</type> in their interfaces. Unfortunately, there
3383   - are many places in the code where it is desirable to have
3384   - <type>const char*</type> with pipelines. None of the pipeline
3385   - implementations in qpdf currently modify the data passed to
3386   - write, and doing so would be counter to the intent of
3387   - <type>Pipeline</type>, but there is nothing in the code to
3388   - prevent this from being done. There are places in the code
3389   - where <function>const_cast</function> is used to remove the
3390   - const-ness of pointers going into <type>Pipeline</type>s. This
3391   - could theoretically be unsafe, but there is adequate testing to
3392   - assert that it is safe and will remain safe in qpdf's code.
3393   - </para>
3394   - </listitem>
3395   - <listitem>
3396   - <para>
3397   - <type>size_t</type> vs. <type>qpdf_offset_t</type>. This is
3398   - pretty much unavoidable since sizes are unsigned types and
3399   - offsets are signed types. Whenever it is necessary to seek by
3400   - an amount given by a <type>size_t</type>, it becomes necessary
3401   - to mix and match between <type>size_t</type> and
3402   - <type>qpdf_offset_t</type>. Additionally, qpdf sometimes
3403   - treats memory buffers like files (as with
3404   - <type>BufferInputSource</type>, and those seek interfaces have
3405   - to be consistent with file-based input sources. Neither gcc
3406   - nor MSVC give warnings for this case by default, but both have
3407   - warning flags that can enable this. (MSVC:
3408   - <option>/W14267</option> or <option>/W3</option>, which also
3409   - enables some additional warnings that we ignore; gcc:
3410   - <option>-Wconversion -Wsign-conversion</option>). This could
3411   - matter for files whose sizes are larger than
3412   - 2<superscript>63</superscript> bytes, but it is reasonable to
3413   - expect that a world where such files are common would also have
3414   - larger <type>size_t</type> and <type>qpdf_offset_t</type> types
3415   - in it. On most 64-bit systems at the time of this writing (the
3416   - release of version 4.1.0 of qpdf), both <type>size_t</type> and
3417   - <type>qpdf_offset_t</type> are 64-bit integer types, while on
3418   - many current 32-bit systems, <type>size_t</type> is a 32-bit
3419   - type while <type>qpdf_offset_t</type> is a 64-bit type. I am
3420   - not aware of any cases where 32-bit systems that have
3421   - <type>size_t</type> smaller than <type>qpdf_offset_t</type>
3422   - could run into problems. Although I can't conclusively rule
3423   - out the possibility of such problems existing, I suspect any
3424   - cases would be pretty contrived. In the event that someone
3425   - should produce a file that qpdf can't handle because of what is
3426   - suspected to be issues involving the handling of
3427   - <type>size_t</type> vs. <type>qpdf_offset_t</type> (such files
3428   - may behave properly on 64-bit systems but not on 32-bit systems
3429   - because they have very large embedded files or streams, for
3430   - example), the above mentioned warning flags could be enabled
3431   - and all those implicit conversions could be carefully
3432   - scrutinized. (I have already gone through that exercise once
3433   - in adding support for files larger than 4&nbsp;GB in size.) I
3434   - continue to be committed to supporting large files on 32-bit
3435   - systems, but I would not go to any lengths to support corner
3436   - cases involving large embedded files or large streams that work
3437   - on 64-bit systems but not on 32-bit systems because of
3438   - <type>size_t</type> being too small. It is reasonable to
3439   - assume that anyone working with such files would be using a
3440   - 64-bit system anyway since many 32-bit applications would have
3441   - similar difficulties.
3442   - </para>
3443   - </listitem>
3444   - <listitem>
3445   - <para>
3446   - <type>size_t</type> vs. <type>int</type> or <type>long</type>.
3447   - There are some cases where <type>size_t</type> and
3448   - <type>int</type> or <type>long</type> or <type>size_t</type>
3449   - and <type>unsigned int</type> or <type>unsigned long</type> are
3450   - used interchangeably. These cases occur when working with very
3451   - small amounts of memory, such as with the bit readers (where
3452   - we're working with just a few bytes at a time), some cases of
3453   - <function>strlen</function>, and a few other cases. I have
3454   - scrutinized all of these cases and determined them to be safe,
3455   - but there is no mechanism in the code to ensure that new unsafe
3456   - conversions between <type>int</type> and <type>size_t</type>
3457   - aren't introduced short of good testing and strong awareness of
3458   - the issues. Again, if any such bugs are suspected in the
3459   - future, enabling the additional warning flags and scrutinizing
3460   - the warnings would be in order.
3461   - </para>
3462   - </listitem>
3463   - </itemizedlist>
3464   - </para>
3465   - <para>
3466   - To be clear, I believe qpdf to be well-behaved with respect to
3467   - sizes and offsets, and qpdf's test suite includes actual
3468   - generation and full processing of files larger than 4&nbsp;GB in
3469   - size. The issues raised here are largely academic and should not
3470   - in any way be interpreted to mean that qpdf has practical problems
3471   - involving sloppiness with integer types. I also believe that
3472   - appropriate measures have been taken in the code to avoid problems
3473   - with signed vs. unsigned integers from resulting in memory
3474   - overwrites or other issues with potential security implications,
3475   - though there are never any absolute guarantees.
  3343 + The <classname>QIntC</classname> namespace, provided by
  3344 + <filename>include/qpdf/QIntC.hh</filename>, implements safe
  3345 + functions for converting between integer types. These functions do
  3346 + range checking and throw a <type>std::range_error</type>, which is
  3347 + subclass of <type>std::runtime_error</type>, if conversion from one
  3348 + integer type to another results in loss of information. There are
  3349 + many cases in which we have to move between different integer
  3350 + types because of incompatible integer types used in interoperable
  3351 + interfaces. Some are unavoidable, such as moving between sizes and
  3352 + offsets, and others are there because of old code that is too in
  3353 + entrenched to be fixable without breaking source compatibility and
  3354 + causing pain for users. QPDF is compiled with extra warnings to
  3355 + detect conversions with potential data loss, and all such cases
  3356 + should be fixed by either using a function from
  3357 + <classname>QIntC</classname> or a
  3358 + <function>static_cast</function>.
  3359 + </para>
  3360 + <para>
  3361 + When the intention is just to switch the type because of
  3362 + exchanging data between incompatible interfaces, use
  3363 + <classname>QIntC</classname>. This is the usual case. However,
  3364 + there are some cases in which we are explicitly intending to use
  3365 + the exact same bit pattern with a different type. This is most
  3366 + common when switching between signed and unsigned characters. A
  3367 + lot of qpdf's code uses unsigned characters internally, but
  3368 + <type>std::string</type> and <type>char</type> are signed. Using
  3369 + <function>QIntC::to_char</function> would be wrong for converting
  3370 + from unsigned to signed characters because a negative
  3371 + <type>char</type> value and the corresponding <type>unsigned
  3372 + char</type> value greater than 127 <emphasis>mean the same
  3373 + thing</emphasis>. There are also cases in which we use
  3374 + <function>static_cast</function> when working with bit fields
  3375 + where we are not representing a numerical value but rather a bunch
  3376 + of bits packed together in some integer type. Also note that
  3377 + <type>size_t</type> and <type>long</type> both typically differ
  3378 + between 32-bit and 64-bit environments, so sometimes an explicit
  3379 + cast may not be needed to avoid warnings on one platform but may
  3380 + be needed on another. A conversion with
  3381 + <classname>QIntC</classname> should always be used when the types
  3382 + are different even if the underlying size is the same. QPDF's CI
  3383 + build builds on 32-bit and 64-bit platforms, and the test suite is
  3384 + very thorough, so it is hard to make any of the potential errors
  3385 + here without being caught in build or test.
  3386 + </para>
  3387 + <para>
  3388 + Non-const <type>unsigned char*</type> is used in the
  3389 + <type>Pipeline</type> interface. The pipeline interface has a
  3390 + <function>write</function> call that uses <type>unsigned
  3391 + char*</type> without a <type>const</type> qualifier. The main
  3392 + reason for this is to support pipelines that make calls to
  3393 + third-party libraries, such as zlib, that don't include
  3394 + <type>const</type> in their interfaces. Unfortunately, there are
  3395 + many places in the code where it is desirable to have <type>const
  3396 + char*</type> with pipelines. None of the pipeline implementations
  3397 + in qpdf currently modify the data passed to write, and doing so
  3398 + would be counter to the intent of <type>Pipeline</type>, but there
  3399 + is nothing in the code to prevent this from being done. There are
  3400 + places in the code where <function>const_cast</function> is used
  3401 + to remove the const-ness of pointers going into
  3402 + <type>Pipeline</type>s. This could theoretically be unsafe, but
  3403 + there is adequate testing to assert that it is safe and will
  3404 + remain safe in qpdf's code.
3476 3405 </para>
3477 3406 </sect1>
3478 3407 <sect1 id="ref.encryption">
... ...