Commit 6fca27995ea2d4da06bab85e5d7911e01618ddda

Authored by Jay Berkenbilt
1 parent cc2e8853

Update casting policy in the documentation

README-maintainer
@@ -119,7 +119,7 @@ RELEASE PREPARATION @@ -119,7 +119,7 @@ RELEASE PREPARATION
119 * If any interfaces were added or changed, check C API to see whether 119 * If any interfaces were added or changed, check C API to see whether
120 changes are appropriate there as well. If necessary, review the 120 changes are appropriate there as well. If necessary, review the
121 casting policy in the manual, and ensure that integer types are 121 casting policy in the manual, and ensure that integer types are
122 - properly handled. 122 + properly handled with QIntC or the appropriate cast.
123 123
124 * Increment shared library version information as needed (`LT_*` in 124 * Increment shared library version information as needed (`LT_*` in
125 `configure.ac`) 125 `configure.ac`)
manual/qpdf-manual.xml
@@ -3340,139 +3340,68 @@ outfile.pdf</option> @@ -3340,139 +3340,68 @@ outfile.pdf</option>
3340 and C++ code. 3340 and C++ code.
3341 </para> 3341 </para>
3342 <para> 3342 <para>
3343 - The casting policy explicitly prohibits casting between integer  
3344 - sizes for no purpose other than to quiet a compiler warning when  
3345 - there is no reasonable chance of a problem resulting. The reason  
3346 - for this exclusion is that the practice of adding these additional  
3347 - casts precludes future use of additional compiler warnings as a  
3348 - tool for making future improvements to this aspect of the code,  
3349 - and it also damages the readability of the code.  
3350 - </para>  
3351 - <para>  
3352 - There are a few significant areas where casting is common in the  
3353 - qpdf sources or where casting would be required to quiet higher  
3354 - levels of compiler warnings but is omitted at present:  
3355 - <itemizedlist>  
3356 - <listitem>  
3357 - <para>  
3358 - <type>char</type> vs. <type>unsigned char</type>. For  
3359 - historical reasons, there are a lot of places in qpdf's  
3360 - internals that deal with <type>unsigned char</type>, which  
3361 - means that a lot of casting is required to interoperate with  
3362 - standard library calls and <type>std::string</type>. In  
3363 - retrospect, qpdf should have probably used regular (signed)  
3364 - <type>char</type> and <type>char*</type> everywhere and just  
3365 - cast to <type>unsigned char</type> when needed, but it's too  
3366 - late to make that change now. There are  
3367 - <function>reinterpret_cast</function> calls to go between  
3368 - <type>char*</type> and <type>unsigned char*</type>, and there  
3369 - are <function>static_cast</function> calls to go between  
3370 - <type>char</type> and <type>unsigned char</type>. These should  
3371 - always be safe.  
3372 - </para>  
3373 - </listitem>  
3374 - <listitem>  
3375 - <para>  
3376 - Non-const <type>unsigned char*</type> used in the  
3377 - <type>Pipeline</type> interface. The pipeline interface has a  
3378 - <function>write</function> call that uses <type>unsigned  
3379 - char*</type> without a <type>const</type> qualifier. The main  
3380 - reason for this is to support pipelines that make calls to  
3381 - third-party libraries, such as zlib, that don't include  
3382 - <type>const</type> in their interfaces. Unfortunately, there  
3383 - are many places in the code where it is desirable to have  
3384 - <type>const char*</type> with pipelines. None of the pipeline  
3385 - implementations in qpdf currently modify the data passed to  
3386 - write, and doing so would be counter to the intent of  
3387 - <type>Pipeline</type>, but there is nothing in the code to  
3388 - prevent this from being done. There are places in the code  
3389 - where <function>const_cast</function> is used to remove the  
3390 - const-ness of pointers going into <type>Pipeline</type>s. This  
3391 - could theoretically be unsafe, but there is adequate testing to  
3392 - assert that it is safe and will remain safe in qpdf's code.  
3393 - </para>  
3394 - </listitem>  
3395 - <listitem>  
3396 - <para>  
3397 - <type>size_t</type> vs. <type>qpdf_offset_t</type>. This is  
3398 - pretty much unavoidable since sizes are unsigned types and  
3399 - offsets are signed types. Whenever it is necessary to seek by  
3400 - an amount given by a <type>size_t</type>, it becomes necessary  
3401 - to mix and match between <type>size_t</type> and  
3402 - <type>qpdf_offset_t</type>. Additionally, qpdf sometimes  
3403 - treats memory buffers like files (as with  
3404 - <type>BufferInputSource</type>, and those seek interfaces have  
3405 - to be consistent with file-based input sources. Neither gcc  
3406 - nor MSVC give warnings for this case by default, but both have  
3407 - warning flags that can enable this. (MSVC:  
3408 - <option>/W14267</option> or <option>/W3</option>, which also  
3409 - enables some additional warnings that we ignore; gcc:  
3410 - <option>-Wconversion -Wsign-conversion</option>). This could  
3411 - matter for files whose sizes are larger than  
3412 - 2<superscript>63</superscript> bytes, but it is reasonable to  
3413 - expect that a world where such files are common would also have  
3414 - larger <type>size_t</type> and <type>qpdf_offset_t</type> types  
3415 - in it. On most 64-bit systems at the time of this writing (the  
3416 - release of version 4.1.0 of qpdf), both <type>size_t</type> and  
3417 - <type>qpdf_offset_t</type> are 64-bit integer types, while on  
3418 - many current 32-bit systems, <type>size_t</type> is a 32-bit  
3419 - type while <type>qpdf_offset_t</type> is a 64-bit type. I am  
3420 - not aware of any cases where 32-bit systems that have  
3421 - <type>size_t</type> smaller than <type>qpdf_offset_t</type>  
3422 - could run into problems. Although I can't conclusively rule  
3423 - out the possibility of such problems existing, I suspect any  
3424 - cases would be pretty contrived. In the event that someone  
3425 - should produce a file that qpdf can't handle because of what is  
3426 - suspected to be issues involving the handling of  
3427 - <type>size_t</type> vs. <type>qpdf_offset_t</type> (such files  
3428 - may behave properly on 64-bit systems but not on 32-bit systems  
3429 - because they have very large embedded files or streams, for  
3430 - example), the above mentioned warning flags could be enabled  
3431 - and all those implicit conversions could be carefully  
3432 - scrutinized. (I have already gone through that exercise once  
3433 - in adding support for files larger than 4&nbsp;GB in size.) I  
3434 - continue to be committed to supporting large files on 32-bit  
3435 - systems, but I would not go to any lengths to support corner  
3436 - cases involving large embedded files or large streams that work  
3437 - on 64-bit systems but not on 32-bit systems because of  
3438 - <type>size_t</type> being too small. It is reasonable to  
3439 - assume that anyone working with such files would be using a  
3440 - 64-bit system anyway since many 32-bit applications would have  
3441 - similar difficulties.  
3442 - </para>  
3443 - </listitem>  
3444 - <listitem>  
3445 - <para>  
3446 - <type>size_t</type> vs. <type>int</type> or <type>long</type>.  
3447 - There are some cases where <type>size_t</type> and  
3448 - <type>int</type> or <type>long</type> or <type>size_t</type>  
3449 - and <type>unsigned int</type> or <type>unsigned long</type> are  
3450 - used interchangeably. These cases occur when working with very  
3451 - small amounts of memory, such as with the bit readers (where  
3452 - we're working with just a few bytes at a time), some cases of  
3453 - <function>strlen</function>, and a few other cases. I have  
3454 - scrutinized all of these cases and determined them to be safe,  
3455 - but there is no mechanism in the code to ensure that new unsafe  
3456 - conversions between <type>int</type> and <type>size_t</type>  
3457 - aren't introduced short of good testing and strong awareness of  
3458 - the issues. Again, if any such bugs are suspected in the  
3459 - future, enabling the additional warning flags and scrutinizing  
3460 - the warnings would be in order.  
3461 - </para>  
3462 - </listitem>  
3463 - </itemizedlist>  
3464 - </para>  
3465 - <para>  
3466 - To be clear, I believe qpdf to be well-behaved with respect to  
3467 - sizes and offsets, and qpdf's test suite includes actual  
3468 - generation and full processing of files larger than 4&nbsp;GB in  
3469 - size. The issues raised here are largely academic and should not  
3470 - in any way be interpreted to mean that qpdf has practical problems  
3471 - involving sloppiness with integer types. I also believe that  
3472 - appropriate measures have been taken in the code to avoid problems  
3473 - with signed vs. unsigned integers from resulting in memory  
3474 - overwrites or other issues with potential security implications,  
3475 - though there are never any absolute guarantees. 3343 + The <classname>QIntC</classname> namespace, provided by
  3344 + <filename>include/qpdf/QIntC.hh</filename>, implements safe
  3345 + functions for converting between integer types. These functions do
  3346 + range checking and throw a <type>std::range_error</type>, which is
  3347 + subclass of <type>std::runtime_error</type>, if conversion from one
  3348 + integer type to another results in loss of information. There are
  3349 + many cases in which we have to move between different integer
  3350 + types because of incompatible integer types used in interoperable
  3351 + interfaces. Some are unavoidable, such as moving between sizes and
  3352 + offsets, and others are there because of old code that is too in
  3353 + entrenched to be fixable without breaking source compatibility and
  3354 + causing pain for users. QPDF is compiled with extra warnings to
  3355 + detect conversions with potential data loss, and all such cases
  3356 + should be fixed by either using a function from
  3357 + <classname>QIntC</classname> or a
  3358 + <function>static_cast</function>.
  3359 + </para>
  3360 + <para>
  3361 + When the intention is just to switch the type because of
  3362 + exchanging data between incompatible interfaces, use
  3363 + <classname>QIntC</classname>. This is the usual case. However,
  3364 + there are some cases in which we are explicitly intending to use
  3365 + the exact same bit pattern with a different type. This is most
  3366 + common when switching between signed and unsigned characters. A
  3367 + lot of qpdf's code uses unsigned characters internally, but
  3368 + <type>std::string</type> and <type>char</type> are signed. Using
  3369 + <function>QIntC::to_char</function> would be wrong for converting
  3370 + from unsigned to signed characters because a negative
  3371 + <type>char</type> value and the corresponding <type>unsigned
  3372 + char</type> value greater than 127 <emphasis>mean the same
  3373 + thing</emphasis>. There are also cases in which we use
  3374 + <function>static_cast</function> when working with bit fields
  3375 + where we are not representing a numerical value but rather a bunch
  3376 + of bits packed together in some integer type. Also note that
  3377 + <type>size_t</type> and <type>long</type> both typically differ
  3378 + between 32-bit and 64-bit environments, so sometimes an explicit
  3379 + cast may not be needed to avoid warnings on one platform but may
  3380 + be needed on another. A conversion with
  3381 + <classname>QIntC</classname> should always be used when the types
  3382 + are different even if the underlying size is the same. QPDF's CI
  3383 + build builds on 32-bit and 64-bit platforms, and the test suite is
  3384 + very thorough, so it is hard to make any of the potential errors
  3385 + here without being caught in build or test.
  3386 + </para>
  3387 + <para>
  3388 + Non-const <type>unsigned char*</type> is used in the
  3389 + <type>Pipeline</type> interface. The pipeline interface has a
  3390 + <function>write</function> call that uses <type>unsigned
  3391 + char*</type> without a <type>const</type> qualifier. The main
  3392 + reason for this is to support pipelines that make calls to
  3393 + third-party libraries, such as zlib, that don't include
  3394 + <type>const</type> in their interfaces. Unfortunately, there are
  3395 + many places in the code where it is desirable to have <type>const
  3396 + char*</type> with pipelines. None of the pipeline implementations
  3397 + in qpdf currently modify the data passed to write, and doing so
  3398 + would be counter to the intent of <type>Pipeline</type>, but there
  3399 + is nothing in the code to prevent this from being done. There are
  3400 + places in the code where <function>const_cast</function> is used
  3401 + to remove the const-ness of pointers going into
  3402 + <type>Pipeline</type>s. This could theoretically be unsafe, but
  3403 + there is adequate testing to assert that it is safe and will
  3404 + remain safe in qpdf's code.
3476 </para> 3405 </para>
3477 </sect1> 3406 </sect1>
3478 <sect1 id="ref.encryption"> 3407 <sect1 id="ref.encryption">