Commit 6fca27995ea2d4da06bab85e5d7911e01618ddda
1 parent
cc2e8853
Update casting policy in the documentation
Showing
2 changed files
with
63 additions
and
134 deletions
README-maintainer
| ... | ... | @@ -119,7 +119,7 @@ RELEASE PREPARATION |
| 119 | 119 | * If any interfaces were added or changed, check C API to see whether |
| 120 | 120 | changes are appropriate there as well. If necessary, review the |
| 121 | 121 | casting policy in the manual, and ensure that integer types are |
| 122 | - properly handled. | |
| 122 | + properly handled with QIntC or the appropriate cast. | |
| 123 | 123 | |
| 124 | 124 | * Increment shared library version information as needed (`LT_*` in |
| 125 | 125 | `configure.ac`) | ... | ... |
manual/qpdf-manual.xml
| ... | ... | @@ -3340,139 +3340,68 @@ outfile.pdf</option> |
| 3340 | 3340 | and C++ code. |
| 3341 | 3341 | </para> |
| 3342 | 3342 | <para> |
| 3343 | - The casting policy explicitly prohibits casting between integer | |
| 3344 | - sizes for no purpose other than to quiet a compiler warning when | |
| 3345 | - there is no reasonable chance of a problem resulting. The reason | |
| 3346 | - for this exclusion is that the practice of adding these additional | |
| 3347 | - casts precludes future use of additional compiler warnings as a | |
| 3348 | - tool for making future improvements to this aspect of the code, | |
| 3349 | - and it also damages the readability of the code. | |
| 3350 | - </para> | |
| 3351 | - <para> | |
| 3352 | - There are a few significant areas where casting is common in the | |
| 3353 | - qpdf sources or where casting would be required to quiet higher | |
| 3354 | - levels of compiler warnings but is omitted at present: | |
| 3355 | - <itemizedlist> | |
| 3356 | - <listitem> | |
| 3357 | - <para> | |
| 3358 | - <type>char</type> vs. <type>unsigned char</type>. For | |
| 3359 | - historical reasons, there are a lot of places in qpdf's | |
| 3360 | - internals that deal with <type>unsigned char</type>, which | |
| 3361 | - means that a lot of casting is required to interoperate with | |
| 3362 | - standard library calls and <type>std::string</type>. In | |
| 3363 | - retrospect, qpdf should have probably used regular (signed) | |
| 3364 | - <type>char</type> and <type>char*</type> everywhere and just | |
| 3365 | - cast to <type>unsigned char</type> when needed, but it's too | |
| 3366 | - late to make that change now. There are | |
| 3367 | - <function>reinterpret_cast</function> calls to go between | |
| 3368 | - <type>char*</type> and <type>unsigned char*</type>, and there | |
| 3369 | - are <function>static_cast</function> calls to go between | |
| 3370 | - <type>char</type> and <type>unsigned char</type>. These should | |
| 3371 | - always be safe. | |
| 3372 | - </para> | |
| 3373 | - </listitem> | |
| 3374 | - <listitem> | |
| 3375 | - <para> | |
| 3376 | - Non-const <type>unsigned char*</type> used in the | |
| 3377 | - <type>Pipeline</type> interface. The pipeline interface has a | |
| 3378 | - <function>write</function> call that uses <type>unsigned | |
| 3379 | - char*</type> without a <type>const</type> qualifier. The main | |
| 3380 | - reason for this is to support pipelines that make calls to | |
| 3381 | - third-party libraries, such as zlib, that don't include | |
| 3382 | - <type>const</type> in their interfaces. Unfortunately, there | |
| 3383 | - are many places in the code where it is desirable to have | |
| 3384 | - <type>const char*</type> with pipelines. None of the pipeline | |
| 3385 | - implementations in qpdf currently modify the data passed to | |
| 3386 | - write, and doing so would be counter to the intent of | |
| 3387 | - <type>Pipeline</type>, but there is nothing in the code to | |
| 3388 | - prevent this from being done. There are places in the code | |
| 3389 | - where <function>const_cast</function> is used to remove the | |
| 3390 | - const-ness of pointers going into <type>Pipeline</type>s. This | |
| 3391 | - could theoretically be unsafe, but there is adequate testing to | |
| 3392 | - assert that it is safe and will remain safe in qpdf's code. | |
| 3393 | - </para> | |
| 3394 | - </listitem> | |
| 3395 | - <listitem> | |
| 3396 | - <para> | |
| 3397 | - <type>size_t</type> vs. <type>qpdf_offset_t</type>. This is | |
| 3398 | - pretty much unavoidable since sizes are unsigned types and | |
| 3399 | - offsets are signed types. Whenever it is necessary to seek by | |
| 3400 | - an amount given by a <type>size_t</type>, it becomes necessary | |
| 3401 | - to mix and match between <type>size_t</type> and | |
| 3402 | - <type>qpdf_offset_t</type>. Additionally, qpdf sometimes | |
| 3403 | - treats memory buffers like files (as with | |
| 3404 | - <type>BufferInputSource</type>, and those seek interfaces have | |
| 3405 | - to be consistent with file-based input sources. Neither gcc | |
| 3406 | - nor MSVC give warnings for this case by default, but both have | |
| 3407 | - warning flags that can enable this. (MSVC: | |
| 3408 | - <option>/W14267</option> or <option>/W3</option>, which also | |
| 3409 | - enables some additional warnings that we ignore; gcc: | |
| 3410 | - <option>-Wconversion -Wsign-conversion</option>). This could | |
| 3411 | - matter for files whose sizes are larger than | |
| 3412 | - 2<superscript>63</superscript> bytes, but it is reasonable to | |
| 3413 | - expect that a world where such files are common would also have | |
| 3414 | - larger <type>size_t</type> and <type>qpdf_offset_t</type> types | |
| 3415 | - in it. On most 64-bit systems at the time of this writing (the | |
| 3416 | - release of version 4.1.0 of qpdf), both <type>size_t</type> and | |
| 3417 | - <type>qpdf_offset_t</type> are 64-bit integer types, while on | |
| 3418 | - many current 32-bit systems, <type>size_t</type> is a 32-bit | |
| 3419 | - type while <type>qpdf_offset_t</type> is a 64-bit type. I am | |
| 3420 | - not aware of any cases where 32-bit systems that have | |
| 3421 | - <type>size_t</type> smaller than <type>qpdf_offset_t</type> | |
| 3422 | - could run into problems. Although I can't conclusively rule | |
| 3423 | - out the possibility of such problems existing, I suspect any | |
| 3424 | - cases would be pretty contrived. In the event that someone | |
| 3425 | - should produce a file that qpdf can't handle because of what is | |
| 3426 | - suspected to be issues involving the handling of | |
| 3427 | - <type>size_t</type> vs. <type>qpdf_offset_t</type> (such files | |
| 3428 | - may behave properly on 64-bit systems but not on 32-bit systems | |
| 3429 | - because they have very large embedded files or streams, for | |
| 3430 | - example), the above mentioned warning flags could be enabled | |
| 3431 | - and all those implicit conversions could be carefully | |
| 3432 | - scrutinized. (I have already gone through that exercise once | |
| 3433 | - in adding support for files larger than 4 GB in size.) I | |
| 3434 | - continue to be committed to supporting large files on 32-bit | |
| 3435 | - systems, but I would not go to any lengths to support corner | |
| 3436 | - cases involving large embedded files or large streams that work | |
| 3437 | - on 64-bit systems but not on 32-bit systems because of | |
| 3438 | - <type>size_t</type> being too small. It is reasonable to | |
| 3439 | - assume that anyone working with such files would be using a | |
| 3440 | - 64-bit system anyway since many 32-bit applications would have | |
| 3441 | - similar difficulties. | |
| 3442 | - </para> | |
| 3443 | - </listitem> | |
| 3444 | - <listitem> | |
| 3445 | - <para> | |
| 3446 | - <type>size_t</type> vs. <type>int</type> or <type>long</type>. | |
| 3447 | - There are some cases where <type>size_t</type> and | |
| 3448 | - <type>int</type> or <type>long</type> or <type>size_t</type> | |
| 3449 | - and <type>unsigned int</type> or <type>unsigned long</type> are | |
| 3450 | - used interchangeably. These cases occur when working with very | |
| 3451 | - small amounts of memory, such as with the bit readers (where | |
| 3452 | - we're working with just a few bytes at a time), some cases of | |
| 3453 | - <function>strlen</function>, and a few other cases. I have | |
| 3454 | - scrutinized all of these cases and determined them to be safe, | |
| 3455 | - but there is no mechanism in the code to ensure that new unsafe | |
| 3456 | - conversions between <type>int</type> and <type>size_t</type> | |
| 3457 | - aren't introduced short of good testing and strong awareness of | |
| 3458 | - the issues. Again, if any such bugs are suspected in the | |
| 3459 | - future, enabling the additional warning flags and scrutinizing | |
| 3460 | - the warnings would be in order. | |
| 3461 | - </para> | |
| 3462 | - </listitem> | |
| 3463 | - </itemizedlist> | |
| 3464 | - </para> | |
| 3465 | - <para> | |
| 3466 | - To be clear, I believe qpdf to be well-behaved with respect to | |
| 3467 | - sizes and offsets, and qpdf's test suite includes actual | |
| 3468 | - generation and full processing of files larger than 4 GB in | |
| 3469 | - size. The issues raised here are largely academic and should not | |
| 3470 | - in any way be interpreted to mean that qpdf has practical problems | |
| 3471 | - involving sloppiness with integer types. I also believe that | |
| 3472 | - appropriate measures have been taken in the code to avoid problems | |
| 3473 | - with signed vs. unsigned integers from resulting in memory | |
| 3474 | - overwrites or other issues with potential security implications, | |
| 3475 | - though there are never any absolute guarantees. | |
| 3343 | + The <classname>QIntC</classname> namespace, provided by | |
| 3344 | + <filename>include/qpdf/QIntC.hh</filename>, implements safe | |
| 3345 | + functions for converting between integer types. These functions do | |
| 3346 | + range checking and throw a <type>std::range_error</type>, which is | |
| 3347 | + subclass of <type>std::runtime_error</type>, if conversion from one | |
| 3348 | + integer type to another results in loss of information. There are | |
| 3349 | + many cases in which we have to move between different integer | |
| 3350 | + types because of incompatible integer types used in interoperable | |
| 3351 | + interfaces. Some are unavoidable, such as moving between sizes and | |
| 3352 | + offsets, and others are there because of old code that is too in | |
| 3353 | + entrenched to be fixable without breaking source compatibility and | |
| 3354 | + causing pain for users. QPDF is compiled with extra warnings to | |
| 3355 | + detect conversions with potential data loss, and all such cases | |
| 3356 | + should be fixed by either using a function from | |
| 3357 | + <classname>QIntC</classname> or a | |
| 3358 | + <function>static_cast</function>. | |
| 3359 | + </para> | |
| 3360 | + <para> | |
| 3361 | + When the intention is just to switch the type because of | |
| 3362 | + exchanging data between incompatible interfaces, use | |
| 3363 | + <classname>QIntC</classname>. This is the usual case. However, | |
| 3364 | + there are some cases in which we are explicitly intending to use | |
| 3365 | + the exact same bit pattern with a different type. This is most | |
| 3366 | + common when switching between signed and unsigned characters. A | |
| 3367 | + lot of qpdf's code uses unsigned characters internally, but | |
| 3368 | + <type>std::string</type> and <type>char</type> are signed. Using | |
| 3369 | + <function>QIntC::to_char</function> would be wrong for converting | |
| 3370 | + from unsigned to signed characters because a negative | |
| 3371 | + <type>char</type> value and the corresponding <type>unsigned | |
| 3372 | + char</type> value greater than 127 <emphasis>mean the same | |
| 3373 | + thing</emphasis>. There are also cases in which we use | |
| 3374 | + <function>static_cast</function> when working with bit fields | |
| 3375 | + where we are not representing a numerical value but rather a bunch | |
| 3376 | + of bits packed together in some integer type. Also note that | |
| 3377 | + <type>size_t</type> and <type>long</type> both typically differ | |
| 3378 | + between 32-bit and 64-bit environments, so sometimes an explicit | |
| 3379 | + cast may not be needed to avoid warnings on one platform but may | |
| 3380 | + be needed on another. A conversion with | |
| 3381 | + <classname>QIntC</classname> should always be used when the types | |
| 3382 | + are different even if the underlying size is the same. QPDF's CI | |
| 3383 | + build builds on 32-bit and 64-bit platforms, and the test suite is | |
| 3384 | + very thorough, so it is hard to make any of the potential errors | |
| 3385 | + here without being caught in build or test. | |
| 3386 | + </para> | |
| 3387 | + <para> | |
| 3388 | + Non-const <type>unsigned char*</type> is used in the | |
| 3389 | + <type>Pipeline</type> interface. The pipeline interface has a | |
| 3390 | + <function>write</function> call that uses <type>unsigned | |
| 3391 | + char*</type> without a <type>const</type> qualifier. The main | |
| 3392 | + reason for this is to support pipelines that make calls to | |
| 3393 | + third-party libraries, such as zlib, that don't include | |
| 3394 | + <type>const</type> in their interfaces. Unfortunately, there are | |
| 3395 | + many places in the code where it is desirable to have <type>const | |
| 3396 | + char*</type> with pipelines. None of the pipeline implementations | |
| 3397 | + in qpdf currently modify the data passed to write, and doing so | |
| 3398 | + would be counter to the intent of <type>Pipeline</type>, but there | |
| 3399 | + is nothing in the code to prevent this from being done. There are | |
| 3400 | + places in the code where <function>const_cast</function> is used | |
| 3401 | + to remove the const-ness of pointers going into | |
| 3402 | + <type>Pipeline</type>s. This could theoretically be unsafe, but | |
| 3403 | + there is adequate testing to assert that it is safe and will | |
| 3404 | + remain safe in qpdf's code. | |
| 3476 | 3405 | </para> |
| 3477 | 3406 | </sect1> |
| 3478 | 3407 | <sect1 id="ref.encryption"> | ... | ... |