Commit e8ddac89501e232205e1737a07ddb7d1c2425e4b
1 parent
1ec1b128
Document casting policy
Showing
4 changed files
with
174 additions
and
102 deletions
ChangeLog
README.maintainer
| ... | ... | @@ -31,7 +31,9 @@ Release Reminders |
| 31 | 31 | * Check all open issues in the sourceforge trackers and on github. |
| 32 | 32 | |
| 33 | 33 | * If any interfaces were added or changed, check C API to see whether |
| 34 | - changes are appropriate there as well. | |
| 34 | + changes are appropriate there as well. If necessary, review the | |
| 35 | + casting policy in the manual, and ensure that integer types are | |
| 36 | + properly handled. | |
| 35 | 37 | |
| 36 | 38 | * Increment shared library version information as needed |
| 37 | 39 | (libqpdf/build.mk) | ... | ... |
TODO
| 1 | +4.1.0 | |
| 2 | +===== | |
| 3 | + | |
| 4 | + * New public interfaces have been added. | |
| 5 | + | |
| 6 | + | |
| 1 | 7 | 4.2.0 |
| 2 | 8 | ===== |
| 3 | 9 | |
| ... | ... | @@ -38,107 +44,6 @@ |
| 38 | 44 | - See ../misc/broken-files |
| 39 | 45 | |
| 40 | 46 | |
| 41 | -4.1.0 | |
| 42 | -===== | |
| 43 | - | |
| 44 | - * Add to documentation, and mention this documentation in | |
| 45 | - README.maintainer: | |
| 46 | - | |
| 47 | - Casting policy. | |
| 48 | - | |
| 49 | - The C++ code in qpdf is free of old-style casts except where | |
| 50 | - unavoidable (e.g. where the old-style cast is in a macro provided | |
| 51 | - by a third-party header file). When there is a need for a cast, it | |
| 52 | - is handled, in order of preference by rewriting the code to avoid | |
| 53 | - the need for a cast, calling const_cast, calling static_cast, | |
| 54 | - calling reinterpret_cast, or calling some combination of the above. | |
| 55 | - The casting policy explicitly prohibits casting between sizes for | |
| 56 | - no purpose other than to quiet a compiler warning when there is no | |
| 57 | - reasonable chance of a problem resulting. The reason for this | |
| 58 | - exclusion is that it takes away enabling additional compiler | |
| 59 | - warnings as a tool for making future improvements to this aspect of | |
| 60 | - the code and also damages the readability of the code. As a last | |
| 61 | - resort, a compiler-specific pragma may be used to suppress a | |
| 62 | - warning that we don't want to fix. Examples may include | |
| 63 | - suppressing warnings about the use of old-style casts in code that | |
| 64 | - is shared between C and C++ code. | |
| 65 | - | |
| 66 | - There are a few significant areas where casting is common in the qpdf | |
| 67 | - sources or where casting would be required to quiet higher levels | |
| 68 | - of compiler warnings but is omitted at present: | |
| 69 | - | |
| 70 | - * signed vs. unsigned char. For historical reasons, there are a | |
| 71 | - lot of places in qpdf's internals that deal with unsigned char, | |
| 72 | - which means that a lot of casting is required to interoperate | |
| 73 | - with standard library calls and std::string. In retrospect, | |
| 74 | - qpdf should have probably used signed char everywhere and just | |
| 75 | - cast to unsigned char when needed. There are reinterpret_cast | |
| 76 | - calls to go between char* and unsigned char*, and there are | |
| 77 | - static_cast calls to go between char and unsigned char. These | |
| 78 | - should always be safe. | |
| 79 | - | |
| 80 | - * non-const unsigned char* used in Pipeline interface. The | |
| 81 | - pipeline interface has a write() call that uses unsigned char* | |
| 82 | - without a const qualifier. The main reason for this is to | |
| 83 | - support pipelines that make calls to third-party libraries, such | |
| 84 | - as zlib, that don't include const in their interfaces. | |
| 85 | - Unfortunately, there are many places in the code where it is | |
| 86 | - desirable to have const char* with pipelines. None of the | |
| 87 | - pipeline implementations in qpdf currently modify the data | |
| 88 | - passed to write, and doing so would be counter to the intent of | |
| 89 | - Pipeline. There are places in the code where const_cast is used | |
| 90 | - to remove the const-ness of pointers going into Pipelines. This | |
| 91 | - could be potentially unsafe, but there is adequate testing to | |
| 92 | - assert that it is safe in qpdf's code. | |
| 93 | - | |
| 94 | - * size_t vs. qpdf_offset_t. This is pretty much unavoidable since | |
| 95 | - offsets are signed types and sizes are unsigned types. Whenever | |
| 96 | - it is necessary to seek by an amount given by a size_t, it | |
| 97 | - becomes necessary to mix and match between size_t and | |
| 98 | - qpdf_offset_t. Additionally, qpdf sometimes treats memory | |
| 99 | - buffers like files, and those seek interfaces have to be | |
| 100 | - consistent with file-based input sources. Neither gcc nor MSVC | |
| 101 | - give warnings for this case by default, but both have warning | |
| 102 | - flags that can enable this. (MSVC: /W14267 or /W3 (which also | |
| 103 | - enables some additional warnings that we ignore); gcc: | |
| 104 | - -Wconversion -Wsign-conversion). This could matter for files | |
| 105 | - whose sizes are larger than 2^63 bytes, but it is reasonable to | |
| 106 | - expect that a world where such files are common would also have | |
| 107 | - larger size_t and qpdf_offset_t types in it. I am not aware of | |
| 108 | - any cases where 32-bit systems that have size_t smaller than | |
| 109 | - qpdf_offset_t could run into problems, though I can't | |
| 110 | - conclusively rule out the possibility. In the event that | |
| 111 | - someone should produce a file that qpdf can't handle because of | |
| 112 | - what is suspected to be issues involving the handling of size_t | |
| 113 | - vs. qpdf_offset_t (such files may behave properly on 64-bit | |
| 114 | - systems but not on 32-bit systems and may have very large | |
| 115 | - embedded files or streams, for example), the above mentioned | |
| 116 | - warning flags could be enabled and all those implicit | |
| 117 | - conversions could be carefully scrutinized. (I have already | |
| 118 | - gone through that exercise once in adding support for files > | |
| 119 | - 4GB in size.) I continue to be commited to supporting large | |
| 120 | - files on 32-bit systems, but I would not go to any lengths to | |
| 121 | - support corner cases involving large embedded files or large | |
| 122 | - streams that work on 64-bit systems but not on 32-bit systems | |
| 123 | - because of size_t being too small. It is reasonable to assume | |
| 124 | - that anyone working with such files would be using a 64-bit | |
| 125 | - system anyway. | |
| 126 | - | |
| 127 | - * size_t vs. int. There are some cases where size_t and int or | |
| 128 | - size_t and unsigned int are used interchangeably. These cases | |
| 129 | - occur when working with very small amounts of memory, such as | |
| 130 | - with the bit readers (where we're working with just a few bytes | |
| 131 | - at a time), some cases of strlen, and a few other cases. I have | |
| 132 | - scrutinized all of these cases and determined them to be safe, | |
| 133 | - but there is no mechanism in the code to ensure that new unsafe | |
| 134 | - conversions between int and size_t aren't introduced short of | |
| 135 | - good testing and strong awareness of the issues. Again, if any | |
| 136 | - such bugs are suspected in the future, enable the additional | |
| 137 | - warning flags and scrutinizing the warnings would be in order. | |
| 138 | - | |
| 139 | - * New public interfaces have been added. | |
| 140 | - | |
| 141 | - | |
| 142 | 47 | General |
| 143 | 48 | ======= |
| 144 | 49 | ... | ... |
manual/qpdf-manual.xml
| ... | ... | @@ -1623,6 +1623,166 @@ outfile.pdf</option> |
| 1623 | 1623 | </itemizedlist> |
| 1624 | 1624 | </para> |
| 1625 | 1625 | </sect1> |
| 1626 | + <sect1 id="ref.casting"> | |
| 1627 | + <title>Casting Policy</title> | |
| 1628 | + <para> | |
| 1629 | + This section describes the casting policy followed by qpdf's | |
| 1630 | + implementation. This is no concern to qpdf's end users and | |
| 1631 | + largely of no concern to people writing code that uses qpdf, but | |
| 1632 | + it could be of interest to people who are porting qpdf to a new | |
| 1633 | + platform or who are making modifications to the code. | |
| 1634 | + </para> | |
| 1635 | + <para> | |
| 1636 | + The C++ code in qpdf is free of old-style casts except where | |
| 1637 | + unavoidable (e.g. where the old-style cast is in a macro provided | |
| 1638 | + by a third-party header file). When there is a need for a cast, | |
| 1639 | + it is handled, in order of preference, by rewriting the code to | |
| 1640 | + avoid the need for a cast, calling | |
| 1641 | + <function>const_cast</function>, calling | |
| 1642 | + <function>static_cast</function>, calling | |
| 1643 | + <function>reinterpret_cast</function>, or calling some combination | |
| 1644 | + of the above. As a last resort, a compiler-specific | |
| 1645 | + <literal>#pragma</literal> may be used to suppress a warning that | |
| 1646 | + we don't want to fix. Examples may include suppressing warnings | |
| 1647 | + about the use of old-style casts in code that is shared between C | |
| 1648 | + and C++ code. | |
| 1649 | + </para> | |
| 1650 | + <para> | |
| 1651 | + The casting policy explicitly prohibits casting between integer | |
| 1652 | + sizes for no purpose other than to quiet a compiler warning when | |
| 1653 | + there is no reasonable chance of a problem resulting. The reason | |
| 1654 | + for this exclusion is that the practice of adding these additional | |
| 1655 | + casts precludes future use of additional compiler warnings as a | |
| 1656 | + tool for making future improvements to this aspect of the code, | |
| 1657 | + and it also damages the readability of the code. | |
| 1658 | + </para> | |
| 1659 | + <para> | |
| 1660 | + There are a few significant areas where casting is common in the | |
| 1661 | + qpdf sources or where casting would be required to quiet higher | |
| 1662 | + levels of compiler warnings but is omitted at present: | |
| 1663 | + <itemizedlist> | |
| 1664 | + <listitem> | |
| 1665 | + <para> | |
| 1666 | + <type>char</type> vs. <type>unsigned char</type>. For | |
| 1667 | + historical reasons, there are a lot of places in qpdf's | |
| 1668 | + internals that deal with <type>unsigned char</type>, which | |
| 1669 | + means that a lot of casting is required to interoperate with | |
| 1670 | + standard library calls and <type>std::string</type>. In | |
| 1671 | + retrospect, qpdf should have probably used regular (signed) | |
| 1672 | + <type>char</type> and <type>char*</type> everywhere and just | |
| 1673 | + cast to <type>unsigned char</type> when needed, but it's too | |
| 1674 | + late to make that change now. There are | |
| 1675 | + <function>reinterpret_cast</function> calls to go between | |
| 1676 | + <type>char*</type> and <type>unsigned char*</type>, and there | |
| 1677 | + are <function>static_cast</function> calls to go between | |
| 1678 | + <type>char</type> and <type>unsigned char</type>. These should | |
| 1679 | + always be safe. | |
| 1680 | + </para> | |
| 1681 | + </listitem> | |
| 1682 | + <listitem> | |
| 1683 | + <para> | |
| 1684 | + Non-const <type>unsigned char*</type> used in the | |
| 1685 | + <type>Pipeline</type> interface. The pipeline interface has a | |
| 1686 | + <function>write</function> call that uses <type>unsigned | |
| 1687 | + char*</type> without a <type>const</type> qualifier. The main | |
| 1688 | + reason for this is to support pipelines that make calls to | |
| 1689 | + third-party libraries, such as zlib, that don't include | |
| 1690 | + <type>const</type> in their interfaces. Unfortunately, there | |
| 1691 | + are many places in the code where it is desirable to have | |
| 1692 | + <type>const char*</type> with pipelines. None of the pipeline | |
| 1693 | + implementations in qpdf currently modify the data passed to | |
| 1694 | + write, and doing so would be counter to the intent of | |
| 1695 | + <type>Pipeline</type>, but there is nothing in the code to | |
| 1696 | + prevent this from being done. There are places in the code | |
| 1697 | + where <function>const_cast</function> is used to remove the | |
| 1698 | + const-ness of pointers going into <type>Pipeline</type>s. This | |
| 1699 | + could theoretically be unsafe, but there is adequate testing to | |
| 1700 | + assert that it is safe and will remain safe in qpdf's code. | |
| 1701 | + </para> | |
| 1702 | + </listitem> | |
| 1703 | + <listitem> | |
| 1704 | + <para> | |
| 1705 | + <type>size_t</type> vs. <type>qpdf_offset_t</type>. This is | |
| 1706 | + pretty much unavoidable since sizes are unsigned types and | |
| 1707 | + offsets are signed types. Whenever it is necessary to seek by | |
| 1708 | + an amount given by a <type>size_t</type>, it becomes necessary | |
| 1709 | + to mix and match between <type>size_t</type> and | |
| 1710 | + <type>qpdf_offset_t</type>. Additionally, qpdf sometimes | |
| 1711 | + treats memory buffers like files (as with | |
| 1712 | + <type>BufferInputSource</type>, and those seek interfaces have | |
| 1713 | + to be consistent with file-based input sources. Neither gcc | |
| 1714 | + nor MSVC give warnings for this case by default, but both have | |
| 1715 | + warning flags that can enable this. (MSVC: | |
| 1716 | + <option>/W14267</option> or <option>/W3</option>, which also | |
| 1717 | + enables some additional warnings that we ignore; gcc: | |
| 1718 | + <option>-Wconversion -Wsign-conversion</option>). This could | |
| 1719 | + matter for files whose sizes are larger than | |
| 1720 | + 2<superscript>63</superscript> bytes, but it is reasonable to | |
| 1721 | + expect that a world where such files are common would also have | |
| 1722 | + larger <type>size_t</type> and <type>qpdf_offset_t</type> types | |
| 1723 | + in it. On most 64-bit systems at the time of this writing (the | |
| 1724 | + release of version 4.1.0 of qpdf), both <type>size_t</type> and | |
| 1725 | + <type>qpdf_offset_t</type> are 64-bit integer types, while on | |
| 1726 | + many current 32-bit systems, <type>size_t</type> is a 32-bit | |
| 1727 | + type while <type>qpdf_offset_t</type> is a 64-bit type. I am | |
| 1728 | + not aware of any cases where 32-bit systems that have | |
| 1729 | + <type>size_t</type> smaller than <type>qpdf_offset_t</type> | |
| 1730 | + could run into problems. Although I can't conclusively rule | |
| 1731 | + out the possibility of such problems existing, I suspect any | |
| 1732 | + cases would be pretty contrived. In the event that someone | |
| 1733 | + should produce a file that qpdf can't handle because of what is | |
| 1734 | + suspected to be issues involving the handling of | |
| 1735 | + <type>size_t</type> vs. <type>qpdf_offset_t</type> (such files | |
| 1736 | + may behave properly on 64-bit systems but not on 32-bit systems | |
| 1737 | + because they have very large embedded files or streams, for | |
| 1738 | + example), the above mentioned warning flags could be enabled | |
| 1739 | + and all those implicit conversions could be carefully | |
| 1740 | + scrutinized. (I have already gone through that exercise once | |
| 1741 | + in adding support for files larger than 4 GB in size.) I | |
| 1742 | + continue to be commited to supporting large files on 32-bit | |
| 1743 | + systems, but I would not go to any lengths to support corner | |
| 1744 | + cases involving large embedded files or large streams that work | |
| 1745 | + on 64-bit systems but not on 32-bit systems because of | |
| 1746 | + <type>size_t</type> being too small. It is reasonable to | |
| 1747 | + assume that anyone working with such files would be using a | |
| 1748 | + 64-bit system anyway since many 32-bit applications would have | |
| 1749 | + similar difficulties. | |
| 1750 | + </para> | |
| 1751 | + </listitem> | |
| 1752 | + <listitem> | |
| 1753 | + <para> | |
| 1754 | + <type>size_t</type> vs. <type>int</type> or <type>long</type>. | |
| 1755 | + There are some cases where <type>size_t</type> and | |
| 1756 | + <type>int</type> or <type>long</type> or <type>size_t</type> | |
| 1757 | + and <type>unsigned int</type> or <type>unsigned long</type> are | |
| 1758 | + used interchangeably. These cases occur when working with very | |
| 1759 | + small amounts of memory, such as with the bit readers (where | |
| 1760 | + we're working with just a few bytes at a time), some cases of | |
| 1761 | + <function>strlen</function>, and a few other cases. I have | |
| 1762 | + scrutinized all of these cases and determined them to be safe, | |
| 1763 | + but there is no mechanism in the code to ensure that new unsafe | |
| 1764 | + conversions between <type>int</type> and <type>size_t</type> | |
| 1765 | + aren't introduced short of good testing and strong awareness of | |
| 1766 | + the issues. Again, if any such bugs are suspected in the | |
| 1767 | + future, enabling the additional warning flags and scrutinizing | |
| 1768 | + the warnings would be in order. | |
| 1769 | + </para> | |
| 1770 | + </listitem> | |
| 1771 | + </itemizedlist> | |
| 1772 | + </para> | |
| 1773 | + <para> | |
| 1774 | + To be clear, I believe qpdf to be well-behaved with respect to | |
| 1775 | + sizes and offsets, and qpdf's test suite includes actual | |
| 1776 | + generation and full processing of files larger than 4 GB in | |
| 1777 | + size. The issues raised here are largely academic and should not | |
| 1778 | + in any way be interpreted to mean that qpdf has practical problems | |
| 1779 | + involving sloppiness with integer types. I also believe that | |
| 1780 | + appropriate measures have been taken in the code to avoid problems | |
| 1781 | + with signed vs. unsigned integers from resulting in memory | |
| 1782 | + overwrites or other issues with potential security implications, | |
| 1783 | + though there are never any absolute guarantees. | |
| 1784 | + </para> | |
| 1785 | + </sect1> | |
| 1626 | 1786 | <sect1 id="ref.encryption"> |
| 1627 | 1787 | <title>Encryption</title> |
| 1628 | 1788 | <para> | ... | ... |