Commit e8ddac89501e232205e1737a07ddb7d1c2425e4b

Authored by Jay Berkenbilt
1 parent 1ec1b128

Document casting policy

ChangeLog
  1 +2013-03-25 Jay Berkenbilt <ejb@ql.org>
  2 +
  3 + * manual/qpdf-manual.xml: Document the casting policy that is
  4 + followed in qpdf's implementation.
  5 +
1 2013-03-11 Jay Berkenbilt <ejb@ql.org> 6 2013-03-11 Jay Berkenbilt <ejb@ql.org>
2 7
3 * When creating Windows binary distributions, make sure to only 8 * When creating Windows binary distributions, make sure to only
README.maintainer
@@ -31,7 +31,9 @@ Release Reminders @@ -31,7 +31,9 @@ Release Reminders
31 * Check all open issues in the sourceforge trackers and on github. 31 * Check all open issues in the sourceforge trackers and on github.
32 32
33 * If any interfaces were added or changed, check C API to see whether 33 * If any interfaces were added or changed, check C API to see whether
34 - changes are appropriate there as well. 34 + changes are appropriate there as well. If necessary, review the
  35 + casting policy in the manual, and ensure that integer types are
  36 + properly handled.
35 37
36 * Increment shared library version information as needed 38 * Increment shared library version information as needed
37 (libqpdf/build.mk) 39 (libqpdf/build.mk)
  1 +4.1.0
  2 +=====
  3 +
  4 + * New public interfaces have been added.
  5 +
  6 +
1 4.2.0 7 4.2.0
2 ===== 8 =====
3 9
@@ -38,107 +44,6 @@ @@ -38,107 +44,6 @@
38 - See ../misc/broken-files 44 - See ../misc/broken-files
39 45
40 46
41 -4.1.0  
42 -=====  
43 -  
44 - * Add to documentation, and mention this documentation in  
45 - README.maintainer:  
46 -  
47 - Casting policy.  
48 -  
49 - The C++ code in qpdf is free of old-style casts except where  
50 - unavoidable (e.g. where the old-style cast is in a macro provided  
51 - by a third-party header file). When there is a need for a cast, it  
52 - is handled, in order of preference by rewriting the code to avoid  
53 - the need for a cast, calling const_cast, calling static_cast,  
54 - calling reinterpret_cast, or calling some combination of the above.  
55 - The casting policy explicitly prohibits casting between sizes for  
56 - no purpose other than to quiet a compiler warning when there is no  
57 - reasonable chance of a problem resulting. The reason for this  
58 - exclusion is that it takes away enabling additional compiler  
59 - warnings as a tool for making future improvements to this aspect of  
60 - the code and also damages the readability of the code. As a last  
61 - resort, a compiler-specific pragma may be used to suppress a  
62 - warning that we don't want to fix. Examples may include  
63 - suppressing warnings about the use of old-style casts in code that  
64 - is shared between C and C++ code.  
65 -  
66 - There are a few significant areas where casting is common in the qpdf  
67 - sources or where casting would be required to quiet higher levels  
68 - of compiler warnings but is omitted at present:  
69 -  
70 - * signed vs. unsigned char. For historical reasons, there are a  
71 - lot of places in qpdf's internals that deal with unsigned char,  
72 - which means that a lot of casting is required to interoperate  
73 - with standard library calls and std::string. In retrospect,  
74 - qpdf should have probably used signed char everywhere and just  
75 - cast to unsigned char when needed. There are reinterpret_cast  
76 - calls to go between char* and unsigned char*, and there are  
77 - static_cast calls to go between char and unsigned char. These  
78 - should always be safe.  
79 -  
80 - * non-const unsigned char* used in Pipeline interface. The  
81 - pipeline interface has a write() call that uses unsigned char*  
82 - without a const qualifier. The main reason for this is to  
83 - support pipelines that make calls to third-party libraries, such  
84 - as zlib, that don't include const in their interfaces.  
85 - Unfortunately, there are many places in the code where it is  
86 - desirable to have const char* with pipelines. None of the  
87 - pipeline implementations in qpdf currently modify the data  
88 - passed to write, and doing so would be counter to the intent of  
89 - Pipeline. There are places in the code where const_cast is used  
90 - to remove the const-ness of pointers going into Pipelines. This  
91 - could be potentially unsafe, but there is adequate testing to  
92 - assert that it is safe in qpdf's code.  
93 -  
94 - * size_t vs. qpdf_offset_t. This is pretty much unavoidable since  
95 - offsets are signed types and sizes are unsigned types. Whenever  
96 - it is necessary to seek by an amount given by a size_t, it  
97 - becomes necessary to mix and match between size_t and  
98 - qpdf_offset_t. Additionally, qpdf sometimes treats memory  
99 - buffers like files, and those seek interfaces have to be  
100 - consistent with file-based input sources. Neither gcc nor MSVC  
101 - give warnings for this case by default, but both have warning  
102 - flags that can enable this. (MSVC: /W14267 or /W3 (which also  
103 - enables some additional warnings that we ignore); gcc:  
104 - -Wconversion -Wsign-conversion). This could matter for files  
105 - whose sizes are larger than 2^63 bytes, but it is reasonable to  
106 - expect that a world where such files are common would also have  
107 - larger size_t and qpdf_offset_t types in it. I am not aware of  
108 - any cases where 32-bit systems that have size_t smaller than  
109 - qpdf_offset_t could run into problems, though I can't  
110 - conclusively rule out the possibility. In the event that  
111 - someone should produce a file that qpdf can't handle because of  
112 - what is suspected to be issues involving the handling of size_t  
113 - vs. qpdf_offset_t (such files may behave properly on 64-bit  
114 - systems but not on 32-bit systems and may have very large  
115 - embedded files or streams, for example), the above mentioned  
116 - warning flags could be enabled and all those implicit  
117 - conversions could be carefully scrutinized. (I have already  
118 - gone through that exercise once in adding support for files >  
119 - 4GB in size.) I continue to be commited to supporting large  
120 - files on 32-bit systems, but I would not go to any lengths to  
121 - support corner cases involving large embedded files or large  
122 - streams that work on 64-bit systems but not on 32-bit systems  
123 - because of size_t being too small. It is reasonable to assume  
124 - that anyone working with such files would be using a 64-bit  
125 - system anyway.  
126 -  
127 - * size_t vs. int. There are some cases where size_t and int or  
128 - size_t and unsigned int are used interchangeably. These cases  
129 - occur when working with very small amounts of memory, such as  
130 - with the bit readers (where we're working with just a few bytes  
131 - at a time), some cases of strlen, and a few other cases. I have  
132 - scrutinized all of these cases and determined them to be safe,  
133 - but there is no mechanism in the code to ensure that new unsafe  
134 - conversions between int and size_t aren't introduced short of  
135 - good testing and strong awareness of the issues. Again, if any  
136 - such bugs are suspected in the future, enable the additional  
137 - warning flags and scrutinizing the warnings would be in order.  
138 -  
139 - * New public interfaces have been added.  
140 -  
141 -  
142 General 47 General
143 ======= 48 =======
144 49
manual/qpdf-manual.xml
@@ -1623,6 +1623,166 @@ outfile.pdf&lt;/option&gt; @@ -1623,6 +1623,166 @@ outfile.pdf&lt;/option&gt;
1623 </itemizedlist> 1623 </itemizedlist>
1624 </para> 1624 </para>
1625 </sect1> 1625 </sect1>
  1626 + <sect1 id="ref.casting">
  1627 + <title>Casting Policy</title>
  1628 + <para>
  1629 + This section describes the casting policy followed by qpdf's
  1630 + implementation. This is no concern to qpdf's end users and
  1631 + largely of no concern to people writing code that uses qpdf, but
  1632 + it could be of interest to people who are porting qpdf to a new
  1633 + platform or who are making modifications to the code.
  1634 + </para>
  1635 + <para>
  1636 + The C++ code in qpdf is free of old-style casts except where
  1637 + unavoidable (e.g. where the old-style cast is in a macro provided
  1638 + by a third-party header file). When there is a need for a cast,
  1639 + it is handled, in order of preference, by rewriting the code to
  1640 + avoid the need for a cast, calling
  1641 + <function>const_cast</function>, calling
  1642 + <function>static_cast</function>, calling
  1643 + <function>reinterpret_cast</function>, or calling some combination
  1644 + of the above. As a last resort, a compiler-specific
  1645 + <literal>#pragma</literal> may be used to suppress a warning that
  1646 + we don't want to fix. Examples may include suppressing warnings
  1647 + about the use of old-style casts in code that is shared between C
  1648 + and C++ code.
  1649 + </para>
  1650 + <para>
  1651 + The casting policy explicitly prohibits casting between integer
  1652 + sizes for no purpose other than to quiet a compiler warning when
  1653 + there is no reasonable chance of a problem resulting. The reason
  1654 + for this exclusion is that the practice of adding these additional
  1655 + casts precludes future use of additional compiler warnings as a
  1656 + tool for making future improvements to this aspect of the code,
  1657 + and it also damages the readability of the code.
  1658 + </para>
  1659 + <para>
  1660 + There are a few significant areas where casting is common in the
  1661 + qpdf sources or where casting would be required to quiet higher
  1662 + levels of compiler warnings but is omitted at present:
  1663 + <itemizedlist>
  1664 + <listitem>
  1665 + <para>
  1666 + <type>char</type> vs. <type>unsigned char</type>. For
  1667 + historical reasons, there are a lot of places in qpdf's
  1668 + internals that deal with <type>unsigned char</type>, which
  1669 + means that a lot of casting is required to interoperate with
  1670 + standard library calls and <type>std::string</type>. In
  1671 + retrospect, qpdf should have probably used regular (signed)
  1672 + <type>char</type> and <type>char*</type> everywhere and just
  1673 + cast to <type>unsigned char</type> when needed, but it's too
  1674 + late to make that change now. There are
  1675 + <function>reinterpret_cast</function> calls to go between
  1676 + <type>char*</type> and <type>unsigned char*</type>, and there
  1677 + are <function>static_cast</function> calls to go between
  1678 + <type>char</type> and <type>unsigned char</type>. These should
  1679 + always be safe.
  1680 + </para>
  1681 + </listitem>
  1682 + <listitem>
  1683 + <para>
  1684 + Non-const <type>unsigned char*</type> used in the
  1685 + <type>Pipeline</type> interface. The pipeline interface has a
  1686 + <function>write</function> call that uses <type>unsigned
  1687 + char*</type> without a <type>const</type> qualifier. The main
  1688 + reason for this is to support pipelines that make calls to
  1689 + third-party libraries, such as zlib, that don't include
  1690 + <type>const</type> in their interfaces. Unfortunately, there
  1691 + are many places in the code where it is desirable to have
  1692 + <type>const char*</type> with pipelines. None of the pipeline
  1693 + implementations in qpdf currently modify the data passed to
  1694 + write, and doing so would be counter to the intent of
  1695 + <type>Pipeline</type>, but there is nothing in the code to
  1696 + prevent this from being done. There are places in the code
  1697 + where <function>const_cast</function> is used to remove the
  1698 + const-ness of pointers going into <type>Pipeline</type>s. This
  1699 + could theoretically be unsafe, but there is adequate testing to
  1700 + assert that it is safe and will remain safe in qpdf's code.
  1701 + </para>
  1702 + </listitem>
  1703 + <listitem>
  1704 + <para>
  1705 + <type>size_t</type> vs. <type>qpdf_offset_t</type>. This is
  1706 + pretty much unavoidable since sizes are unsigned types and
  1707 + offsets are signed types. Whenever it is necessary to seek by
  1708 + an amount given by a <type>size_t</type>, it becomes necessary
  1709 + to mix and match between <type>size_t</type> and
  1710 + <type>qpdf_offset_t</type>. Additionally, qpdf sometimes
  1711 + treats memory buffers like files (as with
  1712 + <type>BufferInputSource</type>, and those seek interfaces have
  1713 + to be consistent with file-based input sources. Neither gcc
  1714 + nor MSVC give warnings for this case by default, but both have
  1715 + warning flags that can enable this. (MSVC:
  1716 + <option>/W14267</option> or <option>/W3</option>, which also
  1717 + enables some additional warnings that we ignore; gcc:
  1718 + <option>-Wconversion -Wsign-conversion</option>). This could
  1719 + matter for files whose sizes are larger than
  1720 + 2<superscript>63</superscript> bytes, but it is reasonable to
  1721 + expect that a world where such files are common would also have
  1722 + larger <type>size_t</type> and <type>qpdf_offset_t</type> types
  1723 + in it. On most 64-bit systems at the time of this writing (the
  1724 + release of version 4.1.0 of qpdf), both <type>size_t</type> and
  1725 + <type>qpdf_offset_t</type> are 64-bit integer types, while on
  1726 + many current 32-bit systems, <type>size_t</type> is a 32-bit
  1727 + type while <type>qpdf_offset_t</type> is a 64-bit type. I am
  1728 + not aware of any cases where 32-bit systems that have
  1729 + <type>size_t</type> smaller than <type>qpdf_offset_t</type>
  1730 + could run into problems. Although I can't conclusively rule
  1731 + out the possibility of such problems existing, I suspect any
  1732 + cases would be pretty contrived. In the event that someone
  1733 + should produce a file that qpdf can't handle because of what is
  1734 + suspected to be issues involving the handling of
  1735 + <type>size_t</type> vs. <type>qpdf_offset_t</type> (such files
  1736 + may behave properly on 64-bit systems but not on 32-bit systems
  1737 + because they have very large embedded files or streams, for
  1738 + example), the above mentioned warning flags could be enabled
  1739 + and all those implicit conversions could be carefully
  1740 + scrutinized. (I have already gone through that exercise once
  1741 + in adding support for files larger than 4&nbsp;GB in size.) I
  1742 + continue to be commited to supporting large files on 32-bit
  1743 + systems, but I would not go to any lengths to support corner
  1744 + cases involving large embedded files or large streams that work
  1745 + on 64-bit systems but not on 32-bit systems because of
  1746 + <type>size_t</type> being too small. It is reasonable to
  1747 + assume that anyone working with such files would be using a
  1748 + 64-bit system anyway since many 32-bit applications would have
  1749 + similar difficulties.
  1750 + </para>
  1751 + </listitem>
  1752 + <listitem>
  1753 + <para>
  1754 + <type>size_t</type> vs. <type>int</type> or <type>long</type>.
  1755 + There are some cases where <type>size_t</type> and
  1756 + <type>int</type> or <type>long</type> or <type>size_t</type>
  1757 + and <type>unsigned int</type> or <type>unsigned long</type> are
  1758 + used interchangeably. These cases occur when working with very
  1759 + small amounts of memory, such as with the bit readers (where
  1760 + we're working with just a few bytes at a time), some cases of
  1761 + <function>strlen</function>, and a few other cases. I have
  1762 + scrutinized all of these cases and determined them to be safe,
  1763 + but there is no mechanism in the code to ensure that new unsafe
  1764 + conversions between <type>int</type> and <type>size_t</type>
  1765 + aren't introduced short of good testing and strong awareness of
  1766 + the issues. Again, if any such bugs are suspected in the
  1767 + future, enabling the additional warning flags and scrutinizing
  1768 + the warnings would be in order.
  1769 + </para>
  1770 + </listitem>
  1771 + </itemizedlist>
  1772 + </para>
  1773 + <para>
  1774 + To be clear, I believe qpdf to be well-behaved with respect to
  1775 + sizes and offsets, and qpdf's test suite includes actual
  1776 + generation and full processing of files larger than 4&nbsp;GB in
  1777 + size. The issues raised here are largely academic and should not
  1778 + in any way be interpreted to mean that qpdf has practical problems
  1779 + involving sloppiness with integer types. I also believe that
  1780 + appropriate measures have been taken in the code to avoid problems
  1781 + with signed vs. unsigned integers from resulting in memory
  1782 + overwrites or other issues with potential security implications,
  1783 + though there are never any absolute guarantees.
  1784 + </para>
  1785 + </sect1>
1626 <sect1 id="ref.encryption"> 1786 <sect1 id="ref.encryption">
1627 <title>Encryption</title> 1787 <title>Encryption</title>
1628 <para> 1788 <para>