Commit e8ddac89501e232205e1737a07ddb7d1c2425e4b

Authored by Jay Berkenbilt
1 parent 1ec1b128

Document casting policy

ChangeLog
  1 +2013-03-25 Jay Berkenbilt <ejb@ql.org>
  2 +
  3 + * manual/qpdf-manual.xml: Document the casting policy that is
  4 + followed in qpdf's implementation.
  5 +
1 6 2013-03-11 Jay Berkenbilt <ejb@ql.org>
2 7  
3 8 * When creating Windows binary distributions, make sure to only
... ...
README.maintainer
... ... @@ -31,7 +31,9 @@ Release Reminders
31 31 * Check all open issues in the sourceforge trackers and on github.
32 32  
33 33 * If any interfaces were added or changed, check C API to see whether
34   - changes are appropriate there as well.
  34 + changes are appropriate there as well. If necessary, review the
  35 + casting policy in the manual, and ensure that integer types are
  36 + properly handled.
35 37  
36 38 * Increment shared library version information as needed
37 39 (libqpdf/build.mk)
... ...
  1 +4.1.0
  2 +=====
  3 +
  4 + * New public interfaces have been added.
  5 +
  6 +
1 7 4.2.0
2 8 =====
3 9  
... ... @@ -38,107 +44,6 @@
38 44 - See ../misc/broken-files
39 45  
40 46  
41   -4.1.0
42   -=====
43   -
44   - * Add to documentation, and mention this documentation in
45   - README.maintainer:
46   -
47   - Casting policy.
48   -
49   - The C++ code in qpdf is free of old-style casts except where
50   - unavoidable (e.g. where the old-style cast is in a macro provided
51   - by a third-party header file). When there is a need for a cast, it
52   - is handled, in order of preference by rewriting the code to avoid
53   - the need for a cast, calling const_cast, calling static_cast,
54   - calling reinterpret_cast, or calling some combination of the above.
55   - The casting policy explicitly prohibits casting between sizes for
56   - no purpose other than to quiet a compiler warning when there is no
57   - reasonable chance of a problem resulting. The reason for this
58   - exclusion is that it takes away enabling additional compiler
59   - warnings as a tool for making future improvements to this aspect of
60   - the code and also damages the readability of the code. As a last
61   - resort, a compiler-specific pragma may be used to suppress a
62   - warning that we don't want to fix. Examples may include
63   - suppressing warnings about the use of old-style casts in code that
64   - is shared between C and C++ code.
65   -
66   - There are a few significant areas where casting is common in the qpdf
67   - sources or where casting would be required to quiet higher levels
68   - of compiler warnings but is omitted at present:
69   -
70   - * signed vs. unsigned char. For historical reasons, there are a
71   - lot of places in qpdf's internals that deal with unsigned char,
72   - which means that a lot of casting is required to interoperate
73   - with standard library calls and std::string. In retrospect,
74   - qpdf should have probably used signed char everywhere and just
75   - cast to unsigned char when needed. There are reinterpret_cast
76   - calls to go between char* and unsigned char*, and there are
77   - static_cast calls to go between char and unsigned char. These
78   - should always be safe.
79   -
80   - * non-const unsigned char* used in Pipeline interface. The
81   - pipeline interface has a write() call that uses unsigned char*
82   - without a const qualifier. The main reason for this is to
83   - support pipelines that make calls to third-party libraries, such
84   - as zlib, that don't include const in their interfaces.
85   - Unfortunately, there are many places in the code where it is
86   - desirable to have const char* with pipelines. None of the
87   - pipeline implementations in qpdf currently modify the data
88   - passed to write, and doing so would be counter to the intent of
89   - Pipeline. There are places in the code where const_cast is used
90   - to remove the const-ness of pointers going into Pipelines. This
91   - could be potentially unsafe, but there is adequate testing to
92   - assert that it is safe in qpdf's code.
93   -
94   - * size_t vs. qpdf_offset_t. This is pretty much unavoidable since
95   - offsets are signed types and sizes are unsigned types. Whenever
96   - it is necessary to seek by an amount given by a size_t, it
97   - becomes necessary to mix and match between size_t and
98   - qpdf_offset_t. Additionally, qpdf sometimes treats memory
99   - buffers like files, and those seek interfaces have to be
100   - consistent with file-based input sources. Neither gcc nor MSVC
101   - give warnings for this case by default, but both have warning
102   - flags that can enable this. (MSVC: /W14267 or /W3 (which also
103   - enables some additional warnings that we ignore); gcc:
104   - -Wconversion -Wsign-conversion). This could matter for files
105   - whose sizes are larger than 2^63 bytes, but it is reasonable to
106   - expect that a world where such files are common would also have
107   - larger size_t and qpdf_offset_t types in it. I am not aware of
108   - any cases where 32-bit systems that have size_t smaller than
109   - qpdf_offset_t could run into problems, though I can't
110   - conclusively rule out the possibility. In the event that
111   - someone should produce a file that qpdf can't handle because of
112   - what is suspected to be issues involving the handling of size_t
113   - vs. qpdf_offset_t (such files may behave properly on 64-bit
114   - systems but not on 32-bit systems and may have very large
115   - embedded files or streams, for example), the above mentioned
116   - warning flags could be enabled and all those implicit
117   - conversions could be carefully scrutinized. (I have already
118   - gone through that exercise once in adding support for files >
119   - 4GB in size.) I continue to be commited to supporting large
120   - files on 32-bit systems, but I would not go to any lengths to
121   - support corner cases involving large embedded files or large
122   - streams that work on 64-bit systems but not on 32-bit systems
123   - because of size_t being too small. It is reasonable to assume
124   - that anyone working with such files would be using a 64-bit
125   - system anyway.
126   -
127   - * size_t vs. int. There are some cases where size_t and int or
128   - size_t and unsigned int are used interchangeably. These cases
129   - occur when working with very small amounts of memory, such as
130   - with the bit readers (where we're working with just a few bytes
131   - at a time), some cases of strlen, and a few other cases. I have
132   - scrutinized all of these cases and determined them to be safe,
133   - but there is no mechanism in the code to ensure that new unsafe
134   - conversions between int and size_t aren't introduced short of
135   - good testing and strong awareness of the issues. Again, if any
136   - such bugs are suspected in the future, enable the additional
137   - warning flags and scrutinizing the warnings would be in order.
138   -
139   - * New public interfaces have been added.
140   -
141   -
142 47 General
143 48 =======
144 49  
... ...
manual/qpdf-manual.xml
... ... @@ -1623,6 +1623,166 @@ outfile.pdf&lt;/option&gt;
1623 1623 </itemizedlist>
1624 1624 </para>
1625 1625 </sect1>
  1626 + <sect1 id="ref.casting">
  1627 + <title>Casting Policy</title>
  1628 + <para>
  1629 + This section describes the casting policy followed by qpdf's
  1630 + implementation. This is no concern to qpdf's end users and
  1631 + largely of no concern to people writing code that uses qpdf, but
  1632 + it could be of interest to people who are porting qpdf to a new
  1633 + platform or who are making modifications to the code.
  1634 + </para>
  1635 + <para>
  1636 + The C++ code in qpdf is free of old-style casts except where
  1637 + unavoidable (e.g. where the old-style cast is in a macro provided
  1638 + by a third-party header file). When there is a need for a cast,
  1639 + it is handled, in order of preference, by rewriting the code to
  1640 + avoid the need for a cast, calling
  1641 + <function>const_cast</function>, calling
  1642 + <function>static_cast</function>, calling
  1643 + <function>reinterpret_cast</function>, or calling some combination
  1644 + of the above. As a last resort, a compiler-specific
  1645 + <literal>#pragma</literal> may be used to suppress a warning that
  1646 + we don't want to fix. Examples may include suppressing warnings
  1647 + about the use of old-style casts in code that is shared between C
  1648 + and C++ code.
  1649 + </para>
  1650 + <para>
  1651 + The casting policy explicitly prohibits casting between integer
  1652 + sizes for no purpose other than to quiet a compiler warning when
  1653 + there is no reasonable chance of a problem resulting. The reason
  1654 + for this exclusion is that the practice of adding these additional
  1655 + casts precludes future use of additional compiler warnings as a
  1656 + tool for making future improvements to this aspect of the code,
  1657 + and it also damages the readability of the code.
  1658 + </para>
  1659 + <para>
  1660 + There are a few significant areas where casting is common in the
  1661 + qpdf sources or where casting would be required to quiet higher
  1662 + levels of compiler warnings but is omitted at present:
  1663 + <itemizedlist>
  1664 + <listitem>
  1665 + <para>
  1666 + <type>char</type> vs. <type>unsigned char</type>. For
  1667 + historical reasons, there are a lot of places in qpdf's
  1668 + internals that deal with <type>unsigned char</type>, which
  1669 + means that a lot of casting is required to interoperate with
  1670 + standard library calls and <type>std::string</type>. In
  1671 + retrospect, qpdf should have probably used regular (signed)
  1672 + <type>char</type> and <type>char*</type> everywhere and just
  1673 + cast to <type>unsigned char</type> when needed, but it's too
  1674 + late to make that change now. There are
  1675 + <function>reinterpret_cast</function> calls to go between
  1676 + <type>char*</type> and <type>unsigned char*</type>, and there
  1677 + are <function>static_cast</function> calls to go between
  1678 + <type>char</type> and <type>unsigned char</type>. These should
  1679 + always be safe.
  1680 + </para>
  1681 + </listitem>
  1682 + <listitem>
  1683 + <para>
  1684 + Non-const <type>unsigned char*</type> used in the
  1685 + <type>Pipeline</type> interface. The pipeline interface has a
  1686 + <function>write</function> call that uses <type>unsigned
  1687 + char*</type> without a <type>const</type> qualifier. The main
  1688 + reason for this is to support pipelines that make calls to
  1689 + third-party libraries, such as zlib, that don't include
  1690 + <type>const</type> in their interfaces. Unfortunately, there
  1691 + are many places in the code where it is desirable to have
  1692 + <type>const char*</type> with pipelines. None of the pipeline
  1693 + implementations in qpdf currently modify the data passed to
  1694 + write, and doing so would be counter to the intent of
  1695 + <type>Pipeline</type>, but there is nothing in the code to
  1696 + prevent this from being done. There are places in the code
  1697 + where <function>const_cast</function> is used to remove the
  1698 + const-ness of pointers going into <type>Pipeline</type>s. This
  1699 + could theoretically be unsafe, but there is adequate testing to
  1700 + assert that it is safe and will remain safe in qpdf's code.
  1701 + </para>
  1702 + </listitem>
  1703 + <listitem>
  1704 + <para>
  1705 + <type>size_t</type> vs. <type>qpdf_offset_t</type>. This is
  1706 + pretty much unavoidable since sizes are unsigned types and
  1707 + offsets are signed types. Whenever it is necessary to seek by
  1708 + an amount given by a <type>size_t</type>, it becomes necessary
  1709 + to mix and match between <type>size_t</type> and
  1710 + <type>qpdf_offset_t</type>. Additionally, qpdf sometimes
  1711 + treats memory buffers like files (as with
  1712 + <type>BufferInputSource</type>, and those seek interfaces have
  1713 + to be consistent with file-based input sources. Neither gcc
  1714 + nor MSVC give warnings for this case by default, but both have
  1715 + warning flags that can enable this. (MSVC:
  1716 + <option>/W14267</option> or <option>/W3</option>, which also
  1717 + enables some additional warnings that we ignore; gcc:
  1718 + <option>-Wconversion -Wsign-conversion</option>). This could
  1719 + matter for files whose sizes are larger than
  1720 + 2<superscript>63</superscript> bytes, but it is reasonable to
  1721 + expect that a world where such files are common would also have
  1722 + larger <type>size_t</type> and <type>qpdf_offset_t</type> types
  1723 + in it. On most 64-bit systems at the time of this writing (the
  1724 + release of version 4.1.0 of qpdf), both <type>size_t</type> and
  1725 + <type>qpdf_offset_t</type> are 64-bit integer types, while on
  1726 + many current 32-bit systems, <type>size_t</type> is a 32-bit
  1727 + type while <type>qpdf_offset_t</type> is a 64-bit type. I am
  1728 + not aware of any cases where 32-bit systems that have
  1729 + <type>size_t</type> smaller than <type>qpdf_offset_t</type>
  1730 + could run into problems. Although I can't conclusively rule
  1731 + out the possibility of such problems existing, I suspect any
  1732 + cases would be pretty contrived. In the event that someone
  1733 + should produce a file that qpdf can't handle because of what is
  1734 + suspected to be issues involving the handling of
  1735 + <type>size_t</type> vs. <type>qpdf_offset_t</type> (such files
  1736 + may behave properly on 64-bit systems but not on 32-bit systems
  1737 + because they have very large embedded files or streams, for
  1738 + example), the above mentioned warning flags could be enabled
  1739 + and all those implicit conversions could be carefully
  1740 + scrutinized. (I have already gone through that exercise once
  1741 + in adding support for files larger than 4&nbsp;GB in size.) I
  1742 + continue to be commited to supporting large files on 32-bit
  1743 + systems, but I would not go to any lengths to support corner
  1744 + cases involving large embedded files or large streams that work
  1745 + on 64-bit systems but not on 32-bit systems because of
  1746 + <type>size_t</type> being too small. It is reasonable to
  1747 + assume that anyone working with such files would be using a
  1748 + 64-bit system anyway since many 32-bit applications would have
  1749 + similar difficulties.
  1750 + </para>
  1751 + </listitem>
  1752 + <listitem>
  1753 + <para>
  1754 + <type>size_t</type> vs. <type>int</type> or <type>long</type>.
  1755 + There are some cases where <type>size_t</type> and
  1756 + <type>int</type> or <type>long</type> or <type>size_t</type>
  1757 + and <type>unsigned int</type> or <type>unsigned long</type> are
  1758 + used interchangeably. These cases occur when working with very
  1759 + small amounts of memory, such as with the bit readers (where
  1760 + we're working with just a few bytes at a time), some cases of
  1761 + <function>strlen</function>, and a few other cases. I have
  1762 + scrutinized all of these cases and determined them to be safe,
  1763 + but there is no mechanism in the code to ensure that new unsafe
  1764 + conversions between <type>int</type> and <type>size_t</type>
  1765 + aren't introduced short of good testing and strong awareness of
  1766 + the issues. Again, if any such bugs are suspected in the
  1767 + future, enabling the additional warning flags and scrutinizing
  1768 + the warnings would be in order.
  1769 + </para>
  1770 + </listitem>
  1771 + </itemizedlist>
  1772 + </para>
  1773 + <para>
  1774 + To be clear, I believe qpdf to be well-behaved with respect to
  1775 + sizes and offsets, and qpdf's test suite includes actual
  1776 + generation and full processing of files larger than 4&nbsp;GB in
  1777 + size. The issues raised here are largely academic and should not
  1778 + in any way be interpreted to mean that qpdf has practical problems
  1779 + involving sloppiness with integer types. I also believe that
  1780 + appropriate measures have been taken in the code to avoid problems
  1781 + with signed vs. unsigned integers from resulting in memory
  1782 + overwrites or other issues with potential security implications,
  1783 + though there are never any absolute guarantees.
  1784 + </para>
  1785 + </sect1>
1626 1786 <sect1 id="ref.encryption">
1627 1787 <title>Encryption</title>
1628 1788 <para>
... ...