Commit 3d6615b2764da759c5b9354c40cd1a32b84ee039

Authored by Jay Berkenbilt
1 parent 48331b4b

Pl_Buffer: reduce memory growth (fixes #228)

Rather than keeping a list of buffers for every write, accumulate
bytes in a single buffer, doubling the size of the buffer when needed
to accommodate new data.

This is not the best possible implementation, but the change was
implemented in this way to avoid changing the shape of Pl_Buffer and
thus breaking backward compatibility.
ChangeLog
  1 +2018-08-12 Jay Berkenbilt <ejb@ql.org>
  2 +
  3 + * Rewrite the internals of Pl_Buffer to be much more efficient in
  4 + use of memory at a very slight performance cost. The old
  5 + implementation could cause memory usage to go out of control for
  6 + files with large images compressed using the TIFF predictor.
  7 + Fixes #228.
  8 +
1 9 2018-08-05 Jay Berkenbilt <ejb@ql.org>
2 10  
3 11 * Bug fix: end of line characters were not properly handled inside
... ...
... ... @@ -31,6 +31,16 @@ Soon
31 31 - See ../misc/broken-files
32 32  
33 33  
  34 +Next ABI
  35 +========
  36 +
  37 +Do these things next time we have to break binary compatibility
  38 +
  39 + * Pl_Buffer's internal structure is not right for what it does. It
  40 + was modified for greater efficiency, but it was done in a way that
  41 + preserved binary compatibility, so the implementation is a bit
  42 + convoluted.
  43 +
34 44 Lexical
35 45 =======
36 46  
... ... @@ -72,6 +82,8 @@ directory or that are otherwise not publicly accessible. This includes
72 82 things sent to me by email that are specifically not public. Even so,
73 83 I find it useful to make reference to them in this list
74 84  
  85 + * Pl_TIFFPredictor is pretty slow.
  86 +
75 87 * Some test cases on bad fails fail because qpdf is unable to find
76 88 the root dictionary when it fails to read the trailer. Recovery
77 89 could find the root dictionary and even the info dictionary in
... ...
libqpdf/Pl_Buffer.cc
... ... @@ -17,11 +17,32 @@ Pl_Buffer::~Pl_Buffer()
17 17 void
18 18 Pl_Buffer::write(unsigned char* buf, size_t len)
19 19 {
20   - Buffer* b = new Buffer(len);
21   - memcpy(b->getBuffer(), buf, len);
22   - this->data.push_back(b);
  20 + PointerHolder<Buffer> cur_buf;
  21 + size_t cur_size = 0;
  22 + if (! this->data.empty())
  23 + {
  24 + cur_buf = this->data.back();
  25 + cur_size = cur_buf->getSize();
  26 + }
  27 + size_t left = cur_size - this->total_size;
  28 + if (left < len)
  29 + {
  30 + size_t new_size = std::max(this->total_size + len, 2 * cur_size);
  31 + Buffer* b = new Buffer(new_size);
  32 + if (cur_buf.getPointer())
  33 + {
  34 + memcpy(b->getBuffer(), cur_buf->getBuffer(), this->total_size);
  35 + }
  36 + this->data.clear();
  37 + cur_buf = b;
  38 + this->data.push_back(cur_buf);
  39 + }
  40 + if (len)
  41 + {
  42 + memcpy(cur_buf->getBuffer() + this->total_size, buf, len);
  43 + this->total_size += len;
  44 + }
23 45 this->ready = false;
24   - this->total_size += len;
25 46  
26 47 if (getNext(true))
27 48 {
... ... @@ -49,17 +70,13 @@ Pl_Buffer::getBuffer()
49 70  
50 71 Buffer* b = new Buffer(this->total_size);
51 72 unsigned char* p = b->getBuffer();
52   - while (! this->data.empty())
  73 + if (! this->data.empty())
53 74 {
54   - PointerHolder<Buffer> bp = this->data.front();
55   - this->data.pop_front();
56   - size_t bytes = bp->getSize();
57   - memcpy(p, bp->getBuffer(), bytes);
58   - p += bytes;
59   - this->total_size -= bytes;
  75 + PointerHolder<Buffer> bp = this->data.back();
  76 + this->data.clear();
  77 + memcpy(p, bp->getBuffer(), this->total_size);
60 78 }
61   -
62   - assert(this->total_size == 0);
  79 + this->total_size = 0;
63 80 this->ready = false;
64 81  
65 82 return b;
... ...