Commit cc755e37f7b559038e2d23acb6359814fb998286

Authored by Jay Berkenbilt
1 parent e9eac2a2

Update TODO with notes on performance analysis

Showing 1 changed file with 61 additions and 0 deletions
@@ -154,6 +154,67 @@ test that fails if wildcard expansion is broken. In the absence of @@ -154,6 +154,67 @@ test that fails if wildcard expansion is broken. In the absence of
154 this, it will be necessary to test the behavior manually in both mingw 154 this, it will be necessary to test the behavior manually in both mingw
155 and msvc when run from cmd and from msys bash. 155 and msvc when run from cmd and from msys bash.
156 156
  157 +Performance
  158 +===========
  159 +
  160 +As described in https://github.com/qpdf/qpdf/issues/401, there was
  161 +great performance degradation between qpdf 7.1.1 and 9.1.1. Doing a
  162 +bisect between dac65a21fb4fa5f871e31c314280b75adde89a6c and
  163 +release-qpdf-7.1.1, I found several commits that damaged performance.
  164 +I fixed some of them to improve performance by about 70% (as measured
  165 +by saying that old times were 170% of new times). The remaining
  166 +commits that broke performance either can't be correct because they
  167 +would re-introduce an old bug or aren't worth correcting because of
  168 +the high value they offer relative to a relatively low penalty. For
  169 +historical reference, here are the commits. The numbers are the time
  170 +in seconds on the machine I happened to be using of splitting the
  171 +first 100 pages of PDF32000_2008.pdf 20 times and taking an average
  172 +duration.
  173 +
  174 +Commits that broke performance:
  175 +
  176 +* d0e99f195a987c483bbb6c5449cf39bee34e08a1 -- object description and
  177 + context: 0.39 -> 0.45
  178 +* a01359189b32c60c2d55b039f7aefd6c3ce0ebde (minus 313ba08) -- fix
  179 + dangling references: 0.55 -> 0.6
  180 +* e5f504b6c5dc34337cc0b316b4a7b1fca7e614b1 -- sparse array: 0.6 -> 0.62
  181 +
  182 +Other intermediate steps that were previously fixed:
  183 +
  184 +* 313ba081265f69ac9a0324f9fe87087c72918191 -- copy outlines into
  185 + split: 0.55 -> 4.0
  186 +* a01359189b32c60c2d55b039f7aefd6c3ce0ebde -- fix dangling references:
  187 + 4.0 -> 9.0
  188 +
  189 +This commit fixed the awful problem introduced in 313ba081:
  190 +
  191 +* a5a016cdd26a8e5c99e5f019bc30d1bdf6c050a2 -- revert outline
  192 + preservation: 9.0 -> 0.6
  193 +
  194 +Note that the fix dangling references commit had a much worse impact
  195 +prior to removing the outline preservation, so I also measured its
  196 +impact in isolation.
  197 +
  198 +A few important lessons:
  199 +
  200 +* Indirection through PointerHolder<Members> is expensive, and should
  201 + not be used for things that are created and destroyed frequently
  202 + such as QPDFObjectHandle and QPDFObject.
  203 +* Traversal of objects is expensive and should be avoided where
  204 + possible.
  205 +
  206 +Future ideas:
  207 +
  208 +* Look at places in the code where object traversal is being done and,
  209 + where possible, try to avoid it entirely or at least avoid ever
  210 + traversing the same objects multiple times.
  211 +* Avoid attaching too much metadata to objects and object handles
  212 + since those have to get copied around a lot.
  213 +
  214 +Also, it turns out that PointerHolder is more performant than
  215 +std::shared_ptr.
  216 +
  217 +
157 General 218 General
158 ======= 219 =======
159 220