Commit cc755e37f7b559038e2d23acb6359814fb998286
1 parent
e9eac2a2
Update TODO with notes on performance analysis
Showing
1 changed file
with
61 additions
and
0 deletions
TODO
| @@ -154,6 +154,67 @@ test that fails if wildcard expansion is broken. In the absence of | @@ -154,6 +154,67 @@ test that fails if wildcard expansion is broken. In the absence of | ||
| 154 | this, it will be necessary to test the behavior manually in both mingw | 154 | this, it will be necessary to test the behavior manually in both mingw |
| 155 | and msvc when run from cmd and from msys bash. | 155 | and msvc when run from cmd and from msys bash. |
| 156 | 156 | ||
| 157 | +Performance | ||
| 158 | +=========== | ||
| 159 | + | ||
| 160 | +As described in https://github.com/qpdf/qpdf/issues/401, there was | ||
| 161 | +great performance degradation between qpdf 7.1.1 and 9.1.1. Doing a | ||
| 162 | +bisect between dac65a21fb4fa5f871e31c314280b75adde89a6c and | ||
| 163 | +release-qpdf-7.1.1, I found several commits that damaged performance. | ||
| 164 | +I fixed some of them to improve performance by about 70% (as measured | ||
| 165 | +by saying that old times were 170% of new times). The remaining | ||
| 166 | +commits that broke performance either can't be correct because they | ||
| 167 | +would re-introduce an old bug or aren't worth correcting because of | ||
| 168 | +the high value they offer relative to a relatively low penalty. For | ||
| 169 | +historical reference, here are the commits. The numbers are the time | ||
| 170 | +in seconds on the machine I happened to be using of splitting the | ||
| 171 | +first 100 pages of PDF32000_2008.pdf 20 times and taking an average | ||
| 172 | +duration. | ||
| 173 | + | ||
| 174 | +Commits that broke performance: | ||
| 175 | + | ||
| 176 | +* d0e99f195a987c483bbb6c5449cf39bee34e08a1 -- object description and | ||
| 177 | + context: 0.39 -> 0.45 | ||
| 178 | +* a01359189b32c60c2d55b039f7aefd6c3ce0ebde (minus 313ba08) -- fix | ||
| 179 | + dangling references: 0.55 -> 0.6 | ||
| 180 | +* e5f504b6c5dc34337cc0b316b4a7b1fca7e614b1 -- sparse array: 0.6 -> 0.62 | ||
| 181 | + | ||
| 182 | +Other intermediate steps that were previously fixed: | ||
| 183 | + | ||
| 184 | +* 313ba081265f69ac9a0324f9fe87087c72918191 -- copy outlines into | ||
| 185 | + split: 0.55 -> 4.0 | ||
| 186 | +* a01359189b32c60c2d55b039f7aefd6c3ce0ebde -- fix dangling references: | ||
| 187 | + 4.0 -> 9.0 | ||
| 188 | + | ||
| 189 | +This commit fixed the awful problem introduced in 313ba081: | ||
| 190 | + | ||
| 191 | +* a5a016cdd26a8e5c99e5f019bc30d1bdf6c050a2 -- revert outline | ||
| 192 | + preservation: 9.0 -> 0.6 | ||
| 193 | + | ||
| 194 | +Note that the fix dangling references commit had a much worse impact | ||
| 195 | +prior to removing the outline preservation, so I also measured its | ||
| 196 | +impact in isolation. | ||
| 197 | + | ||
| 198 | +A few important lessons: | ||
| 199 | + | ||
| 200 | +* Indirection through PointerHolder<Members> is expensive, and should | ||
| 201 | + not be used for things that are created and destroyed frequently | ||
| 202 | + such as QPDFObjectHandle and QPDFObject. | ||
| 203 | +* Traversal of objects is expensive and should be avoided where | ||
| 204 | + possible. | ||
| 205 | + | ||
| 206 | +Future ideas: | ||
| 207 | + | ||
| 208 | +* Look at places in the code where object traversal is being done and, | ||
| 209 | + where possible, try to avoid it entirely or at least avoid ever | ||
| 210 | + traversing the same objects multiple times. | ||
| 211 | +* Avoid attaching too much metadata to objects and object handles | ||
| 212 | + since those have to get copied around a lot. | ||
| 213 | + | ||
| 214 | +Also, it turns out that PointerHolder is more performant than | ||
| 215 | +std::shared_ptr. | ||
| 216 | + | ||
| 217 | + | ||
| 157 | General | 218 | General |
| 158 | ======= | 219 | ======= |
| 159 | 220 |