Commit cc5485dac1f224f856ce48781278b357f61f74bd

Authored by Jay Berkenbilt
1 parent 5a7bb347

QPDFJob: documentation

README-maintainer
... ... @@ -124,14 +124,32 @@ CODING RULES
124 124  
125 125 HOW TO ADD A COMMAND-LINE ARGUMENT
126 126  
  127 +QPDFJob is documented in three places:
  128 +
  129 +* This section provides a quick reminder for how to add a command-line
  130 + argument
  131 +
  132 +* generate_auto_job has a detailed explanation about how QPDFJob and
  133 + generate_auto_job work together
  134 +
  135 +* The manual ("QPDFJob Design" in qpdf-job.rst) discusses the design
  136 + approach, rationale, and evolution of QPDFJob.
  137 +
127 138 Command-line arguments are closely coupled with QPDFJob. To add a new
128 139 command-line argument, add the option to the appropriate table in
129 140 job.yml. This will automatically declare a method in the private
130 141 ArgParser class in QPDFJob_argv.cc which you have to implement. The
131   -implementation should make calls to methods in QPDFJob. Then, add the
132   -same option to either the no-json section of job.yml if it is to be
133   -excluded from the job json structure, or add it under the json
134   -structure to the place where it should appear in the json structure.
  142 +implementation should make calls to methods in QPDFJob via its Config
  143 +classes. Then, add the same option to either the no-json section of
  144 +job.yml if it is to be excluded from the job json structure, or add it
  145 +under the json structure to the place where it should appear in the
  146 +json structure.
  147 +
  148 +In most cases, adding a new option will automatically declare and call
  149 +the appropriate Config method, which you then have to implement. If
  150 +you need a manual handler, you have to declare the option as manual in
  151 +job.yml and implement the handler yourself, though the automatically
  152 +generated code will declare it for you.
135 153  
136 154 The build will fail until the new option is documented in
137 155 manual/cli.rst. To do that, create documentation for the option by
... ... @@ -148,6 +166,10 @@ When done, the following should happen:
148 166 * qpdf --help=topic should list --new-option for the correct topic
149 167 * --new-option should appear in the manual
150 168 * --new-option should be in the command-line option index in the manual
  169 +* A Config method (in Config or one of the other Config classes in
  170 + QPDFJob) should exist that corresponds to the command-line flag
  171 +* The job JSON file should have a new key in the schema corresponding
  172 + to the new option
151 173  
152 174  
153 175 RELEASE PREPARATION
... ...
cSpell.json
... ... @@ -100,6 +100,7 @@
100 100 "encodable",
101 101 "encp",
102 102 "endianness",
  103 + "endl",
103 104 "endobj",
104 105 "endstream",
105 106 "enspliel",
... ... @@ -128,6 +129,7 @@
128 129 "fuzzer",
129 130 "fuzzers",
130 131 "fvisibility",
  132 + "iostream",
131 133 "gajic",
132 134 "gajić",
133 135 "gcurl",
... ...
examples/build.mk
... ... @@ -8,13 +8,13 @@ BINS_examples = \
8 8 pdf-filter-tokens \
9 9 pdf-invert-images \
10 10 pdf-mod-info \
11   - pdf-job \
12 11 pdf-name-number-tree \
13 12 pdf-npages \
14 13 pdf-overlay-page \
15 14 pdf-parse-content \
16 15 pdf-set-form-values \
17   - pdf-split-pages
  16 + pdf-split-pages \
  17 + qpdf-job
18 18 CBINS_examples = \
19 19 pdf-c-objects \
20 20 pdf-linearize
... ...
examples/pdf-job.cc renamed to examples/qpdf-job.cc
generate_auto_job
... ... @@ -9,6 +9,121 @@ import json
9 9 import filecmp
10 10 from contextlib import contextmanager
11 11  
  12 +# The purpose of this code is to automatically generate various parts
  13 +# of the QPDFJob class. It is fairly complicated and extremely
  14 +# bespoke, so understanding it is important if modifications are to be
  15 +# made.
  16 +
  17 +# Documentation of QPDFJob is divided among three places:
  18 +#
  19 +# * "HOW TO ADD A COMMAND-LINE ARGUMENT" in README-maintainer provides
  20 +# a quick reminder for how to add a command-line argument
  21 +#
  22 +# * This file has a detailed explanation about how QPDFJob and
  23 +# generate_auto_job work together
  24 +#
  25 +# * The manual ("QPDFJob Design" in qpdf-job.rst) discusses the design
  26 +# approach, rationale, and evolution of QPDFJob.
  27 +#
  28 +# QPDFJob solved the problem of moving extensive functionality that
  29 +# lived in qpdf.cc into the library. The QPDFJob class consists of
  30 +# four major sections:
  31 +#
  32 +# * The run() method and its subsidiaries are responsible for
  33 +# performing the actual operations on PDF files. This is implemented
  34 +# in QPDFJob.cc
  35 +#
  36 +# * The nested Config class and the other classes it creates provide
  37 +# an API for setting up a QPDFJob instance and correspond to the
  38 +# command-line arguments of the qpdf executable. This is implemented
  39 +# in QPDFJob_config.cc
  40 +#
  41 +# * The argument parsing code reads an argv array and calls
  42 +# configuration methods. This is implemented in QPDFJob_argv.cc. The
  43 +# argument parsing logic itself is implemented in the QPDFArgParser
  44 +# class.
  45 +#
  46 +# * The job JSON handling code, which reads a QPDFJob JSON file and
  47 +# calls configuration methods. This is implemented in
  48 +# QPDFJob_json.cc. The JSON parsing code is in the JSON class. A
  49 +# sax-like JSON handler class that calls callbacks in response to
  50 +# items in the JSON is implemented in the JSONHandler class.
  51 +#
  52 +# This code has the job of ensuring that configuration, command-line
  53 +# arguments, and JSON are all consistent and complete so that a
  54 +# developer or user can freely move among those different ways of
  55 +# interacting with QPDFJob in a predictable fashion. In addition, help
  56 +# information for each option appears in manual/cli.rst, and that
  57 +# information is used in creation of the job JSON schema and to supply
  58 +# help text to QPDFArgParser. This code also ensures that there is an
  59 +# exact match between options in job.yml and options in cli.rst.
  60 +#
  61 +# The job.yml file contains the data that drives this code. To
  62 +# understand job.yml, here are some important concepts.
  63 +#
  64 +# QPDFArgParser option table. There is support for positional
  65 +# arguments, options consisting of flags and optional parameters, and
  66 +# subparsers that start with a regular parameterless flag, have their
  67 +# own positional and option sections, and are terminated with -- by
  68 +# itself. Examples of this include --encrypt and --pages. An "option
  69 +# table" contains an optional positional argument handler and a list
  70 +# of valid options with specifications about their parameters. There
  71 +# are three kinds of option tables:
  72 +#
  73 +# * The built-in "help" option table contains help commands, like
  74 +# --help and --version, that are only valid when they appear as the
  75 +# single command-line argument.
  76 +#
  77 +# * The "main" option table contains the options that are valid
  78 +# starting at the beginning of argument parsing.
  79 +#
  80 +# * A named option table can be started manually by the argument
  81 +# parsing code to switch the argument parser's context. Switching
  82 +# the parser to a new option table is manual (via a call to
  83 +# selectOptionTable). Context reverts to the main option table
  84 +# automatically when -- is encountered.
  85 +#
  86 +# In QPDFJob.hh, there is a Config class for each option table except
  87 +# help.
  88 +#
  89 +# Option type: bare, required/optional parameter, required/optional
  90 +# choices. A bare argument is just a flag, like --qdf. A parameter
  91 +# option takes an arbitrary parameter, like --password. A choices
  92 +# option takes one of a fixed list of choices, like --object-streams.
  93 +# If a parameter or choices option's parameter is option, the empty
  94 +# string may be specified as an option, such as --collate (or
  95 +# --collate=). For a bare option, --option= is always the same as just
  96 +# --option. This makes it possible to switch an option from bare to
  97 +# optional choice to optional parameter all without breaking
  98 +# compatibility.
  99 +#
  100 +# JSON "schema". This is a qpdf-specific "schema" for JSON. It is not
  101 +# related to any kind of standard JSON schema. It is described in
  102 +# JSON.hh and in the manual. QPDFJob uses the JSON "schema" in a mode
  103 +# in which keys in the schema are all optional in the JSON object.
  104 +#
  105 +# Here is the mapping between configuration, argv, and JSON.
  106 +#
  107 +# The help options table is implemented solely for argv processing and
  108 +# has no counterpart in configuration or JSON.
  109 +#
  110 +# The config() method returns a shared pointer to a Config object.
  111 +# Every command-line option in the main option table has a
  112 +# corresponding method in Config whose name is the option converted to
  113 +# camel case. For bare options and options with optional parameters, a
  114 +# version exists that takes no arguments. For others, a version exists
  115 +# that takes a char const*. For example, the --qdf flag implies a
  116 +# qdf() method in Config, and the --object-streams flag implies an
  117 +# objectStreams(char const*) method in Config. For flags in option
  118 +# tables, the method is declared inside a config class specific to the
  119 +# option table. The mapping between option tables and config classes
  120 +# is explicit in job.yml. Positional arguments are handled
  121 +# individually and manually -- see QPDFJob.hh in the CONFIGURATION
  122 +# section for details. See examples/qpdf-job.cc for an example.
  123 +#
  124 +# To understand the rest, start at main and follow comments in the
  125 +# code.
  126 +
12 127 whoami = os.path.basename(sys.argv[0])
13 128 BANNER = f'''//
14 129 // This file is automatically generated by {whoami}.
... ... @@ -33,12 +148,18 @@ def write_file(filename):
33 148  
34 149  
35 150 class Main:
  151 + # SOURCES is a list of source files whose contents are used by
  152 + # this program. If they change, we are out of date.
36 153 SOURCES = [
37 154 whoami,
38 155 'manual/_ext/qpdf.py',
39 156 'job.yml',
40 157 'manual/cli.rst',
41 158 ]
  159 + # DESTS is a map to the output files this code generates. These
  160 + # generated files, as well as those added to DESTS later in the
  161 + # code, are included in various places by QPDFJob.hh or any of the
  162 + # implementing QPDFJob*.cc files.
42 163 DESTS = {
43 164 'decl': 'libqpdf/qpdf/auto_job_decl.hh',
44 165 'init': 'libqpdf/qpdf/auto_job_init.hh',
... ... @@ -48,6 +169,11 @@ class Main:
48 169 'json_init': 'libqpdf/qpdf/auto_job_json_init.hh',
49 170 # Others are added in top
50 171 }
  172 + # SUBS contains a checksum for each source and destination and is
  173 + # used to detect whether we're up to date without having to force
  174 + # recompilation all the time. This way the build can invoke this
  175 + # script unconditionally without causing stuff to rebuild every
  176 + # time.
51 177 SUMS = 'job.sums'
52 178  
53 179 def main(self, args=sys.argv[1:], prog=whoami):
... ... @@ -71,8 +197,17 @@ class Main:
71 197 def top(self, options):
72 198 with open('job.yml', 'r') as f:
73 199 data = yaml.safe_load(f.read())
  200 + # config_decls maps a config key from an option in "options"
  201 + # (from job.yml) to a list of declarations. A declaration is
  202 + # generated for each config method for that option table.
74 203 self.config_decls = {}
  204 + # Keep track of which configs we've declared since we can have
  205 + # option tables share a config class, as with the encryption
  206 + # tables.
75 207 self.declared_configs = set()
  208 +
  209 + # Update DESTS -- see above. This ensures that each config
  210 + # class's contents are included in job.sums.
76 211 for o in data['options']:
77 212 config = o.get('config', None)
78 213 if config is not None:
... ... @@ -257,12 +392,21 @@ class Main:
257 392 def generate(self, data):
258 393 warn(f'{whoami}: regenerating auto job files')
259 394 self.validate(data)
260   - # Add the built-in help options to tables that we populate as
261   - # we read job.yml since we won't encounter these in job.yml
  395 +
  396 + # Keep track of which options are help options since they are
  397 + # handled specially. Add the built-in help options to tables
  398 + # that we populate as we read job.yml since we won't encounter
  399 + # these in job.yml
262 400 self.help_options = set(
263 401 ['--completion-bash', '--completion-zsh', '--help']
264 402 )
  403 + # Keep track of which options we have encountered but haven't
  404 + # seen help text for. This enables us to report if any option
  405 + # is missing help.
265 406 self.options_without_help = set(self.help_options)
  407 +
  408 + # Compute the information needed for generated files and write
  409 + # the files.
266 410 self.prepare(data)
267 411 with write_file(self.DESTS['decl']) as f:
268 412 print(BANNER, file=f)
... ... @@ -276,6 +420,11 @@ class Main:
276 420 with open('manual/cli.rst', 'r') as df:
277 421 print(BANNER, file=f)
278 422 self.generate_doc(df, f)
  423 +
  424 + # Compute the json files after the config and arg parsing
  425 + # files. We need to have full information about all the
  426 + # options before we can generate the schema. Generating the
  427 + # schema also generates the json header files.
279 428 self.generate_schema(data)
280 429 with write_file(self.DESTS['schema']) as f:
281 430 print('static constexpr char const* JOB_SCHEMA_DATA = R"(' +
... ... @@ -301,6 +450,9 @@ class Main:
301 450 # DON'T ADD CODE TO generate AFTER update_hashes
302 451  
303 452 def handle_trivial(self, i, identifier, cfg, prefix, kind, v):
  453 + # A "trivial" option is one whose handler does nothing other
  454 + # than to call the config method with the same name (switched
  455 + # to camelCase).
304 456 decl_arg = 1
305 457 decl_arg_optional = False
306 458 if kind == 'bare':
... ... @@ -341,11 +493,18 @@ class Main:
341 493 # strategy enables us to change an option from bare to
342 494 # optional_parameter or optional_choices without
343 495 # breaking binary compatibility. The overloaded
344   - # methods both have to be implemented manually.
  496 + # methods both have to be implemented manually. They
  497 + # are not automatically called, so if you forget,
  498 + # someone will get a link error if they try to call
  499 + # one.
345 500 self.config_decls[cfg].append(
346 501 f'QPDF_DLL {config_prefix}* {identifier}();')
347 502  
348 503 def handle_flag(self, i, identifier, kind, v):
  504 + # For flags that require manual handlers, declare the handler
  505 + # and register it. They have to be implemented manually in
  506 + # QPDFJob_argv.cc. You get compiler/linker errors for any
  507 + # missing methods.
349 508 if kind == 'bare':
350 509 self.decls.append(f'void {identifier}();')
351 510 self.init.append(f'this->ap.addBare("{i}", '
... ... @@ -371,14 +530,17 @@ class Main:
371 530 f', false, {v}_choices);')
372 531  
373 532 def prepare(self, data):
374   - self.decls = []
375   - self.init = []
376   - self.json_decls = []
377   - self.json_init = []
378   - self.jdata = {}
379   - self.by_table = {}
  533 + self.decls = [] # argv handler declarations
  534 + self.init = [] # initialize arg parsing code
  535 + self.json_decls = [] # json handler declarations
  536 + self.json_init = [] # initialize json handlers
  537 + self.jdata = {} # running data used for json generate
  538 + self.by_table = {} # table information by name for easy lookup
380 539  
381 540 def add_jdata(flag, table, details):
  541 + # Keep track of each flag and where it appears so we can
  542 + # check consistency between the json information and the
  543 + # options section.
382 544 nonlocal self
383 545 if table == 'help':
384 546 self.help_options.add(f'--{flag}')
... ... @@ -389,6 +551,7 @@ class Main:
389 551 'tables': {table: details},
390 552 }
391 553  
  554 + # helper functions
392 555 self.init.append('auto b = [this](void (ArgParser::*f)()) {')
393 556 self.init.append(' return QPDFArgParser::bindBare(f, this);')
394 557 self.init.append('};')
... ... @@ -396,6 +559,8 @@ class Main:
396 559 self.init.append(' return QPDFArgParser::bindParam(f, this);')
397 560 self.init.append('};')
398 561 self.init.append('')
  562 +
  563 + # static variables for each set of choices for choices options
399 564 for k, v in data['choices'].items():
400 565 s = f'static char const* {k}_choices[] = {{'
401 566 for i in v:
... ... @@ -406,6 +571,8 @@ class Main:
406 571 self.init.append('')
407 572 self.json_init.append('')
408 573  
  574 + # constants for the table names to reduce hard-coding strings
  575 + # in the handlers
409 576 for o in data['options']:
410 577 table = o['table']
411 578 if table in ('main', 'help'):
... ... @@ -413,6 +580,20 @@ class Main:
413 580 i = self.to_identifier(table, 'O', True)
414 581 self.decls.append(f'static constexpr char const* {i} = "{table}";')
415 582 self.decls.append('')
  583 +
  584 + # Walk through all the options adding declarations for the
  585 + # option handlers and initialization code to register the
  586 + # handlers in QPDFArgParser. For "trivial" cases,
  587 + # QPDFArgParser will call the corresponding config method
  588 + # automatically. Otherwise, it will declare a handler that you
  589 + # have to explicitly implement.
  590 +
  591 + # If you add a new option table, you have to set config to the
  592 + # name of a member variable that you declare in the ArgParser
  593 + # class in QPDFJob_argv.cc. Then there should be an option in
  594 + # the main table, also listed as manual in job.yml, that
  595 + # switches to it. See implementations of any of the existing
  596 + # options that do this for examples.
416 597 for o in data['options']:
417 598 table = o['table']
418 599 config = o.get('config', None)
... ... @@ -437,8 +618,8 @@ class Main:
437 618 self.decls.append(f'void {arg_prefix}Positional(char*);')
438 619 self.init.append('this->ap.addPositional('
439 620 f'p(&ArgParser::{arg_prefix}Positional));')
440   - flags = {}
441 621  
  622 + flags = {}
442 623 for i in o.get('bare', []):
443 624 flags[i] = ['bare', None]
444 625 for i, v in o.get('required_parameter', {}).items():
... ... @@ -462,6 +643,11 @@ class Main:
462 643 self.handle_trivial(
463 644 i, identifier, config, config_prefix, kind, v)
464 645  
  646 + # Subsidiary options tables need end methods to do any
  647 + # final checking within the option table. Final checking
  648 + # for the main option table is handled by
  649 + # checkConfiguration, which is called explicitly in the
  650 + # QPDFJob code.
465 651 if table not in ('main', 'help'):
466 652 identifier = self.to_identifier(table, 'argEnd', False)
467 653 self.decls.append(f'void {identifier}();')
... ... @@ -510,6 +696,19 @@ class Main:
510 696 return self.option_to_json_key(schema_key)
511 697  
512 698 def build_schema(self, j, path, flag, expected, options_seen):
  699 + # j: the part of data from "json" in job.yml as we traverse it
  700 + # path: a string representation of the path in the json
  701 + # flag: the command-line flag
  702 + # expected: a map of command-line options we expect to eventually see
  703 + # options_seen: which options we have seen so far
  704 +
  705 + # As described in job.yml, the json can have keys that don't
  706 + # map to options. This includes keys whose values are
  707 + # dictionaries as well as keys that correspond to positional
  708 + # arguments. These start with _ and get their help from
  709 + # job.yml. Things that correspond to options get their help
  710 + # from the help text we gathered from cli.rst.
  711 +
513 712 if flag in expected:
514 713 options_seen.add(flag)
515 714 elif isinstance(j, str):
... ... @@ -519,6 +718,19 @@ class Main:
519 718 elif not (flag == '' or flag.startswith('_')):
520 719 raise Exception(f'json: unknown key {flag}')
521 720  
  721 + # The logic here is subtle and makes sense if you understand
  722 + # how our JSON schemas work. They are described in JSON.hh,
  723 + # but basically, if you see a dictionary, the schema should
  724 + # have a dictionary with the same keys whose values are
  725 + # descriptive. If you see an array, the array should have
  726 + # single member that describes each element of the array. See
  727 + # JSON.hh for details.
  728 +
  729 + # See comments in QPDFJob_json.cc in the Handlers class
  730 + # declaration to understand how and why the methods called
  731 + # here work. The idea is that Handlers keeps a stack of
  732 + # JSONHandler shared pointers so that we can register our
  733 + # handlers in the right place as we go.
522 734 if isinstance(j, dict):
523 735 schema_value = {}
524 736 if flag:
... ... @@ -579,14 +791,20 @@ class Main:
579 791  
580 792 def generate_schema(self, data):
581 793 # Check to make sure that every command-line option is
582   - # represented in data['json'].
583   -
584   - # Build a list of options that we expect. If an option appears
585   - # once, we just expect to see it once. If it appears in more
586   - # than one options table, we need to see a separate version of
587   - # it for each option table. It is represented in job.yml
588   - # prepended with the table prefix. The table prefix is removed
589   - # in the schema.
  794 + # represented in data['json']. Build a list of options that we
  795 + # expect. If an option appears once, we just expect to see it
  796 + # once. If it appears in more than one options table, we need
  797 + # to see a separate version of it for each option table. It is
  798 + # represented in job.yml prepended with the table prefix. The
  799 + # table prefix is removed in the schema. Example: "password"
  800 + # appears multiple times, so the json section of job.yml has
  801 + # main.password, uo.password, etc. But most options appear
  802 + # only once, so we can just list them as they are. There is a
  803 + # nearly exact match between option tables and dictionary in
  804 + # the job json schema, but it's not perfect because of how
  805 + # positional arguments are handled, so we have to do this
  806 + # extra work. Information about which tables a particular
  807 + # option appeared in is gathered up in prepare().
590 808 expected = {}
591 809 for k, v in self.jdata.items():
592 810 tables = v['tables']
... ... @@ -600,7 +818,11 @@ class Main:
600 818 # Walk through the json information building the schema as we
601 819 # go. This verifies consistency between command-line options
602 820 # and the json section of the data and builds up a schema by
603   - # populating with help information as available.
  821 + # populating with help information as available. In addition
  822 + # to generating the schema, we declare and register json
  823 + # handlers that correspond with it. That way, we can first
  824 + # check a job JSON file against the schema, and if it matches,
  825 + # we have fewer error opportunities while calling handlers.
604 826 self.schema = self.build_schema(
605 827 data['json'], '', '', expected, options_seen)
606 828 if options_seen != set(expected.keys()):
... ...
include/qpdf/QPDFJob.hh
... ... @@ -62,10 +62,10 @@ class QPDFJob
62 62 // the regular API. This is exposed in the C API, which makes it
63 63 // easier to get certain high-level qpdf functionality from other
64 64 // languages. If there are any command-line errors, this method
65   - // will throw QPDFArgParser::Usage which is derived from
66   - // std::runtime_error. Other exceptions may be thrown in some
67   - // cases. Note that argc, and argv should be UTF-8 encoded. If you
68   - // are calling this from a Windows Unicode-aware main (wmain), see
  65 + // will throw QPDFUsage which is derived from std::runtime_error.
  66 + // Other exceptions may be thrown in some cases. Note that argc,
  67 + // and argv should be UTF-8 encoded. If you are calling this from
  68 + // a Windows Unicode-aware main (wmain), see
69 69 // QUtil::call_main_from_wmain for information about converting
70 70 // arguments to UTF-8. This method will mutate arguments that are
71 71 // passed to it.
... ... @@ -76,7 +76,7 @@ class QPDFJob
76 76 // Initialize a QPDFJob from json. Passing partial = true prevents
77 77 // this method from doing the final checks (calling
78 78 // checkConfiguration) after processing the json file. This makes
79   - // it possible to initialze QPDFJob in stages using multiple json
  79 + // it possible to initialize QPDFJob in stages using multiple json
80 80 // files or to have a json file that can be processed from the CLI
81 81 // with --job-json-file and be combined with other arguments. For
82 82 // example, you might include only encryption parameters, leaving
... ... @@ -84,7 +84,11 @@ class QPDFJob
84 84 // input and output files. initializeFromJson is called with
85 85 // partial = true when invoked from the command line. To make sure
86 86 // that the json file is fully valid on its own, just don't
87   - // specify any other command-line flags.
  87 + // specify any other command-line flags. If there are any
  88 + // configuration errors, QPDFUsage is thrown. Some error messages
  89 + // may be CLI-centric. If an an exception tells you to use the
  90 + // "--some-option" option, set the "someOption" key in the JSON
  91 + // object instead.
88 92 QPDF_DLL
89 93 void initializeFromJson(std::string const& json, bool partial = false);
90 94  
... ... @@ -160,7 +164,7 @@ class QPDFJob
160 164 // object. The Config object contains methods that correspond with
161 165 // qpdf command-line arguments. You can use a fluent interface to
162 166 // configure a QPDFJob object that would do exactly the same thing
163   - // as a specific qpdf command. The example pdf-job.cc contains an
  167 + // as a specific qpdf command. The example qpdf-job.cc contains an
164 168 // example of this usage. You can also use initializeFromJson or
165 169 // initializeFromArgv to initialize a QPDFJob object.
166 170  
... ... @@ -180,6 +184,10 @@ class QPDFJob
180 184 // with references. Returning pointers instead of references
181 185 // makes for a more uniform interface.
182 186  
  187 + // Maintainer documentation: see the section in README-maintainer
  188 + // called "HOW TO ADD A COMMAND-LINE ARGUMENT", which contains
  189 + // references to additional places in the documentation.
  190 +
183 191 class Config;
184 192  
185 193 class AttConfig
... ... @@ -330,7 +338,10 @@ class QPDFJob
330 338 // Return a top-level configuration item. See CONFIGURATION above
331 339 // for details. If an invalid configuration is created (such as
332 340 // supplying contradictory options, omitting an input file, etc.),
333   - // QPDFUsage is thrown.
  341 + // QPDFUsage is thrown. Note that error messages are CLI-centric,
  342 + // but you can map them into config calls. For example, if an
  343 + // exception tells you to use the --some-option flag, you should
  344 + // call config()->someOption() instead.
334 345 QPDF_DLL
335 346 std::shared_ptr<Config> config();
336 347  
... ...
job.sums
1 1 # Generated by generate_auto_job
2   -generate_auto_job 1fdb113412a444aad67b0232f3f6c4f50d9e2a5701691e5146fd1b559039ef2e
  2 +generate_auto_job 5d6ec1e4f0b94d8f73df665061d8a2188cbbe8f25ea42be78ec576547261d5ac
3 3 include/qpdf/auto_job_c_att.hh 7ad43bb374c1370ef32ebdcdcb7b73a61d281f7f4e3f12755585872ab30fb60e
4 4 include/qpdf/auto_job_c_copy_att.hh 32275d03cdc69b703dd7e02ba0bbe15756e714e9ad185484773a6178dc09e1ee
5 5 include/qpdf/auto_job_c_enc.hh 72e138c7b96ed5aacdce78c1dec04b1c20d361faec4f8faf52f64c1d6be99265
6 6 include/qpdf/auto_job_c_main.hh 69d5ea26098bcb6ec5b5e37ba0bca9e7d16a784d2618e0c05d635046848d5123
7 7 include/qpdf/auto_job_c_pages.hh 931840b329a36ca0e41401190e04537b47f2867671a6643bfd8da74014202671
8 8 include/qpdf/auto_job_c_uo.hh 0585b7de459fa479d9e51a45fa92de0ff6dee748efc9ec1cedd0dde6cee1ad50
9   -job.yml effc93a805fb74503be2213ad885238db21991ba3d084fbfeff01183c66cb002
  9 +job.yml 9544c6e046b25d3274731fbcd07ba25b300fd67055021ac4364ad8a91f77c6b6
10 10 libqpdf/qpdf/auto_job_decl.hh 9f79396ec459f191be4c5fe34cf88c265cf47355a1a945fa39169d1c94cf04f6
11   -libqpdf/qpdf/auto_job_help.hh 6002f503368f319a3d717484ac39d1558f34e67989d442f394791f6f6f5f0500
  11 +libqpdf/qpdf/auto_job_help.hh 43184f01816b5210bbc981de8de48446546fb94f4fd6e63cfc7f2fbac3578e6b
12 12 libqpdf/qpdf/auto_job_init.hh fd13b9f730e6275a39a15d193bd9af19cf37f4495699ec1886c2b208d7811ab1
13 13 libqpdf/qpdf/auto_job_json_decl.hh c5e3fd38a3b0c569eb0c6b4c60953a09cd6bc7d3361a357a81f64fe36af2b0cf
14 14 libqpdf/qpdf/auto_job_json_init.hh 3f86ce40931ca8f417d050fcd49104d73c1fa4e977ad19d54b372831a8ea17ed
15 15 libqpdf/qpdf/auto_job_schema.hh 18a3780671d95224cb9a27dcac627c421cae509d59f33a63e6bda0ab53cce923
16 16 manual/_ext/qpdf.py e9ac9d6c70642a3d29281ee5ad92ae2422dee8be9306fb8a0bc9dba0ed5e28f3
17   -manual/cli.rst 35289dbf593085016a62249f760cdcad50d5cce76d799ea4acf5dff58b78679a
  17 +manual/cli.rst 3746df6c4f115387cca0d921f25619a6b8407fc10b0e4c9dcf40b0b1656c6f8a
... ...
1 1 # See "HOW TO ADD A COMMAND-LINE ARGUMENT" in README-maintainer.
  2 +
  3 +# REMEMBER: if you add an optional_choices or optional_parameter, you
  4 +# have to explicitly remember to implement the overloaded config
  5 +# method that takes no arguments. Since no generated code will call it
  6 +# automatically, there is no automated reminder to do this. If you
  7 +# forget, it will be a link error if someone tries to call it.
  8 +
2 9 choices:
3 10 yn:
4 11 - "y"
... ...
libqpdf/QPDFJob.cc
... ... @@ -646,7 +646,6 @@ QPDFJob::createsOutput() const
646 646 void
647 647 QPDFJob::checkConfiguration()
648 648 {
649   - // QXXXQ messages are CLI-centric
650 649 if (m->replace_input)
651 650 {
652 651 if (m->outfilename)
... ... @@ -722,7 +721,8 @@ QPDFJob::checkConfiguration()
722 721 {
723 722 QTC::TC("qpdf", "qpdf same file error");
724 723 usage("input file and output file are the same;"
725   - " use --replace-input to intentionally overwrite the input file");
  724 + " use --replace-input to intentionally"
  725 + " overwrite the input file");
726 726 }
727 727 }
728 728  
... ...
libqpdf/QPDFJob_config.cc
... ... @@ -28,7 +28,6 @@ QPDFJob::Config::emptyInput()
28 28 {
29 29 if (o.m->infilename == 0)
30 30 {
31   - // QXXXQ decide whether to fix this or just leave the comment:
32 31 // Various places in QPDFJob.cc know that the empty string for
33 32 // infile means empty. This means that passing "" as the
34 33 // argument to inputFile, or equivalently using "" as a
... ...
libqpdf/QPDFJob_json.cc
... ... @@ -29,6 +29,28 @@ namespace
29 29 typedef std::function<void(char const*)> param_handler_t;
30 30 typedef std::function<void(JSON)> json_handler_t;
31 31  
  32 + // The code that calls these methods is automatically
  33 + // generated by generate_auto_job. This describes how we
  34 + // implement what it does. We keep a stack of handlers in
  35 + // json_handlers. The top of the stack is the "current" json
  36 + // handler, intially for the top-level object. Whenever we
  37 + // encounter a scalar, we add a handler using addBare,
  38 + // addParameter, or addChoices. Whenever we encounter a
  39 + // dictionary, we first add the dictionary handlers. Then we
  40 + // walk into the dictionary and, for each key, we register a
  41 + // dict key handler and push it to the stack, then do the same
  42 + // process for the key's value. Then we pop the key handler
  43 + // off the stack. When we encounter an array, we add the array
  44 + // handlers, push an item handler to the stack, call
  45 + // recursively for the array's single item (as this is what is
  46 + // expected in a schema), and pop the item handler. Note that
  47 + // we don't pop dictionary start/end handlers. The dictionary
  48 + // handlers and the key handlers are at the same level in
  49 + // JSONHandler. This logic is subtle and took several tries to
  50 + // get right. It's best understood by carefully understanding
  51 + // the behavior of JSONHandler, the JSON schema, and the code
  52 + // in generate_auto_job.
  53 +
32 54 void addBare(bare_handler_t);
33 55 void addParameter(param_handler_t);
34 56 void addChoices(char const** choices, bool required, param_handler_t);
... ...
libqpdf/qpdf/auto_job_help.hh
... ... @@ -812,7 +812,8 @@ This option is repeatable. If given, only specified objects will
812 812 be shown in the "objects" key of the JSON output. Otherwise, all
813 813 objects will be shown.
814 814 )");
815   -ap.addOptionHelp("--job-json-help", "json", "show format of job JSON", R"(Describe the format of the QPDFJob JSON input.
  815 +ap.addOptionHelp("--job-json-help", "json", "show format of job JSON", R"(Describe the format of the QPDFJob JSON input used by
  816 +--job-json-file.
816 817 )");
817 818 ap.addHelpTopic("testing", "options for testing or debugging", R"(The options below are useful when writing automated test code that
818 819 includes files created by qpdf or when testing qpdf itself.
... ...
manual/cli.rst
... ... @@ -167,9 +167,11 @@ Related Options
167 167 description of the JSON input file format.
168 168  
169 169 Specify the name of a file whose contents are expected to contain a
170   - QPDFJob JSON file. QXXXQ ref. This file is read and treated as if
171   - the equivalent command-line arguments were supplied. It can be
172   - mixed freely with other options.
  170 + QPDFJob JSON file. This file is read and treated as if the
  171 + equivalent command-line arguments were supplied. It can be repeated
  172 + and mixed freely with other options. Run ``qpdf`` with
  173 + :qpdf:ref:`--job-json-help` for a description of the job JSON input
  174 + file format. For more information, see :ref:`qpdf-job`.
173 175  
174 176 .. _exit-status:
175 177  
... ... @@ -3200,9 +3202,12 @@ Related Options
3200 3202  
3201 3203 .. help: show format of job JSON
3202 3204  
3203   - Describe the format of the QPDFJob JSON input.
  3205 + Describe the format of the QPDFJob JSON input used by
  3206 + --job-json-file.
3204 3207  
3205   - Describe the format of the QPDFJob JSON input. QXXXQ doc ref.
  3208 + Describe the format of the QPDFJob JSON input used by
  3209 + :qpdf:ref:`--job-json-file`. For more information about QPDFJob,
  3210 + see :ref:`qpdf-job`.
3206 3211  
3207 3212 .. _test-options:
3208 3213  
... ...
manual/index.rst
... ... @@ -28,6 +28,7 @@ documentation, please visit `https://qpdf.readthedocs.io
28 28 weak-crypto
29 29 json
30 30 design
  31 + qpdf-job
31 32 linearization
32 33 object-streams
33 34 encryption
... ...
manual/qpdf-job.rst 0 → 100644
  1 +
  2 +.. _qpdf-job:
  3 +
  4 +QPDFJob: a Job-Based Interface
  5 +==============================
  6 +
  7 +All of the functionality from the :command:`qpdf` command-line
  8 +executable is available from inside the C++ library using the
  9 +``QPDFJob`` class. There are several ways to access this functionality:
  10 +
  11 +- Command-line options
  12 +
  13 + - Run the :command:`qpdf` command line
  14 +
  15 + - Use from the C++ API with ``QPDFJob::initializeFromArgv``
  16 +
  17 + - Use from the C API with QXXXQ
  18 +
  19 +- The job JSON file format
  20 +
  21 + - Use from the CLI with the :qpdf:ref:`--job-json-file` parameter
  22 +
  23 + - Use from the C++ API with ``QPDFJob::initializeFromJson``
  24 +
  25 + - Use from the C API with QXXXQ
  26 +
  27 +- The ``QPDFJob`` C++ API
  28 +
  29 +If you can understand how to use the :command:`qpdf` CLI, you can
  30 +understand the ``QPDFJob`` class and the json file. qpdf guarantees
  31 +that all of the above methods are in sync. Here's how it works:
  32 +
  33 +.. list-table:: QPDFJob Interfaces
  34 + :widths: 30 30 30
  35 + :header-rows: 1
  36 +
  37 + - - CLI
  38 + - JSON
  39 + - C++
  40 +
  41 + - - ``--some-option``
  42 + - ``"someOption": ""``
  43 + - ``config()->someOption()``
  44 +
  45 + - - ``--some-option=value``
  46 + - ``"someOption": "value"``
  47 + - ``config()->someOption("value")``
  48 +
  49 + - - positional argument
  50 + - ``"otherOption": "value"``
  51 + - ``config()->otherOption("value")``
  52 +
  53 +In the JSON file, the JSON structure is an object (dictionary) whose
  54 +keys are command-line flags converted to camelCase. Positional
  55 +arguments have some corresponding key, which you can find by running
  56 +``qpdf`` with the :qpdf:ref:`--job-json-help` flag. For example, input
  57 +and output files are named by positional arguments on the CLI. In the
  58 +JSON, they are ``"inputFile"`` and ``"outputFile"``. The following are
  59 +equivalent:
  60 +
  61 +.. It would be nice to have an automated test that these are all the
  62 + same, but we have so few live examples that it's not worth it for
  63 + now.
  64 +
  65 +CLI:
  66 + ::
  67 +
  68 + qpdf infile.pdf outfile.pdf \
  69 + --pages . other.pdf --password=x 1-5 -- \
  70 + --encrypt user owner 256 --print=low -- \
  71 + --object-streams=generate
  72 +
  73 +Job JSON:
  74 + .. code-block:: json
  75 +
  76 + {
  77 + "inputFile": "infile.pdf",
  78 + "outputFile": "outfile.pdf",
  79 + "pages": [
  80 + {
  81 + "file": "."
  82 + },
  83 + {
  84 + "file": "other.pdf",
  85 + "password": "x",
  86 + "range": "1-5"
  87 + }
  88 + ],
  89 + "encrypt": {
  90 + "userPassword": "user",
  91 + "ownerPassword": "owner",
  92 + "256bit": {
  93 + "print": "low"
  94 + }
  95 + },
  96 + "objectStreams": "generate"
  97 + }
  98 +
  99 +C++ code:
  100 + .. code-block:: c++
  101 +
  102 + #include <qpdf/QPDFJob.hh>
  103 + #include <qpdf/QPDFUsage.hh>
  104 + #include <iostream>
  105 +
  106 + int main(int argc, char* argv[])
  107 + {
  108 + try
  109 + {
  110 + QPDFJob j;
  111 + j.config()
  112 + ->inputFile("infile.pdf")
  113 + ->outputFile("outfile.pdf")
  114 + ->pages()
  115 + ->pageSpec(".", "1-z")
  116 + ->pageSpec("other.pdf", "1-5", "x")
  117 + ->endPages()
  118 + ->encrypt(256, "user", "owner")
  119 + ->print("low")
  120 + ->endEncrypt()
  121 + ->objectStreams("generate")
  122 + ->checkConfiguration();
  123 + j.run();
  124 + }
  125 + catch (QPDFUsage& e)
  126 + {
  127 + std::cerr << "configuration error: " << e.what() << std::endl;
  128 + return 2;
  129 + }
  130 + catch (std::exception& e)
  131 + {
  132 + std::cerr << "other error: " << e.what() << std::endl;
  133 + return 2;
  134 + }
  135 + return 0;
  136 + }
  137 +
  138 +It is also possible to mix and match command-line options and json
  139 +from the CLI. For example, you could create a file called
  140 +:file:`my-options.json` containing the following:
  141 +
  142 +.. code-block:: json
  143 +
  144 + {
  145 + "encrypt": {
  146 + "userPassword": "",
  147 + "ownerPassword": "owner",
  148 + "256bit": {
  149 + }
  150 + },
  151 + "objectStreams": "generate"
  152 + }
  153 +
  154 +and use it with other options to create 256-bit encrypted (but
  155 +unrestricted) files with object streams while specifying other
  156 +parameters on the command line, such as
  157 +
  158 +::
  159 +
  160 + qpdf infile.pdf outfile.pdf --job-json-file=my-options.json
  161 +
  162 +.. _qpdfjob-design:
  163 +
  164 +See also :file:`examples/qpdf-job.cc` in the source distribution as
  165 +well as comments in ``QPDFJob.hh``.
  166 +
  167 +
  168 +QPDFJob Design
  169 +--------------
  170 +
  171 +This section describes some of the design rationale and history behind
  172 +``QPDFJob``.
  173 +
  174 +Documentation of ``QPDFJob`` is divided among three places:
  175 +
  176 +- "HOW TO ADD A COMMAND-LINE ARGUMENT" in :file:`README-maintainer`
  177 + provides a quick reminder for how to add a command-line argument
  178 +
  179 +- The source file :file:`generate_auto_job` has a detailed explanation
  180 + about how ``QPDFJob`` and ``generate_auto_job`` work together
  181 +
  182 +- This chapter of the manual has other details.
  183 +
  184 +Prior to qpdf version 10.6.0, the qpdf CLI executable had a lot of
  185 +functionality built into the executable that was not callable from the
  186 +library as such. This created a number of problems:
  187 +
  188 +- Some of the logic in :file:`qpdf.cc` was pretty complex, such as
  189 + image optimization, generating json output, and many of the page
  190 + manipulations. While those things could all be coded using the C++
  191 + API, there would be a lot of duplicated code.
  192 +
  193 +- Page splitting and merging will get more complicated over time as
  194 + qpdf supports a wider range of document-level options. It would be
  195 + nice to be able to expose this to library users instead of baking it
  196 + all into the CLI.
  197 +
  198 +- Users of other languages who just wanted an interface to do things
  199 + that the CLI could do didn't have a good way to do it, such as just
  200 + handling a library call a set of command-line options or an
  201 + equivalent JSON object that could be passed in as a string.
  202 +
  203 +- The qpdf CLI itself was almost 8,000 lines of code. It needed to be
  204 + refactored, cleaned up, and split.
  205 +
  206 +- Exposing a new feature via the command-line required making lots of
  207 + small edits to lots of small bits of code, and it was easy to forget
  208 + something. Adding a code generator, while complex in some ways,
  209 + greatly reduces the chances of error when extending qpdf.
  210 +
  211 +Here are a few notes on some design decisions about QPDFJob and its
  212 +various interfaces.
  213 +
  214 +- Bare command-line options (flags with no parameter) map to config
  215 + functions that take no options and to json keys whose values are
  216 + required to be the empty string. The rationale is that we can later
  217 + change these bare options to options that take an optional parameter
  218 + without breaking backward compatibility in the CLI or the JSON.
  219 + Options that take optional parameters generate two config functions:
  220 + one has no arguments, and one that has a ``char const*`` argument.
  221 + This means that adding an optional parameter to a previously bare
  222 + option also doesn't break binary compatibility.
  223 +
  224 +- Adding a new argument to :file:`job.yml` automatically triggers
  225 + almost everything by declaring and referencing things that you have
  226 + to implement. This way, once you get the code to compile and link,
  227 + you know you haven't forgotten anything. There are two tricky cases:
  228 +
  229 + - If an argument handler has to do something special, like call a
  230 + nested config method or select an option table, you have to
  231 + implement it manually. This is discussed in
  232 + :file:`generate_auto_job`.
  233 +
  234 + - When you add an option that has optional parameters or choices,
  235 + both of the handlers described above are declared, but only the
  236 + one that takes an argument is referenced. You have to remember to
  237 + implement the one that doesn't take an argument or else people
  238 + will get a linker error if they try to call it. The assumption is
  239 + that things with optional parameters started out as bare, so the
  240 + argument-less version is already there.
  241 +
  242 +- If you have to add a new option that requires its own option table,
  243 + you will have to do some extra work including adding a new nested
  244 + Config class, adding a config member variable to ``ArgParser`` in
  245 + :file:`QPDFJob_argv.cc` and ``Handlers`` in :file:`QPDFJob_json.cc`,
  246 + and make sure that manually implemented handlers are consistent with
  247 + each other. It is best under the cases to explicit test cases for
  248 + all the various ways to get to the option.
... ...
manual/release-notes.rst
... ... @@ -2303,9 +2303,9 @@ For a detailed list of changes, please see the file
2303 2303 been added to the :command:`qpdf` command-line
2304 2304 tool. See :ref:`page-selection`.
2305 2305  
2306   - - Options have been added to the :command:`qpdf`
2307   - command-line tool for copying encryption parameters from another
2308   - file. (QXXXQ Link)
  2306 + - The :qpdf:ref:`--copy-encryption` option have been added to the
  2307 + :command:`qpdf` command-line tool for copying encryption
  2308 + parameters from another file.
2309 2309  
2310 2310 - New methods have been added to the ``QPDF`` object for adding and
2311 2311 removing pages. See :ref:`adding-and-remove-pages`.
... ...