Commit 4674c04cb80e0a50cde1b97464642e2778f9522f

Authored by Jay Berkenbilt
1 parent f8d1ab94

JSON schema: support multi-element array validation

ChangeLog
... ... @@ -8,7 +8,11 @@
8 8 allow QPDFJob JSON to start accepting multiple items where a
9 9 single item used to be expected without breaking backward
10 10 compatibility. Without this change, the earlier fix to
11   - removeAttachment would be a breaking change.
  11 + removeAttachment would be a breaking change. Also allow the schema
  12 + to contain a multi-element array, which means that the output has
  13 + to have an array of the same length in the corresponding location,
  14 + and each element is validated against the corresponding schema
  15 + element.
12 16  
13 17 * QPDFObjectHandle: for the methods insertItem, appendItem,
14 18 eraseItem, replaceKey, and removeKey, add a corresponding
... ...
... ... @@ -124,11 +124,6 @@ JSON v2 fixes
124 124 compatibility, but at least this gives people a namespace they can
125 125 know will never conflict with future keys.
126 126  
127   - * Change schema validation so that if the schema contains an array
128   - with more than one element, the output has to have an array with
129   - the same number of elements whose individual elements are
130   - validated according to the regular rules.
131   -
132 127 * When reading back in, we'll have to call
133 128 pushInheritedAttributesToPage or getAllPages based on the values
134 129 of the metadata.
... ...
include/qpdf/JSON.hh
... ... @@ -183,25 +183,30 @@ class JSON
183 183 //
184 184 // * The schema is a nested structure containing dictionaries,
185 185 // single-element arrays, and strings only.
186   - // * Recursively walk the schema.
187   - // * Whenever the schema has an array of length 1 and the object
188   - // does not have an array in the corresponding location,
189   - // validate the object against the array's single element.
190   - // This effectively enables a single element to appear in
191   - // place of an array and be treated as if it were an array of
192   - // one element. This makes it possible to decide later that
193   - // something that used to contain a single element now allows
194   - // an array without invalidating any old data.
195   - // * If the current value is a dictionary, this object must have
196   - // a dictionary in the same place with the same keys. If flags
197   - // contains f_optional, a key in the schema does not have to
198   - // be present in the object. Otherwise, all keys have to be
199   - // present. Any key in the object must be present in the
200   - // schema.
201   - // * If the current value is an array, this object must have an
202   - // array in the same place. The schema's array must contain a
203   - // single element, which is used as a schema to validate each
204   - // element of this object's corresponding array.
  186 + // * Recursively walk the schema. In the items below, "schema
  187 + // object" refers to an object in the schema, and "checked
  188 + // object" refers to the correspondingi part of the object
  189 + // being checked.
  190 + // * If the schema object is a dictionary, the checked object
  191 + // must have a dictionary in the same place with the same
  192 + // keys. If flags contains f_optional, a key in the schema
  193 + // does not have to be present in the object. Otherwise, all
  194 + // keys have to be present. Any key in the object must be
  195 + // present in the schema.
  196 + // * If the schema object is an array of length 1, the checked
  197 + // object may either be a single item or an array of items.
  198 + // The single item or each element of the checked object's
  199 + // array is validated against the single element of the
  200 + // schema's array. The rationale behind this logic is that a
  201 + // single element may appear wherever the schema allows a
  202 + // variable-length array. This makes it possible to start
  203 + // allowing an array in the future where a single element was
  204 + // previously required without breaking backward
  205 + // compatibility.
  206 + // * If the schema object is an array of length > 1, the checked
  207 + // object must be an array of the same length. In this case,
  208 + // each element of the checked object array is validated
  209 + // against the corresponding element of the schema array.
205 210 // * Otherwise, the value must be a string whose value is a
206 211 // description of the object's corresponding value, which may
207 212 // have any type.
... ...
libqpdf/JSON.cc
... ... @@ -538,27 +538,55 @@ JSON::checkSchemaInternal(
538 538 }
539 539 }
540 540 } else if (sch_arr) {
541   - if (sch_arr->elements.size() != 1) {
542   - QTC::TC("libtests", "JSON schema array error");
  541 + auto n_elements = sch_arr->elements.size();
  542 + if (n_elements == 1) {
  543 + // A single-element array in the schema allows a single
  544 + // element in the object or a variable-length array, each
  545 + // of whose items must conform to the single element of
  546 + // the schema array. This doesn't apply to arrays of
  547 + // arrays -- we fall back to the behavior of allowing a
  548 + // single item only when the object is not an array.
  549 + if (this_arr) {
  550 + int i = 0;
  551 + for (auto const& element: this_arr->elements) {
  552 + checkSchemaInternal(
  553 + element.get(),
  554 + sch_arr->elements.at(0).get(),
  555 + flags,
  556 + errors,
  557 + prefix + "." + QUtil::int_to_string(i));
  558 + ++i;
  559 + }
  560 + } else {
  561 + QTC::TC("libtests", "JSON schema array for single item");
  562 + checkSchemaInternal(
  563 + this_v,
  564 + sch_arr->elements.at(0).get(),
  565 + flags,
  566 + errors,
  567 + prefix);
  568 + }
  569 + } else if (!this_arr || (this_arr->elements.size() != n_elements)) {
  570 + QTC::TC("libtests", "JSON schema array length mismatch");
543 571 errors.push_back(
544   - err_prefix + " schema array contains other than one item");
  572 + err_prefix + " is supposed to be an array of length " +
  573 + QUtil::uint_to_string(n_elements));
545 574 return false;
546   - }
547   - if (this_arr) {
548   - int i = 0;
  575 + } else {
  576 + // A multi-element array in the schema must correspond to
  577 + // an element of the same length in the object. Each
  578 + // element in the object is validated against the
  579 + // corresponding element in the schema.
  580 + size_t i = 0;
549 581 for (auto const& element: this_arr->elements) {
550 582 checkSchemaInternal(
551 583 element.get(),
552   - sch_arr->elements.at(0).get(),
  584 + sch_arr->elements.at(i).get(),
553 585 flags,
554 586 errors,
555   - prefix + "." + QUtil::int_to_string(i));
  587 + prefix + "." + QUtil::uint_to_string(i));
556 588 ++i;
557 589 }
558   - } else {
559   - QTC::TC("libtests", "JSON schema array for single item");
560   - checkSchemaInternal(
561   - this_v, sch_arr->elements.at(0).get(), flags, errors, prefix);
562 590 }
563 591 } else if (!sch_str) {
564 592 QTC::TC("libtests", "JSON schema other type");
... ...
libtests/json.cc
... ... @@ -179,7 +179,11 @@ test_schema()
179 179 "z": "ebra",
180 180 "o": "ptional"
181 181 }
182   - }
  182 + },
  183 + "four": [
  184 + { "first": "first element" },
  185 + { "second": "second element" }
  186 + ]
183 187 }
184 188 )");
185 189  
... ... @@ -227,17 +231,28 @@ test_schema()
227 231 "else": {
228 232 "z": "okay"
229 233 }
230   - }
  234 + },
  235 + "four": [
  236 + {"first": "missing second"}
  237 + ]
231 238 }
232 239 )");
233 240  
234 241 check_schema(b, schema, 0, false, "missing items");
235   - check_schema(a, a, 0, false, "top-level schema array error");
236   - check_schema(b, b, 0, false, "lower-level schema array error");
237 242  
238 243 JSON bad_schema = JSON::parse(R"({"a": true, "b": "potato?"})");
239 244 check_schema(bad_schema, bad_schema, 0, false, "bad schema field type");
240 245  
  246 + JSON c = JSON::parse(R"(
  247 +{
  248 + "four": [
  249 + { "first": 1 },
  250 + { "oops": [2] }
  251 + ]
  252 +}
  253 +)");
  254 + check_schema(c, schema, JSON::f_optional, false, "array element mismatch");
  255 +
241 256 // "two" exercises the case of the JSON containing a single
242 257 // element where the schema has an array.
243 258 JSON good = JSON::parse(R"(
... ... @@ -262,7 +277,11 @@ test_schema()
262 277 "<objid>": {
263 278 "z": "ebra"
264 279 }
265   - }
  280 + },
  281 + "four": [
  282 + { "first": 1 },
  283 + { "second": [2] }
  284 + ]
266 285 }
267 286 )");
268 287 check_schema(good, schema, 0, false, "not optional");
... ...
libtests/libtests.testcov
... ... @@ -36,7 +36,6 @@ Pl_PNGFilter decodePaeth 0
36 36 Pl_TIFFPredictor processRow 1
37 37 JSON wanted dictionary 0
38 38 JSON key missing in object 0
39   -JSON schema array error 0
40 39 JSON key extra in object 0
41 40 QPDFArgParser read args from stdin 0
42 41 QPDFArgParser read args from file 0
... ... @@ -93,3 +92,4 @@ JSON 16 low not after high 0
93 92 JSON 16 dangling high 0
94 93 JSON parse duplicate key 0
95 94 JSON schema array for single item 0
  95 +JSON schema array length mismatch 0
... ...
libtests/qtest/json/json.out
... ... @@ -2,6 +2,7 @@
2 2 top-level object is supposed to be a dictionary
3 3 ---
4 4 --- missing items
  5 +json key ".four" is supposed to be an array of length 2
5 6 json key ".one.a": key "q" is present in schema but missing in object
6 7 json key ".one.a.r" is supposed to be a dictionary
7 8 json key ".one.a.s": key "ss" is present in schema but missing in object
... ... @@ -15,16 +16,12 @@ json key &quot;.two.1&quot;: key &quot;flarp&quot; is not present in schema but appears in object
15 16 json key ".two.2" is supposed to be a dictionary
16 17 json key ".two.3" is supposed to be a dictionary
17 18 ---
18   ---- top-level schema array error
19   -top-level object schema array contains other than one item
20   ----
21   ---- lower-level schema array error
22   -json key ".one.a.r" schema array contains other than one item
23   -json key ".two" schema array contains other than one item
24   ----
25 19 --- bad schema field type
26 20 json key ".a" schema value is not dictionary, array, or string
27 21 ---
  22 +--- array element mismatch
  23 +json key ".four.1": key "oops" is not present in schema but appears in object
  24 +---
28 25 --- not optional
29 26 json key ".three.<objid>": key "o" is present in schema but missing in object
30 27 ---
... ...
manual/json.rst
... ... @@ -595,11 +595,17 @@ Documentation
595 595 appears in the corresponding location of the actual output. The
596 596 corresponding output can have any value including ``null``.
597 597  
598   - - An array in the help output always contains a single element. It
599   - indicates that the corresponding location in the actual output is
600   - an array of any length, and that each element of the array has
601   - whatever format is implied by the single element of the help
602   - output's array.
  598 + - A single-element array in the help output indicates that the
  599 + corresponding location in the actual output is either a single
  600 + item or is an array of any length. The single item or each
  601 + element of the array has whatever format is implied by the single
  602 + element of the help output's array.
  603 +
  604 + - A multi-element array in the help output indicates that the
  605 + corresponding location in the actual output is an array of the
  606 + same length. Each element of the output array has whatever format
  607 + is implied by the corresponding element of the help output's
  608 + array.
603 609  
604 610 For example, the help output indicates includes a ``"pagelabels"``
605 611 key whose value is an array of one element. That element is a
... ...