Commit 4674c04cb80e0a50cde1b97464642e2778f9522f

Authored by Jay Berkenbilt
1 parent f8d1ab94

JSON schema: support multi-element array validation

ChangeLog
@@ -8,7 +8,11 @@ @@ -8,7 +8,11 @@
8 allow QPDFJob JSON to start accepting multiple items where a 8 allow QPDFJob JSON to start accepting multiple items where a
9 single item used to be expected without breaking backward 9 single item used to be expected without breaking backward
10 compatibility. Without this change, the earlier fix to 10 compatibility. Without this change, the earlier fix to
11 - removeAttachment would be a breaking change. 11 + removeAttachment would be a breaking change. Also allow the schema
  12 + to contain a multi-element array, which means that the output has
  13 + to have an array of the same length in the corresponding location,
  14 + and each element is validated against the corresponding schema
  15 + element.
12 16
13 * QPDFObjectHandle: for the methods insertItem, appendItem, 17 * QPDFObjectHandle: for the methods insertItem, appendItem,
14 eraseItem, replaceKey, and removeKey, add a corresponding 18 eraseItem, replaceKey, and removeKey, add a corresponding
@@ -124,11 +124,6 @@ JSON v2 fixes @@ -124,11 +124,6 @@ JSON v2 fixes
124 compatibility, but at least this gives people a namespace they can 124 compatibility, but at least this gives people a namespace they can
125 know will never conflict with future keys. 125 know will never conflict with future keys.
126 126
127 - * Change schema validation so that if the schema contains an array  
128 - with more than one element, the output has to have an array with  
129 - the same number of elements whose individual elements are  
130 - validated according to the regular rules.  
131 -  
132 * When reading back in, we'll have to call 127 * When reading back in, we'll have to call
133 pushInheritedAttributesToPage or getAllPages based on the values 128 pushInheritedAttributesToPage or getAllPages based on the values
134 of the metadata. 129 of the metadata.
include/qpdf/JSON.hh
@@ -183,25 +183,30 @@ class JSON @@ -183,25 +183,30 @@ class JSON
183 // 183 //
184 // * The schema is a nested structure containing dictionaries, 184 // * The schema is a nested structure containing dictionaries,
185 // single-element arrays, and strings only. 185 // single-element arrays, and strings only.
186 - // * Recursively walk the schema.  
187 - // * Whenever the schema has an array of length 1 and the object  
188 - // does not have an array in the corresponding location,  
189 - // validate the object against the array's single element.  
190 - // This effectively enables a single element to appear in  
191 - // place of an array and be treated as if it were an array of  
192 - // one element. This makes it possible to decide later that  
193 - // something that used to contain a single element now allows  
194 - // an array without invalidating any old data.  
195 - // * If the current value is a dictionary, this object must have  
196 - // a dictionary in the same place with the same keys. If flags  
197 - // contains f_optional, a key in the schema does not have to  
198 - // be present in the object. Otherwise, all keys have to be  
199 - // present. Any key in the object must be present in the  
200 - // schema.  
201 - // * If the current value is an array, this object must have an  
202 - // array in the same place. The schema's array must contain a  
203 - // single element, which is used as a schema to validate each  
204 - // element of this object's corresponding array. 186 + // * Recursively walk the schema. In the items below, "schema
  187 + // object" refers to an object in the schema, and "checked
  188 + // object" refers to the correspondingi part of the object
  189 + // being checked.
  190 + // * If the schema object is a dictionary, the checked object
  191 + // must have a dictionary in the same place with the same
  192 + // keys. If flags contains f_optional, a key in the schema
  193 + // does not have to be present in the object. Otherwise, all
  194 + // keys have to be present. Any key in the object must be
  195 + // present in the schema.
  196 + // * If the schema object is an array of length 1, the checked
  197 + // object may either be a single item or an array of items.
  198 + // The single item or each element of the checked object's
  199 + // array is validated against the single element of the
  200 + // schema's array. The rationale behind this logic is that a
  201 + // single element may appear wherever the schema allows a
  202 + // variable-length array. This makes it possible to start
  203 + // allowing an array in the future where a single element was
  204 + // previously required without breaking backward
  205 + // compatibility.
  206 + // * If the schema object is an array of length > 1, the checked
  207 + // object must be an array of the same length. In this case,
  208 + // each element of the checked object array is validated
  209 + // against the corresponding element of the schema array.
205 // * Otherwise, the value must be a string whose value is a 210 // * Otherwise, the value must be a string whose value is a
206 // description of the object's corresponding value, which may 211 // description of the object's corresponding value, which may
207 // have any type. 212 // have any type.
libqpdf/JSON.cc
@@ -538,27 +538,55 @@ JSON::checkSchemaInternal( @@ -538,27 +538,55 @@ JSON::checkSchemaInternal(
538 } 538 }
539 } 539 }
540 } else if (sch_arr) { 540 } else if (sch_arr) {
541 - if (sch_arr->elements.size() != 1) {  
542 - QTC::TC("libtests", "JSON schema array error"); 541 + auto n_elements = sch_arr->elements.size();
  542 + if (n_elements == 1) {
  543 + // A single-element array in the schema allows a single
  544 + // element in the object or a variable-length array, each
  545 + // of whose items must conform to the single element of
  546 + // the schema array. This doesn't apply to arrays of
  547 + // arrays -- we fall back to the behavior of allowing a
  548 + // single item only when the object is not an array.
  549 + if (this_arr) {
  550 + int i = 0;
  551 + for (auto const& element: this_arr->elements) {
  552 + checkSchemaInternal(
  553 + element.get(),
  554 + sch_arr->elements.at(0).get(),
  555 + flags,
  556 + errors,
  557 + prefix + "." + QUtil::int_to_string(i));
  558 + ++i;
  559 + }
  560 + } else {
  561 + QTC::TC("libtests", "JSON schema array for single item");
  562 + checkSchemaInternal(
  563 + this_v,
  564 + sch_arr->elements.at(0).get(),
  565 + flags,
  566 + errors,
  567 + prefix);
  568 + }
  569 + } else if (!this_arr || (this_arr->elements.size() != n_elements)) {
  570 + QTC::TC("libtests", "JSON schema array length mismatch");
543 errors.push_back( 571 errors.push_back(
544 - err_prefix + " schema array contains other than one item"); 572 + err_prefix + " is supposed to be an array of length " +
  573 + QUtil::uint_to_string(n_elements));
545 return false; 574 return false;
546 - }  
547 - if (this_arr) {  
548 - int i = 0; 575 + } else {
  576 + // A multi-element array in the schema must correspond to
  577 + // an element of the same length in the object. Each
  578 + // element in the object is validated against the
  579 + // corresponding element in the schema.
  580 + size_t i = 0;
549 for (auto const& element: this_arr->elements) { 581 for (auto const& element: this_arr->elements) {
550 checkSchemaInternal( 582 checkSchemaInternal(
551 element.get(), 583 element.get(),
552 - sch_arr->elements.at(0).get(), 584 + sch_arr->elements.at(i).get(),
553 flags, 585 flags,
554 errors, 586 errors,
555 - prefix + "." + QUtil::int_to_string(i)); 587 + prefix + "." + QUtil::uint_to_string(i));
556 ++i; 588 ++i;
557 } 589 }
558 - } else {  
559 - QTC::TC("libtests", "JSON schema array for single item");  
560 - checkSchemaInternal(  
561 - this_v, sch_arr->elements.at(0).get(), flags, errors, prefix);  
562 } 590 }
563 } else if (!sch_str) { 591 } else if (!sch_str) {
564 QTC::TC("libtests", "JSON schema other type"); 592 QTC::TC("libtests", "JSON schema other type");
libtests/json.cc
@@ -179,7 +179,11 @@ test_schema() @@ -179,7 +179,11 @@ test_schema()
179 "z": "ebra", 179 "z": "ebra",
180 "o": "ptional" 180 "o": "ptional"
181 } 181 }
182 - } 182 + },
  183 + "four": [
  184 + { "first": "first element" },
  185 + { "second": "second element" }
  186 + ]
183 } 187 }
184 )"); 188 )");
185 189
@@ -227,17 +231,28 @@ test_schema() @@ -227,17 +231,28 @@ test_schema()
227 "else": { 231 "else": {
228 "z": "okay" 232 "z": "okay"
229 } 233 }
230 - } 234 + },
  235 + "four": [
  236 + {"first": "missing second"}
  237 + ]
231 } 238 }
232 )"); 239 )");
233 240
234 check_schema(b, schema, 0, false, "missing items"); 241 check_schema(b, schema, 0, false, "missing items");
235 - check_schema(a, a, 0, false, "top-level schema array error");  
236 - check_schema(b, b, 0, false, "lower-level schema array error");  
237 242
238 JSON bad_schema = JSON::parse(R"({"a": true, "b": "potato?"})"); 243 JSON bad_schema = JSON::parse(R"({"a": true, "b": "potato?"})");
239 check_schema(bad_schema, bad_schema, 0, false, "bad schema field type"); 244 check_schema(bad_schema, bad_schema, 0, false, "bad schema field type");
240 245
  246 + JSON c = JSON::parse(R"(
  247 +{
  248 + "four": [
  249 + { "first": 1 },
  250 + { "oops": [2] }
  251 + ]
  252 +}
  253 +)");
  254 + check_schema(c, schema, JSON::f_optional, false, "array element mismatch");
  255 +
241 // "two" exercises the case of the JSON containing a single 256 // "two" exercises the case of the JSON containing a single
242 // element where the schema has an array. 257 // element where the schema has an array.
243 JSON good = JSON::parse(R"( 258 JSON good = JSON::parse(R"(
@@ -262,7 +277,11 @@ test_schema() @@ -262,7 +277,11 @@ test_schema()
262 "<objid>": { 277 "<objid>": {
263 "z": "ebra" 278 "z": "ebra"
264 } 279 }
265 - } 280 + },
  281 + "four": [
  282 + { "first": 1 },
  283 + { "second": [2] }
  284 + ]
266 } 285 }
267 )"); 286 )");
268 check_schema(good, schema, 0, false, "not optional"); 287 check_schema(good, schema, 0, false, "not optional");
libtests/libtests.testcov
@@ -36,7 +36,6 @@ Pl_PNGFilter decodePaeth 0 @@ -36,7 +36,6 @@ Pl_PNGFilter decodePaeth 0
36 Pl_TIFFPredictor processRow 1 36 Pl_TIFFPredictor processRow 1
37 JSON wanted dictionary 0 37 JSON wanted dictionary 0
38 JSON key missing in object 0 38 JSON key missing in object 0
39 -JSON schema array error 0  
40 JSON key extra in object 0 39 JSON key extra in object 0
41 QPDFArgParser read args from stdin 0 40 QPDFArgParser read args from stdin 0
42 QPDFArgParser read args from file 0 41 QPDFArgParser read args from file 0
@@ -93,3 +92,4 @@ JSON 16 low not after high 0 @@ -93,3 +92,4 @@ JSON 16 low not after high 0
93 JSON 16 dangling high 0 92 JSON 16 dangling high 0
94 JSON parse duplicate key 0 93 JSON parse duplicate key 0
95 JSON schema array for single item 0 94 JSON schema array for single item 0
  95 +JSON schema array length mismatch 0
libtests/qtest/json/json.out
@@ -2,6 +2,7 @@ @@ -2,6 +2,7 @@
2 top-level object is supposed to be a dictionary 2 top-level object is supposed to be a dictionary
3 --- 3 ---
4 --- missing items 4 --- missing items
  5 +json key ".four" is supposed to be an array of length 2
5 json key ".one.a": key "q" is present in schema but missing in object 6 json key ".one.a": key "q" is present in schema but missing in object
6 json key ".one.a.r" is supposed to be a dictionary 7 json key ".one.a.r" is supposed to be a dictionary
7 json key ".one.a.s": key "ss" is present in schema but missing in object 8 json key ".one.a.s": key "ss" is present in schema but missing in object
@@ -15,16 +16,12 @@ json key &quot;.two.1&quot;: key &quot;flarp&quot; is not present in schema but appears in object @@ -15,16 +16,12 @@ json key &quot;.two.1&quot;: key &quot;flarp&quot; is not present in schema but appears in object
15 json key ".two.2" is supposed to be a dictionary 16 json key ".two.2" is supposed to be a dictionary
16 json key ".two.3" is supposed to be a dictionary 17 json key ".two.3" is supposed to be a dictionary
17 --- 18 ---
18 ---- top-level schema array error  
19 -top-level object schema array contains other than one item  
20 ----  
21 ---- lower-level schema array error  
22 -json key ".one.a.r" schema array contains other than one item  
23 -json key ".two" schema array contains other than one item  
24 ----  
25 --- bad schema field type 19 --- bad schema field type
26 json key ".a" schema value is not dictionary, array, or string 20 json key ".a" schema value is not dictionary, array, or string
27 --- 21 ---
  22 +--- array element mismatch
  23 +json key ".four.1": key "oops" is not present in schema but appears in object
  24 +---
28 --- not optional 25 --- not optional
29 json key ".three.<objid>": key "o" is present in schema but missing in object 26 json key ".three.<objid>": key "o" is present in schema but missing in object
30 --- 27 ---
manual/json.rst
@@ -595,11 +595,17 @@ Documentation @@ -595,11 +595,17 @@ Documentation
595 appears in the corresponding location of the actual output. The 595 appears in the corresponding location of the actual output. The
596 corresponding output can have any value including ``null``. 596 corresponding output can have any value including ``null``.
597 597
598 - - An array in the help output always contains a single element. It  
599 - indicates that the corresponding location in the actual output is  
600 - an array of any length, and that each element of the array has  
601 - whatever format is implied by the single element of the help  
602 - output's array. 598 + - A single-element array in the help output indicates that the
  599 + corresponding location in the actual output is either a single
  600 + item or is an array of any length. The single item or each
  601 + element of the array has whatever format is implied by the single
  602 + element of the help output's array.
  603 +
  604 + - A multi-element array in the help output indicates that the
  605 + corresponding location in the actual output is an array of the
  606 + same length. Each element of the output array has whatever format
  607 + is implied by the corresponding element of the help output's
  608 + array.
603 609
604 For example, the help output indicates includes a ``"pagelabels"`` 610 For example, the help output indicates includes a ``"pagelabels"``
605 key whose value is an array of one element. That element is a 611 key whose value is an array of one element. That element is a