Commit 4674c04cb80e0a50cde1b97464642e2778f9522f
1 parent
f8d1ab94
JSON schema: support multi-element array validation
Showing
8 changed files
with
109 additions
and
55 deletions
ChangeLog
| ... | ... | @@ -8,7 +8,11 @@ |
| 8 | 8 | allow QPDFJob JSON to start accepting multiple items where a |
| 9 | 9 | single item used to be expected without breaking backward |
| 10 | 10 | compatibility. Without this change, the earlier fix to |
| 11 | - removeAttachment would be a breaking change. | |
| 11 | + removeAttachment would be a breaking change. Also allow the schema | |
| 12 | + to contain a multi-element array, which means that the output has | |
| 13 | + to have an array of the same length in the corresponding location, | |
| 14 | + and each element is validated against the corresponding schema | |
| 15 | + element. | |
| 12 | 16 | |
| 13 | 17 | * QPDFObjectHandle: for the methods insertItem, appendItem, |
| 14 | 18 | eraseItem, replaceKey, and removeKey, add a corresponding | ... | ... |
TODO
| ... | ... | @@ -124,11 +124,6 @@ JSON v2 fixes |
| 124 | 124 | compatibility, but at least this gives people a namespace they can |
| 125 | 125 | know will never conflict with future keys. |
| 126 | 126 | |
| 127 | - * Change schema validation so that if the schema contains an array | |
| 128 | - with more than one element, the output has to have an array with | |
| 129 | - the same number of elements whose individual elements are | |
| 130 | - validated according to the regular rules. | |
| 131 | - | |
| 132 | 127 | * When reading back in, we'll have to call |
| 133 | 128 | pushInheritedAttributesToPage or getAllPages based on the values |
| 134 | 129 | of the metadata. | ... | ... |
include/qpdf/JSON.hh
| ... | ... | @@ -183,25 +183,30 @@ class JSON |
| 183 | 183 | // |
| 184 | 184 | // * The schema is a nested structure containing dictionaries, |
| 185 | 185 | // single-element arrays, and strings only. |
| 186 | - // * Recursively walk the schema. | |
| 187 | - // * Whenever the schema has an array of length 1 and the object | |
| 188 | - // does not have an array in the corresponding location, | |
| 189 | - // validate the object against the array's single element. | |
| 190 | - // This effectively enables a single element to appear in | |
| 191 | - // place of an array and be treated as if it were an array of | |
| 192 | - // one element. This makes it possible to decide later that | |
| 193 | - // something that used to contain a single element now allows | |
| 194 | - // an array without invalidating any old data. | |
| 195 | - // * If the current value is a dictionary, this object must have | |
| 196 | - // a dictionary in the same place with the same keys. If flags | |
| 197 | - // contains f_optional, a key in the schema does not have to | |
| 198 | - // be present in the object. Otherwise, all keys have to be | |
| 199 | - // present. Any key in the object must be present in the | |
| 200 | - // schema. | |
| 201 | - // * If the current value is an array, this object must have an | |
| 202 | - // array in the same place. The schema's array must contain a | |
| 203 | - // single element, which is used as a schema to validate each | |
| 204 | - // element of this object's corresponding array. | |
| 186 | + // * Recursively walk the schema. In the items below, "schema | |
| 187 | + // object" refers to an object in the schema, and "checked | |
| 188 | + // object" refers to the correspondingi part of the object | |
| 189 | + // being checked. | |
| 190 | + // * If the schema object is a dictionary, the checked object | |
| 191 | + // must have a dictionary in the same place with the same | |
| 192 | + // keys. If flags contains f_optional, a key in the schema | |
| 193 | + // does not have to be present in the object. Otherwise, all | |
| 194 | + // keys have to be present. Any key in the object must be | |
| 195 | + // present in the schema. | |
| 196 | + // * If the schema object is an array of length 1, the checked | |
| 197 | + // object may either be a single item or an array of items. | |
| 198 | + // The single item or each element of the checked object's | |
| 199 | + // array is validated against the single element of the | |
| 200 | + // schema's array. The rationale behind this logic is that a | |
| 201 | + // single element may appear wherever the schema allows a | |
| 202 | + // variable-length array. This makes it possible to start | |
| 203 | + // allowing an array in the future where a single element was | |
| 204 | + // previously required without breaking backward | |
| 205 | + // compatibility. | |
| 206 | + // * If the schema object is an array of length > 1, the checked | |
| 207 | + // object must be an array of the same length. In this case, | |
| 208 | + // each element of the checked object array is validated | |
| 209 | + // against the corresponding element of the schema array. | |
| 205 | 210 | // * Otherwise, the value must be a string whose value is a |
| 206 | 211 | // description of the object's corresponding value, which may |
| 207 | 212 | // have any type. | ... | ... |
libqpdf/JSON.cc
| ... | ... | @@ -538,27 +538,55 @@ JSON::checkSchemaInternal( |
| 538 | 538 | } |
| 539 | 539 | } |
| 540 | 540 | } else if (sch_arr) { |
| 541 | - if (sch_arr->elements.size() != 1) { | |
| 542 | - QTC::TC("libtests", "JSON schema array error"); | |
| 541 | + auto n_elements = sch_arr->elements.size(); | |
| 542 | + if (n_elements == 1) { | |
| 543 | + // A single-element array in the schema allows a single | |
| 544 | + // element in the object or a variable-length array, each | |
| 545 | + // of whose items must conform to the single element of | |
| 546 | + // the schema array. This doesn't apply to arrays of | |
| 547 | + // arrays -- we fall back to the behavior of allowing a | |
| 548 | + // single item only when the object is not an array. | |
| 549 | + if (this_arr) { | |
| 550 | + int i = 0; | |
| 551 | + for (auto const& element: this_arr->elements) { | |
| 552 | + checkSchemaInternal( | |
| 553 | + element.get(), | |
| 554 | + sch_arr->elements.at(0).get(), | |
| 555 | + flags, | |
| 556 | + errors, | |
| 557 | + prefix + "." + QUtil::int_to_string(i)); | |
| 558 | + ++i; | |
| 559 | + } | |
| 560 | + } else { | |
| 561 | + QTC::TC("libtests", "JSON schema array for single item"); | |
| 562 | + checkSchemaInternal( | |
| 563 | + this_v, | |
| 564 | + sch_arr->elements.at(0).get(), | |
| 565 | + flags, | |
| 566 | + errors, | |
| 567 | + prefix); | |
| 568 | + } | |
| 569 | + } else if (!this_arr || (this_arr->elements.size() != n_elements)) { | |
| 570 | + QTC::TC("libtests", "JSON schema array length mismatch"); | |
| 543 | 571 | errors.push_back( |
| 544 | - err_prefix + " schema array contains other than one item"); | |
| 572 | + err_prefix + " is supposed to be an array of length " + | |
| 573 | + QUtil::uint_to_string(n_elements)); | |
| 545 | 574 | return false; |
| 546 | - } | |
| 547 | - if (this_arr) { | |
| 548 | - int i = 0; | |
| 575 | + } else { | |
| 576 | + // A multi-element array in the schema must correspond to | |
| 577 | + // an element of the same length in the object. Each | |
| 578 | + // element in the object is validated against the | |
| 579 | + // corresponding element in the schema. | |
| 580 | + size_t i = 0; | |
| 549 | 581 | for (auto const& element: this_arr->elements) { |
| 550 | 582 | checkSchemaInternal( |
| 551 | 583 | element.get(), |
| 552 | - sch_arr->elements.at(0).get(), | |
| 584 | + sch_arr->elements.at(i).get(), | |
| 553 | 585 | flags, |
| 554 | 586 | errors, |
| 555 | - prefix + "." + QUtil::int_to_string(i)); | |
| 587 | + prefix + "." + QUtil::uint_to_string(i)); | |
| 556 | 588 | ++i; |
| 557 | 589 | } |
| 558 | - } else { | |
| 559 | - QTC::TC("libtests", "JSON schema array for single item"); | |
| 560 | - checkSchemaInternal( | |
| 561 | - this_v, sch_arr->elements.at(0).get(), flags, errors, prefix); | |
| 562 | 590 | } |
| 563 | 591 | } else if (!sch_str) { |
| 564 | 592 | QTC::TC("libtests", "JSON schema other type"); | ... | ... |
libtests/json.cc
| ... | ... | @@ -179,7 +179,11 @@ test_schema() |
| 179 | 179 | "z": "ebra", |
| 180 | 180 | "o": "ptional" |
| 181 | 181 | } |
| 182 | - } | |
| 182 | + }, | |
| 183 | + "four": [ | |
| 184 | + { "first": "first element" }, | |
| 185 | + { "second": "second element" } | |
| 186 | + ] | |
| 183 | 187 | } |
| 184 | 188 | )"); |
| 185 | 189 | |
| ... | ... | @@ -227,17 +231,28 @@ test_schema() |
| 227 | 231 | "else": { |
| 228 | 232 | "z": "okay" |
| 229 | 233 | } |
| 230 | - } | |
| 234 | + }, | |
| 235 | + "four": [ | |
| 236 | + {"first": "missing second"} | |
| 237 | + ] | |
| 231 | 238 | } |
| 232 | 239 | )"); |
| 233 | 240 | |
| 234 | 241 | check_schema(b, schema, 0, false, "missing items"); |
| 235 | - check_schema(a, a, 0, false, "top-level schema array error"); | |
| 236 | - check_schema(b, b, 0, false, "lower-level schema array error"); | |
| 237 | 242 | |
| 238 | 243 | JSON bad_schema = JSON::parse(R"({"a": true, "b": "potato?"})"); |
| 239 | 244 | check_schema(bad_schema, bad_schema, 0, false, "bad schema field type"); |
| 240 | 245 | |
| 246 | + JSON c = JSON::parse(R"( | |
| 247 | +{ | |
| 248 | + "four": [ | |
| 249 | + { "first": 1 }, | |
| 250 | + { "oops": [2] } | |
| 251 | + ] | |
| 252 | +} | |
| 253 | +)"); | |
| 254 | + check_schema(c, schema, JSON::f_optional, false, "array element mismatch"); | |
| 255 | + | |
| 241 | 256 | // "two" exercises the case of the JSON containing a single |
| 242 | 257 | // element where the schema has an array. |
| 243 | 258 | JSON good = JSON::parse(R"( |
| ... | ... | @@ -262,7 +277,11 @@ test_schema() |
| 262 | 277 | "<objid>": { |
| 263 | 278 | "z": "ebra" |
| 264 | 279 | } |
| 265 | - } | |
| 280 | + }, | |
| 281 | + "four": [ | |
| 282 | + { "first": 1 }, | |
| 283 | + { "second": [2] } | |
| 284 | + ] | |
| 266 | 285 | } |
| 267 | 286 | )"); |
| 268 | 287 | check_schema(good, schema, 0, false, "not optional"); | ... | ... |
libtests/libtests.testcov
| ... | ... | @@ -36,7 +36,6 @@ Pl_PNGFilter decodePaeth 0 |
| 36 | 36 | Pl_TIFFPredictor processRow 1 |
| 37 | 37 | JSON wanted dictionary 0 |
| 38 | 38 | JSON key missing in object 0 |
| 39 | -JSON schema array error 0 | |
| 40 | 39 | JSON key extra in object 0 |
| 41 | 40 | QPDFArgParser read args from stdin 0 |
| 42 | 41 | QPDFArgParser read args from file 0 |
| ... | ... | @@ -93,3 +92,4 @@ JSON 16 low not after high 0 |
| 93 | 92 | JSON 16 dangling high 0 |
| 94 | 93 | JSON parse duplicate key 0 |
| 95 | 94 | JSON schema array for single item 0 |
| 95 | +JSON schema array length mismatch 0 | ... | ... |
libtests/qtest/json/json.out
| ... | ... | @@ -2,6 +2,7 @@ |
| 2 | 2 | top-level object is supposed to be a dictionary |
| 3 | 3 | --- |
| 4 | 4 | --- missing items |
| 5 | +json key ".four" is supposed to be an array of length 2 | |
| 5 | 6 | json key ".one.a": key "q" is present in schema but missing in object |
| 6 | 7 | json key ".one.a.r" is supposed to be a dictionary |
| 7 | 8 | json key ".one.a.s": key "ss" is present in schema but missing in object |
| ... | ... | @@ -15,16 +16,12 @@ json key ".two.1": key "flarp" is not present in schema but appears in object |
| 15 | 16 | json key ".two.2" is supposed to be a dictionary |
| 16 | 17 | json key ".two.3" is supposed to be a dictionary |
| 17 | 18 | --- |
| 18 | ---- top-level schema array error | |
| 19 | -top-level object schema array contains other than one item | |
| 20 | ---- | |
| 21 | ---- lower-level schema array error | |
| 22 | -json key ".one.a.r" schema array contains other than one item | |
| 23 | -json key ".two" schema array contains other than one item | |
| 24 | ---- | |
| 25 | 19 | --- bad schema field type |
| 26 | 20 | json key ".a" schema value is not dictionary, array, or string |
| 27 | 21 | --- |
| 22 | +--- array element mismatch | |
| 23 | +json key ".four.1": key "oops" is not present in schema but appears in object | |
| 24 | +--- | |
| 28 | 25 | --- not optional |
| 29 | 26 | json key ".three.<objid>": key "o" is present in schema but missing in object |
| 30 | 27 | --- | ... | ... |
manual/json.rst
| ... | ... | @@ -595,11 +595,17 @@ Documentation |
| 595 | 595 | appears in the corresponding location of the actual output. The |
| 596 | 596 | corresponding output can have any value including ``null``. |
| 597 | 597 | |
| 598 | - - An array in the help output always contains a single element. It | |
| 599 | - indicates that the corresponding location in the actual output is | |
| 600 | - an array of any length, and that each element of the array has | |
| 601 | - whatever format is implied by the single element of the help | |
| 602 | - output's array. | |
| 598 | + - A single-element array in the help output indicates that the | |
| 599 | + corresponding location in the actual output is either a single | |
| 600 | + item or is an array of any length. The single item or each | |
| 601 | + element of the array has whatever format is implied by the single | |
| 602 | + element of the help output's array. | |
| 603 | + | |
| 604 | + - A multi-element array in the help output indicates that the | |
| 605 | + corresponding location in the actual output is an array of the | |
| 606 | + same length. Each element of the output array has whatever format | |
| 607 | + is implied by the corresponding element of the help output's | |
| 608 | + array. | |
| 603 | 609 | |
| 604 | 610 | For example, the help output indicates includes a ``"pagelabels"`` |
| 605 | 611 | key whose value is an array of one element. That element is a | ... | ... |