Commit 4674c04cb80e0a50cde1b97464642e2778f9522f
1 parent
f8d1ab94
JSON schema: support multi-element array validation
Showing
8 changed files
with
109 additions
and
55 deletions
ChangeLog
| @@ -8,7 +8,11 @@ | @@ -8,7 +8,11 @@ | ||
| 8 | allow QPDFJob JSON to start accepting multiple items where a | 8 | allow QPDFJob JSON to start accepting multiple items where a |
| 9 | single item used to be expected without breaking backward | 9 | single item used to be expected without breaking backward |
| 10 | compatibility. Without this change, the earlier fix to | 10 | compatibility. Without this change, the earlier fix to |
| 11 | - removeAttachment would be a breaking change. | 11 | + removeAttachment would be a breaking change. Also allow the schema |
| 12 | + to contain a multi-element array, which means that the output has | ||
| 13 | + to have an array of the same length in the corresponding location, | ||
| 14 | + and each element is validated against the corresponding schema | ||
| 15 | + element. | ||
| 12 | 16 | ||
| 13 | * QPDFObjectHandle: for the methods insertItem, appendItem, | 17 | * QPDFObjectHandle: for the methods insertItem, appendItem, |
| 14 | eraseItem, replaceKey, and removeKey, add a corresponding | 18 | eraseItem, replaceKey, and removeKey, add a corresponding |
TODO
| @@ -124,11 +124,6 @@ JSON v2 fixes | @@ -124,11 +124,6 @@ JSON v2 fixes | ||
| 124 | compatibility, but at least this gives people a namespace they can | 124 | compatibility, but at least this gives people a namespace they can |
| 125 | know will never conflict with future keys. | 125 | know will never conflict with future keys. |
| 126 | 126 | ||
| 127 | - * Change schema validation so that if the schema contains an array | ||
| 128 | - with more than one element, the output has to have an array with | ||
| 129 | - the same number of elements whose individual elements are | ||
| 130 | - validated according to the regular rules. | ||
| 131 | - | ||
| 132 | * When reading back in, we'll have to call | 127 | * When reading back in, we'll have to call |
| 133 | pushInheritedAttributesToPage or getAllPages based on the values | 128 | pushInheritedAttributesToPage or getAllPages based on the values |
| 134 | of the metadata. | 129 | of the metadata. |
include/qpdf/JSON.hh
| @@ -183,25 +183,30 @@ class JSON | @@ -183,25 +183,30 @@ class JSON | ||
| 183 | // | 183 | // |
| 184 | // * The schema is a nested structure containing dictionaries, | 184 | // * The schema is a nested structure containing dictionaries, |
| 185 | // single-element arrays, and strings only. | 185 | // single-element arrays, and strings only. |
| 186 | - // * Recursively walk the schema. | ||
| 187 | - // * Whenever the schema has an array of length 1 and the object | ||
| 188 | - // does not have an array in the corresponding location, | ||
| 189 | - // validate the object against the array's single element. | ||
| 190 | - // This effectively enables a single element to appear in | ||
| 191 | - // place of an array and be treated as if it were an array of | ||
| 192 | - // one element. This makes it possible to decide later that | ||
| 193 | - // something that used to contain a single element now allows | ||
| 194 | - // an array without invalidating any old data. | ||
| 195 | - // * If the current value is a dictionary, this object must have | ||
| 196 | - // a dictionary in the same place with the same keys. If flags | ||
| 197 | - // contains f_optional, a key in the schema does not have to | ||
| 198 | - // be present in the object. Otherwise, all keys have to be | ||
| 199 | - // present. Any key in the object must be present in the | ||
| 200 | - // schema. | ||
| 201 | - // * If the current value is an array, this object must have an | ||
| 202 | - // array in the same place. The schema's array must contain a | ||
| 203 | - // single element, which is used as a schema to validate each | ||
| 204 | - // element of this object's corresponding array. | 186 | + // * Recursively walk the schema. In the items below, "schema |
| 187 | + // object" refers to an object in the schema, and "checked | ||
| 188 | + // object" refers to the correspondingi part of the object | ||
| 189 | + // being checked. | ||
| 190 | + // * If the schema object is a dictionary, the checked object | ||
| 191 | + // must have a dictionary in the same place with the same | ||
| 192 | + // keys. If flags contains f_optional, a key in the schema | ||
| 193 | + // does not have to be present in the object. Otherwise, all | ||
| 194 | + // keys have to be present. Any key in the object must be | ||
| 195 | + // present in the schema. | ||
| 196 | + // * If the schema object is an array of length 1, the checked | ||
| 197 | + // object may either be a single item or an array of items. | ||
| 198 | + // The single item or each element of the checked object's | ||
| 199 | + // array is validated against the single element of the | ||
| 200 | + // schema's array. The rationale behind this logic is that a | ||
| 201 | + // single element may appear wherever the schema allows a | ||
| 202 | + // variable-length array. This makes it possible to start | ||
| 203 | + // allowing an array in the future where a single element was | ||
| 204 | + // previously required without breaking backward | ||
| 205 | + // compatibility. | ||
| 206 | + // * If the schema object is an array of length > 1, the checked | ||
| 207 | + // object must be an array of the same length. In this case, | ||
| 208 | + // each element of the checked object array is validated | ||
| 209 | + // against the corresponding element of the schema array. | ||
| 205 | // * Otherwise, the value must be a string whose value is a | 210 | // * Otherwise, the value must be a string whose value is a |
| 206 | // description of the object's corresponding value, which may | 211 | // description of the object's corresponding value, which may |
| 207 | // have any type. | 212 | // have any type. |
libqpdf/JSON.cc
| @@ -538,27 +538,55 @@ JSON::checkSchemaInternal( | @@ -538,27 +538,55 @@ JSON::checkSchemaInternal( | ||
| 538 | } | 538 | } |
| 539 | } | 539 | } |
| 540 | } else if (sch_arr) { | 540 | } else if (sch_arr) { |
| 541 | - if (sch_arr->elements.size() != 1) { | ||
| 542 | - QTC::TC("libtests", "JSON schema array error"); | 541 | + auto n_elements = sch_arr->elements.size(); |
| 542 | + if (n_elements == 1) { | ||
| 543 | + // A single-element array in the schema allows a single | ||
| 544 | + // element in the object or a variable-length array, each | ||
| 545 | + // of whose items must conform to the single element of | ||
| 546 | + // the schema array. This doesn't apply to arrays of | ||
| 547 | + // arrays -- we fall back to the behavior of allowing a | ||
| 548 | + // single item only when the object is not an array. | ||
| 549 | + if (this_arr) { | ||
| 550 | + int i = 0; | ||
| 551 | + for (auto const& element: this_arr->elements) { | ||
| 552 | + checkSchemaInternal( | ||
| 553 | + element.get(), | ||
| 554 | + sch_arr->elements.at(0).get(), | ||
| 555 | + flags, | ||
| 556 | + errors, | ||
| 557 | + prefix + "." + QUtil::int_to_string(i)); | ||
| 558 | + ++i; | ||
| 559 | + } | ||
| 560 | + } else { | ||
| 561 | + QTC::TC("libtests", "JSON schema array for single item"); | ||
| 562 | + checkSchemaInternal( | ||
| 563 | + this_v, | ||
| 564 | + sch_arr->elements.at(0).get(), | ||
| 565 | + flags, | ||
| 566 | + errors, | ||
| 567 | + prefix); | ||
| 568 | + } | ||
| 569 | + } else if (!this_arr || (this_arr->elements.size() != n_elements)) { | ||
| 570 | + QTC::TC("libtests", "JSON schema array length mismatch"); | ||
| 543 | errors.push_back( | 571 | errors.push_back( |
| 544 | - err_prefix + " schema array contains other than one item"); | 572 | + err_prefix + " is supposed to be an array of length " + |
| 573 | + QUtil::uint_to_string(n_elements)); | ||
| 545 | return false; | 574 | return false; |
| 546 | - } | ||
| 547 | - if (this_arr) { | ||
| 548 | - int i = 0; | 575 | + } else { |
| 576 | + // A multi-element array in the schema must correspond to | ||
| 577 | + // an element of the same length in the object. Each | ||
| 578 | + // element in the object is validated against the | ||
| 579 | + // corresponding element in the schema. | ||
| 580 | + size_t i = 0; | ||
| 549 | for (auto const& element: this_arr->elements) { | 581 | for (auto const& element: this_arr->elements) { |
| 550 | checkSchemaInternal( | 582 | checkSchemaInternal( |
| 551 | element.get(), | 583 | element.get(), |
| 552 | - sch_arr->elements.at(0).get(), | 584 | + sch_arr->elements.at(i).get(), |
| 553 | flags, | 585 | flags, |
| 554 | errors, | 586 | errors, |
| 555 | - prefix + "." + QUtil::int_to_string(i)); | 587 | + prefix + "." + QUtil::uint_to_string(i)); |
| 556 | ++i; | 588 | ++i; |
| 557 | } | 589 | } |
| 558 | - } else { | ||
| 559 | - QTC::TC("libtests", "JSON schema array for single item"); | ||
| 560 | - checkSchemaInternal( | ||
| 561 | - this_v, sch_arr->elements.at(0).get(), flags, errors, prefix); | ||
| 562 | } | 590 | } |
| 563 | } else if (!sch_str) { | 591 | } else if (!sch_str) { |
| 564 | QTC::TC("libtests", "JSON schema other type"); | 592 | QTC::TC("libtests", "JSON schema other type"); |
libtests/json.cc
| @@ -179,7 +179,11 @@ test_schema() | @@ -179,7 +179,11 @@ test_schema() | ||
| 179 | "z": "ebra", | 179 | "z": "ebra", |
| 180 | "o": "ptional" | 180 | "o": "ptional" |
| 181 | } | 181 | } |
| 182 | - } | 182 | + }, |
| 183 | + "four": [ | ||
| 184 | + { "first": "first element" }, | ||
| 185 | + { "second": "second element" } | ||
| 186 | + ] | ||
| 183 | } | 187 | } |
| 184 | )"); | 188 | )"); |
| 185 | 189 | ||
| @@ -227,17 +231,28 @@ test_schema() | @@ -227,17 +231,28 @@ test_schema() | ||
| 227 | "else": { | 231 | "else": { |
| 228 | "z": "okay" | 232 | "z": "okay" |
| 229 | } | 233 | } |
| 230 | - } | 234 | + }, |
| 235 | + "four": [ | ||
| 236 | + {"first": "missing second"} | ||
| 237 | + ] | ||
| 231 | } | 238 | } |
| 232 | )"); | 239 | )"); |
| 233 | 240 | ||
| 234 | check_schema(b, schema, 0, false, "missing items"); | 241 | check_schema(b, schema, 0, false, "missing items"); |
| 235 | - check_schema(a, a, 0, false, "top-level schema array error"); | ||
| 236 | - check_schema(b, b, 0, false, "lower-level schema array error"); | ||
| 237 | 242 | ||
| 238 | JSON bad_schema = JSON::parse(R"({"a": true, "b": "potato?"})"); | 243 | JSON bad_schema = JSON::parse(R"({"a": true, "b": "potato?"})"); |
| 239 | check_schema(bad_schema, bad_schema, 0, false, "bad schema field type"); | 244 | check_schema(bad_schema, bad_schema, 0, false, "bad schema field type"); |
| 240 | 245 | ||
| 246 | + JSON c = JSON::parse(R"( | ||
| 247 | +{ | ||
| 248 | + "four": [ | ||
| 249 | + { "first": 1 }, | ||
| 250 | + { "oops": [2] } | ||
| 251 | + ] | ||
| 252 | +} | ||
| 253 | +)"); | ||
| 254 | + check_schema(c, schema, JSON::f_optional, false, "array element mismatch"); | ||
| 255 | + | ||
| 241 | // "two" exercises the case of the JSON containing a single | 256 | // "two" exercises the case of the JSON containing a single |
| 242 | // element where the schema has an array. | 257 | // element where the schema has an array. |
| 243 | JSON good = JSON::parse(R"( | 258 | JSON good = JSON::parse(R"( |
| @@ -262,7 +277,11 @@ test_schema() | @@ -262,7 +277,11 @@ test_schema() | ||
| 262 | "<objid>": { | 277 | "<objid>": { |
| 263 | "z": "ebra" | 278 | "z": "ebra" |
| 264 | } | 279 | } |
| 265 | - } | 280 | + }, |
| 281 | + "four": [ | ||
| 282 | + { "first": 1 }, | ||
| 283 | + { "second": [2] } | ||
| 284 | + ] | ||
| 266 | } | 285 | } |
| 267 | )"); | 286 | )"); |
| 268 | check_schema(good, schema, 0, false, "not optional"); | 287 | check_schema(good, schema, 0, false, "not optional"); |
libtests/libtests.testcov
| @@ -36,7 +36,6 @@ Pl_PNGFilter decodePaeth 0 | @@ -36,7 +36,6 @@ Pl_PNGFilter decodePaeth 0 | ||
| 36 | Pl_TIFFPredictor processRow 1 | 36 | Pl_TIFFPredictor processRow 1 |
| 37 | JSON wanted dictionary 0 | 37 | JSON wanted dictionary 0 |
| 38 | JSON key missing in object 0 | 38 | JSON key missing in object 0 |
| 39 | -JSON schema array error 0 | ||
| 40 | JSON key extra in object 0 | 39 | JSON key extra in object 0 |
| 41 | QPDFArgParser read args from stdin 0 | 40 | QPDFArgParser read args from stdin 0 |
| 42 | QPDFArgParser read args from file 0 | 41 | QPDFArgParser read args from file 0 |
| @@ -93,3 +92,4 @@ JSON 16 low not after high 0 | @@ -93,3 +92,4 @@ JSON 16 low not after high 0 | ||
| 93 | JSON 16 dangling high 0 | 92 | JSON 16 dangling high 0 |
| 94 | JSON parse duplicate key 0 | 93 | JSON parse duplicate key 0 |
| 95 | JSON schema array for single item 0 | 94 | JSON schema array for single item 0 |
| 95 | +JSON schema array length mismatch 0 |
libtests/qtest/json/json.out
| @@ -2,6 +2,7 @@ | @@ -2,6 +2,7 @@ | ||
| 2 | top-level object is supposed to be a dictionary | 2 | top-level object is supposed to be a dictionary |
| 3 | --- | 3 | --- |
| 4 | --- missing items | 4 | --- missing items |
| 5 | +json key ".four" is supposed to be an array of length 2 | ||
| 5 | json key ".one.a": key "q" is present in schema but missing in object | 6 | json key ".one.a": key "q" is present in schema but missing in object |
| 6 | json key ".one.a.r" is supposed to be a dictionary | 7 | json key ".one.a.r" is supposed to be a dictionary |
| 7 | json key ".one.a.s": key "ss" is present in schema but missing in object | 8 | json key ".one.a.s": key "ss" is present in schema but missing in object |
| @@ -15,16 +16,12 @@ json key ".two.1": key "flarp" is not present in schema but appears in object | @@ -15,16 +16,12 @@ json key ".two.1": key "flarp" is not present in schema but appears in object | ||
| 15 | json key ".two.2" is supposed to be a dictionary | 16 | json key ".two.2" is supposed to be a dictionary |
| 16 | json key ".two.3" is supposed to be a dictionary | 17 | json key ".two.3" is supposed to be a dictionary |
| 17 | --- | 18 | --- |
| 18 | ---- top-level schema array error | ||
| 19 | -top-level object schema array contains other than one item | ||
| 20 | ---- | ||
| 21 | ---- lower-level schema array error | ||
| 22 | -json key ".one.a.r" schema array contains other than one item | ||
| 23 | -json key ".two" schema array contains other than one item | ||
| 24 | ---- | ||
| 25 | --- bad schema field type | 19 | --- bad schema field type |
| 26 | json key ".a" schema value is not dictionary, array, or string | 20 | json key ".a" schema value is not dictionary, array, or string |
| 27 | --- | 21 | --- |
| 22 | +--- array element mismatch | ||
| 23 | +json key ".four.1": key "oops" is not present in schema but appears in object | ||
| 24 | +--- | ||
| 28 | --- not optional | 25 | --- not optional |
| 29 | json key ".three.<objid>": key "o" is present in schema but missing in object | 26 | json key ".three.<objid>": key "o" is present in schema but missing in object |
| 30 | --- | 27 | --- |
manual/json.rst
| @@ -595,11 +595,17 @@ Documentation | @@ -595,11 +595,17 @@ Documentation | ||
| 595 | appears in the corresponding location of the actual output. The | 595 | appears in the corresponding location of the actual output. The |
| 596 | corresponding output can have any value including ``null``. | 596 | corresponding output can have any value including ``null``. |
| 597 | 597 | ||
| 598 | - - An array in the help output always contains a single element. It | ||
| 599 | - indicates that the corresponding location in the actual output is | ||
| 600 | - an array of any length, and that each element of the array has | ||
| 601 | - whatever format is implied by the single element of the help | ||
| 602 | - output's array. | 598 | + - A single-element array in the help output indicates that the |
| 599 | + corresponding location in the actual output is either a single | ||
| 600 | + item or is an array of any length. The single item or each | ||
| 601 | + element of the array has whatever format is implied by the single | ||
| 602 | + element of the help output's array. | ||
| 603 | + | ||
| 604 | + - A multi-element array in the help output indicates that the | ||
| 605 | + corresponding location in the actual output is an array of the | ||
| 606 | + same length. Each element of the output array has whatever format | ||
| 607 | + is implied by the corresponding element of the help output's | ||
| 608 | + array. | ||
| 603 | 609 | ||
| 604 | For example, the help output indicates includes a ``"pagelabels"`` | 610 | For example, the help output indicates includes a ``"pagelabels"`` |
| 605 | key whose value is an array of one element. That element is a | 611 | key whose value is an array of one element. That element is a |