-
By default, all processed templates are collected, and output at the end of the call to projectUpdate, but it is now possible to e.g. simply discard all results, if nothing more needs to be done with the processed templates.
-
Expand on Object::description to get a workable string that reflects the current state of the properties of Objects in memory. With this, we can conceive of serializing Transforms/Distances without having prior knowledge of the string used to construct them, and also to serialize them in cases where alterations have been made to their state in memory. Also introduce an option to force complete serialization of a transform tree (i.e. bypass LoadStore blocks), this allows us to do things like serialize an algorithm in memory, and transmit it to another process without requiring that the other process read part of the algorithm from disk. Refactor ProcessWrapper to directly transmit its current Transform to child processes, rather than requiring the child processes to read its state from disk. This allows for far more uniform treatment of multiProcess and non-multiProcess jobs in AlgorithmCore, leading to substantial simplification of AlgorithmCore::compare.
-
Introduce PP5GalleryTransform, a transform that compares incoming templates against a fixed gallery (thereby avoiding the cost of repeatedly setting up pp5 galleries for the same templates). Modify AlgorithmCore to support supplying Transforms to the right of the algorithm string, in this case ! should be used instead of :.
-
Surely this will save many bytes of source code.
-
The basic idea is to read galleries incrementally, but there are some complications especially related to progress counting--if we don't read a gallery we don't know how many templates are stored in it since gallery formats aren't nice enough to provide headers with that information. One solution to the progress counting problem is to measure progress based on the position of a file pointer in the gallery file (i.e. measure the current position in the gallery file, divide by the total size of the gallery file). This is supported by expanding the Gallery API to include a totalSize method indicating the total size of the gallery file (or total number of templates if that is known), and then as templates are read, their position is stored in metadata (using the "p" key). Several galleries are updated to respect readBlockSize, and also to store position data in read templates. Support for filtering out already enrolled templates in read-mode was maintained by making the filtering an online process (part of the enrollment pipeline) rather than a batch process done before enrollment-proper starts.
-
This avoids warnings about project(file,file) shadowing project(template,template).
-
Introduce transforms which operate solely on metadata
-
Introduces MetadataTransform and UntrainableMetadataTransform, which provide a project(file,file) interface for defining operations that only affect metadata, and do not touch matrices. Modify a number of transforms to inherit from these interfaces instead of UntrainableTransform or UntrainableMetaTransform
-
Move emptyRead to a higher visibility scope, and make it handle caching metadata in addition to reading galleries. Caching still leverages memGallery since re-implementing a cache just for FileLists would largely be redundant.
-
Add explicit multi-process support to enroll and compare, if -multiProcess true is specified, ProcessWrapper will be attached to the current algorithm in enrollment and comparison (causing the job to be distributed over multiple br processes instead of using multi-threading). This is desirable since putting ProcessWrapper directly on algorithms leads to some compositionality problems (it's very undesirable to have nested process wrappers). One drawback is that in multi-process mode, altcompare has to write the explicitly enrolled gallery to a file (instead of using a memGallery) since the slave processes obviously don't have a shared memory space. A better approach might be to have processWrapper serialize and transmit its transform (instead of constructing it), but even that won't work directly since transforms often rely on reconstructing data from other files even if serialized.
-
OutputTransform takes inputs suitable for an output (Output specification, target adn query galleries), and expects to receive rows of a comparison matrix on incoming templates. OutputTransform supports receiving either rows one at a time, or columsn one at a time (via the transposeMode flag).