-
The basic idea is to read galleries incrementally, but there are some complications especially related to progress counting--if we don't read a gallery we don't know how many templates are stored in it since gallery formats aren't nice enough to provide headers with that information. One solution to the progress counting problem is to measure progress based on the position of a file pointer in the gallery file (i.e. measure the current position in the gallery file, divide by the total size of the gallery file). This is supported by expanding the Gallery API to include a totalSize method indicating the total size of the gallery file (or total number of templates if that is known), and then as templates are read, their position is stored in metadata (using the "p" key). Several galleries are updated to respect readBlockSize, and also to store position data in read templates. Support for filtering out already enrolled templates in read-mode was maintained by making the filtering an online process (part of the enrollment pipeline) rather than a batch process done before enrollment-proper starts.
-
We have no particular need to mess with internal whitespace to deal with this issue.
-
QTextStream is apparently somewhat better behaved about these things than QFile.
-
Incrementally write a file instead of accumulating all lines of a text file, then outputting them in the destructor. This avoids maintaining a full copy of all metadata being written to disk. ALso, respect readBlockSize in these galleries when reading incrementally.
-
Move emptyRead to a higher visibility scope, and make it handle caching metadata in addition to reading galleries. Caching still leverages memGallery since re-implementing a cache just for FileLists would largely be redundant.
-
For galleries, add a property to Gallery indicating the number of templates that will be read per block (for galleries that do incremental reads--really just galGallery and memGallery). For outputs, add two properties indicating the rows and columns of blocks to use (i.e. support non-square output blocks). For both classes, default these properties to Globals->blockSize
-
This change consolidates the previous 'read' and 'noDuplicates' flags into a single 'append' flag. If append is specified, and an output gallery already exists, and the gallery format supports read/write or has explicit append support, then enrollment will be restricted to those files in the input list not already present in the gallery, and the results will be appended to the existing gallery. append defaults to 'false', which is a deparature from previous behavior. The .gal format has explicit append support, for other cases, if the gallery supports both read and write (less common than you might think), we support append by reading the existing gallery, and writing back out to an overwriting file. It should be possible to add explicit append support to several other gallery types.
-
Resolved conflicts: app/br/br.cpp openbr/core/bee.cpp openbr/plugins/output.cpp
-
resolved conflicts: app/br/br.cpp openbr/core/bee.cpp openbr/core/classify.cpp openbr/core/cluster.cpp openbr/core/eval.h openbr/openbr.cpp openbr/openbr.h openbr/plugins/algorithms.cpp openbr/plugins/independent.cpp openbr/plugins/output.cpp openbr/plugins/svm.cpp
-
This, rather than always storing it as label. The name of the column can be set in the SQL query, so there is no particular need to change it in dbGallery
-
Change default label name from Subjet to Label (since label is a more general term). Use different default variable names for classification (label), regression (regressor/regressand), and clustering (ClusterID) Update some (far from all) transforms to accept arguments specifying their input/output variables. Update eval classification to optionally take target variable names as arguments
-
Remove special casing of label in cache, and getCSVElement
-
Remove global label/subject lookup table Consistently use "Subject" rather than "Label", subject is assumed to be convertable to QString. When desirable, map discrete subject values to ints. For classifiers such as svm that require numeric labels, generate a string->int mapping for the training data, and store it (local to the transform). Utility functions for collecting all values of a given property (on a template list), and mapping discrete property values to 0-based integers Some outstanding issues include use of label/subject in matrix output
-
Remove subject/label methods from the API, replace some methods with general methods taking a property name as an argument. This breaks quite a few things