devel
devel > compatability testing
Nov 1, 2010 01:11 PM | Richard Reynolds
compatability testing
Hi all,
Nick, that's a great start on the test plan document!
...
I wonder a bit about the goal of dataset duplication. For a package
to be compliant, one would expect that it could read and understand
a valid dataset, and that the datasets it produces are valid, as well.
Is it really necessary that they be able to essentially duplicate an
input dataset?
For example, consider the input storage method. A compliant package
should be able to read datasets in ASCII, binary base64, or the gzip
version. But I would expect most packages to have a default output
format. Do we force everyone to have some option to write the output
using the same storage mode as the input?
That applies to data order (both row/column major and endian) and
data types as well. Does every package store 8, 16, 32-bit ints
and floats as read? I would expect many to promote to floats, for
example (though 32-bit ints to not "promote" to floats). And I
cannot see most packages bothering to write in a non-native endian.
My opinion is no, I do not see the need for such a requirement. The
written dataset should be able to appropriately represent the input
one, but we should not force packages to write exactly as was read.
One conceptual note here is the purpose of writing a dataset. There
is little need to copy a dataset, since the 'cp' command does that
well. What is written is by design of the applied program, and it
generally means altering a dataset for some purpose, or writing anew.
Relaxing this restriction makes compliance testing harder, but it
seems fair to me.
We could consider allowing the following to vary:
- encoding
- data order
- endian
- data types (allow conversion to float? signed int? unsigned?)
Note that if we allow these things to vary, maybe the I/O library
needs the ability to account for them, i.e. have some option to
"promote to defaults". Then the datasets could be compared after
that (particularly necessary for data order and data type, and
endian is already done).
Any thoughts on this?
I'll continue with thoughts on metadata separately.
- rick
Nick, that's a great start on the test plan document!
...
I wonder a bit about the goal of dataset duplication. For a package
to be compliant, one would expect that it could read and understand
a valid dataset, and that the datasets it produces are valid, as well.
Is it really necessary that they be able to essentially duplicate an
input dataset?
For example, consider the input storage method. A compliant package
should be able to read datasets in ASCII, binary base64, or the gzip
version. But I would expect most packages to have a default output
format. Do we force everyone to have some option to write the output
using the same storage mode as the input?
That applies to data order (both row/column major and endian) and
data types as well. Does every package store 8, 16, 32-bit ints
and floats as read? I would expect many to promote to floats, for
example (though 32-bit ints to not "promote" to floats). And I
cannot see most packages bothering to write in a non-native endian.
My opinion is no, I do not see the need for such a requirement. The
written dataset should be able to appropriately represent the input
one, but we should not force packages to write exactly as was read.
One conceptual note here is the purpose of writing a dataset. There
is little need to copy a dataset, since the 'cp' command does that
well. What is written is by design of the applied program, and it
generally means altering a dataset for some purpose, or writing anew.
Relaxing this restriction makes compliance testing harder, but it
seems fair to me.
We could consider allowing the following to vary:
- encoding
- data order
- endian
- data types (allow conversion to float? signed int? unsigned?)
Note that if we allow these things to vary, maybe the I/O library
needs the ability to account for them, i.e. have some option to
"promote to defaults". Then the datasets could be compared after
that (particularly necessary for data order and data type, and
endian is already done).
Any thoughts on this?
I'll continue with thoughts on metadata separately.
- rick
Threaded View
| Title | Author | Date |
|---|---|---|
| Richard Reynolds | Nov 1, 2010 | |
| Nick Schmansky | Dec 14, 2010 | |
| Richard Reynolds | Dec 14, 2010 | |
