List view
For raw data, it might be necessary to drop specific data variables, or merge equivalent disjoint variables (in which always all but one has no value), it could be beneficial to provide tools to automatically transform raw data (which should stay read-only). At the same time, with the prospect of acquiring data from a FHIR-server, the data tables are to be picked from all available data. At this very first step of the pipeline the user should be able to acquire data and shape it according to their needs. A FHIR data extraction module should be in place as soon as we have access to a FHIR server. The user should be able to specify the wanted variables. In a further step, constraints on the set of patients are to be enforced. For variables, these are the variable ID (usually strings) in a list, constraints could be specified directly in a list of FHIR queries (strings).
No due dateMetrics are applied at the end of the pipeline to determine the best performing model for the input data. In our case each metric is a function of the full real (input) data and the synthetic (output) data. There are a few methods that should be applied to assess * statistical similarity, * ML performance, and * anonymity Since this list can grow, and it might not be adequate or desirable (run-time constraints, etc.) to apply all defined metrics, we define a hook variable to hold all metrics to be applied by the constructed pipeline. A hook is a variable holding a list of functions, combined with a function to execute all functions in that list at a well-specified point in the course of the program. The user of the program has to specify the list of metrics functions prior to running. The first list of metrics to be implemented is Chi-squared test and Kolmogorov-Smirnov test (univariate similarity), PCA and Spearman-corellation comparisons (correlations) TRTR/TSTR using logistic regression (ML-performance) mean pairwise distance and maximum cosine similarity All hooked metrics are to be applied on each of the model outputs. As mentioned above, metric functions are functions of input and output data of the models, they return a single floating point value, for which the lower values denote better results. In the end the metric values will be summed up for each synthesis model and compared. In a next step we have to implement a normalization process to be able to compare overall performance of the best model.
Overdue by 2 year(s)•Due by December 20, 2022•3/4 issues closed