This document specifies a draft profile of RO-Crate for the purpose of TRE-FX implementation of workflow execution in a distributed trusted research environment (TRE).

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Note that all references to schema.org types/properties/instances use the prefix http://schema.org/ (not https) to correspond with the official JSON-LD context.

Overview

A Five Safes Crate represents a unit of computational access to sensitive information which is managed in accordance with a set of principles conforming to the 5 safe framework.

Archive serialisation

A compliant Five Safes Crate SHOULD be stored and transferred as an ZIP archive containing a single BagIt directory (bag) of an arbitrary name, which payload data/ contains the RO-Crate Metadata File ro-crate-metadata.json and any required data files (e.g. inputs).

Internally a processing TRE MAY choose to unpack a ZIP file to a local file store, taking necessary security and performance precautions (see Security considerations).

The BagIt payload manifest [RFC8493] MUST be present using sha-512 checksums, and the tag manifest SHOULD be included as sha-512 [FIPS 180-4]. Payload and tag manifests using other checksums MAY be included, taking care to exclude tagmanifest-* from their checksums.

BagIt expectations

The RO-Crate BagIt expectations for Adding RO-Crate to Bagit MUST be followed. The bag-info.txt MUST include a generated External-Identifier: field which SHOULD be a UUID URN [rfc4122], e.g.:

The identifier of the External-Identifier represents this crate as a request and subsequent response, and SHOULD be freshly generated for each request. It is RECOMMENDED to not modify this identifier as the Five Safes Crate progresses through the distributed TRE processing, unless it is recognized as an previous execution^[c]^[d].

Note that as the ro-crate-metadata.json establishes payload directory data/ as the RO-Crate Root it can only reference files and directories there within, the RO-Crate MUST NOT reference tag files like ../fetch.txt or other relative paths outside the Bag (see Security considerations).

Zip expectations

The ZIP archive MUST only contain a single top-level entry for the bag directory, identified by the bagit.txt marker. For interoperability in terms of ZIP features, implementations SHOULD follow guidance for an OSF ZIP Container (ignoring OCF Abstract Container requirements).

Metadata file expectations

The RO-Crate Metadata File MUST conform to RO-Crate 1.2^[e] (or later minor version). The compliant version MUST be declared in the metadata file descriptor:

{

The root data entity of a Five Saves Crate MUST have the @id equal "./" (as it is stored within the BagIt ZIP archive). It MAY have an additional identifier.

Profile conformance

Crates conforming to this profile specification SHOULD indicate this on the Root Data Entity using conformsTo:

},
{
"@id": "https://w3id.org/ro/five-safes/0.1-DRAFT",
"@type": "Profile",
"name": "Five Saves Crate profile"
}

Note that licence and datePublished is not required in a submitted crate, but SHOULD be included for a published crate (see Publishing Phase).

Referencing a Workflow Crate

The metadata file MUST reference a Workflow RO-Crate Dataset as its mainEntity, indirectly indicating the workflow to execute.

The identifier SHOULD be a permalink or versioned URL (e.g. https://workflowhub.eu/workflows/289?version=1) or MAY be a nested directory within the BagIt payload directory (e.g. "workflow289.1/").

Finding the RO-Crate archive

If the identifier is a URI, an URL to the downloadable Workflow RO-Crate ZIP archive SHOULD be included with distribution, otherwise clients SHOULD use Signposting to find the link to the RO-Crate by looking for the link with rel="item" type="application/zip" profile="https://w3id.org/ro/crate" - for instance:

Example:

"distribution": {
"@id": "https://workflowhub.eu/workflows/289/ro_crate?version=1"}

In the above example, the mainEntity points to a Dataset that conforms to the Workflow RO-Crate Profile and references the ZIP download URI using distribution, therefore the client can download the workflow directly without needing to follow Signposting headers.

Requested Workflow Run

The metadata file MUST include a CreateAction, which MUST be referenced from mentions of the root entity. The identifier SHOULD be based on a UUID (different from the BagIt External-Identifier).

Example:

"@type": "CreateAction",
"actionStatus": "http://schema.org/PotentialActionStatus",

Requesting Agent

The individual person who is requesting the run MUST be indicated as an agent from the CreateAction, which SHOULD have an affiliation to the organisation they are representing for access control purposes.

Note: The organisation under affiliation is typically the employing organisation^[h]^[i]^[j], e.g. a university or hospital. Virtual organisations such as research projects MAY additionally be listed using memberOf (see also Responsible Project below).

Responsible Project

The project that the request is sent on behalf of, typically related to permission to use a TRE, MUST be indicated from the root dataset using sourceOrganization to a Project. The responsible project SHOULD be referenced from the requesting agent’s memberOf.

Note: The responsible project is not necessarily a ResearchProject corresponding to a funded grant, but may be more specific studies within a funded project. Various TREs may have different granularity and identifiers for the responsible projects. A project Grant MAY be referenced using funding from the responsible project.

It is RECOMMENDED to include TRE-specific ids under identifier (which MAY be an array). If the identifier is not globally unique (e.g. an integer rather than an UUID or URI), it is RECOMMENDED to add a repository-specific identifier and provide the local identifier as value of a PropertyValue entity. Multiple repository-specific identifiers MAY be included for different TREs from a single Project entity.

The project MAY indicate the member organisations, in which case one of them SHOULD match the affiliation of the Requesting Agent with a memberOf to this project.

],
"funding": [
"@id": "https://gtr.ukri.org/projects?ref=10038961"^[k],
],
"member": [
{"@id": "https://ror.org/027m9bs27"},

"name": "tre72",
"value": "project81"
},
{ "@id": "https://gtr.ukri.org/projects?ref=10038961",
"@type": "Grant",
"name": "EOSC4Cancer"
}

Agreement Data

Inputs

Each input MUST have a corresponding data entity, which SHOULD have a exampleOfWork to a corresponding FormalParameter:

{ "@id": "input1.txt",
"@type": "File",
"name": "input1",
"exampleOfWork": {"#sequence"}
},
{

"@id": "#enableFastMode",
"@type": "PropertyValue",
"name": "--fast-mode",
"value": "True",
"exampleOfWork": {"#fast"}

},
{ "@id": "#sequence",
"@type": "FormalParameter",
"name": "input-sequence"
},

Tip: While the FormalParameter SHOULD match the definitions within the Workflow Crate referenced from mainEntity, the only requirement from this profile is that their name is programmatically recognized by the workflow engine for binding input parameters of the particular workflow.

Outputs

If the workflow has successfully executed, that is the CreateAction has actionStatus set to CompletedActionStatus, the output data entities SHOULD be referenced from the results array.

Output entities MUST be described as in the Workflow Run Crate profile, with type SHOULD be File, Dataset, Collection, DigitalDocument or PropertyValue.

Implementations MAY include the outputs within the Crate BagIt archive, in which case it is RECOMMENDED to use the folder outputs/ to avoid conflict with other files in the crate.

Tip: Implementations may need to inspect the FormalParameter of the Workflow Crate to propagate a human readable name and encodingFormat file format of the inputs and output.

Sensitive data

Outputs MAY include references to sensitive data that is only accessible from within the TRE or through URIs that require authentication. The requirement for permission SHOULD be indicated by typing the data entity as a DigitalDocument that use hasDigitalDocumentPermission to reference the DigitalDocumentPermission entity, typically assigning http://schema.org/ReadPermission with grantee to only to the Responsible Project.

"@id": "urn:uuid:07b81e0f-7ac4-5428-9940-878b241e2397",

"@type": "DigitalDocument",

"encodingFormat": "text/csv",

"name": "Patient measurement 07b81e0f-7ac4-5428-9940-878b241e2397",

"hasDigitalDocumentPermission": {"@id": "#permissions-07b81e0f"},

{ "@id": "#permissions-07b81e0f",

"@type": "DigitalDocumentPermission",

"permissionType": "http://schema.org/ReadPermission",

"grantee": { "@id": "#project-be6ffb55-4f5a-4c14-b60e-47e0951090c70"}

Review process

The Five Safes RO-Crate may face several reviews both before and after workflow execution, automated and manual. To record that such review will or has taken place, a series of additional Action contextual entities SHOULD be related to the root data entity using mentions.

It is RECOMMENDED that the first step after authentication is a syntactic validation step that verifies the RO-Crate validity according to this profile and system expectations. This step SHOULD remove mentions references to any end-user-provided AssessAction (as defined in this profile) from the submitted crate, in order to ensure only assessment endorsements by the particular TRE are considered in the subsequent internal processing.

Assessment actions SHOULD indicate an actionStatus to reflect the outcome or pending nature of the assessment. Each assessment SHOULD have the root data entity (typically {"@id": "./"}) listed under object, and MAY include additional entities that were assessed.

The phase of the review process is indicated using subclasses of Action and more accurately with additionalType using terms from the Safe Haven Provenance (SHP) ontology.

The name of the action MUST provide a human readable name of the type of check and its outcome, but SHOULD NOT be consulted by software for decision making (rather they should check actionStatus and additionalType).

Each completed action SHOULD have a timestamp using endTime that follow the ISO-8601 syntax of RFC 3339 (including timezone or Z). startTime MAY be included for active, failed and completed actions.

The main actor performing the assessment SHOULD be listed under agent and refer to either a Organization (e.g. an TRE helpdesk), Person (manual check) or a SoftwareApplication (automated check). A SoftwareApplication acting on behalf of a TRE MUST include a reference to the TRE Organization using provider. There may be multiple actors appearing as agent for different actions, each of which should be listed as contextual entities with at least name.

{ "@id": "https://tre72.example.com/#crate-validator",
"@type": "SoftwareApplication",
"name": "RO-Crate validator at TRE72",
"provider": {"@id": "https://tre72.example.com/"}
},

{ "@id": "https://tre72.example.com/",
"@type": "Organization",
"name": "TRE 72 trusted research environment at The University of Manchester",
"parentOrganization": {"@id": "https://ror.org/027m9bs27"}
}

Check phase

Before any further processing, the content of a submitted crate SHOULD be checked for integrity and completeness against the BagIt payload manifest and tag manifest, considering at least the SHA-512 algorithm. This phase MAY also check any cryptographic signatures.
Example:

{ "@id": "#check-f33fe90c-0c22-4c72-b299-de509028410e",
"type": "AssessAction",
"additionalType": {"@id": "https://w3id.org/shp#CheckValue"},
"name": "BagIt checksum of Crate: OK",
"endTime": "2023-04-18T12:11:45+01:00",
"object": {"@id": "./"},
"instrument": {
"@id": "https://www.iana.org/assignments/named-information#sha-512"},
"agent": {"@id": "#validator-a4a66c63-2fe0-4c57-830d-268a40718313"},
"actionStatus": "http://schema.org/CompletedActionStatus"
},

{ "@id": "https://www.iana.org/assignments/named-information#sha-512",
"@type": "DefinedTerm",
"name": "sha-512 algorithm"
}

Note that subsequent modifications to the submitted crate by the TRE will necessarily mean checksums become out of date. It is RECOMMENDED to update the BagIt manifest following crate modifications if further TRE phases require checksum (e.g. after network transfer), however any subsequent internal checksum validations SHOULD NOT be recorded as an AssessAction. Checksums of the final crate MUST be updated by the Publishing phase and recorded accordingly.

The check phase MAY perform any additional file-level security checks required by the particular TRE, e.g. maximum file size of crate, valid characters in filenames or use of symbolic links.

Validation phase

A crate that has been validated according to RO-Crate specifications and this profile SHOULD mention an AssessAction which instrument refers to the profile entity, and an additionalType referring to https://w3id.org/shp#ValidationCheck. Example:

{ "@id": "#validate-1146f640-819e-4c86-b029-b763a0040896",
"type": "AssessAction",
"additionalType": {"@id": "https://w3id.org/shp#ValidationCheck"},
"name": "Validation against Five Safes Crate profile: approved",
"startTime": "2023-04-18T12:11:46+01:00",
"endTime": "2023-04-18T12:11:49+01:00",
"object": {"@id": "./"},
"instrument": {"@id": "https://w3id.org/ro/five-safes/0.1-DRAFT"},
"agent": {"@id": "#validator-a4a66c63-2fe0-4c57-830d-268a40718313"},
"actionStatus": "http://schema.org/CompletedActionStatus"
}

The validation phase MAY perform any additional syntactic or semantic checks required by the particular TRE and workflow, e.g. correspondence between provided and expected input parameters, in which case this should be reflected by adding such entities to object (checked) and instrument (expected) as arrays.

Workflow retrieval phase

The referenced workflow crate may be retrieved by the TRE before Sign-off or Workflow Execution, potentially using a local proxy (see Finding the RO-Crate archive and Security considerations). In this case, the retrieval SHOULD be indicated by a DowloadAction with the Workflow RO-Crate’s distribution as object (indicating the URL that was downloaded from, potentially following Signposting).

Implementations MAY choose to unpack and add the Workflow Crate folder to the Bagit, in which case it SHOULD be indicated as an additional data entity referenced as result, which reference the mainEntity from sameAs and the download location in distribution.

{ "@id": "#download-8b51bf57-6b29-44da-b24b-638c8df91639",
"type": "DownloadAction",
"name": "Downloaded workflow RO-Crate via proxy",
"startTime": "2023-04-18T12:11:50+01:00",
"endTime": "2023-04-18T12:11:52+01:00",
"object": {"@id": "https://workflowhub.eu/workflows/289/ro_crate?version=1"},
"result": {"@id": "workflow/289/"},
"agent": {"@id": "http://proxy.example.com/"},
"actionStatus": "http://schema.org/CompletedActionStatus"
},
{
"@id": "workflow/289/",
"sameAs": { "@id": "https://workflowhub.eu/workflows/289?version=1" },
"@type": "Dataset",
"name": "CWL Protein MD Setup tutorial with mutations",
"conformsTo": {"@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0"},
"distribution": {
"@id": "https://workflowhub.eu/workflows/289/ro_crate?version=1"}

Sign-off phase

Before executing a Five Safes Crate, the TRE SHOULD check if the requesting agent is permitted to execute the particular workflow on behalf of the responsible project. This SHOULD include checks against the Agreement policy data maintained by the TRE. This may be a manual and/or automated check, as indicated by agent. The object SHOULD additionally reference the workflow and responsible project (unless these were not part of the sign-off checks).

{ "@id": "#signoff-3b741265-cfef-49ea-8138-a2fa149bf2f0",
"type": "AssessAction",
"additionalType": {"@id": "https://w3id.org/shp#SignOff"},
"name": "Sign-off of execution according to Agreement policy: approved",
"endTime": "2023-04-19T17:15:12+01:00",
"object": [
{"@id": "./"},
{"@id": "https://workflowhub.eu/workflows/289?version=1"},
{"#project-be6ffb55-4f5a-4c14-b60e-47e0951090c70},
],
"instrument": {"@id": "https://tre72.example.com/agreement-policy/81"},
"agent": {"@id": "https://orcid.org/0000-0002-1825-0097"},
"actionStatus": "http://schema.org/CompletedActionStatus"
},

{ "@id": "https://tre72.example.com/agreement-policy/81",
"@type": "CreativeWork",
"name": "Agreement policy for TRE72 for project 81",
},

Workflow execution phase

In this phase, the approved workflow execution is performed within the TRE. The CreateAction of the workflow execution will its actionStatus and acquire a startTime and endTime.

When the execution is in CompletedActionStatus or FailedActionStatus, the crate SHOULD also follow the Provenance Crate profile, e.g. the workflow outputs data entities will be listed as result.

Execution states

The states of the Five Safes Crate is indicated by the actionStatus of this main action:

Example

"@type": "CreateAction",
"actionStatus": "http://schema.org/CompletetedActionStatus",
"startTime": "2023-04-18T13:52:19+01:00",
"endTime": "2023-04-18T14:00:19+01:00",

Note that even if the workflow execution action is in CompletetedActionStatus, two additional phases are required to pass before the Five Safe Crate can be considered “finished”, see the following subsections.

Disclosure phase

Before workflow results are returned from the TRE, a disclosure check SHOULD be performed, e.g. to verify the workflow execution has not revealed sensitive data. Depending on the workflow and TRE data this may be automated and/or manual as indicated by agent.

In the example below, a person has been assigned, while the actionStatus indicates the disclosure check is pending (startTime in this case indicating predicted waiting time into the future).

{ "@id": "#disclosure-b16c1f0a-ae7f-4582-9b28-7d9df3313e27",
"type": "AssessAction",
"additionalType": {"@id": "https://w3id.org/shp#DisclosureCheck"},
"name": "Disclosure check of workflow results: pending (estimate: 1 week)",
"startTime": "2023-04-25T16:00:00+01:00",
"object": {"@id": "./"},
"agent": {"@id": "https://orcid.org/0000-0002-1825-0097"},
"actionStatus": "http://schema.org/PotentialActionStatus"
}

If a crate fails the disclosure phase, its content such as workflow results MUST NOT be included in the returned crate returned to the user. Likewise, its workflow execution CreateAction and corresponding output data entities SHOULD be removed from the metadata file.

Publishing phase

Before a disclosure-approved Five Saves Crate is published to the requesting user (or archived in a repository), some housekeeping tasks are to be completed.

The root data entity’s datePublished SHOULD be updated to the time the manifest is last written. The publisher SHOULD be updated to reflect the executing TRE. The mentions MUST be expanded to include all the AssessActions recorded by the TRE. hasPart MUST include (possibly through intermediate folders as Dataset) any result data entities now referred to from the workflow execution’s CreateAction.

The licence SHOULD be included to describe the licence of the workflow output data, either an open licence such as Creative Commons, or a restrictive (typically TRE-specific) conditions of access.

{
"@id": "./",
"@type": "Dataset",
"conformsTo": {"@id": "https://w3id.org/ro/five-safes/0.1-DRAFT"},
"datePublisher": "2023-04-29T11:01:04+01:00",
"publisher": {"@id": "https://tre72.example.com/"},
"licence": {"@id": "http://spdx.org/licenses/CC-BY-4.0"},
"hasPart": […],
"mainEntity": {"@id": "https://workflowhub.eu/workflows/289?version=1"},
"mentions": […],
"sourceOrganization":
{"@id": "#project-be6ffb55-4f5a-4c14-b60e-47e0951090c70"}

},
{ "@id": "https://spdx.org/licenses/CC-BY-4.0",
"@type": "CreativeWork",
"name": "Creative Commons Attribution 4.0 International",
"identifier": "CC-BY-4.0"
}

TRE implementations MAY additionally record changes to the RO-Crate as it has progressed through the execution, by associating a CreateAction and subsequent UpdateAction to the root data entity as object.

Following the final update of the RO-Crate Metadata file and content, the BagIt payload manifests and tag manifests MUST be updated (as a minimum because the ro-crate-metadata.json has been modified). The result entity is NOT recorded, as ../manifest-sha512.txt would have escaped the RO-Crate root. The agent SHOULD delete manifest files it can’t re-generate. After the checksum calculation, the TRE SHOULD not do any further changes to the crate or BagIt files.

{ "@id": "#bagit-ce785c0b-c988-4043-8cbd-1489dcebc14f",
"type": "AssessAction",
"startTime": "2023-04-29T12:12:25+01:00",
"additionalType": {"@id": "https://w3id.org/shp#GenerateCheckValue"},
"name": "BagIt manifests of Crate updated",
"object": {"@id": "./"},
"instrument": {
"@id": "https://www.iana.org/assignments/named-information#sha-512"},
"agent": {"@id": "#validator-a4a66c63-2fe0-4c57-830d-268a40718313"},
"actionStatus": "http://schema.org/CompletedActionStatus"
},

{ "@id": "https://www.iana.org/assignments/named-information#sha-512",
"@type": "DefinedTerm",
"name": "sha-512 algorithm"
}

Note: This action must be written to the RO-Crate Metadata File before calculating the payload manifest, and therefore can’t include the correct endTime. The actionStatus should nevertheless reflect the status as if it has already completed. Likewise, the payload manifest must be calculated before updating the tag manifest, as it includes the checksum of the payload manifest.

Receiving phase

Clients receiving a Five-Safe Crate SHOULD check the BagIt manifest checksums similar to the Check phase, as well as the status of all the actions specified in this profile before further processing.

It is NOT sufficient for clients to check the publishing AssessAction, as TRE implementations are permitted to expose partial crates which have failed approval phases or which are in pending/execution state.

Clients MAY add additional post-processing data and/or metadata not specified in this profile to the crate (e.g. ReceiveAction), in which case they SHOULD maintain the BagIt manifests accordingly. Manifest checksums can be used to detect accidental local changes in post-processing.

Security considerations^[l]

It is RECOMMENDED that implementers apply strong access control before accepting a Five Safes Crate.

Allowing execution of any Workflow Crate effectively allows execution of arbitrary code. It is RECOMMENDED to check against a list of pre-approved workflows (see Sign-off phase), e.g. using file checksums or cryptographic signatures.

Clients parsing and unpacking ZIP files, JSON metadata, workflow definitions and BagIt manifests SHOULD apply reasonable security measures to limit the possibility of an attacker to consume excessive disk, CPU or memory resources, as well as escaping any file directory or execution container jails. For instance, clients ^[m]should check for invalid file path characters, relative paths or symbolic links escaping the crate, as well guard against zip bomb attacks.

Pre-approved workflows could be exploited by a malicious attacker if they, the workflow engine or their tools themselves have security vulnerabilities, e.g. by using hand-crafted input parameters that by-passes command line escapes. Workflow executions are not guaranteed to complete in a given timescale; sufficient timeouts and resource usage restrictions SHOULD therefore be applied by the workflow engine.

It is currently out of scope for this specification how to verify that Five Saves Crate was requested by the given person, or how to verify if the person has access to a particular TRE according to their Agreement policies. It is therefore RECOMMENDED that implementers check authentication and authorization of a submitted query and use strong encryption. Implementers SHOULD check that the @id and affiliation of the Requesting Agent and Responsible Project corresponds to the authentication, and MAY inject/overwrite client-submitted data.

Malicious clients submitting a Five Safes Crate may have included additional entities, properties and types, which may cause security concerns in an implementation. Implementers SHOULD sanity check inputs, including ensuring that all paths are relative within the bag or absolute URIs, and MUST remove references to any client-submitted AssessActions, as these could be used to bypass the TRE compliance process.

Malicious clients MAY attempt to reference URLs or IP addresses that are only accessible within a TRE. Implementers MUST perform any URL downloads (such as Workflow RO-Crates or container images) in a way that does not access the secured TRE network, e.g. from a Demilitarized Zone (DMZ) with a network firewall restricting access to the TRE, or through a proxying repository controlled by TRE administrators.

As an executed Five Saves Crate may be intended for publishing (possibly following an embargo period), it SHOULD NOT include sensitive data or security tokens within the metadata file or the BagIt archive (e.g. in configuration or log files); TREs SHOULD verify this in the Disclosure Phase. It is RECOMMENDED to use keychain services or time-limited security access tokens that can be assured to be expired before the Crate is published.

The crate MAY include references (e.g. S3 URIs) to sensitive data, in which case the implementation and executed workflow SHOULD protect against divulging sensitive information (directly or indirectly) in the File identifiers, use UUID v5 hashing ^[n][RFC4122] to hide sensitive identifiers. Note: predicable identifiers like patient-456 would still be vulnerable in such hashing due to iteration attacks.

Media type and profiles

When transferring a HTTP Five Safes Crate using HTTP, implementations SHOULD use the following HTTP headers for content-type and profile:

HTML landing pages that reference a Five Safes Crate SHOULD include Signposting using HTTP Link headers that refer to the Crate’s ZIP download and the RO-Crate profile:

Link: <https://example.com/query-12389.zip>; rel="item", type="application/zip"
Link: <https://w3id.org/ro/crate>; rel="profile"; type="application/zip";
anchor="https://example.com/query-12389.zip"

Implementations MAY also provide direct public access to the RO-Crate metadata file, in which case they SHOULD follow the RO-Crate media type recommendations for JSON-LD, in which case it is RECOMMENDED to convert the metadata file to Detached RO-Crate by establishing a base URI based on the BagIt External-Identifier UUID (e.g. arcp://uuid,9796155a-fe44-4614-89b8-71945f718ffb/).

References

[a]It appears to me that to support Remote Analysis, in a manner compliant with the Five Safes, three things need to be in place, and mutually aligned:

- Researcher provides a precise (and verifiable) description of the Analysis Requested;

- TRE retains a description of the Context in which Analysis will be performed;

- TRE maintains Analysis Platforms which can validate and execute Analysis Request, in the correct Context.

[b]Poll now over name:

- TRE-FX Crate (https://trefx.uk/tre-fx-crate)

- TRE Crate (https://trefx.uk//tre-crate)

- Five Safes Crate (https://trefx.uk/five-safes-crate)

- Trusted Request Crate (https://trefx.uk/trusted-request-crate)

- Request Crate (https://trefx.uk/request-crate)

- Trusted Workflow Run Crate (https://trefx.uk/trusted-wfrun-crate)

Vote in Slack: https://elixir-uknode.slack.com/archives/C04PCSKV6NL/p1683713456367009

[c]Would it be correct to assume in the case of an analysis request is sent to multiple TREs each would have the same "external-Identifier". I was assuming that each request would need to be slightly rewritten, for example, change the researcherID to match the TRE naming scheme. So would contain an "analysis request identifier", which would be associated with a number of RO_Crates with slightly different content and different "external-Identifier"

[d]I think we would need a bit more in the crate to handle federation with handovers as well as indicating what is the 'next' step that this particular TRE is meant to do. Or perhaps we will do nested crates for that? It depends how we represent the "master" workflow across TREs.

[e]Note: Currently 1.2-DRAFT

https://www.researchobject.org/ro-crate/1.2-DRAFT/ due for release in a few weeks.

[f]I would expect the researcher (agent) would need to specify which project they are requesting on behalf of. As a researcher could be part of multiple projects, each with distinct access rights.

[h]Does this mean that any employee is permitted to see the results. I would expect the granularity of affiliation would be much smaller, and match the TRE's access control group.

[j]perhaps the person should also be a "memberOf" the Project, if that represents TRE access rights, rather than indirectly through the person's affiliation. If I get access on behalf of UNIMAN to a TRE, anyone else at UNIMAN won't necessarily have access. This should of course be checked to be genuine in later phases.

[l]As part of the Pioneer manual results checking process, is to ensure the results aren't disclosive. It appears two of the inputs to this process are a Data Sharing Agreement and a Data Request Form: https://www.pioneerdatahub.co.uk/data/data-request-form/

Should the Request RO-Crate contain a reference to the applicable DSA and the contains of the DRF?

I would argue that this means the Results RO-Crate should contain the Request RO-Crate.

[n]Perhaps content-based identifier is better as they would leak nothing except the checksum of the output file -- and would be easier to set up reverse mapping for on the file server side.

Naming Things With Hashes:

https://www.rfc-editor.org/rfc/rfc6920.html

Overview

Archive serialisation

BagIt expectations

Zip expectations

Metadata file expectations

Profile conformance

Referencing a Workflow Crate

Finding the RO-Crate archive

Example:

Requested Workflow Run

Example:

Requesting Agent

Responsible Project

Agreement Data

Inputs

Outputs

Sensitive data

{

"@id": "urn:uuid:07b81e0f-7ac4-5428-9940-878b241e2397",

"@type": "DigitalDocument",

"encodingFormat": "text/csv",

"name": "Patient measurement 07b81e0f-7ac4-5428-9940-878b241e2397",

"hasDigitalDocumentPermission": {"@id": "#permissions-07b81e0f"},

},

{ "@id": "#permissions-07b81e0f",

"@type": "DigitalDocumentPermission",

"permissionType": "http://schema.org/ReadPermission",

"grantee": { "@id": "#project-be6ffb55-4f5a-4c14-b60e-47e0951090c70"}

Review process

Check phase

Validation phase

Workflow retrieval phase

Sign-off phase

Workflow execution phase

Execution states

Example

Disclosure phase

Publishing phase

Receiving phase

Security considerations[l]

Media type and profiles

References

Security considerations^[l]