BioDT RO-Crate Metadata Profiles Documentation

RO-Crate profiles for linking the different types of digital objects that compose a digital twin, such as datasets, models and workflows

Workflows might be one of the less intuitive FDOs and can be understood in different ways. Nonetheless, they play an important role in bringing the different components of a digital twin together in an automated way. The distinction between a software script and a workflow can be fuzzy, but it’s likely that scripts are more prevalent (even if they are fulfilling the role of a workflow). We follow RO-Crate’s guidelines for distinguishing between scripts and workflows:

Here are some indicators for when a script should be considered a workflow:

  • It performs a series of steps (pipeline)
  • The executed steps are mainly external tools or services
  • The main work is performed by the steps (script is not algorithmic)
  • The steps exchange data in a dataflow, typically file inputs/outputs
  • The script has well-defined inputs and outputs, e.g. file arguments

Here are some counter-indicators for when a script might not be a workflow:

  • The script contains mainly algorithms or logic
  • Data is exchanged out of bands, e.g. a SQL database
  • The script relies on a particular state of the system (e.g. appends existing files)
  • An interactive user interface that controls the actions
Computational Workflows

Following RO-Crate’s guidelines, we are using Bioschema’s ComputationalWorkflow as the base profile for workflows. They include mandatory, recommended and optional attributes. From the mandatory ones, some of them are already part of the Kernel Attributes. The remaining ones that are mandatory but exclusive to workflows are shown below.

RO-Crate requires alignment with ComputationalWorkflow, but it also extends it with some additional features in the Workflow Profile. These are mostly specific to how the RO-Crate should be structured. Lastly, there are some more specialised profiles, like those from Workflow Run RO-Crate. These go beyond the general RO-Crate guidelines and include 3 separate profiles for capturing the provenance of an execution of a computational workflow with increasing granularity:

From the previous three profiles, Workflow Run Crate is probably the right balance. Process doesn’t necessarily align with the ComputationalWorkflow, while Provenance adds too much detailed and and might hinder implementation.

Moreover, the finer details of the workflow execution might need to be captured through the LEXIS platform, which relies on Apache Airflow. In order to achieve interoperability between the different workflow systems mentioned so far, we will use the Common Workflow Language (CWL) as a central translation point, since it is well supported in the RO-Crate ecosystem and also through different tools (see cwl-airflow for a connector from CWL to Airflow).

Workflow Attributes

Format: the name of each metadata attribute includes a link to Schema.org (or another vocabulary to which the property belongs), and is followed by a definition in the “Description” line. “Type” indicates the expected values for each property, while “Cardinality” specifies the amount and whether they are optional or mandatory. Lastly, additional remarks might be added as “Comments”, and an example is given in the final line.


input


output


version


programmingLanguage


url


sdPublisher


Apart from the mandatory attributes above, the ComputationalWorkflow profile from Bioschemas includes many other recommended and optional attributes that add further details. Likewise, the Workflow Run Crate provide a more fitting profile for capturing the execution of a workflow. We refer to the official profiles for those.

Example Metadata File (ro-crate-metadata.json)

{
    "@context": ["https://w3id.org/ro/crate/1.1/context"],
    "@graph": [
        {
            "@type": "CreativeWork",
            "@id": "ro-crate-metadata.json",
            "about": {
                "@id": "./"
            },
            "conformsTo": {
                "@id": "https://w3id.org/ro/crate/1.1"
            }
        },
        {
            "@id": "https://github.com/uio-mana/CWR-Hackathon/tree/main/ModGP",
            "@type": "Dataset",
            "hasPart": [{ "@id": "ModGP.R" }]
        },
        {
            "@id": "ModGP.R",
            "@type": ["File", "SoftwareSourceCode", "ComputationalWorkflow"],
            "conformsTo": "https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE",
            "name": "ModGP.R",
            "description": "BioDT CWR - ModGP",
            "dateCreated": "2023-01-23",
            "license": {
                "@id": "https://creativecommons.org/licenses/by/4.0/"
            },
            "creator": [
                {
                    "@id": "https://orcid.org/0000-0002-4984-7646"
                },
                {
                    "@id": "https://orcid.org/0000-0002-8045-6950"
                }
            ],
            "keywords": "SDM, crop wild relatives, genomics",
            "url": "https://github.com/uio-mana/CWR-Hackathon/blob/main/ModGP",
            "programmingLanguage": { "@id": "#R" },
            "version": "0.1.0",
            "sdPublisher": "#workflow-repo",
            "input": [
                {
                    "@id": "X - Functions-Data.R"
                },
                {
                    "@id": "X - Functions-Outputs.R"
                },
                {
                    "@id": "X - Functions-SDM.R"
                }
            ],
            "output": { "@id": "#Species distribution output" }
        },
        {
            "@id": "X - Functions-Data.R",
            "@type": ["File", "SoftwareSourceCode"],
            "name": "GBIF data download functionality",
            "description": "Bioclimativ Variable Climatology creation for qsoil1 anbd qsoil2 combined",
            "programmingLanguage": { "@id": "#R" }
        },
        {
            "@id": "X - Functions-Outputs.R",
            "@type": ["File", "SoftwareSourceCode"],
            "name": "SDM Output",
            "description": "SDM Output Visualisation and posthoc summaries",
            "programmingLanguage": { "@id": "#R" }
        },
        {
            "@id": "X - Functions-SDM.R",
            "@type": ["File", "SoftwareSourceCode"],
            "name": "Species Distribution Model (SDM)",
            "description": "SDM Functionality: Data Preparation and Model Execution",
            "programmingLanguage": { "@id": "#R" }
        },
        {
            "@id": "#Species distribution output",
            "@type": "Dataset",
            "name": "Lathyrus aphaca distribution",
            "description": "ModGP output for Lathyrus aphaca",
            "studySubject": ["http://eurovoc.europa.eu/632"]
        },
        {
            "@id": "#workflow-repo",
            "@type": "Organization",
            "name": "WorkflowHub space for BioDT CWR pDT",
            "url": "https://workflowhub.eu/projects/133"
        },
        {
            "@type": "Person",
            "@id": "https://orcid.org/0000-0002-8045-6950",
            "name": "Desalegn Chala Gelete",
            "affiliation": { "@id": "https://ror.org/01xtthb56" }
        },
        {
            "@id": "https://orcid.org/0000-0002-4984-7646",
            "@type": "Person",
            "name": "Erik Kusch",
            "affiliation": { "@id": "https://ror.org/01xtthb56" }
        },
        {
            "@id": "https://ror.org/01xtthb56",
            "@type": "Organization",
            "name": "University of Oslo",
            "url": "http://www.uio.no/english/"
        },
        {
            "@id": "#R",
            "@type": "ProgrammingLanguage",
            "name": "R",
            "url": "https://www.r-project.org/about.html",
            "version": "4.3.2"
        },
        {
            "@id": "https://creativecommons.org/licenses/by/4.0/",
            "@type": "CreativeWork",
            "name": "Creative Commons Attribution 4.0 International",
            "description": "You are free to:\nShare — copy and redistribute the material in any medium or format for any purpose, even commercially.\nAdapt — remix, transform, and build upon the material for any purpose, even commercially.\nThe licensor cannot revoke these freedoms as long as you follow the license terms."
        }
    ]
}