Skip to content

Workflow Publishing and RO-Crate Export

LabID provides comprehensive workflow publishing capabilities that enable sharing workflows through standardized formats and external platforms. The system generates Research Object Crates (RO-Crates) and integrates with WorkflowHub for community sharing.

Publishing Overview

Core Publishing Concepts

LabID supports two primary publishing mechanisms:

  1. RO-Crate Export: Generate standardized workflow packages
  2. WorkflowHub Publishing: Direct publication to WorkflowHub platforms

Both mechanisms ensure workflows are packaged with complete metadata, provenance information, and all necessary files for reproducibility.

Publishing Prerequisites

Before publishing a workflow:

  • Committed Version: Only released WorkflowVersions can be published
  • MAIN File: Must have exactly one file designated as MAIN type
  • Computer Language: Workflow language must be specified
  • Complete Metadata: Proper workflow metadata and descriptions

RO-Crate Generation

What is RO-Crate?

Research Object Crate (RO-Crate) is a community standard for packaging research data with metadata using JSON-LD and schema.org vocabulary. LabID generates RO-Crates that comply with the Workflow RO-Crate profile.

Export Types

LabID supports two RO-Crate export types:

Minimal Export (Default)

  • WorkflowFiles Only: Includes only files registered as WorkflowFiles
  • Curated Content: Only user-selected and annotated files
  • Smaller Size: Optimized for sharing essential workflow components
  • Use Case: Publishing core workflow logic and documentation

Full Export

  • Complete Repository: Includes all files in the Git repository
  • Comprehensive: Everything from the workflow's Git repository
  • Larger Size: May include development artifacts and auxiliary files
  • Use Case: Complete workflow preservation and archival

RO-Crate Structure

Generated RO-Crates include:

workflow-rocrate.zip
├── ro-crate-metadata.json    # JSON-LD metadata
├── ro-crate-preview.html     # Human-readable preview
├── main_workflow_file        # Primary workflow (MAIN type)
├── config_files/             # Configuration files
├── test_files/               # Test files and examples
├── README.md                 # Documentation
├── LICENSE                   # License information
└── additional_files/         # Other workflow files

Metadata Included

Each RO-Crate contains comprehensive metadata:

{
  "@context": "https://w3id.org/workflowhub/workflow-ro-crate/1.0/context",
  "@graph": [
    {
      "@id": "./",
      "@type": "Dataset",
      "name": "Workflow Name",
      "description": "Workflow description",
      "license": "MIT",
      "creator": {
        "@id": "#creator",
        "@type": "Person",
        "name": "Creator Name",
        "affiliation": "Institution"
      },
      "mainEntity": {
        "@id": "main_workflow.nf",
        "@type": ["File", "SoftwareSourceCode", "ComputationalWorkflow"],
        "programmingLanguage": "Nextflow"
      }
    }
  ]
}

API Endpoints

Export RO-Crate

GET /api/v2/workflows/workflowversions/{id}/export/?type=full

Parameters:

  • type: Export type (minimal or full)

Response: Binary RO-Crate ZIP file

Requirements:

  • WorkflowVersion must be committed (commit_hash present)
  • MAIN file must be designated
  • Computer language must be set

Publish to WorkflowHub

PUT /api/v2/workflows/workflowversions/{id}/publish_to_workflowhub/
Content-Type: application/json

{
  "project": "project_id",
  "type": "full"
}

Parameters:

  • project: WorkflowHub project ID (required)
  • type: Export type (minimal or full)

Process:

  1. Validates user permissions for the specified project
  2. Generates RO-Crate
  3. Uploads to WorkflowHub
  4. Creates PublishedWorkflow record

RO-Crate Generation Process

1. Validation Phase

def generate_research_object_crate(self, export_type="minimal"):
    # Validate prerequisites
    if not self.computer_language:
        raise ValidationError("Computer language not set")

    if not self.get_main_file():
        raise ValidationError("No main file found")

2. Repository Checkout

# Create temporary checkout at specific commit
tmpdir = self.workflow.repository.create_tmp_checkout(
    commit_hexsha=self.commit_hash
)

3. Language Detection

def _get_language_for_rocrate(self):
    """Map LabID workflow language to RO-Crate language object"""
    if LANG_MAP.get(self.computer_language.name):
        return LANG_MAP[self.computer_language.name]

    # Create custom language definition for unsupported languages
    return {
        "identifier": f"#{lang_id}",
        "properties": {
            "name": term.name,
            "identifier": {"@id": url},
            "url": {"@id": url}
        }
    }

4. Crate Construction

def _build_rocrate(self, tmpdir, main_file, lang, license_text, export_type):
    crate = ROCrate()
    crate.name = self.label
    crate.description = self.description
    crate.license = license_text

    # Add creator information
    person = crate.add(Person(
        crate,
        self.owner.userprofile.orcid,
        properties={
            "name": self.owner.full_name,
            "affiliation": self.owner.get_affiliation()
        }
    ))

    # Add main workflow
    crate.add_workflow(
        source=abs_file_path,
        dest_path=main_file.name,
        cls=ComputationalWorkflow,
        lang=lang,
        properties=workflow_properties,
        main=True
    )

    # Add workflow files
    for wf_file in self.workflowfiles.all():
        crate.add_file(abs_file_path, dest_path=wf_file.file_path)

    # For full export, add all repository files
    if export_type == "full":
        for file_path in _get_all_files_recursively(tmpdir):
            crate.add_file(file_path, dest_path=rel_path)

5. Validation

def _validate_rocrate(self, rocrate_path):
    """Validate against RO-Crate standards"""
    rov_settings = rov_services.ValidationSettings(
        rocrate_uri=rocrate_path,
        profile_identifier="ro-crate-1.1",
        requirement_severity=rov_models.Severity.REQUIRED
    )
    rov_result = rov_services.validate(rov_settings)
    return not rov_result.has_issues()

WorkflowHub Integration

Authentication

WorkflowHub publishing requires user authentication:

user_token = request.user.get_workflowhub_token()

Users must configure their WorkflowHub API token in their LabID profile.

Project Validation

wfh_project_details = fetch_workflowhub_project(
    wfh_project, user_token, settings.DEFAULT_WORKFLOWHUB
)

The system validates that the user has write access to the specified WorkflowHub project.

Publication Process

  1. Generate RO-Crate: Create standardized workflow package
  2. Upload to WorkflowHub: POST RO-Crate to WorkflowHub API
  3. Create Record: Store PublishedWorkflow record in LabID
  4. Link Provenance: Maintain connection between LabID and WorkflowHub

PublishedWorkflow Model

class PublishedWorkflow(models.Model):
    name = models.CharField(max_length=200)
    url = models.URLField()  # WorkflowHub URL
    workflowrepository = models.ForeignKey(WorkflowRepository)
    workflowversion = models.ForeignKey(WorkflowVersion)
    created_by = models.ForeignKey(User)

This model tracks which WorkflowVersions have been published where.

Supported Workflow Languages

Built-in Language Support

LabID includes built-in support for major workflow languages:

LANG_MAP = {
    "CWL": cwl,
    "Galaxy": galaxy,
    "Nextflow": nextflow,
    "Snakemake": snakemake,
    "KNIME": knime,
    # ... additional languages
}

Custom Language Support

For unsupported languages, LabID creates custom language definitions:

# Custom language with proper metadata
lang_obj = ComputerLanguage(
    crate,
    identifier=lang["identifier"],
    properties=lang["properties"]
)

File Type Mapping

RO-Crate File Roles

WorkflowFiles are mapped to appropriate RO-Crate roles:

  • MAIN → Primary workflow entity (mainEntity)
  • README → Documentation (about: {"@id": "./"})
  • LICENSE → License information
  • CONFIG/PARAMETER → Configuration entities
  • TEST → Test entities
  • OTHER → General files

Special File Handling

# README files get special properties
if wf_file.data_type == README:
    properties = {"about": {"@id": "./"}}

# Main workflow gets comprehensive metadata
workflow_properties = {
    "name": self.label,
    "description": self.description,
    "creator": person,
    "dateCreated": current_time_as_string(),
    "programmingLanguage": self.workflow.workflow_manager,
    "url": convert_git_to_valid_url(self.workflow.remote_repository)
}

License Detection and Handling

Automatic License Detection

LabID automatically detects licenses using ScanCode:

from scancode.cli import run_scan

license_file = self.get_license()
if license_file:
    success, result = run_scan(license_file, license=True)
    if success:
        self.license = result["license_detections"][0]["license_expression_spdx"]

License Validation

def validate_license(license_value):
    """Validate and normalize license information"""
    # Maps various license formats to SPDX identifiers
    # Handles common license variations and aliases

Error Handling and Troubleshooting

Common Publishing Errors

Error Message Cause Solution
"Computer language not set" WorkflowVersion missing computer_language Set the workflow language in the version metadata
"No main file found" No WorkflowFile designated as MAIN type Designate exactly one file as MAIN type
"This version has not been released yet" Attempting to publish uncommitted version Commit the WorkflowVersion first
"Already published" WorkflowVersion already published to target repository Remove existing publication or publish to different repository

Validation Failures

RO-Crate validation may fail due to:

  • Missing required metadata
  • Invalid file references
  • Non-compliant JSON-LD structure

Debug Information

# Check publishing prerequisites
workflowversion.is_committed  # Must be True
workflowversion.computer_language  # Must be set
workflowversion.get_main_file()  # Must exist

# Validate RO-Crate
rov_result = rov_services.validate(rov_settings)
if rov_result.has_issues():
    logger.critical("ROCrate validation issues: %s", rov_result)

Best Practices

Before Publishing

  1. Complete Metadata: Ensure all required fields are filled
  2. File Organization: Properly categorize all WorkflowFiles
  3. Documentation: Include README and LICENSE files
  4. Testing: Validate workflow functionality
  5. Version Release: Commit the WorkflowVersion

Export Strategy

  1. Minimal for Sharing: Use minimal export for community sharing
  2. Full for Archival: Use full export for complete preservation
  3. Metadata Quality: Ensure high-quality descriptions and metadata
  4. License Clarity: Use clear, standard license identifiers

WorkflowHub Publishing

  1. Project Selection: Choose appropriate WorkflowHub project
  2. Permissions: Verify write access to target project
  3. Avoid Duplicates: Check for existing publications
  4. Update Strategy: Plan for workflow updates and versioning

Integration with External Systems

WorkflowHub API

LabID integrates with WorkflowHub's REST API:

response = requests.post(
    urljoin(base_url, "workflows"),
    files={
        "ro_crate": (rocrate_zip, open(rocrate_zip, "rb")),
        "workflow[project_ids][]": (None, project_id)
    },
    headers={"authorization": f"Token {user_token}"}
)

Galaxy WorkflowInvocation Integration

LabID integrates with Galaxy for importing workflow invocations as WorkflowRuns using the bioblend library:

from bioblend import galaxy

# User credentials stored in user profile
gi = galaxy.GalaxyInstance(url=galaxy_url, key=user_galaxy_settings[galaxy_url]["key"])

# Fetch invocation details
url = urljoin(gi.base_url, "api/invocations/")
params = {"job_id": job_id, "key": gi.key}
invocation_details = requests.get(url, params=params).json()

Authentication: Users must configure their Galaxy API tokens in their LabID profile settings. The system uses these stored credentials to authenticate with Galaxy instances.

Process: Galaxy WorkflowInvocations are imported as LabID WorkflowRuns, with the option to automaticly associate datasets for inputs that exist in LabID with matching Galaxy annotations.

Standards Compliance

  • RO-Crate 1.1: Full compliance with RO-Crate specification
  • Workflow RO-Crate: Implements workflow-specific profile
  • Schema.org: Uses schema.org vocabulary for metadata
  • JSON-LD: Structured data in JSON-LD format