Workflow Publishing and RO-Crate Export¶
LabID provides comprehensive workflow publishing capabilities that enable sharing workflows through standardized formats and external platforms. The system generates Research Object Crates (RO-Crates) and integrates with WorkflowHub for community sharing.
Publishing Overview¶
Core Publishing Concepts¶
LabID supports two primary publishing mechanisms:
- RO-Crate Export: Generate standardized workflow packages
- WorkflowHub Publishing: Direct publication to WorkflowHub platforms
Both mechanisms ensure workflows are packaged with complete metadata, provenance information, and all necessary files for reproducibility.
Publishing Prerequisites¶
Before publishing a workflow:
- Committed Version: Only released WorkflowVersions can be published
- MAIN File: Must have exactly one file designated as MAIN type
- Computer Language: Workflow language must be specified
- Complete Metadata: Proper workflow metadata and descriptions
RO-Crate Generation¶
What is RO-Crate?¶
Research Object Crate (RO-Crate) is a community standard for packaging research data with metadata using JSON-LD and schema.org vocabulary. LabID generates RO-Crates that comply with the Workflow RO-Crate profile.
Export Types¶
LabID supports two RO-Crate export types:
Minimal Export (Default)¶
- WorkflowFiles Only: Includes only files registered as WorkflowFiles
- Curated Content: Only user-selected and annotated files
- Smaller Size: Optimized for sharing essential workflow components
- Use Case: Publishing core workflow logic and documentation
Full Export¶
- Complete Repository: Includes all files in the Git repository
- Comprehensive: Everything from the workflow's Git repository
- Larger Size: May include development artifacts and auxiliary files
- Use Case: Complete workflow preservation and archival
RO-Crate Structure¶
Generated RO-Crates include:
workflow-rocrate.zip
├── ro-crate-metadata.json # JSON-LD metadata
├── ro-crate-preview.html # Human-readable preview
├── main_workflow_file # Primary workflow (MAIN type)
├── config_files/ # Configuration files
├── test_files/ # Test files and examples
├── README.md # Documentation
├── LICENSE # License information
└── additional_files/ # Other workflow files
Metadata Included¶
Each RO-Crate contains comprehensive metadata:
{
"@context": "https://w3id.org/workflowhub/workflow-ro-crate/1.0/context",
"@graph": [
{
"@id": "./",
"@type": "Dataset",
"name": "Workflow Name",
"description": "Workflow description",
"license": "MIT",
"creator": {
"@id": "#creator",
"@type": "Person",
"name": "Creator Name",
"affiliation": "Institution"
},
"mainEntity": {
"@id": "main_workflow.nf",
"@type": ["File", "SoftwareSourceCode", "ComputationalWorkflow"],
"programmingLanguage": "Nextflow"
}
}
]
}
API Endpoints¶
Export RO-Crate¶
Parameters:
type: Export type (minimalorfull)
Response: Binary RO-Crate ZIP file
Requirements:
- WorkflowVersion must be committed (
commit_hashpresent) - MAIN file must be designated
- Computer language must be set
Publish to WorkflowHub¶
PUT /api/v2/workflows/workflowversions/{id}/publish_to_workflowhub/
Content-Type: application/json
{
"project": "project_id",
"type": "full"
}
Parameters:
project: WorkflowHub project ID (required)type: Export type (minimalorfull)
Process:
- Validates user permissions for the specified project
- Generates RO-Crate
- Uploads to WorkflowHub
- Creates PublishedWorkflow record
RO-Crate Generation Process¶
1. Validation Phase¶
def generate_research_object_crate(self, export_type="minimal"):
# Validate prerequisites
if not self.computer_language:
raise ValidationError("Computer language not set")
if not self.get_main_file():
raise ValidationError("No main file found")
2. Repository Checkout¶
# Create temporary checkout at specific commit
tmpdir = self.workflow.repository.create_tmp_checkout(
commit_hexsha=self.commit_hash
)
3. Language Detection¶
def _get_language_for_rocrate(self):
"""Map LabID workflow language to RO-Crate language object"""
if LANG_MAP.get(self.computer_language.name):
return LANG_MAP[self.computer_language.name]
# Create custom language definition for unsupported languages
return {
"identifier": f"#{lang_id}",
"properties": {
"name": term.name,
"identifier": {"@id": url},
"url": {"@id": url}
}
}
4. Crate Construction¶
def _build_rocrate(self, tmpdir, main_file, lang, license_text, export_type):
crate = ROCrate()
crate.name = self.label
crate.description = self.description
crate.license = license_text
# Add creator information
person = crate.add(Person(
crate,
self.owner.userprofile.orcid,
properties={
"name": self.owner.full_name,
"affiliation": self.owner.get_affiliation()
}
))
# Add main workflow
crate.add_workflow(
source=abs_file_path,
dest_path=main_file.name,
cls=ComputationalWorkflow,
lang=lang,
properties=workflow_properties,
main=True
)
# Add workflow files
for wf_file in self.workflowfiles.all():
crate.add_file(abs_file_path, dest_path=wf_file.file_path)
# For full export, add all repository files
if export_type == "full":
for file_path in _get_all_files_recursively(tmpdir):
crate.add_file(file_path, dest_path=rel_path)
5. Validation¶
def _validate_rocrate(self, rocrate_path):
"""Validate against RO-Crate standards"""
rov_settings = rov_services.ValidationSettings(
rocrate_uri=rocrate_path,
profile_identifier="ro-crate-1.1",
requirement_severity=rov_models.Severity.REQUIRED
)
rov_result = rov_services.validate(rov_settings)
return not rov_result.has_issues()
WorkflowHub Integration¶
Authentication¶
WorkflowHub publishing requires user authentication:
Users must configure their WorkflowHub API token in their LabID profile.
Project Validation¶
wfh_project_details = fetch_workflowhub_project(
wfh_project, user_token, settings.DEFAULT_WORKFLOWHUB
)
The system validates that the user has write access to the specified WorkflowHub project.
Publication Process¶
- Generate RO-Crate: Create standardized workflow package
- Upload to WorkflowHub: POST RO-Crate to WorkflowHub API
- Create Record: Store PublishedWorkflow record in LabID
- Link Provenance: Maintain connection between LabID and WorkflowHub
PublishedWorkflow Model¶
class PublishedWorkflow(models.Model):
name = models.CharField(max_length=200)
url = models.URLField() # WorkflowHub URL
workflowrepository = models.ForeignKey(WorkflowRepository)
workflowversion = models.ForeignKey(WorkflowVersion)
created_by = models.ForeignKey(User)
This model tracks which WorkflowVersions have been published where.
Supported Workflow Languages¶
Built-in Language Support¶
LabID includes built-in support for major workflow languages:
LANG_MAP = {
"CWL": cwl,
"Galaxy": galaxy,
"Nextflow": nextflow,
"Snakemake": snakemake,
"KNIME": knime,
# ... additional languages
}
Custom Language Support¶
For unsupported languages, LabID creates custom language definitions:
# Custom language with proper metadata
lang_obj = ComputerLanguage(
crate,
identifier=lang["identifier"],
properties=lang["properties"]
)
File Type Mapping¶
RO-Crate File Roles¶
WorkflowFiles are mapped to appropriate RO-Crate roles:
- MAIN → Primary workflow entity (
mainEntity) - README → Documentation (
about: {"@id": "./"}) - LICENSE → License information
- CONFIG/PARAMETER → Configuration entities
- TEST → Test entities
- OTHER → General files
Special File Handling¶
# README files get special properties
if wf_file.data_type == README:
properties = {"about": {"@id": "./"}}
# Main workflow gets comprehensive metadata
workflow_properties = {
"name": self.label,
"description": self.description,
"creator": person,
"dateCreated": current_time_as_string(),
"programmingLanguage": self.workflow.workflow_manager,
"url": convert_git_to_valid_url(self.workflow.remote_repository)
}
License Detection and Handling¶
Automatic License Detection¶
LabID automatically detects licenses using ScanCode:
from scancode.cli import run_scan
license_file = self.get_license()
if license_file:
success, result = run_scan(license_file, license=True)
if success:
self.license = result["license_detections"][0]["license_expression_spdx"]
License Validation¶
def validate_license(license_value):
"""Validate and normalize license information"""
# Maps various license formats to SPDX identifiers
# Handles common license variations and aliases
Error Handling and Troubleshooting¶
Common Publishing Errors¶
| Error Message | Cause | Solution |
|---|---|---|
| "Computer language not set" | WorkflowVersion missing computer_language | Set the workflow language in the version metadata |
| "No main file found" | No WorkflowFile designated as MAIN type | Designate exactly one file as MAIN type |
| "This version has not been released yet" | Attempting to publish uncommitted version | Commit the WorkflowVersion first |
| "Already published" | WorkflowVersion already published to target repository | Remove existing publication or publish to different repository |
Validation Failures¶
RO-Crate validation may fail due to:
- Missing required metadata
- Invalid file references
- Non-compliant JSON-LD structure
Debug Information¶
# Check publishing prerequisites
workflowversion.is_committed # Must be True
workflowversion.computer_language # Must be set
workflowversion.get_main_file() # Must exist
# Validate RO-Crate
rov_result = rov_services.validate(rov_settings)
if rov_result.has_issues():
logger.critical("ROCrate validation issues: %s", rov_result)
Best Practices¶
Before Publishing¶
- Complete Metadata: Ensure all required fields are filled
- File Organization: Properly categorize all WorkflowFiles
- Documentation: Include README and LICENSE files
- Testing: Validate workflow functionality
- Version Release: Commit the WorkflowVersion
Export Strategy¶
- Minimal for Sharing: Use minimal export for community sharing
- Full for Archival: Use full export for complete preservation
- Metadata Quality: Ensure high-quality descriptions and metadata
- License Clarity: Use clear, standard license identifiers
WorkflowHub Publishing¶
- Project Selection: Choose appropriate WorkflowHub project
- Permissions: Verify write access to target project
- Avoid Duplicates: Check for existing publications
- Update Strategy: Plan for workflow updates and versioning
Integration with External Systems¶
WorkflowHub API¶
LabID integrates with WorkflowHub's REST API:
response = requests.post(
urljoin(base_url, "workflows"),
files={
"ro_crate": (rocrate_zip, open(rocrate_zip, "rb")),
"workflow[project_ids][]": (None, project_id)
},
headers={"authorization": f"Token {user_token}"}
)
Galaxy WorkflowInvocation Integration¶
LabID integrates with Galaxy for importing workflow invocations as WorkflowRuns using the bioblend library:
from bioblend import galaxy
# User credentials stored in user profile
gi = galaxy.GalaxyInstance(url=galaxy_url, key=user_galaxy_settings[galaxy_url]["key"])
# Fetch invocation details
url = urljoin(gi.base_url, "api/invocations/")
params = {"job_id": job_id, "key": gi.key}
invocation_details = requests.get(url, params=params).json()
Authentication: Users must configure their Galaxy API tokens in their LabID profile settings. The system uses these stored credentials to authenticate with Galaxy instances.
Process: Galaxy WorkflowInvocations are imported as LabID WorkflowRuns, with the option to automaticly associate datasets for inputs that exist in LabID with matching Galaxy annotations.
Standards Compliance¶
- RO-Crate 1.1: Full compliance with RO-Crate specification
- Workflow RO-Crate: Implements workflow-specific profile
- Schema.org: Uses schema.org vocabulary for metadata
- JSON-LD: Structured data in JSON-LD format