Skip to content

Workflow System Administration

Administrative setup and configuration of LabID's workflow system.

System Requirements

Storage Requirements:

  • Git repositories (each workflow gets its own repository)
  • Temporary files for RO-Crate generation and import operations

System Dependencies:

  • SSH (for accessing privately hosted Git repositories)
  • Git 2.20+, GitPython 3.1.40+, rocrate 0.9.0+, rocrate-validator 0.5.1+ (automatically managed via conda)

Core Configuration

Essential environment variables:

# Git Repository Storage
export DJANGO_GIT_REPOSITORY_LOCATION="/app/labid/storage/repositories"
export DJANGO_GIT_REPOSITORY_LOCATION_TMP="/app/labid/storage/repositories/tmp/"

# Git Command Configuration
export DJANGO_GIT_SSH_CONFIG_FILE="/path/to/settings/git_ssh_config"
export DJANGO_GIT_SSH_COMMAND="/path/to/scripts/utils/bash/ssh.sh -F /path/to/git_ssh_config"
export DJANGO_GIT_CONFIG_GLOBAL="/path/to/settings/.gitconfig"

For SSH key setup, see SSH Keys Configuration.

Galaxy Integration

Configure Galaxy integration for workflow import:

Only one Galaxy instance possible

Currently only one Galaxy instance can be used. This is the primary one configured by the GALAXY_URL setting.

# Primary Galaxy instance
export DJANGO_GALAXY_URL="http://localhost:9090"

# Galaxy Endpoints Configuration
export DJANGO_GALAXY_ENDPOINTS='{
  "http://localhost:9090": {
    "url": "http://localhost:9090",
    "label": "Galaxy Local",
    "is_default": true,
    "data_library_sync": true,
    "import_workflows": true
  },
  "https://usegalaxy.eu": {
    "url": "https://usegalaxy.eu",
    "label": "Galaxy Europe",
    "is_default": false,
    "data_library_sync": false,
    "import_workflows": true
  }
}'

Endpoint Properties:

  • is_default: Whether this is the default Galaxy instance (only one should be true)
  • data_library_sync: Enable data library synchronization with this instance
  • import_workflows: Allow workflow import from this instance

For technical implementation details, see the developer documentation.

WorkflowHub Integration

Configure WorkflowHub integration:

Only one WorkflowHub instance possible

Currently only one WorkflowHub instance can be used for publishing workflows.

# Primary WorkflowHub instance
export DJANGO_DEFAULT_WORKFLOWHUB="https://workflowhub.eu"

# WorkflowHub repositories configuration
export DJANGO_DEFAULT_PUBLIC_WORKFLOWREPOSITORIES='{
  "https://workflowhub.eu": {
    "label": "WorkflowHub",
    "url": "https://workflowhub.eu",
    "api_url": "https://workflowhub.eu",
    "trs": {
      "info": "https://workflowhub.eu/ga4gh/trs/v2/service-info",
      "search": "https://workflowhub.eu/ga4gh/trs/v2/tools"
    }
  }
}'

User Credentials:

User credentials are stored encrypted in UserProfile.attributes with a cryptographic key created on first server start.

Preserve Encryption Key

This encryption key must be preserved as removing it would render stored API keys unusable.

For technical implementation details, see the developer documentation.

RO-Crate Generation

LabID generates Research Object Crates (RO-Crates) synchronously when requested through the API.

Export Types:

  • Minimal: Includes only WorkflowFiles (main workflow file, README, LICENSE, etc.)
  • Full: Includes all files from the Git repository

API Usage:

GET /api/workflows/versions/{id}/export/?type=full

Storage:

Temporary RO-Crates are stored at {STORAGE_ROOT}/rocrates/tmp

Process:

  1. Validates workflow version (computer language and main file required)
  2. Creates temporary repository checkout
  3. Processes files and creates metadata
  4. Validates RO-Crate against standard profile
  5. Returns ZIP file for download

Troubleshooting

Common Issues

SSH Key Issues: See SSH Keys Configuration for connectivity testing and troubleshooting.

Storage Space Issues:

# Check repository sizes
du -sh /app/labid/repositories/workflows/*/*/*/

# Clean up temporary files
find /tmp -name "labid-*" -mtime +1 -delete
find /app/labid/storage/rocrates/tmp -name "*.zip" -mtime +1 -delete

WorkflowHub Connection Issues:

# Test API connectivity
curl -I https://workflowhub.eu/ga4gh/trs/v2/service-info

# Check DNS resolution
nslookup workflowhub.eu

RO-Crate Generation Issues:

# Check temporary storage
ls -la /app/labid/storage/rocrates/tmp/

# Check for validation errors in logs
grep -i "rocrate.*error" /app/labid/logs/workflows.log

Repository Management

Repository Location Schema (Advanced): The DJANGO_GIT_REPOSITORY_LOCATION_SCHEMA controls how repositories are organized:

# Default schema: {category}/#U#U/#U#U/
export DJANGO_GIT_REPOSITORY_LOCATION_SCHEMA="{category}/#U#U/#U#U/"

# Alternative schemas:
export DJANGO_GIT_REPOSITORY_LOCATION_SCHEMA="{category}/#U/#U/#U/#U/"  # More directories
export DJANGO_GIT_REPOSITORY_LOCATION_SCHEMA="{category}/#U#U#U#U/"     # Fewer directories

Variables: {category} = workflow category (usually "workflows"), #U = UUID segment (2 characters)

Repository Backup Script:

#!/bin/bash
BACKUP_DIR="/backup/labid-repositories"
REPO_DIR="/app/labid/repositories"
DATE=$(date +%Y%m%d_%H%M%S)

mkdir -p "$BACKUP_DIR/$DATE"
rsync -av --exclude='.git/objects/tmp_*' "$REPO_DIR/" "$BACKUP_DIR/$DATE/"
tar -czf "$BACKUP_DIR/repositories_$DATE.tar.gz" -C "$BACKUP_DIR" "$DATE"
rm -rf "$BACKUP_DIR/$DATE"
find "$BACKUP_DIR" -name "repositories_*.tar.gz" -mtime +30 -delete

Repository Monitoring:

# Check for large repositories (>100MB)
find /app/labid/repositories -type d -name ".git" -exec du -sm {} \; | awk '$1 > 100 {print $2}' | sed 's|/.git||'

# Check repository integrity
find /app/labid/repositories -name ".git" -type d | while read repo; do
    cd "$(dirname "$repo")"
    if ! git fsck --quiet 2>/dev/null; then
        echo "Repository integrity issue: $repo"
    fi
done

Log Analysis

# Check for Git errors
grep -i "git.*error" /app/labid/logs/workflows.log

# Check for SSH issues
grep -i "ssh.*failed" /app/labid/logs/workflows.log

# Check for storage issues
grep -i "no space\|disk full" /app/labid/logs/workflows.log

# Monitor workflow operations
tail -f /app/labid/logs/workflows.log | grep -E "(commit|export|import)"

Management Commands

add_default_workflowrepositories:

python manage.py add_default_workflowrepositories

Sets up the default WorkflowHub repositories configured in your environment settings.