Workflow System Administration¶
Administrative setup and configuration of LabID's workflow system.
System Requirements¶
Storage Requirements:
- Git repositories (each workflow gets its own repository)
- Temporary files for RO-Crate generation and import operations
System Dependencies:
- SSH (for accessing privately hosted Git repositories)
- Git 2.20+, GitPython 3.1.40+, rocrate 0.9.0+, rocrate-validator 0.5.1+ (automatically managed via conda)
Core Configuration¶
Essential environment variables:
# Git Repository Storage
export DJANGO_GIT_REPOSITORY_LOCATION="/app/labid/storage/repositories"
export DJANGO_GIT_REPOSITORY_LOCATION_TMP="/app/labid/storage/repositories/tmp/"
# Git Command Configuration
export DJANGO_GIT_SSH_CONFIG_FILE="/path/to/settings/git_ssh_config"
export DJANGO_GIT_SSH_COMMAND="/path/to/scripts/utils/bash/ssh.sh -F /path/to/git_ssh_config"
export DJANGO_GIT_CONFIG_GLOBAL="/path/to/settings/.gitconfig"
For SSH key setup, see SSH Keys Configuration.
Galaxy Integration¶
Configure Galaxy integration for workflow import:
Only one Galaxy instance possible
Currently only one Galaxy instance can be used. This is the primary one configured by the GALAXY_URL setting.
# Primary Galaxy instance
export DJANGO_GALAXY_URL="http://localhost:9090"
# Galaxy Endpoints Configuration
export DJANGO_GALAXY_ENDPOINTS='{
"http://localhost:9090": {
"url": "http://localhost:9090",
"label": "Galaxy Local",
"is_default": true,
"data_library_sync": true,
"import_workflows": true
},
"https://usegalaxy.eu": {
"url": "https://usegalaxy.eu",
"label": "Galaxy Europe",
"is_default": false,
"data_library_sync": false,
"import_workflows": true
}
}'
Endpoint Properties:
is_default: Whether this is the default Galaxy instance (only one should be true)data_library_sync: Enable data library synchronization with this instanceimport_workflows: Allow workflow import from this instance
For technical implementation details, see the developer documentation.
WorkflowHub Integration¶
Configure WorkflowHub integration:
Only one WorkflowHub instance possible
Currently only one WorkflowHub instance can be used for publishing workflows.
# Primary WorkflowHub instance
export DJANGO_DEFAULT_WORKFLOWHUB="https://workflowhub.eu"
# WorkflowHub repositories configuration
export DJANGO_DEFAULT_PUBLIC_WORKFLOWREPOSITORIES='{
"https://workflowhub.eu": {
"label": "WorkflowHub",
"url": "https://workflowhub.eu",
"api_url": "https://workflowhub.eu",
"trs": {
"info": "https://workflowhub.eu/ga4gh/trs/v2/service-info",
"search": "https://workflowhub.eu/ga4gh/trs/v2/tools"
}
}
}'
User Credentials:
User credentials are stored encrypted in UserProfile.attributes with a cryptographic key created on first server start.
Preserve Encryption Key
This encryption key must be preserved as removing it would render stored API keys unusable.
For technical implementation details, see the developer documentation.
RO-Crate Generation¶
LabID generates Research Object Crates (RO-Crates) synchronously when requested through the API.
Export Types:
- Minimal: Includes only WorkflowFiles (main workflow file, README, LICENSE, etc.)
- Full: Includes all files from the Git repository
API Usage:
Storage:
Temporary RO-Crates are stored at {STORAGE_ROOT}/rocrates/tmp
Process:
- Validates workflow version (computer language and main file required)
- Creates temporary repository checkout
- Processes files and creates metadata
- Validates RO-Crate against standard profile
- Returns ZIP file for download
Troubleshooting¶
Common Issues¶
SSH Key Issues: See SSH Keys Configuration for connectivity testing and troubleshooting.
Storage Space Issues:
# Check repository sizes
du -sh /app/labid/repositories/workflows/*/*/*/
# Clean up temporary files
find /tmp -name "labid-*" -mtime +1 -delete
find /app/labid/storage/rocrates/tmp -name "*.zip" -mtime +1 -delete
WorkflowHub Connection Issues:
# Test API connectivity
curl -I https://workflowhub.eu/ga4gh/trs/v2/service-info
# Check DNS resolution
nslookup workflowhub.eu
RO-Crate Generation Issues:
# Check temporary storage
ls -la /app/labid/storage/rocrates/tmp/
# Check for validation errors in logs
grep -i "rocrate.*error" /app/labid/logs/workflows.log
Repository Management¶
Repository Location Schema (Advanced):
The DJANGO_GIT_REPOSITORY_LOCATION_SCHEMA controls how repositories are organized:
# Default schema: {category}/#U#U/#U#U/
export DJANGO_GIT_REPOSITORY_LOCATION_SCHEMA="{category}/#U#U/#U#U/"
# Alternative schemas:
export DJANGO_GIT_REPOSITORY_LOCATION_SCHEMA="{category}/#U/#U/#U/#U/" # More directories
export DJANGO_GIT_REPOSITORY_LOCATION_SCHEMA="{category}/#U#U#U#U/" # Fewer directories
Variables: {category} = workflow category (usually "workflows"), #U = UUID segment (2 characters)
Repository Backup Script:
#!/bin/bash
BACKUP_DIR="/backup/labid-repositories"
REPO_DIR="/app/labid/repositories"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p "$BACKUP_DIR/$DATE"
rsync -av --exclude='.git/objects/tmp_*' "$REPO_DIR/" "$BACKUP_DIR/$DATE/"
tar -czf "$BACKUP_DIR/repositories_$DATE.tar.gz" -C "$BACKUP_DIR" "$DATE"
rm -rf "$BACKUP_DIR/$DATE"
find "$BACKUP_DIR" -name "repositories_*.tar.gz" -mtime +30 -delete
Repository Monitoring:
# Check for large repositories (>100MB)
find /app/labid/repositories -type d -name ".git" -exec du -sm {} \; | awk '$1 > 100 {print $2}' | sed 's|/.git||'
# Check repository integrity
find /app/labid/repositories -name ".git" -type d | while read repo; do
cd "$(dirname "$repo")"
if ! git fsck --quiet 2>/dev/null; then
echo "Repository integrity issue: $repo"
fi
done
Log Analysis¶
# Check for Git errors
grep -i "git.*error" /app/labid/logs/workflows.log
# Check for SSH issues
grep -i "ssh.*failed" /app/labid/logs/workflows.log
# Check for storage issues
grep -i "no space\|disk full" /app/labid/logs/workflows.log
# Monitor workflow operations
tail -f /app/labid/logs/workflows.log | grep -E "(commit|export|import)"
Management Commands¶
add_default_workflowrepositories:
Sets up the default WorkflowHub repositories configured in your environment settings.