Workflow System Architecture¶
This document provides detailed technical information about LabID's workflow system architecture, focusing on the local Git repository implementation and data storage patterns.
Local Git Repository Implementation¶
Repository Structure¶
Each workflow in LabID gets its own dedicated Git repository managed by the system. The repository structure follows this pattern:
{STORAGE_ROOT}/repositories/
└── workflows/
└── {UUID_SEGMENT_1}{UUID_SEGMENT_2}/
└── {UUID_SEGMENT_3}{UUID_SEGMENT_4}/
└── {WORKFLOW_UUID}/
├── .git/ # Git metadata
├── workflow_file_1.py # Workflow files
├── config.yaml # Configuration files
├── README.md # Documentation
└── tests/ # Test files
└── test_workflow.py
Configuration Parameters¶
The repository location is controlled by these environment variables:
DJANGO_GIT_REPOSITORY_LOCATION: Base directory (default:{STORAGE_ROOT}/repositories)DJANGO_GIT_REPOSITORY_LOCATION_SCHEMA: Organization schema (default:{category}/#U#U/#U#U/)DJANGO_GIT_REPOSITORY_LOCATION_TMP: Temporary operations directory
The #U placeholders are replaced with UUID segments, creating a hierarchical structure that distributes repositories across directories for better filesystem performance.
Git Operations¶
Repository Initialization¶
When a new workflow is created:
def initialize_git_repository(self):
if self.repository:
raise ValueError("Repository already exists for this workflow")
repository = Repository.objects.initialize(self)
self.repository = repository
self.save()
Commit Process¶
When a WorkflowVersion is released:
- All files are staged in Git
- A commit is created with metadata
- The commit hash is stored in
WorkflowVersion.commit_hash - The version becomes immutable
commit_hash = workflow.repository.commit(f"Release version {version.name}")
workflowversion.commit_hash = commit_hash
workflowversion.is_committed = True
Workflow vs WorkflowVersion vs WorkflowFile¶
Workflow Model¶
The top-level Workflow model represents a computational workflow project:
class Workflow(Attachmentable, SoftDeletable, Annotatable, Noteable, Typeable, Ownable, NameableAbstractBaseModel):
remote_repository = CustomCharField(max_length=2000, null=True, blank=True)
imported_from_rocrate = BaseBooleanField(default=False, editable=False)
repository = CustomForeignKey("data_management.Repository", null=True, on_delete=models.PROTECT)
workflow_manager = ChoiceCharField(max_length=200, choices=WORKFLOW_MANAGERS)
Key properties:
remote_repository: URL of source repository (for imported workflows)repository: Link to local Git repositoryworkflow_manager: Type of workflow (SNAKEMAKE, NEXTFLOW, CWL, etc.)
WorkflowVersion Model¶
Each WorkflowVersion represents a specific state of the workflow:
class WorkflowVersion(SoftDeletable, Annotatable, Noteable, Ownable, NameableAbstractBaseModel):
workflow = CustomForeignKey(Workflow, related_name="workflowversions")
commit_hash = CustomCharField(max_length=200, null=True, blank=True)
is_latest = BaseBooleanField(default=True)
is_committed = BaseBooleanField(default=False)
license = CustomForeignKey("vocabularies.License", null=True, blank=True)
Key properties:
commit_hash: Git commit hash (set when version is released)is_committed: Whether version is immutableis_latest: Whether this is the current development version
WorkflowFile Model¶
Individual files within a workflow version:
class WorkflowFile(SoftDeletable, Annotatable, Noteable, Ownable, NameableAbstractBaseModel):
workflowversion = CustomForeignKey(WorkflowVersion, related_name="workflowfiles")
file_path = CustomCharField(max_length=500)
data_type = ChoiceCharField(max_length=200, choices=WORKFLOW_DATATYPES)
Key properties:
file_path: Relative path within the Git repositorydata_type: Semantic type (MAIN, CONFIG, TEST, OTHER)
File Type System¶
Workflow Data Types¶
The system defines semantic types for workflow files:
MAIN = "main"
CONFIG = "config"
TEST = "test"
OTHER = "other"
WORKFLOW_DATATYPES = [
(MAIN, "Main"),
(CONFIG, "Config"),
(TEST, "Test"),
(OTHER, "Other"),
]
Type Significance¶
- MAIN: Required for publishing to WorkflowHub; represents the primary workflow entry point
- CONFIG: Configuration files, parameters, and settings
- TEST: Test files, validation scripts, and examples
- OTHER: Documentation, README files, and auxiliary content
Type Detection¶
The system includes automatic workflow type detection based on file patterns:
def get_workflow_type_and_files(repository, commit_hash):
"""Detect workflow type and suggest file types based on repository content"""
# Implementation analyzes file extensions and names to suggest:
# - Workflow manager type (Snakemake, Nextflow, etc.)
# - Appropriate file type classifications
Imported vs Manual Workflows¶
Manual Workflows¶
Created within LabID by uploading existing workflow files:
- Repository Creation: New Git repository initialized
- File Upload: Users upload existing workflow files from their local computer through web interface
- File Storage: All files stored as WorkflowFile objects
- Version Control: Full Git history managed by LabID
Use Case: Alternative to creating a GitHub repository - users can upload workflow files directly to LabID instead.
Properties:
workflow.remote_repository = Nonerepository.is_remote = False- All repository files are tracked as WorkflowFiles
Imported Workflows¶
Imported from external Git repositories:
- Repository Clone: External repository cloned locally
- File Selection: Users choose which files to track
- Selective Storage: Only selected files become WorkflowFiles
- Sync Capability: Can update from remote source
Properties:
workflow.remote_repository = "https://github.com/user/repo.git"repository.is_remote = True- Only selected files are tracked as WorkflowFiles
- Other repository files remain in Git but aren't managed
Import Process¶
When importing from a remote repository:
@action(detail=False, methods=["post"], url_path="new-from-remote")
def new_from_remote(self, request, version, **kwargs):
# 1. Validate remote repository access
# 2. Clone repository to temporary location
# 3. Analyze files and detect workflow type
# 4. Create Workflow and WorkflowVersion objects
# 5. Create WorkflowFile objects for selected files
# 6. Commit initial version to local repository
Repository Management¶
Storage Backend Integration¶
Workflows integrate with LabID's storage backend system:
class Repository(SoftDeletable, Ownable, NameableAbstractBaseModel):
repo_path = CustomCharField(max_length=500)
is_remote = BaseBooleanField(default=False)
storage = FileSystemStorage() # Configurable storage backend
File Operations¶
The Repository model provides Git operations:
def commit(self, message):
"""Create a Git commit with the given message"""
def list_content(self, commit_hash, files_only=True):
"""List repository contents at specific commit"""
def get_file_content(self, file_path, commit_hash):
"""Retrieve file content at specific commit"""
Cleanup and Deletion¶
When workflows are deleted:
def true_delete(self, *args, **kwargs):
"""On true_delete, try to remove the associated git repository"""
repository = self.repository
ret = self.delete(soft=False, *args, **kwargs)
try:
repository.delete()
except Exception:
logger.exception("Could not delete the associated repository.")
return ret
Storage Planning¶
Administrators should consider:
- Repository growth over time
- Git garbage collection schedules
- Backup strategies for Git repositories
- Disk space monitoring