Workflow File Types and Data Organization¶
LabID uses a semantic file type system to organize and classify files within workflows. This system enables intelligent workflow processing, proper RO-Crate packaging, and clear organization of workflow components.
File Type Categories¶
Workflow File Types¶
Based on the actual implementation in workflows/utils/constants.py, LabID supports these file types:
MAIN - Primary Workflow Definition¶
- Purpose: The primary workflow definition file that serves as the entry point
- Requirements:
- Only one MAIN file allowed per WorkflowVersion
- Required for publishing to WorkflowHub
- Must represent the executable workflow entry point
- Examples:
Snakefile,main.nf,workflow.cwl,workflow.wdl - Validation: System enforces single MAIN file constraint
# From the codebase - only one MAIN file allowed
if self.data_type == MAIN:
if self._state.adding and self.workflowversion.workflowfiles.filter(data_type=MAIN).exists():
raise ValueError(_("There can only be one main file."))
CONFIG - Configuration Files¶
- Purpose: Configuration files and settings
- Usage: Runtime configuration, environment settings
- Examples:
config.yaml,nextflow.config - Dataset Association: Can be also linked to WorkflowRuns as configuration datasets
PARAMETER - Parameter Files¶
- Purpose: Specific parameter definitions and input specifications
- Usage: Input parameters, variable definitions, parameter schemas
- Examples:
parameters.yml,params.json,params.txt - Dataset Association: Can be linked to WorkflowRuns as parameter datasets
README - Documentation¶
- Purpose: Primary documentation and usage instructions
- Auto-detection: Automatically detected during import
- Examples:
README.md,README.txt,README.rst - Standards: Supports various documentation formats
LICENSE - License Information¶
- Purpose: License files and legal information
- Auto-detection: Automatically detected during import
- Examples:
LICENSE,LICENSE.txt,COPYING - Compliance: Important for workflow sharing and publication
TEST - Test Files¶
- Purpose: Test files, validation scripts, and examples
- Usage: Unit tests, integration tests, example runs
- Examples:
test_workflow.py,test_data/,examples/ - Organization: Often organized in
tests/orexamples/directories
DAG_PNG - Workflow Diagrams¶
- Purpose: Visual representations of workflow structure
- Usage: Workflow diagrams, dependency graphs
- Examples:
dag.png,workflow_diagram.svg - Generation: Often auto-generated by workflow engines
OTHER - Miscellaneous Files¶
- Purpose: Files that don't fit other categories
- Usage: Default type for unclassified files
- Examples: Scripts, utilities, auxiliary files
- Flexibility: Catch-all category for diverse file types
Workflow Run Specific File Types¶
INPUT - Input datasets¶
- Purpose: Workflow execution input files, excluding configuration & parameter files
- Usage: Input data on which the workflow operates
- Examples:
sample.fastq,image.tiff - Mandatory: At least one input dataset is required for a valid WorkflowRun
- Dataset Association: Can be linked to WorkflowRuns as input datasets
CONFIG - Configuration Files¶
- Purpose: Configuration files and settings
- Usage: Runtime configuration, environment settings
- Examples:
config.yaml,nextflow.config - Dataset Association: Can be linked to WorkflowRuns as config datasets
PARAMETER - Parameter Files¶
- Purpose: Specific parameter definitions and input specifications
- Usage: Input parameters, variable definitions, parameter schemas
- Examples:
parameters.yml,params.json,params.txt - Dataset Association: Can be linked to WorkflowRuns as parameter datasets
OUTPUT - Output Specifications¶
- Purpose: Workflow execution output files, excluding reports and logs
- Usage: Output data generated by the workflow execution
- Mandatory: At least one output dataset is required for a valid WorkflowRun
- Examples:
sample.bam,count_table.csv,images.ome.zarr - Dataset Association: Can be linked to WorkflowRuns as output datasets
LOG - Log Files¶
- Purpose: Execution logs and runtime information
- Usage: Workflow execution logs, debug information
- Examples:
workflow.log,execution.log - Dataset Association: Can be linked to WorkflowRuns as log datasets
REPORT - Report Files¶
- Purpose: Analysis reports and summaries
- Usage: Generated reports, analysis summaries
- Examples:
report.html,summary.pdf - Dataset Association: Can be linked to WorkflowRuns as report datasets
File Type Detection and Assignment¶
Automatic Detection¶
During workflow import, LabID automatically detects and suggests file types:
def get_workflow_type_and_files(repository, commit_hash):
"""Detect workflow type and suggest file types based on repository content"""
# Automatically detects:
# - README files (README.md, README.txt, etc.)
# - LICENSE files (LICENSE, COPYING, etc.)
# - Workflow-specific patterns (Snakefile, main.nf, etc.)
Workflow-Specific Detection¶
The system includes detection logic for different workflow managers:
- Snakemake: Detects
Snakefile,*.smkfiles - Nextflow: Detects
main.nf,nextflow.config - CWL: Detects
*.cwlfiles - WDL: Detects
*.wdlfiles - Galaxy: Detects
*.gafiles
Manual Assignment¶
Users can manually assign or change file types through the web interface:
- File types can be changed regardless of WorkflowVersion commit status
- Type changes are immediately reflected in the workflow organization
- Validation ensures MAIN file constraints are maintained
Dataset Association Types¶
When linking datasets to WorkflowRuns, these types are used:
DATASET_TO_WORKFLOWRUN_TYPES = (
(INPUT, INPUT_LABEL), # Input datasets
(OUTPUT, OUTPUT_LABEL), # Output datasets
(CONFIG, CONFIG_LABEL), # Configuration datasets
(REPORT, REPORT_LABEL), # Report datasets
(LOG, LOG_LABEL), # Log datasets
(PARAMETER, PARAMETER_LABEL), # Parameter datasets
)
Type Categories for Dataset Association¶
- INPUT_TYPES:
(INPUT, CONFIG, PARAMETER)- Data flowing into workflows - OUTPUT_TYPES:
(OUTPUT, REPORT, LOG)- Data produced by workflows
Publishing and Standards Compliance¶
WorkflowHub Publishing Requirements¶
For publishing to WorkflowHub:
- MAIN file is mandatory - Must have exactly one MAIN file
- RO-Crate compliance - File types map to RO-Crate metadata
- Metadata preservation - File types become part of workflow metadata
RO-Crate Mapping¶
File types are mapped to RO-Crate standards:
- MAIN → Primary workflow entity in RO-Crate
- CONFIG/PARAMETER → Configuration entities
- TEST → Test entities
- README/LICENSE → Documentation entities
Best Practices¶
File Organization¶
- Single MAIN File: Always designate exactly one primary workflow file
- Logical Grouping: Use appropriate types to group related files
- Documentation: Include README and LICENSE files for shared workflows
- Testing: Include TEST files for workflow validation
Type Assignment Strategy¶
- Start with Auto-detection: Let the system suggest initial types
- Review and Refine: Manually adjust types as needed
- Maintain Consistency: Use consistent typing across workflow versions
- Consider Publishing: Ensure MAIN file is properly designated for sharing
Version Management¶
- Type Flexibility: File types can be changed between versions
- Path Consistency: Same file paths can have different types across versions
- Validation: System validates type constraints during operations
API and Integration¶
File Type Validation¶
# Valid file types from constants
WORKFLOWFILE_DATA_TYPES_CHOICES = [
(MAIN, "Main"),
(CONFIG, "Config"),
(LOG, "Log"),
(PARAMETER, "Parameter"),
(INPUT, "Input"),
(TEST, "Test"),
(OUTPUT, "Output"),
(OTHER, "Other"),
(DAG_PNG, "DAG PNG"),
(README, "README"),
(LICENSE, "License"),
]
API Usage¶
File types are specified in API calls:
CLI Integration¶
When registering workflows via CLI, file types can be specified:
Troubleshooting¶
Common Issues¶
- Multiple MAIN Files: System prevents multiple MAIN files per version
- Missing MAIN File: Publishing requires exactly one MAIN file
- Type Validation: Invalid types are rejected by the API
- Empty Files: Empty files are not allowed for upload
Error Messages¶
"There can only be one main file."- Attempting to add multiple MAIN files"'INVALID_TYPE' is not a valid argument"- Using non-existent file type"Empty files are not allowed."- Attempting to upload empty files
Resolution Strategies¶
- Review File Types: Ensure proper type assignment before publishing
- Check Constraints: Verify MAIN file requirements are met
- Validate Content: Ensure files have content before upload
- Use Auto-detection: Leverage automatic type detection for initial setup