Registration strategies¶
We offer different registration strategies depending on use cases.
Registration using the data registration wizard¶
We first focus on use cases where the data and metadata has been produced by a variety of processes, and where this information has to be manually recovered from the end user at registration time.
The following scenari make use of LabID's data loader. The data loader is a multi-step wizard that guides the user through the registration process. Depending of the strategy, some additional steps are required before reaching the wizard (e.g. where to place the data and what permission to set on the data, etc.).
While the data always remains readable by the user and its group, the chosen registration strategy dictates how strict LabID should act regarding data protection (allowing - or not - renaming, reorganising).
Common registration strategy¶
This strategy fits most use cases and is the safest regarding data integrity.
For a variety of supported platform and technologies, we recommend to put the data in the user dropbox. Upon completing its registration with the loader, the data is transferred (e.g. copied) from the user dropbox to its final location in the group data repository . During this process, LabID acquires the rights and ownership on the data files and folders (by data copy), which allows the data to be protected against unwanted user modifications. The data remains read accessible by the user (and its primary group).
Typically, it goes as follows:
- Move the data to your dropbox. Always create a sub-dir in your dropbox that contains all the data of a single assay, we call this directory the run directory. If your data already comes in a directory, you don't need this extra sub-dir.
- Using the data loader, select the run directory from the dropbox and follow the wizard
What is the run directory and what should it contain?
The run directory should be seen as the folder containing all the acquired data in a given session for a unique assay. In a sequencing assay, the run directory is usually created by the sequencer and used to store all the reads of a given run.
When manually operating a microscope, the data structure (and naming) is usually up to you. During a single session, one may aquire images with different settings for different projects; most likely corresponding to different LabID assays. Always try to come up with a consistent, per-assay, organisation for the data.
Alternative strategy for big data and/or when users need to maintain modification rights¶
When using the common strategy, the data is copied from the dropbox to its final location by LabID. This is currently needed so that the necessary rights are acquired on the data in order to adequately protect it. However, in certain use cases:
- The data is big (hundreds of gigabytes or terabytes) and therefore the cost of the intermediary copy step is too high in terms of time, resource usage and costs
- in some very specific use cases the users need to keep modification rights over their data i.e. when downstream data processing tools (e.g. Warp and M software) require to write the processed data next to the original data in the assay run directory
In such cases, the LabID admin can allow your group to directly create a run directory as well as associated data acquisition folders within the LabID's group library. Datasets should be then moved into this folder, ensuring that (1) each run directory you create only contains data for a single assay and (2) a naming convention for the run directory is followed and later be registered from within the app.
How to do this in practice?
- First, contact your LabID admin so that we can appropriately adapt the permissions on your LabID group library folder. We highly recommend to double check that the needs are real for such a strategy as it defeats traceability and, as such, defeats a fundamental expectation of data management
- You will then be able to create a data run directory e.g. at
Data/Assay/<technology>/2024/<run_directory>/. Run directory names should follow the following pattern:YYYY_MM_DD_HHMM_<username>_<custom_name>. For example, the full path, for a light microscopy assay belonging touserX, would look like:
<LabID_group_library>/Data/Assay/light_microscopy/2024/YYYY_MM_DD_HHMM_userX_my-name
Then move the relevant dataset(s) to this directory and register from the LabID UI
You should note that using this strategy, the directory is not transferred by LabID to its final location but by yourself,
this means you keep ownership and holds modification rights on the data. The run directory remain writable and will be internally
flagged as unprotected (as well as all the datasets it contains).
You may still adapt the permissions and remove edit permissions for your group to limit editing the directory and content to yourself; but always ensure group has read permissions.
Once the edit permissions are not needed anymore, we strongly advise to protect the run_directory. For this get in touch with your admin
to see how to transfer the run directory ownership to LabID (more specifically to the LabID Unix system user).
Automated data transfer from trusted provider¶
Certain well-defined data ingestion scenari have lead us to establish controlled pipeline in order to speed up the registration process. Such scenario typically involves a trusted data source (e.g. a core facility) which provide data and metadata in well-defined formats, which can be captured automatically by LabID.
Automating data ingestion with the CLI¶
Scientists in a lab work on a wide variety of subjects and technics generating very heterogeneous data that requires custom registration with the data import wizard. Projects involving the systematic processing of many samples with a stable workflow (fixed protocols, fixed equipment and assay setup) offer the possibility to automate the data and assay registration. In such cases, it is possible to write a custom sniffer and automate the data registration using the LabID command line interface combined with a task scheduler like cron.
A concrete example is available in the stories.