Skip to content

Importing raw data

This page will walk through the dataset registration process. Each step is illustrated by screenshots depicting light microscopy dataset registration.

Raw datasets have to be linked to the data-generating assay, and to their origin samples. Both the assay and samples can either be created while importing, or they already have been created beforehand and are linked to datasets during their registration.

The dataset import wizard assists user throughout the dataset registration process.

the steps of the dataset import wizard
The steps of the dataset import wizard

Importing datasets then consists in:

  1. Start - Selecting the assay type
  2. Select data - Selecting datafiles (that will become datasets)
  3. Assay details - Provide information about the assay
  4. Build datasets - Assembling datafiles into datasets
  5. Verify - Double check assay and datasets
  6. Assign samples - Associating each dataset to its origin sample

The wizard is accessible from the main menu on the left of the app, or from assay detail pages

import datasets wizard entry points
Screenshot of the import dataset wizard entry points. Left panel shows the left menu available on every page, the right panel shows the button to add data to a specific
Additional resources to learn about the registration process

The training material has two sections to learn more about how the registration wizard works. Section 101 deals with the simple example (similar to the example below), Section 102 deals with a more complex example (registering paired-end datasets)

Register Raw Datasets 101

Register Raw Datasets 102

Assay type selection

Currently, it is possible to import sequencing (nanopore & illumina) and imaging (light microscopy & electron microscopy) assays.

Read more about assays...

Create an assay from a template saves time

Creating an assay from scratch everytime datasets are registered can be cumbersome. The assay model is indeed sometime fairly advanced, as it is also made to accomodate the most complex cases. In order to save time, one can chose a template from which important assay information will be fetched and transfered to the new assay.

Templates should be established by group and/or facilities that run standardised assays. In case a template is not available, an already registered assay can be used as a template.

Screenshot of the Assay type selection page with a selected template ('Standard Nuclei Imaging (Light Microscopy Assay)')
Screenshot of the Assay type selection page with a selected template (Standard Nuclei Imaging (Light Microscopy Assay))

Dataset selection

Dataset can be cumbersome to move around. To facilitate this process, a dropbox is exposed for every users. The dropbox is located on the same filesystem users use to store experimental data, which e.g. avoids long upload times. Once moved into the dropbox, files becomes selectable on the select data page of the import wizard.

At this step, users are expected to select all dataset that are important to record for the assay. These dataset are either the final datasets (or part of a multidimensional dataset, see later step "Build datasets")

Light Microscopy Imaging example

Screenshot of the file selector
Screenshot of the file selector
At the top (orange arrow) is the dropbox selector, to select among the different user dropboxes e.g. if the user belongs to more than one group. On the left side (1), the assay run folder is selected and its content appear on the right side (2).

How to select the right datasets?

In the above example:

  • The datasets are the folders (ImageAcquisitionN) containing .czi files. Each dataset has to be manually selected by the user
  • The image files (.czi) are selected as part of each dataset. They will be auto selected when the parent is selected
  • Option: Transfer the whole run directory: When selected (default), the whole directory is transferred, this includes the files that are not selected as datasets (like the image-acquisition-metadata.json at top-level).
    • When not transferring the whole run directory, the dropbox will have to be manually cleaned up by the user afterwards, to delete non transferred files

Assay Creation

This page is where the user provides the assay information details. The form depends on the assay type. When a template was selected at a previous step, this form is initialised with the template information.

Light Microscopy Imaging example

Screenshot of the assay creation form
Screenshot of the assay creation form
In our example, the form is already pref-filled with information from the template selected at the beginning of the wizard

Dataset builder: Assemble datafiles into datasets and datasets into collections

Datasets

A dataset is, in the most common case, a single datafile or folder. However, in more complex cases, a dataset can be multi-dimensional, that is, a combination of two or more datasets selected at the previous step.

A common example of a two-dimensional dataset is a paired-end sequencing dataset that has 2 components, the read 1 (r1) and the read 2 (r2). Isolated from each other, r1 and r2 are useless, this is why we provide a way to combine them.

Read more about datasets...

Collections

Scientific instrument typically produce the raw data, but in some cases, they also do basic data processing (e.g. based quality filters, alignment to reference, etc.), therefore outputting two sets of datasets of different nature. In such cases, the different sets should be separated into different collections (i.e. a logical group of datasets)

Each collection needs a name, can be flagged as raw data, and has its own set of controls within the interface to e.g. use regex to extract a good dataset name based on file names.

Staging

The staging step allows for more complex interaction with datasets. The builder allows for associating datasets together, or automatically extracting better names using advanced controls. However, this is not always necessary, and in the most simple cases (as the one displayed in the example below), files can immediately be staged

Before staging

Light Microscopy Imaging example - Dataset Builder

Screenshot of the dataset builder with 3 candidate datasets (unstaged)
Screenshot of the dataset builder with 3 candidate datasets (unstaged)
This screenshot displays the dataset builder with a single collection of 3 raw light microscopy candidate datasets. The collection information appears at the top (orange square 1), the dataset table at the bottom (orange square 2). The dataset name is displayed in the first column (arrow a), the name of the datafile is displayed in the second column (arrow b). Datasets are staged using the buttons in the last columns (arrow c)

Notes:

  • Advanced controls are not shown here, but the advanced control panel can be opened clicking on the gear button (arrow d)
  • As no dataset have been staged yet, the page displays an error message at the bottom, and is is impossible to continue
After staging

When datasets are ready within a collection, they have to be staged. Upon staging, a name is assigned to the dataset, and the dataset transferred to the staged panel at the bottom. If the autogenerated name is not explicit enough, it can be changed at this point. It will become the name of the registered dataset, and cannot be modified later.

Light Microscopy Imaging example - Staged datasets

Screenshot of the dataset builder with 3 staged datasets
Screenshot of the dataset builder with 3 staged datasets

Verify

The verify page recapitulates all informations so far:

  1. The assay information (hidden by default)
  2. The list of collections and datasets

It also exposes few controls to:

  1. Assign a study to collections
  2. Decide whether samples should be associated to datasets (mandatory for raw data)
  3. Chose, when needed, the type of samples to be associated to datasets

Light Microscopy Imaging example - Verify page

Screenshot of the import wizard verify page

Associate samples

The final step enables assigning a sample to each dataset. Samples can either be loaded from the database - and will be flagged as so in the dataset column - or generated at import time (providing minimum information e.g. name, barcode when needed etc.), and tagged as new .

To select existing samples, they first have to be loaded on the page (use the load samples button at top or hyperlink below sample name input). Once loaded they can be selected in the sample column.

The same sample can be selected on more than one row when the same sample was used to produce different datasets.

How to deal with multiplexed assay?

Multiple samples can be associated to the same dataset via sample groups. Use the sample group controls (in the oolbar button at the top) to create a group of existing or new sample.

⚠️ When creating new samples inside a group, it is currently not possible to associate the same new sample to another datasets in this page. To overcome this limitation, either create the samples beforehand and load existing samples into a group, or merge the duplicated samples later on

Light Microscopy Imaging example - Assign sample page

Screenshot of the import wizard assign sample page
Screenshot of the dataset-sample association table
The screenshot display the dataset-sample association table. On line 1 an existing sample "LM sample 1" has been loaded and assigned to the dataset "ImageAcquisition1". On line 2 and 3, two new samples are assigned respectively to "ImageAcquisition2" and "ImageAcquisition3". New samples can be identified with the new tag next to the input name. They will be created upon submission a linked to the child dataset

In case the dataset belong to another user in the group, the username in the owner column should be adjusted accordingly.

Register and track the import progress

Upon registration, a redirect to a success page will happen.

Light Microscopy Imaging example - Successful registration

Screenshot of the import wizard succesful registration page
Screenshot of the successful registration

Registration is not an synchronous task (i.e. its execution is decoupled from your submit action), so a task is spawn and appear as so on the user task list page

An email will be sent when your data has been ingested at is available for further actions.