Skip to content

Hands-on - Publish a Sequencing Study

  • 60 min
  • Moderate

This hands-on follows a complete RNA-seq experimental story from biological setup to submission-ready exports.

Overview

The study compares Brpf1 knockout (KO) versus wild-type (WT) neurons derived from BALB/c mouse brain tissue, with two biological replicates per condition (4 libraries in total).

The experimental work can be summarized as follows:

  • brain tissue is dissected from four BALB/c mice
  • neurons are isolated and cultured
  • Brpf1 knockout is introduced in the KO condition
  • RNA is extracted
  • mRNA libraries are prepared
  • paired-end sequencing is performed on Illumina HiSeq 4000.

In this training, you will represent this experimental flow in LabID so that the final metadata, annotations, protocols and exports are coherent, reusable, and compatible with repository requirements.

Learning Objectives

In this tutorial you will learn how to:

  • Register managed sequencing datasets and connect them to the correct KO/WT study context
  • Capture publication-grade study metadata (title, narrative description, design terms, contributors)
  • Define experimental factors and annotations that reflect biological variables and sample provenance
  • Annotate sequencing libraries in batch with Excel templates and verify consistency
  • Link mandatory and recommended protocols to the sequencing libraries in experimental order
  • Export and review MAGE-TAB validation outputs, then generate ENA tables and submit XML

Walkthrough

Before you start, pick one trainee account in the range trainee1 to trainee40 and keep it for the whole tutorial (accounts over 40 are reserved for trainers).

Pick one trainee and keep it for all steps

Add your name in front of your trainee username in the shared trainee allocation table.

Link placeholder (to be replaced): Trainee allocation table

In this tutorial, you should replace the XX with your trainee number to customize sample/study names. This will make it easy to locate your item in selection dropboxes. For example, if you picked trainee49 (as shown throughout this tutorial), you should use a study name Wt vs KO RNA-seq Study [49] (pattern: Wt vs KO RNA-seq Study [XX])

Step 1. Register the datasets

This step follows Hands-on - Registration of Managed Data, with two important differences.

  • In the study creation step (step 1), use the study name: Wt vs KO RNA-seq Study [XX]
Create the study for this tutorial

Create the study from the managed registration flow and use the RNA-seq training naming convention (here using 49 as a custom key)

  • Intermediate steps are unchanged
Verify the registration payload

Make sure to select the study we just created

  • In the sample assignment step (step 5), rename samples as (replacing the XX placeholder):
    • XX_KO_Rep1 -> 49_KO_Rep1
    • XX_KO_Rep2 -> 49_KO_Rep2
    • XX_WT_Rep1 -> 49_WT_Rep1
    • XX_WT_Rep2 -> 49_WT_Rep2
Assign and rename samples

Use explicit sample names (XX_KO_Rep1, XX_KO_Rep2, XX_WT_Rep1, XX_WT_Rep2) custommized with your trainee number (here 49)

You now have been redirected to the Assay Detail Page of the registered assay, where you can see the four registered datasets connected to the study. The assay overview table is a good place to quickly check that all datasets are present and correctly linked to the study.

Registered Assay Overview

Resulting assay overview after sucessfull registration

Step 2. Edit the study metadata

Open your study and update the key publication fields:

  • From the assay page, horizontally scroll the Output Datasets table to locate the Studies column and click on the study name; or
  • Navigate to the study list page to locate your study (Dataset Management Menu > Studies sub-menu)

2.1 Name, description & design terms

  • Set the name to a fancier name as this will become the public name of your study once published:

RNA-seq analysis of Brpf1 knockout versus wild-type neurons derived from BALB/c mouse brain [XX]

  • For the same reason, make sure the description accurately describes how the datasets were generated summarizing the main aspects from sample collection to data acquisition:

This study investigates the transcriptional impact of targeted knockout (KO) of the Brpf1 gene in primary mouse neurons using RNA sequencing (RNA-seq).
Brain tissue was dissected from four BALB/c mice. Neuronal cells were isolated and cultured in vitro for two weeks prior to RNA extraction. A targeted knockout of Brpf1 was introduced in two samples, while two samples were maintained as wild-type controls, resulting in two biological replicates per condition.
Total RNA was extracted and mRNA libraries were prepared by the EMBL Genomics Core Facility. Sequencing was performed on the Illumina HiSeq 4000 platform using paired-end mode.
This dataset enables comparative transcriptomic analysis between Brpf1 knockout and wild-type neurons to assess the gene's role in neuronal gene regulation.

  • Add the biological replicate and genetic modification design design terms
  • Save the study details before moving to the next step
Study title, description and design terms

Consolidate public-facing study metadata early

2.2 Experimental factors and annotations

We now configure experimental factors and suggested annotations.

What is an experimental factor?

An experimental factor is a variable that differentiates study conditions and can explain observed differences.

For this tutorial, the experimental factors are:

  • individual genetic characteristics (equivalent to genotype): with values either Brpf- or wild type
  • genetic modification: with values either gene knock out or none

In addition to factors, add key suggested annotations on SequencingLibrary (assay input) to support FAIR metadata and downstream validation. Given the description of the study the following annotations seem relevant and most should be considered mandatory:

  • StrainOrLine describes the mouse strain used to dissect brain tissue from, here BALB/c
  • OrganismPartdescribe the dissected tissue from the mouse, here brain
  • Age with relevant time unit (here 21 weeks) of the mouse when sacrificed, using birth as InitialTimePoint
  • DevelopmentalStage corresponding to the described Age, here adult
  • Sex of the sacrificed mouse, here all female
  • CellType the isolated cell type from the dissected brain tissue, here Neuron

This follows the recommendations in Data Preparation.

Why add both factors and extra annotations?

  • Experimental factors define condition structure for analysis
  • Additional annotations improve reusability and repository compliance
  • These fields are checked during export validation (report file)

How to decide what to add

  • Use ENA checklist expectations as guidance for required sample metadata
  • At EMBL, you can refine this list with MODIS during a consultation session

Ok, let's add the factors and annotations to the study, the Object field is always set to SequencingLibrary.

  • Edit the study details and navigate to the Experimental factors section
  • Add the two factors (genetic modification and individualgenetic characteristics) for SequencingLibrary object, and check the Mandatory and Factor Value boxes
  • Add the suggested annotations as listed above (also for SequencingLibrary object), and check the Mandatory box for all of them
Experimental factors and suggested annotations

Position factors and annotations on the correct object type (SequencingLibrary)

2.3 Contributors

Add contributors, including at least:

  • Yourself with submitter role
  • In the context of this training, use a fake email like trainee49@fake.org
  • Optionally add additional data contributors
Study contributors

Add contributor roles for publication and contactability

Adding or removing datasets from the study

If your study contains datasets you do not want to submit, or misses datasets:

  • Use dataset selection from the study table
  • Use Batch Edit to add selected datasets to the target study
  • Or remove wrongly associated datasets

For detailed guidance, see Data Preparation - dataset content.

Step 3. Annotate sequencing libraries

In the step, we annotate the sequencing libraries connected to your study.

Duplicate annotations on assay input samples

Always ensure annotations are present directly on assay input samples (SequencingLibrary).

We are not (yet) crawling the lineage graph to infer missing values from upstream nodes. Explicit sample-level annotation prevents ambiguity and avoids conflict-resolution issues.

3.1 Set study context and download the Excel template for batch edition

  • Open the Sequencing Library list page available from the Biomaterials menu
  • Click the Context arrow button (top right of the page), and pick Set study context.
    This opens a modal displaying a list page, where the study can be searched for. Select your RNA-seq analysis of Brpf1 knockout versus wild-type neurons derived from BALB/c mouse brain [XX] study (by clicking on the checkbox ) and validate by clicking on the Set study Context button. This closes the modal and loads the list page with the study context.
  • Select the 4 libraries of your RNA-seq study using the checkboxes on the left of the table, or use the top checkbox to select all libraries at once.

Then, download the template for batch edit:

Select libraries before export

Setting your study context ("1", red arrow) and downloading the template for batch edit ("2", red arrow)

  • In the export dialog, make sure to add all needed annotation types as defined on the study
Export options for Excel batch annotation

Pick export mode and include annotations for efficient editing

3.2 Excel-based batch annotation

  • Open the Excel template and remove all unnecessary columns
Empty template with relevant columns ony

Start from the exported template, where only necessary columns are kept. The exact column selection is context dependent, here we display a minimal example

  • Fill all columns with the correct values according to the study setup (see screenshot below), you can also download this filled example file: Filled Excel template example to copy values into your own template, importantly do not copy the id and name columns.
Filled template

Fill annotation columns according to the study setup

  • Re-import the filled template using the button above the table (right to the you used to get the template) and click on the Batch edit with spreadsheet link.
  • Drag'n'Drop the excel spreadsheet on the drop zone. A green Successfully updated 4 items of type SEQUENCINGLIBRARY message should appear at the bottom of the page.
  • Then navigate back to the sequencing library list page to check that the annotations are correctly updated and consistent with the filled template. Note that your study context should still be active, if not set it back as previously explained.
Handson

Confirm updates after import, click the annotation control (as shown on picture) to bring all relevant annotation columns on display. Note that you can also customize your table using the relevant controls

Link protocols directly to assay input samples

Protocols must be directly connected to the sequencing libraries used as assay input.

We are not (yet) crawling upstream/downstream relationships to infer missing protocol links.

For NGS submissions, each sequencing library should include at least:

  • Mandatory
    • Culture & Growth
    • NGS Library Preparation
  • Recommended
    • Extraction
    • Additional context protocols when relevant

Requirements can vary depending on the target repository profile (ENA, EGA, BioStudies).

4.2 Protocol summary vs description

When filling protocol details:

  • Summary: concise, export-ready method text (used in exports)
  • Description: extended details, rich text, optional deeper notes

4.3 Protocols used in this tutorial

The following protocols should already exist as public protocols in LabID (created before training):

  1. Culture & Growth: Neuron Isolation from Mouse Brain
  2. Culture & Growth: Primary Neuron Culture with Targeted Knockout of Brpf1
  3. Extraction: RNA Extraction with RNeasy Mini kit
  4. NGS Library Preparation: mRNA Library Preparation with NEBNext Ultra II Directional RNA Library Prep Kit for Illumina
  5. Sequencing: Illumina HiSeq 4000 Sequencing (Paired-End)

About sequencing protocol creation

The sequencing protocol is typically standardized in export and may be generated automatically in the MAGE-TAB output. In practice, you usually do not need to manually create it during the training.

Create your own protocols

You may also download the protocol text used in this tutorial and create your own protocols. If you create your own protocols, make sure to add your trainee number XX in the protocol name to avoid confusion with the existing protocols used by other trainees, and find them easily in drop down lists.

  • Select your 4 libraries in the sequencing library list page using the checkboxes on the left of the table, and click the batch annotate icon available in the table tools.
Batch link protocols - step 1

Select protocol batch edit mode from the sequencing library list

  • Select the ProtocolList attributes, and start adding protocols one by one using the ...or add a protocol to the list field. Add protocols following the experimental order.
Batch link protocols - step 2

Adding protocols one by one following the experimental order.

  • Add the four protocols as shown below
Batch link protocols - step 3

Adding the four protocols following the experimental steps

  • Click Save to confirm the update and then exit the batch edit mode by clicking the Done button.
  • Congratulations, you have now linked the relevant protocols to all your sequencing libraries in one batch operation, and they are ready for export with the correct protocol information.
Protocol linkage result

All selected libraries now share the expected protocol set

Step 5. Export MAGE-TAB to validate your study

At this stage, your libraries should resemble the example below.

Fully annotated library example

Reference state before export

If this is the case, you are ready to export MAGE-TAB and review the validation report.

  • You most likely need to set up the CLI for your trainee user. If needed, first complete CLI - 101 Getting Started.
  • Run the MAGE-TAB export command from your working directory
labid export study -o <output_dir> -s <study_uuid>

Replace <study_uuid> with your study UUID copied from the UI (a UUID looks like 114c10ae-9302-4a3a-b6c9-86af435dac7f), and <output_dir> with the path to the directory where you want to save the exported files (for example . will use the current dir).

This creates 2 files in the export directory, the magetab.txt (Download example) is the MAGE-TAB table, and report_<study_uuid>.txt (Download example) is the validation report (also shown below)

  • Take a minute to inspect the validation report. The report features four different sections (Study, Protocol, Data & Annotations), each with an overall Status. The status should be PASS if your study is correctly configured, otherwise error messages will guide you to the issues to fix in LabID before export.
Validation report (PNG view)

Export validation outcome reflects your study configuration (factors + annotations + protocols)

  • In any case, open magetab.txt in Excel (or equivalent) and review both the IDF section (study-level metadata, at the top of the file) and the SDRF section (sample/library/dataset rows, at the bottom of the file).

In particular, look for missing/incorrect values in SDRF section. This is a practical way to catch missing values or inconsistencies. If you find issues:

  1. Correct them in LabID
  2. Re-export MAGE-TAB
  3. Re-check report and table until clean
Learning more about MAGE-TAB

The content and format of the MAGE-TAB file is quite intuitive, but if you want to learn more about it, you can refer to the following resources:

  • see the old tab2mage documentation (ancestor of MAGE-TAB) for details on the format and expected content of the IDF and SDRF sections
  • Any AI chat box should also be able to help you understand the content of the MAGE-TAB file if needed

Step 6. Submit to ENA using XML format

Once MAGE-TAB review is satisfactory, you may proceed to submitting the data to the ENA. The ENA submission requires generating specific XML files and submitting them through the ENA Webin tool.

Here we use the ena-upload-cli open source tool which accepts simple tabulated files, generates the XML for you and directly submit the generated XML to ENA Webin.

6.1 Export tables suitable for the ENA upload CLI tool

We first export the study as a set of tables suitable for the ENA upload CLI tool. This is achieved by running a command similar to the MAGE-TAB export, but with a different format option:

labid export study --study-type rnaseq -s <study_uuid> --format enatables -o <output_dir>

This command should generate four tabular files:

Take a minute to inspect these files.

Note that it is always possible to manually amend them if needed, but ideally they should be correct as is. If you find issues, it is better to fix them in LabID and re-export the tables rather than manually amending them, to avoid inconsistencies between the source of truth (LabID) and the exported files.

6.2 Submit XML to ENA

The XML generation and submission to ENA Webin can then be done in one step using the labid export ena command, which accepts the generated tables as input and directly submit the generated XML to ENA Webin using the provided credentials (-f enawebin.yaml). Note that YOUR_CENTER should be replaced by the center name provided by ENA Webin (for example "EMBL").

What is the format of enawebin.yaml file

The enawebin.yaml file is a simple text file containing the credentials for ENA Webin submission, including the username and password.

The format of the file is as follows:

username: Webin-XXXXX
password: your_webin_password

The command would look like this:

labid export ena -f enawebin.yaml -c <YOUR_CENTER> -s <study_uuid> -d <output_dir> 
successful submission output example

Output of a successfull submission to the ENA sandbox

When successfully executed, the command should generate the XML files, submit them to ENA Webin and return a submission receipt (receipt.xml) containing the different ENA accession numbers. The input tables are also duplicated and augmented with the ENA accession number, the submission status and date (please check the ena-upload-cli repository for details).

The different XML documents, updated input tables and submission receipt are provided as examples in the following zip archive

For full details and prerequisites (Webin account, upload steps, test server recommendation), follow:

You now have a complete end-to-end workflow to prepare, validate and submit an RNA-seq study from LabID to ENA.