Hands-on - Publish a Sequencing Study¶
- 60 min
- Moderate
This hands-on follows a complete RNA-seq experimental story from biological setup to submission-ready exports.
Overview
The study compares Brpf1 knockout (KO) versus wild-type (WT) neurons derived from BALB/c mouse brain tissue, with two biological replicates per condition (4 libraries in total).
The experimental work can be summarized as follows:
- brain tissue is dissected from four BALB/c mice
- neurons are isolated and cultured
- Brpf1 knockout is introduced in the KO condition
- RNA is extracted
- mRNA libraries are prepared
- paired-end sequencing is performed on Illumina HiSeq 4000.
In this training, you will represent this experimental flow in LabID so that the final metadata, annotations, protocols and exports are coherent, reusable, and compatible with repository requirements.
Learning Objectives
In this tutorial you will learn how to:
- Register managed sequencing datasets and connect them to the correct KO/WT study context
- Capture publication-grade study metadata (title, narrative description, design terms, contributors)
- Define experimental factors and annotations that reflect biological variables and sample provenance
- Annotate sequencing libraries in batch with Excel templates and verify consistency
- Link mandatory and recommended protocols to the sequencing libraries in experimental order
- Export and review MAGE-TAB validation outputs, then generate ENA tables and submit XML
Walkthrough
Before you start, pick one trainee account in the range trainee1 to trainee40 and keep it for the whole tutorial (accounts over 40 are reserved for trainers).
Pick one trainee and keep it for all steps
Add your name in front of your trainee username in the shared trainee allocation table.
Link placeholder (to be replaced): Trainee allocation table
In this tutorial, you should replace the XX with your trainee number to customize sample/study names. This will make it easy to locate your item in selection dropboxes. For example, if you picked trainee49 (as shown throughout this tutorial), you should use a study name Wt vs KO RNA-seq Study [49] (pattern: Wt vs KO RNA-seq Study [XX])
Step 1. Register the datasets¶
This step follows Hands-on - Registration of Managed Data, with two important differences.
- In the study creation step (step 1), use the study name:
Wt vs KO RNA-seq Study [XX]
Create the study for this tutorial
- Intermediate steps are unchanged
- In the sample assignment step (step 5), rename samples as (replacing the
XXplaceholder):XX_KO_Rep1->49_KO_Rep1XX_KO_Rep2->49_KO_Rep2XX_WT_Rep1->49_WT_Rep1XX_WT_Rep2->49_WT_Rep2
Assign and rename samples
XX_KO_Rep1, XX_KO_Rep2, XX_WT_Rep1, XX_WT_Rep2) custommized with your trainee number (here 49)
You now have been redirected to the Assay Detail Page of the registered assay, where you can see the four registered datasets connected to the study. The assay overview table is a good place to quickly check that all datasets are present and correctly linked to the study.
Step 2. Edit the study metadata¶
Open your study and update the key publication fields:
- From the assay page, horizontally scroll the
Output Datasetstable to locate theStudiescolumn and click on the study name; or - Navigate to the study list page to locate your study (
Dataset ManagementMenu >Studiessub-menu)
2.1 Name, description & design terms¶
- Set the name to a fancier name as this will become the public name of your study once published:
RNA-seq analysis of Brpf1 knockout versus wild-type neurons derived from BALB/c mouse brain [XX]
- For the same reason, make sure the description accurately describes how the datasets were generated summarizing the main aspects from sample collection to data acquisition:
This study investigates the transcriptional impact of targeted knockout (KO) of the Brpf1 gene in primary mouse neurons using RNA sequencing (RNA-seq).
Brain tissue was dissected from four BALB/c mice. Neuronal cells were isolated and cultured in vitro for two weeks prior to RNA extraction. A targeted knockout of Brpf1 was introduced in two samples, while two samples were maintained as wild-type controls, resulting in two biological replicates per condition.
Total RNA was extracted and mRNA libraries were prepared by the EMBL Genomics Core Facility. Sequencing was performed on the Illumina HiSeq 4000 platform using paired-end mode.
This dataset enables comparative transcriptomic analysis between Brpf1 knockout and wild-type neurons to assess the gene's role in neuronal gene regulation.
- Add the
biological replicateandgenetic modification designdesign terms - Save the study details before moving to the next step
2.2 Experimental factors and annotations¶
We now configure experimental factors and suggested annotations.
What is an experimental factor?
An experimental factor is a variable that differentiates study conditions and can explain observed differences.
For this tutorial, the experimental factors are:
individual genetic characteristics(equivalent to genotype): with values either Brpf- or wild typegenetic modification: with values either gene knock out or none
In addition to factors, add key suggested annotations on SequencingLibrary (assay input) to support FAIR metadata and downstream validation. Given the description of the study the following annotations seem relevant and most should be considered mandatory:
StrainOrLinedescribes the mouse strain used to dissect brain tissue from, here BALB/cOrganismPartdescribe the dissected tissue from the mouse, here brainAgewith relevant time unit (here 21 weeks) of the mouse when sacrificed, using birth asInitialTimePointDevelopmentalStagecorresponding to the described Age, here adultSexof the sacrificed mouse, here all femaleCellTypethe isolated cell type from the dissected brain tissue, here Neuron
This follows the recommendations in Data Preparation.
Why add both factors and extra annotations?
- Experimental factors define condition structure for analysis
- Additional annotations improve reusability and repository compliance
- These fields are checked during export validation (report file)
How to decide what to add
- Use ENA checklist expectations as guidance for required sample metadata
- At EMBL, you can refine this list with MODIS during a consultation session
Ok, let's add the factors and annotations to the study, the Object field is always set to SequencingLibrary.
- Edit the study details and navigate to the
Experimental factorssection - Add the two factors (
genetic modificationandindividualgenetic characteristics) forSequencingLibraryobject, and check the Mandatory and Factor Value boxes - Add the suggested annotations as listed above (also for
SequencingLibraryobject), and check the Mandatory box for all of them
Experimental factors and suggested annotations
SequencingLibrary)
2.3 Contributors¶
Add contributors, including at least:
- Yourself with
submitterrole - In the context of this training, use a fake email like
trainee49@fake.org - Optionally add additional data contributors
Adding or removing datasets from the study
If your study contains datasets you do not want to submit, or misses datasets:
- Use dataset selection from the study table
- Use Batch Edit to add selected datasets to the target study
- Or remove wrongly associated datasets
For detailed guidance, see Data Preparation - dataset content.
Step 3. Annotate sequencing libraries¶
In the step, we annotate the sequencing libraries connected to your study.
Duplicate annotations on assay input samples
Always ensure annotations are present directly on assay input samples (SequencingLibrary).
We are not (yet) crawling the lineage graph to infer missing values from upstream nodes. Explicit sample-level annotation prevents ambiguity and avoids conflict-resolution issues.
3.1 Set study context and download the Excel template for batch edition¶
- Open the Sequencing Library list page available from the
Biomaterialsmenu -
- Click the arrow button (top right of the page), and pick Set study context.
- This opens a modal displaying a list page, where the study can be searched for. Select your
RNA-seq analysis of Brpf1 knockout versus wild-type neurons derived from BALB/c mouse brain [XX]study (by clicking on the checkbox ) and validate by clicking on the Set study Context button. This closes the modal and loads the list page with the study context.
- Select the 4 libraries of your RNA-seq study using the checkboxes on the left of the table, or use the top checkbox to select all libraries at once.
Then, download the template for batch edit:
- Click the download template icon button (available at the top of the table) to open the dropdown
- Select the Download template for batch edit option
Select libraries before export
- In the export dialog, make sure to add all needed annotation types as defined on the study
Export options for Excel batch annotation
- Click the Download buttom at the bottom.
3.2 Excel-based batch annotation¶
- Open the Excel template and remove all unnecessary columns
Empty template with relevant columns ony
- Fill all columns with the correct values according to the study setup (see screenshot below), you can also download this filled example file: Filled Excel template example to copy values into your own template, importantly do not copy the
idandnamecolumns.
- Re-import the filled template using the button above the table (right to the you used to get the template) and click on the Batch edit with spreadsheet link.
- Drag'n'Drop the excel spreadsheet on the drop zone. A green Successfully updated 4 items of type SEQUENCINGLIBRARY message should appear at the bottom of the page.
- Then navigate back to the sequencing library list page to check that the annotations are correctly updated and consistent with the filled template. Note that your study context should still be active, if not set it back as previously explained.
Handson
Step 4. Link protocols¶
Link protocols directly to assay input samples
Protocols must be directly connected to the sequencing libraries used as assay input.
We are not (yet) crawling upstream/downstream relationships to infer missing protocol links.
4.1 Mandatory vs recommended protocols¶
For NGS submissions, each sequencing library should include at least:
- Mandatory
- Culture & Growth
- NGS Library Preparation
- Recommended
- Extraction
- Additional context protocols when relevant
Requirements can vary depending on the target repository profile (ENA, EGA, BioStudies).
4.2 Protocol summary vs description¶
When filling protocol details:
- Summary: concise, export-ready method text (used in exports)
- Description: extended details, rich text, optional deeper notes
4.3 Protocols used in this tutorial¶
The following protocols should already exist as public protocols in LabID (created before training):
Culture & Growth: Neuron Isolation from Mouse BrainCulture & Growth: Primary Neuron Culture with Targeted Knockout of Brpf1Extraction: RNA Extraction with RNeasy Mini kitNGS Library Preparation: mRNA Library Preparation with NEBNext Ultra II Directional RNA Library Prep Kit for IlluminaSequencing: Illumina HiSeq 4000 Sequencing (Paired-End)
About sequencing protocol creation
The sequencing protocol is typically standardized in export and may be generated automatically in the MAGE-TAB output. In practice, you usually do not need to manually create it during the training.
Create your own protocols
You may also download the protocol text used in this tutorial and create your own protocols. If you create your own protocols, make sure to add your trainee number XX in the protocol name to avoid confusion with the existing protocols used by other trainees, and find them easily in drop down lists.
- Select your 4 libraries in the sequencing library list page using the checkboxes on the left of the table, and click the batch annotate icon available in the table tools.
- Select the
ProtocolListattributes, and start adding protocols one by one using the...or add a protocol to the listfield. Add protocols following the experimental order.
- Add the four protocols as shown below
- Click Save to confirm the update and then exit the batch edit mode by clicking the Done button.
- Congratulations, you have now linked the relevant protocols to all your sequencing libraries in one batch operation, and they are ready for export with the correct protocol information.
Step 5. Export MAGE-TAB to validate your study¶
At this stage, your libraries should resemble the example below.
If this is the case, you are ready to export MAGE-TAB and review the validation report.
- You most likely need to set up the CLI for your trainee user. If needed, first complete CLI - 101 Getting Started.
- Run the MAGE-TAB export command from your working directory
Replace <study_uuid> with your study UUID copied from the UI (a UUID looks like 114c10ae-9302-4a3a-b6c9-86af435dac7f),
and <output_dir> with the path to the directory where you want to save the exported files (for example . will use the current dir).
This creates 2 files in the export directory, the magetab.txt (Download example) is the MAGE-TAB table, and report_<study_uuid>.txt (Download example) is the validation report (also shown below)
- Take a minute to inspect the validation report. The report features four different sections (Study, Protocol, Data & Annotations), each with an overall Status. The status should be
PASSif your study is correctly configured, otherwise error messages will guide you to the issues to fix in LabID before export.
Validation report (PNG view)
- In any case, open
magetab.txtin Excel (or equivalent) and review both the IDF section (study-level metadata, at the top of the file) and the SDRF section (sample/library/dataset rows, at the bottom of the file).
In particular, look for missing/incorrect values in SDRF section. This is a practical way to catch missing values or inconsistencies. If you find issues:
- Correct them in LabID
- Re-export MAGE-TAB
- Re-check report and table until clean
Learning more about MAGE-TAB
The content and format of the MAGE-TAB file is quite intuitive, but if you want to learn more about it, you can refer to the following resources:
- see the old tab2mage documentation (ancestor of MAGE-TAB) for details on the format and expected content of the IDF and SDRF sections
- Any AI chat box should also be able to help you understand the content of the MAGE-TAB file if needed
Step 6. Submit to ENA using XML format¶
Once MAGE-TAB review is satisfactory, you may proceed to submitting the data to the ENA. The ENA submission requires generating specific XML files and submitting them through the ENA Webin tool.
Here we use the ena-upload-cli open source tool which accepts simple tabulated files, generates the XML for you and directly submit the generated XML to ENA Webin.
6.1 Export tables suitable for the ENA upload CLI tool¶
We first export the study as a set of tables suitable for the ENA upload CLI tool. This is achieved by running a command similar to the MAGE-TAB export, but with a different format option:
This command should generate four tabular files:
ena_study_<study_uuid>.txt(Download example)ena_sample_<study_uuid>.txt(Download example)ena_experiment_<study_uuid>.txt(Download example)ena_run_<study_uuid>.txt(Download example)
Take a minute to inspect these files.
Note that it is always possible to manually amend them if needed, but ideally they should be correct as is. If you find issues, it is better to fix them in LabID and re-export the tables rather than manually amending them, to avoid inconsistencies between the source of truth (LabID) and the exported files.
6.2 Submit XML to ENA¶
The XML generation and submission to ENA Webin can then be done in one step using the labid export ena command, which accepts the generated tables as input and directly submit the generated XML to ENA Webin using the provided credentials (-f enawebin.yaml). Note that YOUR_CENTER should be replaced by the center name provided by ENA Webin (for example "EMBL").
What is the format of enawebin.yaml file
The enawebin.yaml file is a simple text file containing the credentials for ENA Webin submission, including the username and password.
The format of the file is as follows:
The command would look like this:
When successfully executed, the command should generate the XML files, submit them to ENA Webin and return a submission receipt (receipt.xml) containing the different ENA accession numbers. The input tables are also duplicated and augmented with the ENA accession number, the submission status and date (please check the ena-upload-cli repository for details).
The different XML documents, updated input tables and submission receipt are provided as examples in the following zip archive
For full details and prerequisites (Webin account, upload steps, test server recommendation), follow:
You now have a complete end-to-end workflow to prepare, validate and submit an RNA-seq study from LabID to ENA.