Skip to content

ENA

Publishing Sequencing Data to ENA

The procedure to publish data to ENA described below is performed using the command line client and requires advanced unix skills. This procedure is the one followed by MODIS staff to publish your data.

Warning

ENA does not accept human data except commercially available cell lines. All human data must be submitted to the European Genome-phenome Archive (EGA) instead.

Important

You will need a valid Webin account at ENA before you can publish your data using the procedure described below. Please check the ENA documentation for more information on how to create a Webin account; at time of writting the Webin account creation page is here.

Once you have your Webin account, you can proceed with the submission process. We advise creating a file ena.yml containing the account information like (and store it in a private location):

username: Webin-XXXXX
password: 123456

Step 1: Review the study by exporting a MAGE-TAB document

After you checked the information in LabID and made sure that everything looks correct according to the Data Preparation instructions, we advise exporting the study as a MAGE-TAB document. It is sometimes challenging to have a complete overview of the study information in LabID. The MAGE-TAB document contains all the information about the study, including sample descriptions, protocols, and fastq files and is therefore a good way to review the study information and spot errors or missing information. It also gives a good idea of what the study will look like for other people.

The command line to export a MAGE-TAB document for a study is as follows:

$ labid export study -s <study_id> --format magetab -o <output_dir> -f magetab.txt
where:

  • <study_id> is the UUID of the study you want to export,
  • <output_dir> is the directory where you want to save the MAGE-TAB document.

Tip

We advise to create a folder for your submission, e.g. ena_submission, and to use this folder as parent of the <output_dir>. This way, you will have all the files related to your submission in one place. Assuming you created a folder called /home/username/ena_submission, the <output_dir> could be /home/username/ena_submission/magetab/.

After successful export, the exported MAGE-TAB document (named magetab.txt in the example) will be found in the <output_dir> directory, it is a plain tab-delimited text file and can be open in spreadsheet editor (e.g., Excel) for further inspection.

Additionally, a report_<study_UUID>.txt file is generated in the <output_dir> directory. This file contains information about the export process, including any errors or warnings that were identified during the export. Please take a look at this report file when inspecting the MAGE-TAB document to ensure that everything is correct before proceeding with the data upload. If you spot mistakes, you can correct them in LabID and re-run the export command until you are satisfied with the result.

Important

Submission to ENA using the default sample profile (which is used in LabID) requires providing the collection date and geographic location (country and/or sea) information on sample. These are collected from the LabID annotations named Collection Date and Country and/or Sea, respectively. When your samples are not annotated with these annotations in LabID, the ENA export procedure adds them with default values (not provided, and not applicable respectively) but these defaults will not be visisble in the exported MAGE-TAB. If you have sensible values for these annotations, please make sure to annotate your samples with Collection Date and Country and/or Sea in LabID.

Step 2: Upload the FastQ files to your ENA Webin account

This requires you to have previously created a Webin account at ENA.

First, export links to the fastq files to a directory of your choice ("output_dir"):

$ labid export links -s <study_id> -t <output_dir>

where:

  • <study_id> is the UUID of the study you want to export (i.e., the same as in step 1),
  • <output_dir> is the directory where you want to create the symbolic links.

Tip

When you followed our advise to create a folder for your submission, e.g. /home/username/ena_submission, the <output_dir> would be /home/username/ena_submission/fastq/.

Then upload the fastq files to your ENA Webin account using FTP command, here we use the lftp FTP client (note that at EMBL, one can use module load lftp on any server). As FTP upload can be quite long, we advise using a screen session (or equivalent)

 # go to the directory where the symbolic links have been exported e.g. /home/username/ena_submission/fastq/
 $ cd <output_dir>
 # start a screen session
 $ screen -S ena-upload 
 # connect to ENA FTP server with your Webin account
 $ lftp webin2.ebi.ac.uk -u Webin-xxxx
 # -> enter password at the prompt 
 # upload all fastq files (assuming their are xxx.fastq.gz)
 $ mput *gz
 # exit
 $ bye

Step 3: Export the study as ENA tables

This will export the same information as contained in the MAGE-TAB but in simple tables suitable for the ena-upload-cli tool

We first export tables suitable for the ena-upload-cli tool. These tables basically contain the same information as the MAGE-TAB document. The tables are exported in a directory of your choice ("output_dir"):

$ labid export study --study-type <study_type> -s <study_id> --format enatables -o <output_dir>

where: - <study_id> is the UUID of the study you want to export (i.e., the same as in step 1), - <output_dir> is the directory where you want to export the tables. - <study_type> is the type of study you want to publish (e.g., RNA-seq, ChIP-seq, etc.), see help for options (labid export study --help).

Tip

When you followed our advise to create a folder for your submission, e.g. /home/username/ena_submission, the <output_dir> would be /home/username/ena_submission/enatables/.

After successful export, four exported tables should be found in the <output_dir> directory, and named ena_study_<study_id>.txt, ena_sample_<study_id>.txt, ena_experiment_<study_id>.txt, and ena_run_<study_id>.txt.

Step 4: Upload the ENA tables as XML documents to ENA

The following LabID command is used to:

  1. call the ena-upload-cli tool with the tables exported in the previous step. The ena-upload-cli tool will generate the XML documents and upload them to ENA.
  2. register back the accession numbers found in the response (ENA Study, ENA Dataset & samples, as well as BioSamples accessions) in LabID. This action only occurs when the XML submission is successful.
$ labid export ena -f enawebin.yaml -c <YOUR_CENTER> -s <study_id> -d <output_dir> --no-data-upload --execute

where:

  • <YOUR_CENTER> is the name of your center as associated with your Webin account,
  • <study_id> is the UUID of the study you want to publish,
  • <output_dir> is the directory where the ENA tables are located.
  • --no-data-upload is used to skip the data upload step, as this has already been done in step 2.
  • --execute is used to execute the command and submit the data to ENA for real, you may remove this option to perform a submission to the ENA test server. This is highly advised to validate the submission against the test server before submitting to the production server.

Important

The command expects the ena-upload-cli tool to be in your $PATH. If not or if you want to use a specific version of the tool, you can specify the path to the tool using the --ena-upload-cli option.