Frequently Asked Questions


The ProteomeXchange consortium has been set up to provide a single point of submission of MS proteomics data to the main existing proteomics repositories, and to encourage the data exchange between them for optimal data dissemination.

Current members accepting submissions are:

The PRIDE PRoteomics IDEntifications database at the European Bioinformatics Institute focusing mainly on shotgun mass spectrometry proteomics data

PeptideAtlas/PASSEL focusing on SRM/MRM datasets.

The default PRIDE submission consists on the deposition of MS/MS proteomics datasets according to the guidelines of the ProteomeXchange (PX) consortium. The figure below shows the overall submission process submitters will have to follow up to the point of uploading their datasets.

new pride/px submission workflow

Each submitted PX dataset will contain:

- peptide/protein identification files (called ‘RESULT’),

- mass spectrometer output files (called ‘RAW’), which are either machine raw files or not heavily processed files in a XML standard format such as mzXML or mzML,

- other files like peak list files (called ‘PEAK’), search engine output files (called ‘SEARCH’), quantification files (called 'QUANT') and different post-processing files, amongst others.

The current version of pipeline does not explicitly support quantification results unless they are provided included in the PRIDE XML files. However, quantification result files can be submitted as accompanying (‘QUANT’) files.

There are two different submission workflows depending on whether peptide/protein identification results can be submitted in a format that can be handled by PRIDE (PRIDE XML at present, the data standard mzIdentML in the near future) or not. If PRIDE XML ‘RESULT’ files are provided a "Complete Submission" option is available. If PRIDE XML files are not available a "Partial Submission" can be done.

Before a submission is started it is necessary to have a PRIDE user account (register here). All submissions to ProteomeXchange are private by default, and the username and password are needed to access your data.

Yes. Private is the default option for ProteomeXchange submissions via PRIDE and the curators only make it public once it was requested by the submitter or the paper referencing the dataset has been published.

Two main submission types/workflows are available: Complete Submission or a Partial Submission. For all types of submissions to PX via PRIDE, the user will need to use the PX Submission tool (available at http://www.proteomexchange.org/submission).

The 2 subtypes of Complete Submissions are either mzIdentML- or PRIDE XML-based. Complete Submissions mixing the 2 types of ‘RESULT’ files are not allowed. 

An mzIdentML-based Complete Submission requires 3 types of files:

  • Result files: mzIdentML 1.1 files with identifications provided. In the submission tool they should be tagged as “RESULT”. It is also recommended to check your mzIdentML files before submission using the PRIDE Inspector tool (the version supporting mzIdentML will be out in early January, 2014). mzIdentML version 1.0 files are not supported.
  • Peak list files: Since the mzIdentML files themselves do not contain the spectra information it is mandatory to provide the peak list files (eg. mgf files) that were used for the original search and are referenced in the mzIdentML file. These are different from the provided mandatory raw files. In the submission tool they should be tagged as “PEAK” and the submission tool will try to automatically map the peak files to the mzIdentML file where they are listed.
  • Raw files: the MS instrument output files, for instance Thermo RAW files. As an alternative, lightly processed  mzML, mzXML, mzData files are acceptable if MS1 level spectra information is available and the different peak processing steps are known. In the submission tool they should be tagged as “RAW”.

Please check our Guide to generate mzIdentML files. It is possible that you are already using a pipeline/search engine where mzIdentML files are amongst the native search engine output formats. mzIdentML files can be created/exported already with numerous tools, please see a list here.

Besides the three mandatory file types above, there are optional and recommended file types that can be prepared and uploaded as well:

  • Search engine result files: The original output from your search engine or your analysis pipeline used by you for further post-processing, such as Mascot .dat files, Trans Proteomics Pipeline (TPP) pep.xml and/or prot.xml files among many others. In case your search engine generated mzIdentML files by default you already provided them as "Result" files. They search engine files should contain peptide/protein identification results. In the submission tool they should be tagged as “SEARCH”.
  • Quantification result files: In many cases current mass spectometry proteomics studies do involve a quantitative analysis on the peptides/proteins present in the samples. Quantification related files reporting on peptide/protein quantitative values/ratios can be provided and tagged as "QUANT" in the submission tool.
  • Gel image files: In case two-dimensional gel electrophoresis has been used as a separation method the gel image files can be provided. In the submission tool they should be tagged as ‘GEL’.
  • Other files: Everything else that did not fit into the 6 categories above for instance protein inference files generated by post-processing of the search engine results or R scripts used for data analysis. If you have used custom search databases you can provide those as well. In the submission tool they should be tagged as ‘OTHER’.

PRIDE XML-based Complete Submission requires 2 types of files:

  • Result files: fully supported by PRIDE: PRIDE XML files with identifications provided. In the submission tool they should be tagged as “RESULT”. It is also recommended to check your PRIDE XML files before submission using the PRIDE Inspector tool.
  • Raw files: the MS instrument output files, for instance Thermo RAW files. As an alternative, lightly processed  mzML, mzXML, mzData files are acceptable if MS1 level spectra information is available and the different peak processing steps are known. In the submission tool they should be tagged as “RAW” and mapped to the corresponding "RESULT" files.

Try to create PRIDE XML files using the PRIDE Converter 2 tool. Please take a moment to review our Guide to generate PRIDE XML files concerning the input files you can use for PRIDE XML generation. There are other tools that can produce PRIDE XML files, not mantained by the PRIDE team, like PeptideShaker, Waters PLGS, ProteiosEasyProtMIAPE Extractor (ProteoRed), or the original PRIDE Converter (no longer further developed). 

Besides the two mandatory file types above, there are optional and recommended file types that can be prepared and uploaded as well:

  • Peak list files. It is strongly recommended to provide the peak list files (eg. mgf files) that were used for the original search and these are different from the provided mandatory raw files. In the submission tool they should be tagged as “PEAK”.
  • Search engine result files: the original output from your search engine or your analysis pipeline, such as Mascot .dat files, Trans Proteomics Pipeline (TPP) pep.xml and/or prot.xml files or mzIdentML files, among many others. They should contain the peptide/protein identifications. In the submission tool they should be tagged as “SEARCH”.
  • Quantification result files: In many cases current mass spectometry proteomics studies do involve a quantitative analysis on the peptides/proteins present in the samples. Quantification related files reporting on peptide/protein quantitative values/ratios can be provided and tagged as "QUANT" in the submission tool.
  • Gel image files: In case two-dimensional gel electrophoresis has been used as a separation method the gel image files can be provided. In the submission tool they should be tagged as ‘Gel’.
  • Other files: Everything else that did not fit into the 6 categories above for instance protein inference files generated by post-processing of the search engine results or R scripts used for data analysis. If you have used custom search databases you can provide those as well. In the submission tool they should be tagged as ‘OTHER’.

In case of a Complete Submission a DOI (Digital Object Identifier) will be assigned to your dataset and its transparency level will be higher. That is good for your data and good for the community.

Two main submission types/workflows are available: Complete Submission or a Partial Submission, as highlighted in Figure 1. For all types of submissions to PX via PRIDE, the user will need to use the PX Submission tool (available at http://www.proteomexchange.org/submission).

You should only choose this option if your search results cannot be converted/exported to PRIDE XML or mzIdentML v1.1 (plus the accompanying spectra). It is not the recommended option, since it will significantly reduce the reusability of your dataset. ‘RAW’ files need to be provided together with search engine output files (‘SEARCH’). Uploading peak list (‘PEAK’), quantification and other types of files (‘QUANT’, ‘GEL’ or ‘OTHER’) is possible but not enforced. As a result, you will be issued with a ProteomeXchange accession number but not with a DOI. Once it is made public, your dataset will be available to download via FTP.

Partial Submission requires 2 types of files:

  • Search engine result files: (called ‘SEARCH’): the original output files from your search engine or your analysis pipeline, Trans-Proteomic Pipeline (TPP) pep.xml and/or prot.xml files, or MaxQuant text output files, among many others. They should contain the peptide/protein identifications. In the submission tool they should be tagged as ‘SEARCH’.
  • Raw files (called ‘RAW’): MS instrument binary output files, such as  Thermo RAW files, BRUKER .baf files or not heavily processed mzXML or mzML files. If your ‘RAW’ files are organized in directories instead of individual files, please compress them into one individual file (for instance to .zip) before upload. In the submission tool they should be tagged as ‘RAW’.

Besides the two mandatory file types above, there are optional and recommended file types that can be prepared and uploaded as well:

 

  • Peak list files. It is strongly recommended to provide the peak list files (eg. mgf files) that were used for the original search and these are different from the provided mandatory raw files. In the submission tool they should be tagged as 'PEAK'.
  • Quantification related files: In many cases current mass spectometry proteomics studies do involve a quantitative analysis on the peptides/proteins present in the samples. Quantification related files reporting on peptide/protein quantitative values/ratios can be provided and tagged as 'QUANT' in the submission tool.
  • Gel images files: In case two-dimensional gel electrophoresis has been used as a separation method the gel image files can be provided. In the submission tool they should be tagged as ‘GEL’.
  • Any other files: Everything else that did not fit into the 5 categories above for instance protein inference files generated by post-processing of the search engine results or R scripts used for data analysis. If you have used custom search databases you can provide those as well. In the submission tool they should be tagged as ‘OTHER’.

To perform a partial Submission means that a PX accession number will be assigned to your files but PRIDE experiment accession numbers won't be issued. Also, you won't have a DOI assigned to your dataset.

While the data is still private (during the manuscript review process) it is possible to resubmit the whole dataset by keeping the previously issued PX identifier. Data resubmissions consisting in a subset of the previous submission are not currently supported.

 In order to perform a resubmission, you first you need to install and launch the PX submission tool (available at http://www.proteomexchange.org/submission).

 Step 1: Click resubmission on the ‘Welcome’ page

Step 2: Enable resubmission and provide resubmission details

 In the pop-up dialog box please provide your PRIDE log-in details and select the PX identifier of the dataset you want to resubmit .

 

After these two steps the resubmission follows the same steps described for a regular submission.

 Thermo .RAW, ABSCIEX .wiff, .wiff.scan, Agilent .d/, Waters .raw/  imzML, Shimadzu .run/, Bruker .yep, Bruker .baf

As an alternative, lightly processed  mzML, mzXML, mzData files are acceptable if MS1 level spectra information is available and the different peak processing steps are known.

By default we recommend to add the following formula to your manuscript (typically in "Material and Method"s or just before/in the Acknowledgements):

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository [1] with the dataset identifier <PXD000xxx>."

In case of Complete PX Submissions an extra DOI can be added to this:

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository [1] with the dataset identifier <PXD000xxx> and DOI 10.6019/<PXD000xxx>."

[1] and also for general PRIDE reference, please use: Vizcaino JA, Cote RG, Csordas A, Dianes JA, Fabregat A, Foster JM, Griss J, Alpi E, Birim M, Contell J, O'Kelly G, Schoenegger A, Ovelleiro D, Perez-Riverol Y, Reisinger F, Rios D, Wang R, Hermjakob H. The Proteomics Identifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res. 2013 Jan 1;41(D1):D1063-9. doi: 10.1093/nar/gks1262. Epub 2012 Nov 29. PubMed PMID:23203882.

Additionally we'd like to ask you to also put this information in a much abridged form into the abstract itself, like this: "The data have been deposited to the ProteomeXchange with identifier <PXD000xxx>." See for example this Chromosome-Centric Human Proteome Project dataset and paper: http://www.ncbi.nlm.nih.gov/pubmed/?term=23312004 and other examples on PubMed. A PX Identifier in the abstract makes the dataset much more visible and accessible.

 

For some bulk submissions or for Aspera upload the summary.px file can still be generated and exported with the PX Submission Tool, although the files won't actually be uploaded with the tool itself.

When using the PX tool there is an "Export Summary" button that you can click after you have done the mappings and that will export the summary.px. It is necessary to have all the files available in a folder so this folder will be used to supply all the file names. But the data won't be actually uploaded, only the summary file exported. See screenshot.

You don't have to start the actual file upload that way. You can send that small file to the database curators via email and upload the files via Aspera.

 

 

 

In case of bulk submissions when Aspera is used and there are too many files to handle them with the PX Submission Tool Gui the summary files can be generated by scripting. Details of the tab delimited PX Submission format can be found in the ProteomeXchange_Submission_Summary_File_Format.pdf that is distributed with the PX Submission Tool and can be downloaded from here. Example submission summary files (Complete and Partial) can be downloaded form here.

No. PRIDE XML files without protein/peptide identifications unfortunately do not qualify for a Complete PX Submission. You will have to investigate the options to provide identifications as well.

In case you are unable to produce pride xml files with identifications included you can still do a Partial/Unsupported PX Submission without pride xml files at all. You can provide your identification files as search files and map raw files for that.

Alternatively please check whether the software tools you are using can export MzIdentML 1.1 and we recommend uploading those files with the accompanying mgf peak list files that are referenced in them. Right now these can only be downloaded as Partial Submission but soon we will be able to natively support MzIdentML 1.1 as a Complete Submission. So in case you have uploaded your peak list files as well there is a chance that those datatsets can be turned into Complete Submissions later by adding results file level metadata information.

From version 1.0.4 and up the proxy details can be manually set with the PX Submission Tool. The tool's working directory contains a 'config' folder with a text file 'config.props' There the proxy host and port can be manually set by overwriting and uncommenting the proxy details below:

# px.proxy.host = localhost
# px.proxy.port = 8080

See also screenshot:

proxy setup

MaxQuant output is not supported by PRIDE Converter 2 so a Partial ProteomeXchange Submission is recommended where search/identification files should be uploaded alongside with the machine raw files.

If you are using the latest version of MaxQuant (1.3.0.5) there is a txt folder generated and by default you can just zip this text folder and upload as search/identification file.

If this is complicated we would recommend uploading the following particular text output files:

parameters.txt

peptides.txt

modifiedPeptides.txt

proteinGroups.txt

and your Experimental Design Template file saved as a tab separated file.

Scaffold 4.0 can export MzIdentML 1.1 and we recommend uploading those files with the accompanying mgf peak list files that are referenced in them. This way an MzIdentML based Complete Submission can be done. 

For older Scaffold versions we recommend exporting the binary .sf3 result files as ProtXML files and use them for a Partial Submission as "search" files. 

 

Technically PRIDE Converter 2 does support Proteome Discoverer .msf files but only from version 1.2 and up. In case there are problems with the conversion we recommend doing a Partial Submission instead of a Complete Submission. For a Partial Submission please export human readable pep.xml files out of the .msf files and upload those as well as search files since Proteome Discoverer can export pep.xml files from the binary .msf files.

For ProteinPilot as search/identification files we strongly recommend providing human readable files besides the binary group file. Please export the group files into xml files using the following command line feature:

"Command Line Control and Open Results

To support users and third-party software vendors that want to integrate ProteinPilot&trade Software, it is possible to script searches via command line and decrypt the .group file results into clear XML for full access to all the data it contains."

 Here is a howto on the conversion process from one of the PX submitters:

1. Create a txt file in Notepad entitled say "group2XML_Example.bat.txt" and save it in the ProteinPilot folder (where the group2xml.exe is located).

2. Rename "group2XML_Example.bat.txt" to "group2XML_Example.bat", giving it a Windows batch file extension.

3. Opened this batch file in Notepad and type in the following command line instructions:

group2XML.exe XML <full path to the .group file to be converterd> <full path to the .xml file the .group file will be converted into>

for instance

 group2XML.exe XML "C:\AB SCIEX\ProteinPilot Data\Results\Example.group" "C:\AB SCIEX\ProteinPilot Data\Results\Example.xml"

The command has the following argument structure: group2XML.exe <Type> <Result.group> <Output.file>

where:

- <Type> specifies the type of output

- <Result.group> is a .group file created by ProteinPilot Software

- <Output.file> is the name of the file to be created 

4. Save and close the file.

5. Double-click on the file to run the conversion.

 

Increasingly researchers are investigating samples with different omics methodolgies, for instance, RNA-seq for transcriptomics, mass spectrometry for proteomics, mass spectrometry for metabolomics. In these cases it is recommended to upload the datasets to the particular assay databases and cross-reference the issued accession numbers in the particular database entries. To avoid a vicious circle situation, in case of PX Submissions it is not necessary to provide the database accessions of other omcis datasets related to the same project/paper upon submission, but those can be provided later upon making the dataset public.