WP4:Data deposition

Workpackage No.: 
WP4
Workpackage Title: 
Data Deposition
Activity Type: 
COORD
Objectives: 
The first objective of this work package is to implement harmonised and compatible data deposition and annotation strategies across ProteomeXchange partners, providing data producers with a single point of submission. The data deposition and exchange workflow in the consortium will be formally defined, agreed, and documented. These efforts will focus on PRIDE, and Tranche, the external partner NCBI Peptidome, taking into account their key roles in the consortium as initial submission points, and location for large raw binary data. The second objective of this work package is to ensure that the submitted data is properly, consistently and sufficiently annotated. New ways of interaction between data repositories and the scientific journals will be implemented, in order to ensure data annotation quality and consistency, according to the required standard level set by each journal.
Description: 
Task 1: Definition and implementation of deposition data flow in the consortium In a close collaboration of all ProteomeXchange participants, and after consultation with stakeholders, we will define the ProteomeXchange data deposition workflow. Both PRIDE and Peptidome will be possible initial, single data deposition points, and they will co-ordinate the representation of results and metadata in relational database/XML representation, with raw data stored in Tranche. We will define the joint ProteomeXchange data format and submission requirements, likely a thin metadata wrapper around PSI data formats. On successful submission, a standard format file will be generated, containing a ProteomeXchange accession number, metadata, and a private data access option for the use of the data owner and reviewers. The file will be sent to the data depositor, for him/her to pass on to the journal for review purposes. On publication of a manuscript, the associated dataset will be released, and an updated version of the metadata will be issued via the ProteomeXchange RSS notification system, allowing all interested parties, and in particular the “other” repository of the PRIDE/Peptidome pair to access, process, and import the relevant data. A preliminary format has already been proposed by the PRIDE and PeptideAtlas teams, but it will need to be refined and probably extended, ensuring that the relevant metadata can be propagated successfully. The challenges of this data flow, in particular due to the huge volumes of modern mass spectrometry data, are considerable, and implementation of a first version of the ProteomeXchange data flow by all partners, planned for month 18, will represent a major milestone of the project. While the initial development will focus on qualitative data, the second development iteration will implement the dataflow for quantitative data, taking into account the results of WP2. This task involves contributions from all ProteomeXchange participants, reflecting the central role of the data deposition workflow for all partners involved. Task 2: Implementation of MIAPE/journal validation system The value of proteomics data without the proper biological and technical background is quite limited. Therefore, it will be necessary to ensure that all the submitted data is annotated in a correct, consistent and sufficient way. This task will develop tools to validate compliance of the submitted data with the standard PSI MIAPE guidelines (4) or specific journal requirements. As a result of the validation process, after ProteomeXchange data deposition, a report about guideline compliancy of each submission will automatically be generated. Similar to Task 1, this task will initially focus on qualitative data, and then extend to the more complex domain of quantitative data. This task builds on previous experience in the automated generation of MIAPE compliance reports (5) (http://www.proteored.org/Nodes.asp?pmnodo=1) by the group of Juan Pablo Albar (CSIC); on the existing PSI validator jointly developed by several consortium partners (6), and in particular on collaboration with partner Wiley-VCH as the publishers of the journals PROTEOMICS and PROTEOMICS – Clinical Applications.