WP2: Standards development
The work package will develop and maintain the standard formats required for capturing and sharing data from mass-spectrometry (MS) based proteomics studies, building upon the efforts of the HUPO PSI. At the moment, the PSI is developing four XML-based data formats that each captures a different component of the proteomics workflow: mzML (mass spectra, version 1.1), TraML (transitions from multiple/selected reaction monitoring experiments, version 0.9), mzIdentML (peptide and protein identifications, version 1.0) and mzQuantML (quantification of peptides and proteins from mass spectra, initial requirements drafted). The main aim of this work package is to develop and release mzQuantML to capture data from all main quantitative methods in mass spectrometry proteomics. Additionally, the WP will also contribute to the on-going maintenance of release versions of mzML, TraML and mzIdentML, helping with bug fixes, updates to controlled vocabularies and documentation.
Description of work Task 1: Development of mzQuantML The WP will drive the development the XML Schema (XSD) for mzQuantML in the context of the PSI Proteome Informatics (PSI-PI) work group (http://www.psidev.info/index.php?q=node/319). The task of developing a standard format for quantitative data is considerable, due to the range of different methods employed for quantifying peptides and proteins from mass spectra, for example based on differential labeling of mixed samples or by label-free analysis of parallel runs (2). In the last spring PSI meeting (April 2009), it was decided that mzQuantML would be developed in a modular manner. First, a core format will be defined to capture overall relative or absolute values of peptides and proteins, and to relate these values back to the originating samples. Small modular extensions of the core will then be released to capture the required metadata describing how the values were calculated. It is generally accepted that standard formats are difficult to define until a particular technology or methodology has stabilized. As such, modules will be released first for well-established quantitative techniques (Level 1: spectral counting, differential isotope labeling). The technologies that are more challenging to represent (Level 2: label-free analysis based on extracted ion chromatograms) will be developed next. There are new methods reported for MS-quantitation on a monthly basis. Those methods that become well-established in the coming years will be developed as level 3 of mzQuantML. The main development activities for mzQuantML include: (i) interacting with bench scientists and extensive reading of the literature to define accurate community requirements and a set of comprehensive use cases; (ii) building drafts of the schema, a supporting controlled vocabulary and example files to exercise each use case; (iii) coordinating regular conference calls with the other stakeholders (open-source software developers, commercial software vendors and proteome informatics researchers) to ensure that the PSI-PI works as a coherent development team; (iv) defining and adhering to a strict release schedule for modules to allow software developers to schedule implementation of mzQuantML; (v) presentation at conferences and development meetings to received feedback from those not engaged in PSI, include laboratory scientists. Task 2: Maintenance of mzML, TraML and mzIdentML The work package will also contribute to the routine maintenance of the already released versions of the standards mzML, TraML and mzIdentML. In this context, regular updates to the PSI MS controlled vocabulary (http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MS) will be made to accommodate, for instance, new experimental technologies or new software platforms. Documentation will also need to be updated periodically. It is expected that the majority of the maintenance work will be devoted to mzIdentML, developed by the PSI-PI working group, since it is the most recently released standard. On the other hand, updates to mzML and TraML, developed by the PSI mass spectrometry (PSI-MS) working group, should be less time-consuming since they are currently much more stable. WP 2 will be coordinated by Dr. Andy Jones, from the University of Liverpool, who chairs the PSI-PI working group. AJ will attend PSI-PI meetings and conference calls, ensuring that development milestones are met. All participants contribute significant resources to this work package, to ensure that the developed standards meet the needs of the ProteomeXchange consortium and the wider community. The significant overall resource investment in this work package aims to overcome a limiting factor in standards development, namely the irregular progress due to the voluntary nature of contributions which has dominated the first years of PSI existence. With dedicated funding support from the recently completed EU ProDaC grant, we managed to stratify the PSI standards development, and deliver standards to a predefined timetable.