June 2004 Newsletter

Subsystem Development and the Project to Annotate 1000 Genomes

The major SEED development effort for the last two months has centered on annotation of subsystems. The basic goal is to develop and support the tools needed by an expert to annotate/analyze a specific metabolic or nonmetabolic subsystem over a large collection of genomes. We believe that development of these tools and the framework needed to effectively support comparative analysis is essential to straightening out the existing annotations.

The initial set of tools was released in March, and a small group of people began using them. The main objective was simply to make them reasonably reliable, to identify exactly what functionality was most needed, and to establish a framework for saving and exchanging results. By May, we had demonstrated the basic utility of the tools. In at least two cases, the subsystem annotations that were developed during the shakedown phase resulted in serious conjectures relating to "missing genes". One of these conjectures has evidently been proven wrong, but it did result in a rapid identification of the correct gene (this is a tail best told over beers).

On June 18, we had a 1-day tutorial on use of the SEED to annotate subsystems, and it was a remarkably pleasant event. We had close to 30 people attend. The main complaint was "so little time, so much to do... we need two days". Hence, in the future subsystem/SEED tutorials will be 2-day affairs. The next one will occur on July 5-6 at the University of Bielefeld in Germany, and it will be held as part of the SEED Developers Meeting (which will be July 5-9). We are planning on having one at Los Alamos in August, if schedules permit, and one in Boston in the fall. Planning on these last two needs to progress, and we will try to keep everyone posted.

There is a great deal that could be said about the significance of the subsystem annotation effort. While many SEED users and developers are more interested in annotating single genomes or using the SEED for other purposes, we at FIG view the annotation of subsystems to be the essential core of The Project to Annotate 1000 Genomes, and as such it is one of the most critical and exciting components of the SEED collabortion. We believe that this will become obvious and accepted during the next 4-6 months. Time will tell.

SEED Developers Meeting

The next major SEED event will be the meeting at Bielefeld on July 5-9. There are a number of goals for that gathering:

  1. We will hold the subsystems tutorial on the first 2 days, and it appears that a number of people will attand just that component of the gathering.
  2. Ross is supposed to teach a class for graduate students during the week. There will clearly be a major overlap in content between the subsystems tutorial and the class (which will, hopefully, include sections on searching for missing genes and where the technology is going).
  3. On Wed-Friday, we plan on focusing on issues relating to supporting the development and maintenance of a set of communicating systems. Initially, we will focus on GenDB (the system developed at supported by the team at Bielefeld) and the SEED (the system FIG/ANL/Burnham have been developing). Both systems are open source, both groups are deeply interested in creating a joint framework that supports effective use of a loose integration of the systems, and both groups are forging ahead as quickly as possible.

The Bielefeld team has developed and is now shaking down a web service that will call genes in prokaryotic genomes. We have found this extremely useful in our efforts to produce a new release of the SEED. Ralph Butler is working on a tool to help predict more accurate start locations for CDSs, and we are slowly improving our installed genomes. Of course, we are taking RefSeq from NCBI, and they are making substantial and constant progress in cleaning up their collection. The issue of how everfything fits together, how users can install their own data on local copies of the SEED, and how exchange of data occurs will all be major topics at Bielefeld.

Funding of SEED Development

Until very recently support for development of the SEED came from a few consulting contracts obtained by FIG and by a few subcontracts that were (and are) deeply appreciated. This got FIG through last year, and it got the SEED to the point where it has demonstrable utility.

The basic model for funding SEED development goes as follows:

  1. We hope to have many institutions participating. Each is responsible for getting its own funding.
  2. Any institution that wishes can base projects (and proposals) on the SEED. They may, or may not, include FIG as a participant or subcontractor. No one should feel any pressure to include FIG, since the SEED technology is completely available free of charge to anyone.
  3. FIG has played a leadership role until now due to the fact that Ross Overbeek did the majority of design and implementation during the first year. Argonne National Lab has begun to play a major role, the University of Chicago is building a major project upon the SEED, and it seemd likely that actual leadership of the effort will occur on a largely informal basis and be shared between a growing number of senior participants (or junior, for that matter, if they demonstrate they can do it).
  4. FIG and the Computational Institute at University of Chicago have been awarded a sizable five-year contract to construct a National Pathogen Database. This immediately solidifies the future of the SEED effort, since that project will certainly be based upon SEED technology. It will also ensure that rapid development of new capabilities (most notably inclusion of expression, structural, and variation data) will occur. We encourage other organizations to build upon the SEED (thus, amortizing development costs over more projects).

    That is all for now. We plan to write a more extensive newsletter after the Bielefeld meeting. There are a number of exciting things that are planned for that gathering, and we anticipate that there will be much to report.

