July 2004 Newsletter
July 2004 Newsletter
- The SEED Developers Meeting in Bielefeld
- Subsystem Tutorials
- The Subsystems Forum
- Gene Calling
- Computational Servers
The SEED Developers Meeting in Bielefeld
We have just returned from the SEED Developes meeting held in Bielefeld, Germany on July 5-9. It was a remarkably good experience. There were three distinct events going on:
- On July 5-6 a tutorial was held on how to annotate subsystems using the SEED. Participants from Germany, France, Denmark, the Netherlands, Poland, and Russia were there. It was a productive exchange, and it seems very likely that a number of expert biologists will complete their versions of subsystems and publish them to the clearinghouse.
- On July 5-9, Ross taught a course in use of the SEED to a number of students and post-docs from Bielefeld. It has become apparent that the right way to teach a course about use of the SEED is to focus on development of subsystems. Before we had the subsystems-encoding component of the SEED operational, we taught several courses in use of the SEED for comparative analysis, and we sincerely believe they were quite successful. However, it is now clear that developing a 4-lecture course with assignments focusing on the topic of encoding subsystems is needed badly; this is clearly the emphasis that should exist in a "short module" that could be included in standard courses in molecular biology, biochemistry, and microbiology.
- Finally, on July 7-9 a major effort was initiated to link the SEED with GenDB. GenDB is a serious annotation system developed by the team at Bielefeld. A substantial amount of effort and resources have gone into a system that offers many of the features needed to carefully annotate prokaryotic genomes. Many of these features are missing in the SEED. On the other hand, the SEED offers many services in comparative analysis which are missing in GenDB. The systems complement one another in several ways. Hence, we decided to consider the question "Can they easily be coupled?". The team at Bielefeld had already spent time developing a detailed protocol for linking major components of their system (allowing almost independent development, with each component having access to the data maintained by other components). In short, they had thought about the general issues that inevitably come up in such exercises. Within the 3-day session,
- GenDB was installed at Argonne (the SEED was already running on the Bielefeld server under Solaris).
- The access protocols were rapidly installed by the Bielefeld team, which allowed GenDB to access data stored in the SEED. This allowed a user running GenDB to "import" one of the many genomes stored in the SEED into GenDB. It is hard to convey the level of enthusiasm that led to this rapid integration. Among other things, this will allow users to perform detailed editing of gene locations (adding missed genes, adjusting starts, deleting genes that are apparently not real, etc.) This type of detailed editing will be necessary for work with many types of features (e.g., transposons and regulatory sites), and the SEED simply lacks the capabilities to do it properly at this point.
- Detailed plans for modifying the SEED to accept updates relating to features, to smooth out the details of gene calling, and to get computational servers up and running (at both Bielefeld and Argonne) were all discussed. It seems very likely that we will be able to distribute DVDs containing both systems, which we believe will be a major step forward for support of annotation projects, within just a few months.
- Meetings over the internet using the Access Grid were used on a daily basis to support coordinating the Argonne, FIG and Bielefeld teams. These will continue on a weekly basis.
A great deal was accomplished in a very short time. The only known failure related to Ross' promising to locate a "missing gene" in Corynebacterium glutamicum. He concedes that the group could not find one in time, but still insists that one will result from the exercises that people began. His optimism may exceed his judgement.
It was overall a truly magnificent meeting.
The 2-day tutorial held in Bielefeld was completely different than the 1-day one held in Chicago. Although both events were fun and productive, the addition of the extra day allowed people to get much deeper into the topic. The plan called for people starting to work on their subsystems of interest by noon of the first day, which allowed people to confront central issues more quickly. By the end of the second day, everyone was fairly far along, and a few had relatively large spreadsheets. While it is certainly true that it takes months to accurately analyze most of these systems (if not years), it is also true that the existing functional assignments can be dramatically improved with just a day or two of effort by someone interested in participating in this project.
We now believe that annotation of subsystems needs to be the major focus of our efforts over the coming 4-6 months. We must extend the software, offer as many tutorials as possible, and actively seek experts who are willing to participate. The next subsystems tutorial will be held at Los Alamos on August 26-27. Two more are in the very loose planning stage, but we expect to be able to clarify those plans by the time we send the next newsletter.
The Subsystems Forum
Over beers, a number of us discussed the issue of recording the events that will be taking place over the next 2-3 years. If the genes implementing the central functional roles of cellular subsystems are actually worked out, it would be desirable to keep an electronic record of the key events. In addition, it would be nice to have a means of rapidly exchanging information on the status of conjectures, wet lab confirmations, and so forth. Hence, we funded the development of a "Subsystems Forum", which was started just before the Bielefeld meeting. It will be hosted at the University of Chicago. We are urging those people developing annotated subsystems to post both questions and "open problems". The URL for the forum ishttp://TheSEED.uchicago.edu/SubsystemsForum
The subject of gene calling was discussed at the developers meeting. The Bielefeld team has put up a server which already calls protein encoding genes and tRNAs. It can be accessed viahttps://www.cebitec.uni-bielefeld.de/software/gendb/cgi-bin/seed_upload.cgi
The general plan is as follows:
- We will develop the tools to construct a SEED-formatted organism from the output of the Bielefeld gene calling server, including the rRNA caller developed by Niels Larsen.
- GenDB offers a framework for examining and evaluating the relative merits of different gene calling algorithms. We will work on making it possible to easily compare the output from different systems.
- Ralph Butler and Ross will be working on a system to recall starts for genes from a single subsystem.
- It is important that, if we succeed in improving calls substantially that we make the results available to NCBI so that the improvements are not lost.
While the existing gene calls are excellent for some organisms, they are truly awful for others. Hopefully we can make rapid progress in improving the situation over the coming months.
ANL wrote a grant in which it was proposed to (among other things) supply computational servers to support adding new organisms to the SEED. It is not clear exactly what servers are needed to support things like gene calling, adding organisms to GenDB and adding organisms to the SEED. We will work the details out over the coming few months, coordinate with efforts that already exist to provide such services, and hopefully have things running somoothly by the fall.
TrackBack URL for this entry: