December 2003 Newsletter
December 2003 Newsletter
- General Comments on the Schedule for Delivering the SEED
- An Increase in Programming Support
- Peer-to-Peer Updating
- FIG Development Meetings
- The "Annotate a Thousand Genomes" Efffort
General Comments on the Schedule for Delivering the SEED
We promised timely reports on progress, so we thought that it would be a good idea to send out one last newsletter this year. Progress has been very rapid. The SEED was shown at the SC 2003 supercomputing conference, and everything went amazingly smoothly. Early versions of the peer-to-peer updating capability were shown, and a genome was added in real-time. That is, we took a genome with just gene calls, added it to the SEED, computed similarities, made automated assignments, and displayed the results during the demo period. Many thanks to the brave souls that made it happen -- there were points where we wondered whether it could all be made solid quickly enough.
The alpha-test version of the SEED has been installed and operational at several sites for over a month now. We are learning a lot about its utility and what is needed to improve it. Natalia Maltsev is hosting a class on Jan 8-9 that will include two parts: it will present the tools developed by her team to support high-throughput annotation of genomes using grid-technology, and it will present the initial release of the SEED. It will be a small meeting to help make sure that a few sequencing/annotation projects that intend to use the Argonne/FIG annotation tools see what is involved.
We are now working on making a set of DVDs that contain both Mac and Linux versions of the SEED with an install script. We have experienced some difficulties in getting installations to go smoothly, and it is critical that we get the installation script working reliably before a large-scale distribution occurs. In addition, we are finishing the implementation of protocols for sharing annotations. This must all be done and ready to hand out at Natalia's workshop in January.
For those initial users of Mac OS X versions of the SEED, a number of minor problems arose when people converted from Jaguar to Panther. We believe that we understand exactly how to help people make the transition, but be aware that inital versions of the SEED may well stop functioning if you make the switch. Needless to say, we have learned a number of lessons from helping people make the switch, and the versions that will be released in January should run fine under either Jaguar or Panther.
Later in January a class in looking for missing genes will be taught at Franklin and Marshall College in Pennsylvania.
An Increase in Programming Support
Most of the SEED development was done by Ross Overbeek with help from a number of friends. This month Terry Disz and Bob Olson from Argonne National Lab began putting in substantantial amounts of effort, and next month FIG will hire one full-time and one part-time programmer. Part of this expansion is due to FIG receiving its first grant (from DOE). This is a big moment, and we anticipate that it will be just the first of several grants we receive this year.
This represents a huge increase in our ability to support and accelerate development of the SEED. Even so, it is quite likely that a majority of the major advances will come from collaborators over the coming year. This means that you can expect progress to rapidly accelerate (and, really, it has not been so slow up to now!).
The concept of developing a system that supports peer-to-peer updating is really pretty interesting. The standard way of constructing and deploying systems is based on a central source of both the initial system and updates. The peer-to-peer model is one in which any two users can share and update either data or code.
The classical system lends itself to a hierarchical structure: the main source supplies systems to, say, sequencing/annotation projects. These projects install a single local system that collects and manages function assignments and annotations. Periodically, the project site synchronizes annotations with the main site, propagating work up and then back down through the hierarchy.
In the peer-to-peer model, everyone has a local version of the system,
and everyone has the ability to exchange/synchronize with anyone else.
Informal "hubs" may form at nodes attempting to maintain current,
comprehensive collections of genomes, but everyone has and maintains
their own research environment. The way this would work for the
average sequencing project would be
Such an "anarchistic" model introduces numerous questions and potential conflicts, but it does completely remove dependencies on central sources; if you want a new genome immediately, you just add it to your copy (or get it from someone who has already added it). If you have 200 kb of new sequence, you are not faced with sending it to a central source and waiting for them to create an updated version that is returned to you; rather, you just add the data to your copy, do your analysis, and exchange it (or not) with whomever you wish.
This is basically the model we are proposing for the SEED, and the initial "launch" is very close.
FIG Development Meetings
The FIG developers will meet at least three times per year (one week meetings) to integrate code and prepare releases. We anticipate having two meetings per year in the US (one in Chicago) and one per year in Europe. Because there is so much to do initially, we are proposing to meet three times in the next 6-7 months. The first meeting will probably be in the second half of February, the second in Europe in April, and the third in at Chicago in June. The locations of the first and second meetings are still not decided, although the second will probably be in Bielefeld, Germany. The first cannot be in Chicago for two reasons: the weather in February can be terribly unpredictable, and it takes 90 days to get clearances for foreign visitors. We plan on having these meetings be completely informal code integration efforts. People will focus on defining the future additions to the FIG software, integrating software from other independent efforts, and preparing new releases. We anticipate releasing new versions of the FIG software about a week after each meeting. If you have suggestions regarding schedule or location, comments would be welcome.
The "Annotate a Thousand Genomes" Efffort
Developing the SEED is a major effort initiated by FIG, and we believe that it will be of utility for numerous projects. Initially, we believe that it will be used largely by sequencing/annotation teams and individual investigators wishing to use comparative analysis using large collections of genomes. However, a number of FIG fellows wish to focus on the problem of developing reliable annnotations, and they believe that they finally see exactly how to do it. We have discussed this briefly in past newsletters. The project will involve gathering experts in specififc subsystems and supporting analysis of each subsystem accross the entire collection of organisms. We would add whatever functionality is needed to the SEED to support these experts, and the explicit goal would be accurate annotations for the thousand genomes we believe will be available by the end of about three years.
This effort, too, is beginning to make progress. Most notably, a number of biologists with specific areas of expertise have expressed a desire to paricipate actively, as soon as the project is a bit more well-defined, and the tools are in place. We intend to officially launch things in February.
Our first step is to define precisely what "deliverables" we would want from each expert. This may sound like an insignificant issue, yet a number of us are now convinced that getting it right is the key to saving large amounts of effort. We are actively discussing the point, and we will offer an initial position in the next newsletter.
Once the initial release of the SEED occurs (hopefully within a very, very short time now), expect to see focus rapidly shift to defining and starting the annotations effort.
TrackBack URL for this entry:
Listed below are links to weblogs that reference December 2003 Newsletter:
Tracked on March 2, 2005 03:22 PM
Tracked on March 4, 2005 11:02 PM
Tracked on March 11, 2005 08:08 AM
Tracked on March 14, 2005 07:41 AM
Tracked on March 14, 2005 07:41 AM
Tracked on March 16, 2005 06:33 AM
Tracked on March 19, 2005 02:12 AM
Tracked on March 20, 2005 05:58 AM
Tracked on March 21, 2005 07:32 AM
Tracked on March 22, 2005 07:13 AM