PDA

View Full Version : RNA World @ Home


Saenger
2nd March 2010, 06:29 PM
New project:
http://www.rnaworld.de/rnaworld//img/rna4.pngRNA World (beta) (http://www.rnaworld.de/rnaworld/):
RNA World project description (http://www.rechenkraft.net/wiki/index.php?title=RNA_World/Project_description/en)
RNA World is a distributed supercomputer that uses Internet-connected computers to advance RNA research. This system is dedicated to identify, analyze, structurally predict and design RNA molecules on the basis of established bioinformatics software in a high-performance, high-throughput fashion.

In contrast to classical bioinformatic approaches, RNA World does not rely on individual desktop computers, web servers or supercomputers. Instead, it represents a continuously evolving cluster of world-wide distributed machines of any type. As such, RNA World is very heterogenous and, depending on the sub-project, currently addresses Internet-connected computers running Linux, Windows and OSX operating systems - your computer could be an important part of it. The fact that hardware and electricity costs are shared among the volunteer contributors raises the possibility of performing interesting analyses which under economical aspects would often not be affordable. In return, RNA World is not for profit, exclusively uses open source code and will make its results available to the public.

In its present form, RNA World runs a fully automated high-throughput analysis software version of Infernal (http://infernal.janelia.org/), a program suite originally developed in Sean Eddys laboratory (http://selab.janelia.org/) for the systematic identification of non-coding RNAs. The goal of this RNA World sub-project is to systematically identify all known RNA family members in all organisms known to date and make the results available to the public in a timely fashion. With your help, we also aim at supplying established bioinformatic databases such as Rfam (http://rfam.sanger.ac.uk/) with our results to help reduce their future maintenance costs.

In contrast to other distributed and grid computing projects, the RNA World developers are currently designing generalized user interfaces that, in parallel to the projects our own research team is following up, allow non-associated individual scientists to submit their own projects in a manner similar to using a web server interface - of course, free of cost.

Stats (http://www.rnaworld.de/rnaworld/stats/) are exported but not yet counted here (but on the other stats sites)

Rusty
4th March 2010, 01:32 AM
Looks good.. anyone running it?

cswchan
4th March 2010, 03:06 AM
Yep... started running today... seems to play nicely with the rest of the Boinc crowd on my quad...

fractal
4th March 2010, 04:35 AM
I ran it for a week on both linux and windows machines. Work units vary from small to huge, some taking 1.5-2.5 gig of ram. Many finish in minutes, but some take 2 weeks with a 1 week deadline. Work lately has finished in seconds but require minutes to upload and download. The web site is annoyingly slow but the admin is responsive.

I don't read german, but it uses the same forums as Yoyo and team Rechenkraft. Someone who does read german might want to figure out the relation between the three.

Saenger
4th March 2010, 04:58 AM
I don't read german, but it uses the same forums as Yoyo and team Rechenkraft. Someone who does read german might want to figure out the relation between the three.
Rechenkraft is a) a team in DC and b) a German non-profit association for the propagation and promotion of DC.

I'm a member (http://www.rechenkraft.net/wiki/index.php?title=Benutzer:Saenger) of b), but a member of SETI.Germany (http://www.setigermany.de) as a team, Yoyo (http://www.rechenkraft.net/wiki/index.php?title=Benutzer:Yoyo) is a member of both, as is the scientific head of RNA World, Michael Weber (http://www.rechenkraft.net/wiki/index.php?title=Benutzer:Michael_H.W._Weber). The association (http://www.rechenkraft.net/wiki/index.php?title=Verein:Rechenkraft.net_e.V./en) exists since 2004

Yoyo started yoyo@home to bring non-BOINC projects in the BOINC world, usually not the biggest ones. Especially Evolution@Home, as some members of Rechenkraft knew the Admin from the past. He refuses to incorporate Folding, as the user base of Folding is already big enough.

Michael Weber, or in this case better Dr. Michael Weber ;), always wanted to start a DC project in his field of science. He especially wanted to do the bio-informatics with a real-world validation process of the results. And as chairman of the association he "used" it as his stepping stone, and as a good place to get co-operaters from his and other teams, mainly from Germany.

With the start of the project in the open it was soon clear that the old yoyo@home server was not sufficient for both projects, so now they are hosted still on the same, but a very new server. The forum and Wiki are on another server, so they are online even if the project server is down.

Xaverius
4th March 2010, 07:39 AM
The website is also available in English: http://www.rechenkraft.net/wiki/index.php?title=Willkommen_beim_Verein_Rechenkraft .net_e.V./en
I always had in mind that Rechenkraft was an mathematical (or something like that) institute in Germany, I guess I was a bit wrong. :)

Rusty
20th March 2010, 06:07 AM
So what are we looking at here? Project in or out.. Is it up or down..

Whats the deal banana peel

Ungelovende
20th March 2010, 03:44 PM
Whats the deal banana peel
I need more RAM! :)
i7 running on 4 cores with 9GB RAM -> swapping on HD. Its my problem that I only have crappy hardware - thumbs up from me!

fractal
5th April 2010, 05:58 PM
The project admin is traveling and thus not overly responsive at the moment. The project was still issuing work units that require 3-4 gig of ram per unit with BOINC unable to limit a project to a single active unit. The result is as Ungelovende said ... it brings your machines to their knees and swaps until long after the work unit deadline and you get around to aborting them.

I do not believe this project will be ready until the admins can get a handle on how much memory each work unit needs.

Rusty
26th September 2010, 07:45 PM
Bumpity-Bump

Michael H.W. We
26th September 2010, 08:27 PM
Ok, so maybe a brief update. A lot of progress has been made since the last postings shown above which I just saw today.

RNA World (http://www.rnaworld.de/rnaworld/) is active and open for public sign-up since around January/February this year. It has its own server now (which will be upgraded further, soon), so issues with responsiveness as described above have been resolved as it is no longer shared with the Yoyo@home project (instead it is now a 12 GB Intel i7 920 machine with 2x 1.5 TB RAID and huge bandwidth).

A number of presentations of the project at some conferences (BOINC Workshop@Barcelona 2009, RNA2010@Seattle, GDNÄ@Dresden, 2010) and seminar talks (M.I.T/Cambridge, 2010) have been accomplished.

The project currently updates its binaries and soon will also offer OSX support in addition to the current Linux and Windows clients. It should be said that this project still has somewhat higher demands than other DC projects in terms of runtimes, RAM requirements and data transfer. Well, our FAQ (http://www.rechenkraft.net/wiki/index.php?title=RNA_World/FAQ/en) gives details on all of these aspects. If something is missing, please drop by our forum and inform us, so that we can further improve it.

Because I have supported DC projects for around a decade now, I am very communicative in terms of project support and invest quite some of my time to even scooter around in external forums to answer questions and help resolve problems (as you can see here, as well). :D

Michael.

DigiK-oz
27th September 2010, 03:49 PM
Well, I appreciate the update. However, there's some points in the FAQ you mention that makes me hesitate to join your project.

Memory usage up to 2.5 GB (or 1GB): this will bring most quad+ machines to their knees should all cores get such a unit.

No checkpointing : that's fine with small workunits, but the FAQ mentions "up to several days". This means a WU like that will simply never finish on a machine that's not online 24/7. Sure, sleeping it instead of shutdown will alleviate this, but crashes (either your app, BOINC or OS) or occasional reboots (patches etc) will potentially send 10 days of CPU time down the drain :(

In my opinion, memory usage should be around 500MB max (or the project should limit the number of workunits a system can get at any one time), and checkpointing is preferred for ANY workunit, but mandatory for anything running longer than half an hour or so.

Michael H.W. We
28th September 2010, 01:12 AM
Memory usage up to 2.5 GB (or 1GB): this will bring most quad+ machines to their knees should all cores get such a unit.
Well, there is a memory managment system on both the server and the client side to counteract that situation - although it is not absolutely perfect. More importantly, however, most of our WUs have memory demands far below 500 MB as stated in the FAQ. What you cited is the maximum. :)

No checkpointing : that's fine with small workunits, but the FAQ mentions "up to several days". This means a WU like that will simply never finish on a machine that's not online 24/7. Sure, sleeping it instead of shutdown will alleviate this, but crashes (either your app, BOINC or OS) or occasional reboots (patches etc) will potentially send 10 days of CPU time down the drain :(
Yes, that is why we have the FAQ such that users can check in advance whether or not their machines meet the project's system requirements. By the way: For Linux-x32 there is checkpointing available provided memory randomization is disabled in your kernel settings.

In my opinion, memory usage should be around 500MB max (or the project should limit the number of workunits a system can get at any one time), and checkpointing is preferred for ANY workunit, but mandatory for anything running longer than half an hour or so.
Well, there is no such thing like maximum hardware demands in computing. :D We have a server-side management that will check your machine's hardware capabilities and determine which subset of the WUs that we have available will be suitable for your machine. In this way, we can properly assign which machine gets what and also get the demanding work done while your machine won't be overloaded. :)

Michael.

P.S.: Maybe a brief word on checkpointing. I participate in DC for more than 10 years now and I know how much of a "pain" it is when checkpointing is not available. However, checkpointing is available ONLY when an application has been de novo designed to offer that feature. In scientific software, writing out checkpoints is almost never seen since from the viewpoint of the developer it mostly is a superfluid overhead (time-, electricity- and compute power-consuming). That might at first sound strange to you but it is a fact and makes sense if you think about it for a moment. Consequently, almost all scientific software does not have checkpointing and this means that a project such as RNA World which utilizes open source scientific software could only offer checkpointing if the code would be entirely rewritten. Unfortunately, that we cannot do for technical and manpower/funding reasons. And it also makes no sense in general since for each new software version, the code would again have to be reorganized. As a consequence, we are seeking to find a solution for a universal checkpointing system that is in a way integrated into BOINC. One solution to this is the employment of a virtual machine approach. Another one is to write the relevant RAM portion to disk in certain time intervals - similar to what you know as sleep mode. A flavor of the latter we currently employ for Linux-x32 systems.

DigiK-oz
28th September 2010, 04:52 PM
Well, there is a memory managment system on both the server and the client side to counteract that situation - although it is not absolutely perfect. More importantly, however, most of our WUs have memory demands far below 500 MB as stated in the FAQ. What you cited is the maximum. :)


Yes, that is why we have the FAQ such that users can check in advance whether or not their machines meet the project's system requirements. By the way: For Linux-x32 there is checkpointing available provided memory randomization is disabled in your kernel settings.


Well, there is no such thing like maximum hardware demands in computing. :D We have a server-side management that will check your machine's hardware capabilities and determine which subset of the WUs that we have available will be suitable for your machine. In this way, we can properly assign which machine gets what and also get the demanding work done while your machine won't be overloaded. :)

Michael.



1: I know I cited the max. But any machine COULD get several of these "max" units simultaneously.

2: So you're on the right track :) But I think a lot of users here are windows users, hence no checkpointing :(

3: I don't care about overloading :) My main machine is an I7 920 with 12 GB. However, even THAT would basically die with 8 units IF they all happened to require 2.5 GB. Your server-side checks will not know what the hell I am running alongside your project.

As for the checkpointing, I hear what you are saying. But, it is all a technical rundown of why things are currently the way they are. Believe me, I know what I am doing, have been running DC for ages, including BOINC, have my own (test) BOINC-server running and have written (as a hobby-project) my own BOINC-project executables including Nvidia CUDA, ATI Stream and OpenCL implementations. I can agree with your technical explanation, but still think your project as it is is unsuitable to be "released into the wild" on everyone since it has to many IFs (It has checkpointing IF linux, IF randomization is off, IF....).

I'm not trying to be negative here, I simply see a lot of issues if an unsuspecting user attaches to your project and expects things to just work. I, myself, just might attach shortly just to get some points on the board :)

Michael H.W. We
28th September 2010, 06:17 PM
1: I know I cited the max. But any machine COULD get several of these "max" units simultaneously.
Getting them simultaneously does not mean that they are also computed simultaneously. BOINC checks prior to starting a task from the queue, how much of RAM is free. Still, you are correct in that the memory management in BOINC is worth improving. But that is nothing we can deal with and you may also know that other projects have the same issue.

2: So you're on the right track :) But I think a lot of users here are windows users, hence no checkpointing :(
...except if you use a Linux VM which does good with our project. :)

3: I don't care about overloading :) My main machine is an I7 920 with 12 GB. However, even THAT would basically die with 8 units IF they all happened to require 2.5 GB.
Well, see above and remember you have swap space, too.

Your server-side checks will not know what the hell I am running alongside your project.
...but the memory will be checked for space prior to task starting (see above).

As for the checkpointing, I hear what you are saying. But, it is all a technical rundown of why things are currently the way they are. Believe me, I know what I am doing, have been running DC for ages, including BOINC, have my own (test) BOINC-server running and have written (as a hobby-project) my own BOINC-project executables including Nvidia CUDA, ATI Stream and OpenCL implementations. I can agree with your technical explanation, but still think your project as it is is unsuitable to be "released into the wild" on everyone since it has to many IFs (It has checkpointing IF linux, IF randomization is off, IF....).

I'm not trying to be negative here, I simply see a lot of issues if an unsuspecting user attaches to your project and expects things to just work. I, myself, just might attach shortly just to get some points on the board :)
I understand, but as a project leader, I expect that people inform themselves BEFORE deciding to install software on their machines. I consider this the one and only minimum requirement in DC. :D And indeed, if people think there are too many IFs, then please do not participate. But in the meantime we are generating interesting results with those that do and will continue to improve our project to more and more meet "failsafe expectations".

Michael.

P.S.: If you have that much experience with BOINC, you might consider helping us with the VM approach?

DigiK-oz
4th October 2010, 04:05 PM
You have explained the reasons why your project is the way it is. And I understand them fully. Running the project in a VM seems like a nice idea to counteract the lack of checkpointing, but seems like barking up the wrong tree to me. It will add (memory)overhead to an already memory-hungry project, and it might cause trouble in the long run (what if, for instance, GPU support is added to the app you use?). I think your best bet would be to change the (open-source) app. Sure, any new release would require work, but the world is full of variations and forks of open source apps, so it can be done. The effort now put in wrapping the thing in a VM would imho be better spent on changing the app to add checkpointing.

But we are going waaaaaaay off-topic here, I will drop by your project forum and see if I can add anything useful in topics there, simply because you made me curious about the project and the software/techniques behind it. I will start gathering stats for my own stats site, assuming there's no objection to that.

As for the subject of this topic, I still think your project has too many rough edges to have it added to DC-Vault, as I explained in earlier posts. But hey, that's my opinion, and as such worth exactly $0.02 :)

Michael H.W. We
5th October 2010, 05:57 PM
I think your best bet would be to change the (open-source) app. Sure, any new release would require work, but the world is full of variations and forks of open source apps, so it can be done. The effort now put in wrapping the thing in a VM would imho be better spent on changing the app to add checkpointing.
Well, unfortunately that is not possible (I checked back on this with the original developer long before we started the DC project).

As for the subject of this topic, I still think your project has too many rough edges to have it added to DC-Vault, as I explained in earlier posts. But hey, that's my opinion, and as such worth exactly $0.02
The point is that there are clear rules what requirements must be met by a project to be added to DC-Vault. And RNA World meets all of these. On top of that there are other projets included already in DC Vault that do have rough edges as well...

Michael.

Michael H.W. We
14th October 2010, 08:09 PM
We just released an OSX client. :)

Michael.

DigiK-oz
18th October 2010, 04:28 PM
The point is that there are clear rules what requirements must be met by a project to be added to DC-Vault. And RNA World meets all of these. On top of that there are other projects included already in DC Vault that do have rough edges as well...
Michael.

Yes, RNA world meets the minimum requirements, but I think these requirements must be met for a project to be eligible to be included, which is quite different. Other projects currently under discussion, like eon, quantumfire, primaboinca and probably a gazillion others also meet the minimum requirements. It is the discussions in this part of the forum which will serve as input for the final decision for inclusion, which is ultimately made by administrators here. If this was not the case, DC-Vault would include basically each and every BOINC project.

And yes, even current projects have rough edges here and there. Which means RNA can/will/might be included in the near or distant future :) In fact, your responsiveness in this very topic might give you the edge.

Mind you, I have nothing against your project's goal (or most others, for that matter). I am also not an administrator here. I am just pointing out the rough edges here, as I have been doing in some of the other candidate's threads.

Michael H.W. We
13th November 2010, 07:09 PM
Given the current lack of checkpointing, we have adapted our crediting system to motivate participants to also go for the long running WUs. So, starting from 100 hrs of WU runtime, credits per WU are increased proportionally with runtime until at 500 hrs the credit is effectively doubled.

Michael.

Xaverius
13th November 2010, 10:36 PM
About that last post, just for my point of view.

I quit participating in folding@home because of the extra points rewarded to certain circumstances and in this few people can get an very big advantage of people who don't know this is possible or don't have the chance to do just that.

If I remember it right climate prediction made checkpoints so the user already gets points for a WU that could last for a dozens of days. (was it a hundred?). This I find fair and in this way people get the points just by hitting a checkpoint.

This is how I see things and I posted this without exactly knowing what your project can do or can't do.

Michael H.W. We
20th November 2010, 04:06 PM
Well, we will stick to the extra reward as long as we have no checkpointing simply because it makes sense to us and is a fair thing to do. As soon as we have checkpointing, this extra bonus will from that timepoint on be disabled again.

I know that there is a whole bunch of different opinions regarding credit systems in DC. The point is that participants sometimes appear to overlook the fact that a DC project does not exist to reward virtual credits but has a scientific goal to achive. That goal, however, cannot be achieved when there is no support by the participating volunteers. As a consequence, a project lead will make changes to the crediting systems if there is a chance of herby increasing the participation. In our case, we do not generally increase credits to withdraw participants from other project to ours. Instead, we just like to reward those that participate already but hesitate to go for the really (over)demanding work units.

To my opinion, the volunteer with a truly scientific interest in supporting a project will definetely not care about the credits at all but just utilize these to monitor the computational throughput of his system. Please read my signature and you will know where I stand.

Michael.

yoyo
31st May 2011, 08:08 PM
What is now the conclusion, can RNA World be included in the vaults?
yoyo

Ungelovende
2nd June 2011, 03:16 PM
What is now the conclusion, can RNA World be included in the vaults?
yoyo
I have been testing RNA today. Max 200 MB RAM/WU so far. I can't see any reason why this project should not be on DC-vault.

LanDroid
30th June 2011, 12:02 AM
The Knights Who Say Ni! are about to close a month long run at RNA World. We shrubbed over 4 million points. SETI.Germany recently crunched over 900K in one day. So the stability, work units, and bandwidth are there... The large WU's are a problem for some, but it's easy to select only the small work units. The admin is among the most responsive we've seen as shown even in this thread. I won't speak for the team, but DC Vault should add RNA World.

Meanwhile, please check out the KWSN high budget recruitment video:
http://www.youtube.com/watch?v=0ZybzTYpezU

http://www.kwsnforum.com/images/smiles/knightsni.gif

DigiK-oz
30th June 2011, 03:05 PM
I still think that 500 hours WU's (or anything above an hour or so), without checkpointing, are ridiculous. However, if these can be de-selected easily that issue is moot :)

Can you supply average runtimes per sub-project for RNA? I just attached, with "cmsearch XXL" subproject disabled. Is that enough to only get reasonable-sized WU's? And are there sub-projects that do have checkpointing (or semi-checkpointing by bundling WU's, as I read on their forum)?

robegeor
1st July 2011, 01:23 AM
To answer your question, it is easily to include or not include the large work units. Just as you said, deselect the XXL app in config. I have been crunching on RNA world for about a month now with the Knights (Arthur_Lemming) have to say it is one of the most stable projects I have participated in. And I also second LanDroid - very good admin on this one. First rate.

I too think it should be included on the Vault

Ni!

For me, using a Core2 e8400, I get a range from 7 minutes to 1 hour 45 minutes with XXL app disabled.

DigiK-oz
3rd July 2011, 04:16 PM
Well, we will stick to the extra reward as long as we have no checkpointing simply because it makes sense to us and is a fair thing to do. As soon as we have checkpointing, this extra bonus will from that timepoint on be disabled again.

I know that there is a whole bunch of different opinions regarding credit systems in DC. The point is that participants sometimes appear to overlook the fact that a DC project does not exist to reward virtual credits but has a scientific goal to achive. That goal, however, cannot be achieved when there is no support by the participating volunteers. As a consequence, a project lead will make changes to the crediting systems if there is a chance of herby increasing the participation. In our case, we do not generally increase credits to withdraw participants from other project to ours. Instead, we just like to reward those that participate already but hesitate to go for the really (over)demanding work units.

To my opinion, the volunteer with a truly scientific interest in supporting a project will definetely not care about the credits at all but just utilize these to monitor the computational throughput of his system. Please read my signature and you will know where I stand.

Michael.

Michael, thing is that DC-Vault exists solely because of the credits :)

But, having crunched a (small) number of Wu's from your project, without any glitches, I couldn't care less whether it's running in a VM, or executed by virtual smurfs or elves. Since the huge units can be disabled easily (are they disabled by default, by the way?), and given your responiseveness to questions/problems, I agree with people who have crunched much more RNA than I have, and I too think RNA world is a project which should be included in DC-Vault.

Rusty
30th July 2011, 03:14 AM
Bump...Poll added

cswchan
30th July 2011, 03:47 AM
Voted YES... now make it so Rusty...

:p

DigiK-oz
31st July 2011, 07:53 PM
Hell, even I voted YES! :)

VictordeHolland
31st July 2011, 10:46 PM
Yes!

Rusty
2nd August 2011, 03:55 AM
Well I have added it, but it seems the stats file can't be opened... Working on getting it fixed..

DigiK-oz
3rd August 2011, 04:06 PM
Any news, Rusty? Should be standard BOINC-stats I think?

Rusty
5th August 2011, 10:05 AM
I think I stuffed up somewhere..

DigiK-oz
7th August 2011, 04:51 PM
Seems to work now, well done Rusty!