Thursday, June 14, 2007

Data and Software Sharing Policy

This message is a request for feedback from the computational biology community.

In 2002, ISCB developed a policy statement on bioinformatics software availability, which defined 5 levels of software availability and made the following recommendations:

1. Given the variety of meanings of "open source", that people define what they mean when they use the term.

2. That government funding agencies encourage grant proposals to specify the availability of software using at least the ISCB-defined levels.

3. That government funding agencies not mandate that all software created with grant money be available via an open-source license.

4. That government funding agencies require that all software created with grant money be available at a minimum in binary form, and free to non-commercial users.

This policy was developed without sufficient input from ISCB members, and the Public Affairs committee is revisiting this topic. We will distribute relevant educational materials (see links below) and opinion pieces, hold a meeting at ISMB/ECCB 2007, gather input on this blog and via email, and otherwise gather feedback from the community. We hope to develop a revised policy statement, or guidelines, that will be useful to the community as well as to government funding agencies and scientific journals.

Our questions to you (please answer in the comments section or direction to public-affairs@iscb.org):

1. Is there a problem?
  • Is there a need to define software availability clearly?
  • Should we expand the scope from government funding agencies to publications? Or beyond? Should we expand the scope to include data sharing?
  • What should government agencies and journals require in terms of software availability? Should ISCB make a recommendation?
  • Should authors and grant-writers be required to clearly define the availability of their software?
  • Is there a problem currently with published articles, in that it is difficult to reproduce the results due to lack of access to data or software? Have you had personal experience with this?
  • Does it make sense to allow researchers at companies to be charged a fee for software but require that it be provided to academics at no charge?
  • If you have terabytes of data, how does that affect your ability to share it?
  • Are there privacy concerns with sharing of human genomic data?
  • What is really needed to allow results to be verified and built upon?
2. What should be done to address the problem?
  • Do you agree or disagree with the 2002 ISCB policy statement, and why?
  • When you publish a paper or develop software for a grant, how do you make your software and data available?
  • What should ISCB do in addition to, or instead of, releasing a policy statement? (Has the previous policy statement had any effect?)
  • What would YOU be willing to do to help ISCB address this issue?
Please respond in the comments section of this post, or via email to policy@iscb.org, or by attending the meeting on this topic at ISMB/ECCB 2007.

Reference material:

8 comments:

Barb said...

Comments made during the open discussion at the ISMB 2007 meeting on data and software sharing and licensing:
• It doesn’t make sense to have a policy about grants without one about publication.
• Personal tastes shouldn’t make policy
• The goal should be to create reproducibility of scientific results, and not to focus on a quid pro quo.
• Commercially available reagents might make a good example for how to do software sharing.
• Open source should be a minimum standard.
• The details matter in publications: reproducibility is key.
• The policy should be that open source be mandated by research grants.
• ISCB membership is strongly in favor of open source.
• Institutional barriers to producing open source software. Steven Brenner’s hiring negotiation with UC: a lot of people in the Society who want to produce open source software are unable to. ISCB should (instead of having conservative, wimpy & odious policies…) help researchers be able to produce open source. (Institutions/organizations need to allow it.) To do this, we need strong policies from government and international organizations.
• BOSC had a keynote talk about reinventing the wheel in bioinformatics. Open source is one way to avoid this.
• Open source – big pharma is happy to use open source software.
• Data sharing is an important issue, but different than software sharing. ISCB should have a policy on this; it might be a separate discussion.
• Models are neither data nor code, but also worth considering.
• On the question of open source for academics vs commercial users: Ask membership their thoughts about the distinction between academic & commercial users.
• A researcher at a small university has fewer resources to pay for commercial software. The entire community doesn’t belong to wealthy institutions. ISCB has many researchers that do not have a lot of funding.
• Access to computational biology data: There is a valid comparison to Material Transfer Agreements in the biology community. Experiments are not reproducible; reagents often don’t work as publicized. Computational biology related data can be very hard to get; for example a recent attempt to get a microarray dataset published in Cell resulted in no response to an email to the author. The website has no data. Sometimes the author replies “let me track it down” and the data is messy. The situation on the ground is that data is often hard to get – there IS a problem. ISCB should come out with a strong statement about data sharing for grants and publications AND talk about how to enforce it.
• OSI has a clear definition of open source, with definite variations. We don’t need to reinvent that. Instead of making our own new scheme, use theirs.
• The definition of commercial use needs clarification because there are a number of different circumstances. Free for commercial use for developing a drug is quite different from free for commercial use resulting in a change/modification to software followed by selling the revised software for a large amount of money.
• The main thing is for the software to be out there. Various possible existing licenses are problematic; guidance on licensing would be appreciated.
• UC has a license for all software

Daniel Dvorkin said...

Scientific software without access to the code isn't scientific software at all; it's pseudoscientific software, bearing the same relationship to credible, reproducible computational science that "intelligent design" does to evolutionary biology. The "Level 2 Availability" defined in the 2002 standard should be held up by ISCB as the absolute minimum standard, with open source as defined by OSI (roughly corresponding to "Level 4 Availability") being strongly encouraged.

Unknown said...

IMHO ISCB should abstain from deciding the rules of software commercialization - that should be decided by the software authors and the affiliated institutions.

That said, I agree with the previous comment that publishable peer-reviewed software should come in open-source format to everyone and be freely available to research institutions. Commercial institutions would still be able to download and evaluate the software but would be legally required to comply with the software package's user licence for production-mode usage.

Anonymous said...

* If I use Google in my research, will they have to make all of their software Open Source?

* If I use Excel in my research, will Microsoft have to release the source code? If not, will I be required to use the spreadsheet in OpenOffice (or some other politically correct program) if I want to publish my findings?

* If I put my 3 gigabytes of research data on my web site so people can download it, do I have to leave it there permanently (until the end of the world)? If not, when does the statute of limitations run out?

* Do we need to specify a fee modification for corporations that aren't big, well funded, and evil?

* If a big, evil corporation contracts with an educational institution to analyze some data, who (if anyone) pays whom for the use of software that's free for educational use only?

* If I want to publish research that I did on my own time without any grant money involved, will the policy still apply?

* Can we/should we/must we just agree to disagree? There may end up being irreconcilable differences between members who are capitalists vs the hard-core Open Source proponents. Should the ISCB choose one side knowing that members with the opposing philosophy may leave the Society?

Anonymous said...

Scientific results fully funded by public money (derived from evil companies or good individuals) should be available to the public (evil companies or good individuals) without the public having to pay for them a second time.

Whether the result is a paper, a book, a chemical substance, a lab animal, a mathematical algorithm or some software makes no real difference, although of course there are some practical considerations regarding pathogens, nuclear bombs and so on.

Leaving aside any considerations about altruistic money from e.g. your favored good country, supporting the free-riding public of e.g. your favored evil country, this leaves the issue about the required form of availabilty. Having produced a new laboratory animal strain does not force you to make available the complete production environment, samples of the strain are fine. With software are we satisfied by the fully functional binary form? Because it is practically possible I tend to argue for the full source code as opposed to the laboratory equipment but this of course is a very simplified point of view. So on balance I find the considerations of ISCB regarding availability of grant funded software rather convincing.

Availability of scientific publications and the materials reported therein is a completely different issue, so. If the published scientific results have not been funded by public money there is no reason whatoever why the publication or any therein described material should be freely available. Scientific research is the systematic application of the scientific method (and over the years there have been differing flavours thereof) to gain understanding of some process. Scientific publications report the result of this research. Part of the current flavour of the scientific method is, that reported scientific results are reproducible. To this end the tools used to create the result must be described in a way that allows other scientists (public or commercial) to recreate these tools and repeat the research. There is no requirement to make the tools itself available and especially not for free (although I would love to use the LHC for a week in 2010).

Therefore the ISCB should not try to extend its software availability guidelines into publishing as the rules for spending of public money and reproducability of scientific results are quite unconnected.

スタービーチ said...
This comment has been removed by a blog administrator.
リア友 said...
This comment has been removed by a blog administrator.
お家遊びに来てくれる人いないかなぁ? said...
This comment has been removed by a blog administrator.