8 min read

Two's a Company -- My Dream Bioinformatics Workplace

The Bioinformatics Organisation of my Dreams*

This is the seed of ideas I have about how I would like a bioinformatics organisation to be run. Almost all of these ideas diverge from how things are currently done, which means it's incredibly unlikely that an existing organisation could be feasibly adapted to work in this way; it'll need to be started from scratch. Some people may have heard some of this previously, but this is the first time I've put finger to keyboard to put it all together as one semi-cohesive chunk.

I don't currently have time to start such an organisation due to existing work commitments (and fear of failure), but if someone else wants to take these concepts and run with it, please go right ahead.

In the very least, it perhaps demonstrates some of the concerns I have around how bioinformatics (and in particular data access) is currently carried out.

Comments / ideas / suggestions appreciated (e.g. via @gringene_bio).

* I was inspired to start writing this after waking up from a short mid-afternoon nap, so it's technically the organisation of my post-wake clarity.

Overarching themes

Whakapapa

We want to create an environment where there is always an answer to the question, "Where did this come from?" That answer may change over time.

Document the ancestry of everything, as deep as you want. We will always make sure that there is sufficient physical and electronic storage space available to create and preserve the record of whakapapa.

For clarification, consider the ancestry of ideas and creativity, as well as inheritance and genealogy.

We define consent as a continual, enthusiastic "yes" by everyone involved, such that actions can be carried out with a confident understanding that it is the right thing to do.

Informed consent must always be discussed, requested, and agreed during each significant part of a process, especially when owned information is likely to be passed on to other people.

Consent can also be revoked. When a person involved in a project no longer wants to participate in that project, all information and media associated with that project will be returned to its owners. Where data was made public (via a consentual process) prior to the revoking of that consent, the information (and links to that information) will be removed from public-facing systems (bearing in mind that public access from other individuals cannot be reversed).

A record of consent will always be stored in electronic form. Consent (or the revoking of consent) can be provided verbally, in which case the nature of consent (or revocation) will be converted into electronic form in a way that is acceptable to the project owner.

Privacy

We define privacy as the consented and controlled flow of information from someone who owns information to someone who desires that information. This concept of privacy was presented during a conference on Genomics, Ethics, Law and Society that was attended in 2009 by David Eccles.

Privacy is distinct from secrecy (i.e. preventing others from seeing owned information), but they are related concepts.

Openness*

* a more appropriate word is desired (whakatuwheratanga? manahuatanga?)

It is our belief that the free sharing of information with others can help to benefit society. Where possible and properly consented, information and processes should be shared widely and publicly. This allows other people to learn from our successes and mistakes, reducing costs for everyone.

We will make sure that all of our communal tools (e.g. processing scripts and applications) are made publicly available. We will also make sure that all non-project whakapapa records are made publicly available.

This openness extends to our entire system, including the business and (where possible) the machines. David Eccles attended a LinuxConf in 2019 in which Rory Aronson described FarmBot, and Dana Lewis described OpenAPS; these are good models to work from in terms of openness.

Openness will, for example, allow clients to carry out their own hosting of their own project data by setting up a web server on a desktop computer.

Organisation Essentials

Leadership

The organisation should have a central leadership team of at least two people, at least one of whom should be a woman, and at least one of whom should be Māori.

Money

Everyone in the organisation working at least fifteen hours a week should be paid the same flat salary. People working fewer than fifteen hours a week should be paid an equivalent hourly rate, assuming 30 hours per week at the full salary rate (i.e. if the salary is $78,000 per year, then the equivalent hourly rate would be $50 / hour).

Work flexibility

Times of work, and place of work should be decided by the employee. Employees can work at any time of the day or night, wherever it is comfortable for them to do their work.

Separated work environments

All employees will be provisioned with their own well-ventilated and lockable office including at least a desk, a lockable drawer unit, two power sockets, a network port, and a whiteboard (or similar drawing surface). Additional hot-seat areas will be available for communal desktop work and meetings.

Data Storage

In general, each of these categories should be kept separate from other categories. That means different physical storage media, and different virtual mount points on the network.

The current idea around data storage assumes a hot-swappable network-attached storage system (e.g. see here).

Electronic consent records will be stored separately from project data, and represent a permanent record of project consent.

Consent records will be backed up: per-project copies of the consent records will be made at regular intervals onto two separately-located physical storage media (e.g. SD cards).

If a project owner requests that their consent records are deleted, those records will be removed from the network storage media, and any the backup storage media returned to the project owner.

Projects

The creation of each individual project will also involve the purchase or assignment of at least one project-specific physical media storage unit (e.g. a hard drive). This drive will be physically located within the network-attached storage system, with initial access permissions only for the initially assigned worker (who may grant consented access to other workers as the project progresses).

The physical media storage unit should be initially empty; this may not be the case for previously-used units provided by the project owner. The unit will be initially checked to verify that this is the case, and consent requested to wipe existing data on the unit. Where consent is not provided, the unit will be returned to the project owner, and a new unit purchased for use with the project.

Any requests for project data from other groups or individuals will be forwarded onto the project owners for consent.

At the conclusion of a project, all physical media storage units (including any additionally used during the course of a project) will be returned to the project owner. Consent will be requested again for the use of any project-related public scripts or whakapapa. Where consent is not provided, those scripts and whakapapa should be deleted.

Project storage units will not be backed up within our own file systems. This reduces the risk of data persistence after the conclusion of a project. However, project owners can bring their own storage media into the organisation office so that we can back up project information onto that media. [suggested by Jessica Eccles]

Whakapapa

Electronic whakapapa records should be kept separately from other data. Each project should have an associated linked whakapapa record in the public electronic storage system, indicating the owner of the project information (or raw data), and the initially assigned worker. Where consent is provided, additional whakapapa relating to the project may also be stored in the public system, or it may be stored separately on the project-specific system.

We will try to ensure that the public whakapapa record is not lost. Where possible, the records will be backed up: copies of the record will be made at regular intervals, and a revision history will be kept to help demonstrate the growth of the whakapapa record over time.

Project specific public whakapapa records will be duplicated on project specific storage units, but will not be otherwise backed up, but will have revision history preserved. This makes it easier to remove those project specific records where consent is revoked in the future (bearing in mind that public access from other individuals cannot be reversed).

Analysis

Common analysis programs and scripts will be available to all employees, and also made publicly available. The programs and scripts will be backed up: copies of the programs and scripts (tagged with any associated version information) will be made at regular intervals, and a revision history will be kept to help demonstrate the progression of analysis over time.

Where consent is provided, additional project-specific scripts may also be stored in the public system, or they may be stored separately on the project-specific system.

Project specific scripts will be duplicated on project specific storage units, but will not be otherwise backed up, but will have revision history preserved. This makes it easier to remove those project specific records where consent is revoked in the future (bearing in mind that public access from other individuals cannot be reversed).

Continuation*

* a more appropriate word is desired

We want projects results to be used by project owners after the conclusion of the projects. Towards this end, project storage units will by default include a skeleton operating system to allow hosting of data and applications. This will allow, for example, project hard drives to be shipped overseas, dropped into a computer as its main drive, and used for exploring project data and results.

[Based on a discussion with Laura Boykin about microbial databases]