GridION X5 - The Sequel

A presentation by Clive Brown, CTO Oxford Nanopore Technologies; notes by David Eccles, 2017-Mar-15. See the presentation here.

Preface

Some people watching this in Australia 3am [and David Eccles in New Zealand at 4am]
This is Clive's once-yearly update, given shortly before the main conference in May. A few things will be saved for the meeting in London
Company mission: enable anybody to sequence anything anywhere
Publications / elsewhere demonstrate that people are already attempting to use the MinION technology in lots of places
The technology is designed to be used anywhere, with very simple workflows.
Main device (MinION) is portable, with minimal capital cost
System is effectively real-time, or very close to real time
System is demand driven -- can be used, put down and taken up again many times
Read lengths are intrinsically long, limited only by sample prep
Accuracy is pretty good, and improving
Sequencer is good at cDNA; this aspect has probably been undersold
This talk doesn't cover everything available. Clive Brown has given a few other previous previous talks
The last google hangout at end of March [2016] is a good place to start

How nanopore sequencing works

proten pore embedded in membrane
array of membranes embedded in electrical sensor
speed at which sensors measures things is orders of magnitude faster than CCDs
after a second, have 1000b read, entirely available for analysis as the run proceeds
can pipeline bioinformatics pipelines based on sequencing
can make devices work together to a shared goal; methods continue
a piece of DNA will give you a reproducable signal that can be decoded into bases
currently running R9.4 on mem 10, motor E8 (phenomenally processive helicase)

decoding signals

originally done by HMM
explosion in past 2 years around neural networks
now good experience decoding signals using NNs
many methods in signal processing area that could be used
current methods can learn local context over a window

MinION device

At least ~4,000 MinIONs out now
Aluminium device
Inject liquid sample into system, pores devour sample in real time
512 channels running at once, can get at 450b/s/channel v. high throughput
Mark Ib -- no significant improvements planned in MinION

Technology workflow

Clive doesn't like slide: implies linear workflow
Need DNA
Variety of kits; simplest/snappiest uses transposome complex. On long DNA, can add in adapters; 5-10 mins
Lengthiest preps (ligation) take about an hour; working to smooth out variation in sample prep
Can take sample out, can flush sample out, can put more in, can muck with sample while it's in the system
While running, run NN basecaller
Calling can be done on the fly, or post-run
Programmable feedback loop; feature to come back with a vengeance in future

PromethION; elephant in the room

Cake-tin sized benchtop sequencer
Different ASIC, a few thousand channels, in aggregate 144,000 channels
Laid out to be pipette friendly
Can put sample in flow cell offline and while running
Bottom is compute module, can write data out at full-pelt to external storage
Box was designed when running at 30b/s, probably can't handle a full run for real-time basecalling
Running at full-pelt can produce a very large 233Gb per flow cell in 48h, 11Tb assuming 100% bandwidth utilisation, 3x a NovaSeq; right up there in that category of high-througphut sequencers
No chance of falling behind; not going out of date before the box is delivered

On-demand sequencing on PromethION

All kinds of workflow tricks to optimise pipelines
No need to wait for sample, can run 3 samples, 1 sample, 48, or multiplex
Can deal with lumpy demand
Turn-around time basically limited by postage
Can be shipping data back while running
If not enough slots, can just buy more sequencers
When fully deployed, will provide significant competitive advantage

PromethION Flow cell performance

A few bad channels, but mostly green
Numbers are high, yield numbers high, above the threshold for shipping

PromethION scaling

Not novices for scaling; Gordon worked on Glucose blood strips
Working to produce more flow cells

PromethION performance and yield

Key problem to do with flow cell blocking
Promethion typically 10Gb in 6 hours, aim >50Gb per flow cell
Firmware updates for higher
Can probably run for up to 4 days
Software mature, run in house all the time, evolution of MinION and GridION software
Control in a similar way to MinION, in paralle

Instrument shipping

1-2 per week, a bit slower than expected
expect all backorder done by Q3 (original prediction was Q3 2016)
Putting in software that lets ONT do remote firmware updates
Ability to swap out hardware/software very quickly

PromethION Flow cells

First shipping 3rd April to 12 sites
a little bit of hand-holding, will ramp up rate of flow cell shipping after that
Haven't had a single dropout from waiting list
target headroom performance is so high that it will not go out of date

PromethION design change

Compute module will not be able to keep up with 1,000 bases per second
Need a bigger box, getting too tall
Decided to move all of the computing into a separate box
Compute module becomes a switch that lets us stream data to a compute room
People who want to run multiple promethIONs can cable up and have processing elsewhere
Most people put PromethIONs on UPSs anyway, might as well put a compute module there as well
Can add in up to 80 TFLOPS of computing in compute module; can handle 1,000 b/s on a fully-running sequencer
Will map consensus callers and assemblers in box
Can use idle computing for local bioinformatics; e.g. containerised workflows
Network requirements much simpler for a single compute box
To be rolled out; upgrades included in the cost of purchase

What's inside the compute box

Cramming in FPGAs on PCIe cards, easily 60Mb/s, 80 TFLOPs
Allows to fully keep up with real-time basecalling on PromethION
Can also implement own proper version of things like read-until

PromethION evolution

Currently 24 flow cells enabled, can dump data to external disk
Q4 add compute module to enable all 48 flow cells with local basecalling
2018 additional evolution, rolled into current purchasing

Base call acceleration

NNs are fascinating things
Engineers have been able to produce versions that run pretty efficiently on FPGAs
Tend to get higher uplift than CPU/GPU; power cost is lower
Possible to use stripped-down, mini FPGAs for smaller devices

Base call acceleration design

Typically 1.8 events per base, very high requirements
MinION at full output needs 200 GFLOPs
writing OpenCL versions of base callers, unlikely to be as optimal as VHDL

Base call benchmarking

Working closely with Intel
MinION about 240kb/s
Promethion about 65,000 kb/s
Can only utilise around 10% of available CPU
on GPUs, seem to only be able to use about 2% of GPUs; not a good performance payoff
on i7-type CPUs, can process 200,000 bases/s, still only using ~10%
on Intel Arria / Stratix, getting to 1M-7.5M called bases per second, using 60% of available processing power
pretty confident that this is the way to go

PromethION base call implementation

current generation 9Mb/s, almost a Tb per day currently
second generation of cards up to 4Tb of real-time calling per day
looked at accelerator for MinION... but you can do that
coding in the background a dongle, either separate or intercalated between MinION and computer that will do local basecalling, and stream out reads to the computer

MinION compute requirements

Cloud base calling currently - Clive's fault that ONT did that
If you provide a safety net, people start to use it as a hammock
Cloud base calling will be discontinued at 21st March
MinKNOW now has integrated basecaller which will do it for you
A good high-performance laptop will be fine
Can just leave computer running, will keep basecalling after finish of run
Provide binary base caller
Writing to shared drive, another computer can do base calling
Most people getting 3-10Gb
Theoretical maximum 200kb/s, best internal about 100kb/s
laptop CPU can do about 40kb/s base calling
MinKNOW can deal with this

Accuracy / chemistry / algorithm

On 1D, R9.4 base calling modal accuracy of just over 90%, maybe a bit higher
old 2D system at 250b/s

Consensus accuracy

Basic message: accuracy improving, data amenable to polishing
Most errors now falling in / adjacent to homopolymers
Will think about releasing optimised consensus callers in a reasonable timeframe

Homopolymers

ONT's unfinished business is held up by detractors; fixed by ONT, then detractors move onto the next unfinished business
Scrappie package, learnings are migrating into MinKNOW base calling

Novel base calling

Working from raw data (more about that later)

Homopolymer

Recently held up by competitor as a systematic flaw; just another obstacle to overcome (e.g. black knight in Monty Python)
Scrappie doing fairly well, consensus calling of homopolymers can be done using scrappy output
not as mature as other callers, but methods will only improve

Base calling from raw signal

Clive hates event calling, has wanted to get rid of it
Current base callers just take raw signal, output base calls
Neural networks trained to optimally extract features from raw data, architecturally / conceptually better
Accuracy improves, scales to faster sequencing speeds
Can go straight from raw to FASTQ / SAM, recover about 80% of disk space
left with compressible integers in FAST5
Developer versions released Easter
base calling should just get better and better

Base caller landscape

Clive likes open development
Albacore is the production basecaller; can be run offline; fully-supported
Nanonet is a research base caller, available under open source, not supported
Scrappie is the New Kid on Block, limited support, available to everyone shortly
Standard workflow is MinKNOW + onward analysis
Can intercalate other basecallers; preferred by power users

Other accessory tools

Lamprey; file-watching wrapper for open-source base callers. Does what old cloud program did
If writing to shared drive, or external compute, can use Lamprey for local-cloud-type thing
A middle ground between basic and power users

Throughput

Clive took his own blood (with help), sequenced it himself, gets 20G per flow cell
Other customers typically between 3-11G; would be nice to get everyone up to 20-30Gb
This still only represents about 20-30% of what is possible
Lots of reasons, many to do with extraction; people not knowing how much DNA they're putting in
Trying to reach into upstream workflow; focusing now on good sample prep

Software improvements to throughput

DNA complex would occasionally wedge on top of pore; takes a few seconds, once in, won't come out
If caught quick enough, can do a bit of read-until and flick complex back out of pore
This is no longer the limiting factor

Read length

Read length = fragment length
If a pore is presented with megabase sequences, it will produce megabase reads
Other systems will fail due to photodamage
If you can figure out how to get molecules into the system, can produce reads
Josh Quick / Nick Loman managed long reads (>750kb), largely by avoiding pipetting
have accomplished N50s of 60/70kb
Probably no limit; limit is what can be put in. Clive expects 7Mb sequence should be able to be done
Some nutcases at ONT think you can do whole chromosomes
Need to take what is being learned and make it easier for everybody

Upcoming improvements for throughput and sensitivity

MinKNOW upgrade, improved unblocking
Lifetime of flow cell improved by 50%, yield per flow cell and cost per base goes up concommitantly
Working on releasing an official read-until
Working group looking on samples people are looking at, looking at best library prep to give best output

Improving / replacing 2D

Introduced in NY meeting; phasing out 2D sequencing, replaced with 1D^2
2D has always been a problem; strands covalently joined with hairpin
Accuracy plots have quite a different accuracy for template/complement
Bad structural effects that bugger-up the basecalled signal
can't get speed above 250b/s

1D^2

1D prep
As template strand is drawn in, other end gets closer and closer to pore
Finish first molecule, other molecule is sitting nearby on membrane
If second molecule hangs around long enough, it will be processed by the pore
This occurs naturally about 1% of the time, no joining between template/complement
Trick is to make the second molecule hang around longer

1D^2 consensus

Accuracy much sharper
2 strands look like and behave like individual molecules

Traces

Open pore current; drop as molecule goes into pore
When first molecule is traversed, current goes back up to open pore
Second molecule is complement of DNA
With some trickery, can make second molecule hang around for longer
60% of data comes in template/complement pairs; expect that ONT can get that higher
Can get very high 1D^2 yields at 450b/s

1D^2 accuracy

Modal accuracy of 97/98%, a proportion are above 99%
Long stretches of perfect data
Algorithm is not fully optimal
Would like modal accuracy to 99%
Base-caller was not Scrappie, so at least has homopolymer issues
Will need to change to R9.5 pore; better at capturing second signal
Can still generate 1D reads
consensus calling know what pairs are, helps polishing
metagenomics may be more important to look at single molecules
expect this will be forwardly-compatible with 1000 bases/s

1D^2 release

To developers 27th March, developer kit + base caller
general release to community 3rd May
2D kits discontinued on 5th May

New product; MinION well established

Over 4,000; just started pushing into China, India, Japan
workflow getting better, work to do on input material
Aim to make more runnable
MinION not licensed for service sequencing; makes sense to Clive & Spike, but not anybody else
MinION is your personal sequencer

Offerings

Huge performance gap between MinION and PromethION; bookending the space
PromethION might be too large (only 10/20 HiSeq 10)

GridION

Will make GridION X5 available
the sequel, because it follows on from both MinION / GridION

What is GridION

Original system that was proposed by Clive / ONT
Designed around loading membranes in the lab; tore up design in 2011 to change to loading at ONT
Concept is sound: large arrayable computers that can work together or individually on samples
For a long time, GridION wasn't taken off website

GridION X5

Bench-top format, a big MinION
5 individually-addressable flow cells
Inside, taken PromethION developments and shrunk down
FPGAs inside, real-time base calling for up to 1000b/s for 5 flow cells
Everything is in the box
Allows for small group-level or service sequencing

GridION Production

All mature, in build
Very highly-manufacturable design
In the zone

GridION Pricing

Two ways to buy: capital loaded, consumable loaded
Capital commitment of $125k, flow cells $300
Licensed for use as fee for service
At 10G per flow cell, $30 per Gb; at 20G/flow cell $15/Gb
Capital-free model $475 per flow cell with support fee
$47 per Gb at 10Gb per flow cell, $24 per Gb at 20Gb per flow cell

Also Nanopore service certified

Institute-wide service
will run training and QC certification process; contact support
enables you and customers to know that samples can be processed

Summary tables for 3 products

Should be product info on website

Shop opens for GridION next week

Expect to ship devices in May at the latest

Dates

March
1D^2 with developers
Lamprey developer release
Cloud base calling discontinued on 21st
April
Transducer (MinKNOW HP fix) on 20th
May
Broader 1D^2 release (3rd May)
GridION flow cells 15th
2D gone on the 5th

Lots of things Clive hasn't spoken about

A lot covered in more detail in London
Devlopment on targeted CAS9
Massively improving array sensitivity
CliveOME (replaced, with ultra-long reads)
wants to see about enriching immunoglobulin regions
wants centromere-spanning reads
Zumbador looking really exciting
has to become from a drop of blood to a genome
Flongle / SmidgION
Metrichor / Epi2Me; becoming completely separate from Nanopore

Questions

Devices for service: yes, but not MiNION
System suitable for 16s
Error rate improving
Guaranteed performance metrics: certification scheme will probably include this; should talk to Louisa
Scrappie avialable for developer licenses, some in MinKNOW
PromethION compute box can be made rack-mountable (part of roadmap, but need to see demand first)
Switch box / old compute module can have different output options
Optiomal temperature for sequencing - temp rises, sequencer runs quickly
- probably no problem at 50 degrees, not sure about colder
Lifetime for flow cells * up to 4 days
How much DNA is needed to make 20Gb?
answer will be forthcoming
MinION is available now in China
Preparation of 1Mb DNA: look at Nick Loman's blog; just be gentle
MinION on 3rd April - can draw down latest pore or not
same base-callers work on both pores; just rate of capture
Consumable & Capital models are both available in parallel
No need to implement PromethION technology on MinION
expect to improve throughput to 1000 bases/s
in principle 40Gb/day (60Gb/day theoretical max)
complement distorts DNA signal, probably by stretching
segmentation - no methods to deal with this yet, but nothing scary
Plan for clinical MinION use? probably
How close to routine culture-free WGS? Zumbador problem; probably pretty close
can pop cells, tagment
problem is sensitivity
in principle, if they can get to the membrane, will see a lot of them
trying to improve array sensitivity to deal with low inputs
drop of blood / spit into tube, wait 10 mins
Conversion of ONT terminology
Clive used Molarity when he did molecular biology
significant stumbling block for people
PromethION service fee - chat with ONT service team
PromethION based on R9.5 pores? No one has run PromethION/GridION in field
they will only ever be 1D or 1D squared
MinION accuracy for short fragments
pore accuracy is independent of length
aiming to remove capture advantage for short reads