Amit Kumar

United Kingdom

Software Engineer at Quansight and creator of cirun.

Tackling Malaria with Python and distributed computing Talk

English language

Amit Kumar

In this talk I’ll talk about how we leveraged tools in the PyData ecosystem to be able to analyze genomics datasets containing over 100 million variants x 100 samples.

In large-scale genomics analysis pairwise distance is a common technique to reduce samples, which is a very time-expensive calculation. I will demonstrate how we solve the scaling problem in sgkit by taking pairwise distance as an example using Numba and Dask using map reduce algorithms on cpu and gpu hardware.