Software Engineer at Quansight and creator of cirun.
English languageAmit Kumar
In this talk I’ll talk about how we leveraged tools in the PyData ecosystem to be able to analyze genomics datasets containing over 100 million variants x 100 samples.
In large-scale genomics analysis pairwise distance is a common technique to reduce samples, which is a very time-expensive calculation. I will demonstrate how we solve the scaling problem in sgkit by taking pairwise distance as an example using Numba and Dask using map reduce algorithms on cpu and gpu hardware.