A serverless variant caller: using lithops to port genomics pipelines to the cloud
As part of EU project “Cloudbutton: Serverless Data Analytics Platform”, the potential for the lithops serverless computing framework to improve performance of genomics pipelines in the cloud was investigated. For testing purposes, the process of variant calling was chosen, given it is a key step in genome analysis, and has important applications in personalised medicine. Thus, a variant caller bioinformatics pipeline was developed using a map-reduce reduce approach, whereby the fasta file (genome reference) and fastq file(s) (sequencing reads) are split into smaller chunks that can then be processed in parallel, launching an alignment workflow that uses the gem3-mapper. Aligned fastq chunks are further processed to remove any spurious alignments arising from the same read mapping to different fasta chunks, and then the corrected alignment is parsed by the map functions and sent to the reduce function, for variant calling using SiNPle. This approach improves performance by reducing wall-clock time, enabling scalability and elasticity in the processing of large genomic datasets in a cloud-native fashion, and is compatible with the serverless infrastructure of the main cloud providers (AWS, Google, IBM).