From learning to use genome curation tools for depositing sequences on the African and global Pathogen Data Sharing and Archive platform to understanding data quality standards and formats, African scientists are gaining new insights from training courses offered by Africa CDC.
In collaboration with the National Center for Biotechnology Information (NCBI), National Library of Medicine, USA, a workshop was held from June 18-20, 2024, drawing participants from public health institutions and laboratories in 20 African countries.
Integral to Africa CDC's African Pathogen Data Sharing and Archive Platform, also known as Agari, the workshop was the first offered by Africa CDC aimed at accelerating data verification and validation before sharing for public health use.
Agari, developed in 2023, is a continental platform intended for use by national public health institutions, national reference laboratories, and research and academic institutions around Africa to upload, manage, and share pathogen sequence and associated metadata to effectively respond to public health threats across African Union Member States in a coordinated manner.
Professor Alan Christoffels, director of the South African National Bioinformatics Institute and a senior advisor in genomics and bioinformatics to Africa CDC, said data curation is key to the successful setup and use of Agari as it ensures the accuracy and usability of public health data put into the system. Trained data curators and training in data curation are both hard to come by in Africa, he said. "Ideally, we should be growing a community of data curators in Africa to support reproducible data for public health use and benefit sharing," Christoffels added.
The workshop is the first step in building such a community and strengthening the collaboration with NCBI. Participants undertook a series of lectures combined with extended hands-on sessions, exploration of established best practices in data curation and web-based data curation tools, and familiarization with the step-by-step processes of data cleaning, formatting, and sharing with national, regional, and global repositories.
"It's incredible to know that I have been underutilizing those tools available on the platform," said Olusola Akanbi, an infectious diseases expert heading the genomics unit at Nigeria's Centre for Disease Control and Prevention in Abuja. For some years, even as an undergraduate, Olusola, who attended the workshop and works on genomic sequencing of pathogens of interest in Nigeria, has used the NCBI platform. "One take-home message for us is that we collect a lot of samples in the field, analyze and publish them, but forget the metadata, which renders our samples or efforts futile and we cannot link our results to where they're from," she said. The NCBI database has a lot of tools that can help reduce the turnaround time when it comes to data analysis and availability, she added.
Tholwana Pelokgosi, who handles biological data at the National Public Laboratory in Botswana, said the workshop equipped her with knowledge about NCBI databases such as the GenBank and PubMed. "Understanding these resources will help me streamline data retrieval and analysis better," she said. "We were taught that it is vital to ensure that the data submitted is of good quality, meets the required standards and formats, and is easily accessible, ensuring consistency and reliability of good value to the scientific community," Pelokgosi said.
The creation of genomic data is very important for a small Genomic Laboratory research group coordinated by Ako Aristide Berenger, a researcher from the Institut Pasteur in Côte d'Ivoire. The Institut Pasteur has been involved in genomic sequencing since the COVID-19 period in 2021 when there was a lot of national funding. The government financed the setup of the high-throughput sequencing platform with four laboratories, including the one Berenger coordinates. "We have been able to carry out high-throughput sequencing, contribute to the analysis of certain variants, and now we are taking up the challenge of capitalizing on everything we did during COVID-19 for other pathogens, particularly bacterial meningitis and certain multi-bacterial microorganisms," Berenger said.
"The scientific community needs to know what we're capable of doing, and that's where the interest in this workshop comes in. We learned how to use genome curation tools to deposit sequences, whether they're Sanger-type sequences or genomic-type sequences," he said. "We have been taught a number of tools and databanks that can help us popularize our work, so I think the contribution of this workshop will be felt in Abidjan. Having done the sequencing, we are now moving on to data analysis, web tools, data sharing, and data submission to keep up with the scientific revolution that Africa is embarking on," Berenger added.
"This training was beneficial to us," said Dachel Eyenet, a scientist in Congo Brazzaville at the National Public Health Laboratory, where his work involves physiognomic surveillance of pathogens responsible for diseases with epidemic potential, emerging and re-emerging diseases. "When we do our sequencing, we generally submit it to GISAID. The training helped us understand how to create bio projects, put our data or samples on the NCBI platform, and submit the sequences to the database," said Eyenet.
Dr. Dominique Anderson, a senior researcher at the South African National Bioinformatics Institute involved in biobanking, informatics, data quality, and data management, has been using the NCBI platform throughout her scientific career. She noted many improvements in the NCBI database. "It's great to have a refresher and get up to date with some of the new features incorporated into the database," she said. Anderson gained insights into some of the background developments in international databases that she can apply when building her own databases. The training highlighted the importance of making quality data available in scientific African research and provided an opportunity for African scientists to network. "Hopefully, we are going to form some partnerships and work with metadata in terms of our needs," Anderson said.
The training follows recommendations from the Public Health Alliance for Genomic Epidemiology (PHA4GE) data curation technical working group to develop the initial quality controls for data to be deposited into Agari and other platforms. The team included members from SANBI, PHA4GE, the Mozambican Instituto Nacional De Saúde, Morocco's Institut Pasteur du Maroc, the National Institute of Public Health in Uganda, and Senegal's Institut Pasteur Dakar. The group also developed a standard operating procedure (SOP) for standardizing metadata in Agari and other platforms. The SOP was based on work done by PHA4GE, a global coalition working to establish data standards. "This workshop is part of the Africa CDC - Africa PGI plan to train 100 data curators every year to accelerate and ensure genomic data quality in Africa," said Dr. Harris Onywera, a bioinformatics data scientist at Africa CDC.