
CANSSI Ontario is pleased to announce Jorge Alexander Rojas Vargas as the recipient of a CANSSI Ontario Postdoctoral Fellowship in Genome Data Science for his project, “Unveiling Microbial Dark Matter: A Novel Embedding Approach for Metagenome Analysis.”
Jorge Alexander Rojas Vargas is a Postdoctoral Associate in the Departments of Biology and Microbiology & Immunology at Western University. Working under the supervision of Drs. Art Poon, Jessica Prodger, and Vera Tai, this project aims to combine microbiology, bioinformatics, and microbial ecology with a general interest on the taxonomic and functional characterization of microbial communities in natural and human ecosystems.
Dr. Rojas Vargas holds an MSc in Sustainable Development and Environment from Universidad de Manizales (Colombia) and a PhD in Sciences (Biochemistry) from the National Autonomous University of Mexico (UNAM), where he designed a bacterial consortium for degrading crude oil in seawater. As part of his doctoral research, he also created HADEG, a publicly available, manually curated database of genes and enzymes involved in hydrocarbon degradation and biosurfactant production. HADEG has since become a valuable resource in environmental genomics. His PhD research received an honorable mention and recognition as finalist for best doctoral thesis by CINVESTAV (Mexico) in 2023.
He has published 13 peer‑reviewed articles, serving as first author on eight, including high‑impact studies on marine bacteria, genome mining, and microbial genomics. His postdoctoral work at Western University focuses on developing statistical and bioinformatic methods to analyze microbial “dark matter”, genetic sequences that match no known organism or function and often comprise the majority of microbiome datasets. He also contributes to studying the neovaginal microbiome in transfeminine individuals following gender‑affirming surgery.
Originally trained in Chemical Engineering and Philosophy, Dr. Rojas Vargas brings a unique interdisciplinary perspective to microbial research. As a first‑generation scientist, he is committed to equity in science and to mentoring the next generation of researchers. He has extensive teaching and supervision experience in Colombia and Mexico. His long‑term goal is to develop microbial and bioinformatic solutions for environmental and public health challenges.
Next-generation sequencing has dramatically expanded our ability to study microbial communities by enabling culture-independent exploration of genetic material from diverse environments. Despite rapid growth in sequencing technologies—particularly third-generation platforms like Pacific Biosciences and Oxford Nanopore—analytical methods have lagged behind, leaving much of the genomic data, especially genetic “dark matter,” unexplored. This dark matter refers to the large fraction of metagenomic sequences that cannot be annotated due to the absence of homologous reference genomes or known functions.
Current methods such as k-mer-based genomic signatures can partially address this gap but are limited by issues like data sparsity and overfitting. To overcome these limitations, Dr. Rojas Vargas proposes a novel analytical framework based on topic modeling—a technique rooted in natural language processing—to extract meaningful patterns from unassembled metagenomic data. Unlike standard methods (e.g. Latent Dirichlet Allocation, LDA), which struggle with biological noise and rigid assumptions, the proposed approach will be specifically adapted to handle the unique characteristics of metagenomic datasets.
Research Aims:
Method Development – Design and validate a topic modeling method to associate k-mers in raw metagenomic data with microbial community composition using simulated datasets.
Application to Real Data – Apply the method to published metagenomes from ocean, soil, and human gut samples to characterize the environmental distribution of genetic dark matter.
This work aims to advance our understanding of cryptic microbial diversity by uncovering latent genomic patterns and improving the ecological and functional interpretation of metagenomic data across ecosystems.
The CANSSI Ontario Postdoctoral Fellowship in Genome Data Science is designed to support the methodological work of an early-career investigator working in genomics and data science with an emphasis on new genomic technologies or multi-omic integration. The goal of the award is to attract and retain top-tier postdoctoral talent, both nationally and internationally.
The Fellowship offers two-year salary support for up to $50,000 CAD annually for postdoctoral fellows undertaking full-time research at a CANSSI Ontario partner university or their affiliated research institutes.
Candidates are responsible for selecting, contacting, and securing the commitment of two faculty members to jointly supervise them in their project, where at least one is a faculty member with a PhD in statistics, biostatistics, epidemiology, computational biology, genomics, or computer science. The second supervisor can be from any other field.
CANSSI Ontario is the Ontario Regional Centre of the Canadian Statistical Sciences Institute (CANSSI). Its goal is to strengthen and enhance research and training in data science by developing programs that promote interdisciplinary research and enable multidisciplinary collaborations.
CANSSI Ontario
10th Floor, Suite 10072
700 University Avenue
Toronto, ON M5G 1Z5