2.7. Identification and functional annotation of SNPs outliers
To detect footprint of genomic diversification and potential selection,
outlier loci were identified using two independent approaches: PCAdapt
software v 4.3.5 (Luu et al., 2017) and BayeScan v 2.1 (Foll and
Gaggiotti, 2008). The PCAdapt package implemented in R software detects
outlier loci based on PCA by assuming that markers excessively related
to population structure are candidates for potential adaptation. The
number of PCs to retain was chosen after checking score plots for
population structuring, setting maf=0.01 and distance to ‘mahalanobis’.
The distribution of loadings (SNP contribution to each PC) was uniform,
indicating no relevant LD effect. P-values were corrected for false
discovery rate (FDR) using a cut-off q< 0.01 for outlier
retention. BayeScan is designed to detect potential genetic loci under
selection by analyzing variations in allele frequencies among specific
groups with a Multinomial-Dirichlet model. The prior odd (PO) for
neutrality indicates the ratio of selected:neutral sites (e.g., 1:1000)
and provides a measure of uncertainty on the likelihood of the neutral
model compared to the selection model (Lotterhos and Whitlock, 2014).
The sensitivity of the analysis to the PO was evaluated using
alternative values (1:100, 1:10000). The final MCMC chain was run for 20
short pilot runs with 5000 integrations, 50000 burn-in, thinning
interval of 10, and PO set to 100. Loci were filtered for q-value
< 0.01.
PCAdat and Bayescan results were finally intersected, retaining outliers
with Fst > 0.8, thus providing a conservative set of
candidate loci. The above analysis was performed separately for each
dataset (GBS, RADseq and COMBINED data).
The candidate SNP outliers were annotated by cross-referencing the SNP
position against the GFF file of the rPodLil1.1 genome assembly
(Gomez-Garrido et al., 2023) for Gene ID association. For outliers
falling in protein-coding region, a Gene Ontology (GO) annotation was
performed, followed by a functional enrichment analysis with gGOSt in
g:Profiler
[https://biit.cs.ut.ee/gprofiler/gost] on
individual datasets (GBS and RADSeq). The Podarcis muralis genome
was used as a reference to determine the functional categories
(Biological processes (BP), molecular functions (MF) and cellular
components (CC)) that were significantly enriched (FDR <
0.05). For the subset of outliers falling within coding regions (CDS),
we annotated the codon position and assessed whether the alternative
allele (ALT) translated into a synonymous or nonsynonymous substitution
with respect to the reference position in the genome (REF).