Step 2 — Motif Search
Scans every promoter for transcription-factor binding motifs and computes a hypergeometric over-representation p-value per motif.
Inputs
Target FASTA — usually the
promoters.fafrom Step 1.Motifs — any combination of:
Free-text IUPAC consensus (one per line,
NAME SEQ)MEME file
Live import from PlantTFDB (157 species), AnimalTFDB (vertebrates + insects), JASPAR 2024, or HOCOMOCO v11.
Statistics
For each motif:
with
\(N\) |
Total number of promoters |
\(K\) |
Number of promoters in which the motif occurs at least once |
\(n\) |
Number of query promoters (e.g. a K-means cluster from Step 6) |
\(k\) |
Number of query promoters with a hit |
Multiple-testing correction: Benjamini-Hochberg (cis_gs.enrichment.core.bh_fdr).
Outputs
hits.csv— one row per gene × motif, with hit position, strand, raw and adjusted p-value.Significance Summary — collapsed table with one row per (gene × motif).
Gene-ID Resolution
Cis-GS adds three optional ID-mapping methods to bridge the common
NCBI LOC### ↔ species-database mismatch:
Column swap — append
XM_/XP_accessions to the exported CSV.Mapping CSV — user-supplied two-column lookup.
GFF3 Dbxref expansion — pull every synonym from
Dbxref=andlocus_tag=attributes in the annotation.