― Identification of Regulatory Factors Using WGI’s AI Text Mining and AI Analytical Technologies ―
◆ In the following descriptions, the letter following each “Task” (e.g., Task A) corresponds to the codes (A–Q) listed under “Outsourcing task(s)” in the Contact Us form.
Issues with Conventional Cis-Regulatory Element Prediction Methods
Conventionally, cis-regulatory elements have been predicted by identifying highly frequent DNA motifs within the genome. However, since most of these frequent motifs are not actual cis-regulatory elements, the majority of predictions using conventional methods result in false positives.
Furthermore, conventional methods widely use sequence logos (see figure below) to represent cis-regulatory element sequences predicted from alignments of frequent DNA sequence patterns.
However, in the example shown in the figure below, there are polymorphisms at the third site (A or T) and the fourth site (G or T).
Using sequence logos inherently implies four possible combinations (haplotypes) arising from these two polymorphic sites.
However, sequence logos lose the information about which haplotypes are actually present in the aligned sequences.
This loss of sequence information hinders efficient genomic DNA design and transcription factor prediction, making it a suboptimal approach.
High-Precision Prediction of Transcription Factors and Cis-Regulatory Elements Based on WGI’s Proprietary AI Technology (Task B, C, I)
At WGI, we compile comprehensive information on regulatory elements, including transcription factors and cis-regulatory elements.
Using our AI text mining (LA2K) technology, we collect knowledge-based information on regulatory factors, including the experimental methods used for their identification or inference (e.g., promoter deletion analysis, reporter assays, ChIP-seq), allowing users to assess the reliability of the information.
Furthermore, WGI applies AI-driven sequence analysis techniques to predict cis-regulatory element sequences and their corresponding transcription factors, although prediction accuracy may vary depending on the species.
◆Knowledge-Based Compilation of Regulatory Factors Using WGI’s Proprietary AI Text Mining (LA2K) Technology
At WGI, we collect and organize information on transcription factors and cis-regulatory elements that regulate gene expression, using our AI text mining platform (LA2K) (Task B) together with expert manual curation (Task C).
This information also includes details on experimental methods used for the identification or prediction of regulatory factors (such as promoter deletion analysis, reporter assays, and ChIP-seq), enabling evaluation of data reliability.
◆Compilation of Regulatory Factor Information Using WGI’s Proprietary AI-Driven Sequence Analysis Technology
WGI employs proprietary AI-driven sequence analysis technology to collect information on transcription factors and cis-regulatory elements involved in gene expression regulation.
Prediction accuracy varies depending on the biological species.
◆Integration of Biological Relationships, Functions, and Intra-/Inter-Species Homolog Information for Numerous Genes
By utilizing gene networks (see figure below) and gene-compound (metabolite) networks, WGI enables rapid and easy visualization of relationships among genes and compounds, even at the genome-wide scale.