WGIロゴ
Connect with us

WellGreen-i Co., Ltd.

― Enhancing the Accuracy and Reliability of AI Text Mining (A2K/LA2K) Results through WGI’s Manual Curation ―

◆ In the following descriptions, the letter following each “Task” (e.g., Task A) corresponds to the codes (A–Q) listed under “Outsourcing task(s)” in the Contact Us form.

Since A2K/LA2K technologies are based on computational processing, they do not guarantee the same level of accuracy as manual literature reviews conducted by experts.
◆ Example of Misinterpretation by A2K

For example, a certain document (here) contains the following sentence:
“However, all CI chondrite samples show evidence of extensive aqueous alteration on their parent asteroid(s)10,11, and although the presence of extra-terrestrial organic molecules has been demonstrated in these meteorites12–14, the question of how much of this alteration may be due to terrestrial contamination and weathering has not been resolved15–17.”

When applying A2K analysis to this sentence, the following A2K Description is obtained:
Subject: "the presence of extra-terrestrial" » Action: "has been demonstrated" » Process: "in these meteorites12–14"

In this result, Subject: "the presence of extra-terrestrial" is missing "organic molecules."
Additionally, Process: "in these meteorites12–14" includes numerical strings (12–14), which are citation numbers within the text. When numbers are directly attached to words, it becomes difficult to distinguish whether they are reference numbers or part of proper nouns such as gene names.

  • In A2K misinterpretations, most errors are easily identifiable, such as Subject or Process not being proper noun phrases.
  • The summary results provided by WGI include both A2K Descriptions and the corresponding original sentences.
  • By referencing the original sentences included in the output, users can quickly and easily access accurate key information (note: A2K/LA2K analysis is limited to publicly accessible text information).
  • Misinterpretations by A2K are expected to be reduced through improvements in training data and A2K engine refinements.
The above sentence is quoted and restructured from the following publication: Gregorio et al., 2024, Nature Communications. DOI: 10.1038/s41467-024-51731-w
This content is reused and structurally reconstructed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

◆ Improving Precision and Reliability of A2K/LA2K Results through Manual Curation (Task C)
◆ High-Quality Refinement of A2K Descriptions by WGI’s Skilled Curators

Errors included in the output list of A2K Descriptions can be corrected and refined through manual curation by expert curators. WGI’s skilled curators not only perform manual correction but also apply language processing on Linux to quickly and accurately handle massive output lists. Therefore, through WGI’s manual curation services, we can eliminate redundant outputs, correct errors more quickly and accurately than by manual work alone, and provide high-quality summarized information and statistical analysis results.

◆ Example of High-Quality Refinement of A2K/LA2K Analysis Results by WGI’s Manual Curation

As an example of misinterpretation by A2K analysis, the following sentence from the Abstract of a document (here) is shown:

“Based on these observations, we conclude that overexpression of SiDHN gene can promote cold and drought tolerance of transgenic tomato plants by inhibiting cell membrane damage, protecting chloroplasts, and enhancing the reactive oxygen species scavenging capacity.”

In the currently released version of A2K, the following highlighted sections are extracted as A2K Descriptions from the sentence. However, the true knowledge-based information within the sentence is not in the estimated Subject and Action, but is contained within the Process.

“Based on these observations, Subject: we Action: conclude Process: that overexpression of SiDHN gene can promote cold and drought tolerance of transgenic tomato plants by inhibiting cell membrane damage, protecting chloroplasts, and enhancing the reactive oxygen species scavenging capacity.”

Through WGI’s manual curation services, such misinterpretations by A2K/LA2K are comprehensively detected by combining manual review with high-efficiency language processing on Linux, and corrected to the following accurate A2K Description:
“Based on these observations, we conclude that overexpression of Subject: SiDHN gene can Action: promote cold and Process: drought tolerance of transgenic tomato plants by inhibiting cell membrane damage, protecting chloroplasts, and enhancing the reactive oxygen species scavenging capacity.”

*At this point, whether to include "cold and" within the Process, or to extract a separate A2K Description such as "cold tolerance of transgenic tomato" as the Process from this sentence, depends on the summarization policy.

In this way, by applying editing by WGI’s experienced curators to A2K/LA2K analysis results that may contain misinterpretations, it is possible to obtain highly reliable summaries of knowledge-based information. As a result, it becomes possible to develop foundational information that can be immediately utilized on-site, directly contributing to rapid project advancement and the achievement of superior outcomes, thereby strengthening competitiveness.

— The above sentence is quoted from the abstract of the following publication:
Guo X, Zhang L, Wang X, Zhang M, Xi Y, Wang A, Zhu J. (2019)
Overexpression of Saussurea involucrata dehydrin gene SiDHN promotes cold and drought tolerance in transgenic tomato plants.
PLoS ONE 14(11): e0225090.
https://doi.org/10.1371/journal.pone.0225090

This citation is from an open-access article provided under the Creative Commons Attribution 4.0 International License (CC BY 4.0).