◆ In the following descriptions, the letter following each “Task” (e.g., Task A) corresponds to the codes (A–Q) listed under “Outsourcing task(s)” in the Contact Us form.
For example, a certain document (here) contains the following sentence:
“However, all CI chondrite samples show evidence of extensive aqueous alteration on their parent asteroid(s)10,11,
and although the presence of extra-terrestrial organic molecules has been demonstrated in these meteorites12–14,
the question of how much of this alteration may be due to terrestrial contamination and weathering has not been resolved15–17.”
When applying A2K analysis to this sentence, the following A2K Description is obtained:
Subject: "the presence of extra-terrestrial" » Action: "has been demonstrated" »
Process: "in these meteorites12–14"
In this result, Subject: "the presence of extra-terrestrial" is missing "organic molecules."
Additionally, Process: "in these meteorites12–14" includes numerical strings (12–14), which are citation numbers within the text.
When numbers are directly attached to words, it becomes difficult to distinguish whether they are reference numbers or part of proper nouns such as gene names.
Errors included in the output list of A2K Descriptions can be corrected and refined through manual curation by expert curators. WGI’s skilled curators not only perform manual correction but also apply language processing on Linux to quickly and accurately handle massive output lists. Therefore, through WGI’s manual curation services, we can eliminate redundant outputs, correct errors more quickly and accurately than by manual work alone, and provide high-quality summarized information and statistical analysis results.
As an example of misinterpretation by A2K analysis, the following sentence from the Abstract of a document (here) is shown:
“Based on these observations, we conclude that overexpression of SiDHN gene can promote cold and drought tolerance of transgenic tomato plants by inhibiting cell membrane damage, protecting chloroplasts, and enhancing the reactive oxygen species scavenging capacity.”
In the currently released version of A2K, the following highlighted sections are extracted as A2K Descriptions from the sentence.
However, the true knowledge-based information within the sentence is not in the estimated Subject and Action, but is contained within the Process.
“Based on these observations,
Subject: we
Action: conclude
Process: that overexpression of SiDHN gene can promote cold and drought tolerance of transgenic tomato plants by inhibiting cell membrane damage, protecting chloroplasts, and enhancing the reactive oxygen species scavenging capacity.”
Through WGI’s manual curation services, such misinterpretations by A2K/LA2K are comprehensively detected by combining manual review with high-efficiency language processing on Linux, and corrected to the following accurate A2K Description:
“Based on these observations, we conclude that overexpression of
Subject: SiDHN gene can Action: promote
cold and
Process: drought tolerance of transgenic tomato plants by inhibiting cell membrane damage, protecting chloroplasts, and enhancing the reactive oxygen species scavenging capacity.”
*At this point, whether to include "cold and" within the Process, or to extract a separate A2K Description such as "cold tolerance of transgenic tomato" as the Process from this sentence, depends on the summarization policy.
— The above sentence is quoted from the abstract of the following publication:
Guo X, Zhang L, Wang X, Zhang M, Xi Y, Wang A, Zhu J. (2019)
Overexpression of Saussurea involucrata dehydrin gene SiDHN promotes cold and drought tolerance in transgenic tomato plants.
PLoS ONE 14(11): e0225090.
https://doi.org/10.1371/journal.pone.0225090
This citation is from an open-access article provided under the Creative Commons Attribution 4.0 International License (CC BY 4.0).