Researchers at the UNM Cancer Research & Treatment Center are working with computational linguists at New Mexico State University on a "smart" computer system that will speed up the process of searching volumes of data for help in analyzing new genetic research.
"We realize the importance of directly combining the data generated by new experiments with the knowledge stored in the published literature to speed up our ability to analyze experimental results," said Cheryl Willman, M.D., director of the UNM CRTC.
"Computer tools that can deeply understand these texts will enable searches of tremendous power for researchers," Willman continued. "In fact, access to these tools will increase the effectiveness of our senior researchers and enable junior researchers to perform at what we would consider to be very expert levels."
Researchers at NMSU's Computing Research Laboratory already have developed a computer interface to help search vast databases of information relevant to cancer research, using a grant from Sandia National Laboratories. The next stage will be to make the system smarter by adding knowledge of biochemistry and oncology, and a language analysis component, so the system can search more intelligently.
Joint research between UNM and Sandia has shown how critical data mining of published literature is going to be, said George Davidson, researcher at Sandia. Davidson said the next step, which leads from expression studies to mechanisms and drug targets, will have to make extensive use of prior research. Text mining will be the enabling tool.
"The collaboration with NMSU is already producing results, and we can see how it will lead to a real breakthrough, which will enable researchers to make rapid headway," Davidson said. "Powerful text mining will multiply the value of the human genome project by a hundredfold."
A smarter system would be a tremendous help to UNM CRTC researchers who, for example, have analyzed DNA samples from children with leukemia and found that certain genes tend to be active in these children. To determine why the genes might be active, the researchers turn to huge databases of information, looking for connections and clues in the findings of other researchers.
Computers can be programmed to search databases rapidly, as anyone who has used an Internet search engine knows. But to be of much value in medical research, the computer program has to do more than search for key words. It needs to be smart enough to make the right assumptions and recognize concepts and relationships in the text that a human with expert knowledge would recognize.
"A lot of research needs to be done to enable us to capture even a fraction of the knowledge and expertise of biochemists and oncologists," said CRL Director Jim Cowie. "We would like to build prototypes, each one of which is useful. We can add features in stages."
Already applying text analysis to areas such as information retrieval and machine translation of languages, the CRL wants to expand its technologies to the areas of disease and genetics.
The linguists will create computer programs that can conduct semi-autonomous investigations. The programs will also find and summarize information based on an individual researcher's interests and automatically create links to the literature. Programs also will be written to detect "events of interest" to the individual researcher and track changes in documents the researcher has used in the past.
New Mexico State University's Computing Research Laboratory is a non-profit, self-supporting research enterprise committed to basic research and software development in advanced computing applications. Among the core areas of research at CRL are artificial intelligence, computational linguistics and human-computer interaction. Information is available on the Web at http://crl.nmsu.edu/.
The UNM Cancer Research & Treatment Center, founded in 1972, is the only academic health care facility in the state dedicated to both cancer research and patient care. For more information, visit http://hsc.unm.edu/crtc/
Contact: Lynn Melton, 272-3322