UMass professor aims to make data more trustworthy Source: DAN LITTLE
University of Massachusetts assistant professor in the College of Information and Computer Science, Alexandra Meliou, left, works through data management research with Ph.D student Xiaolan Wangin Amherst.
Data is plentiful and easily shared, but that means errors and false information can be spread quickly, too.
Alexandra Meliou, assistant professor of computer science at the University of Massachusetts Amherst, recently received a $550,000 National Science Foundation grant to design and develop new technologies that will help those who work with data avoid those errors.
Some of them can have deadly consequences. Meliou gave the example of a person’s medical coverage running out because the patient’s date of birth is wrong or the wrong type of medication being prescribed because it is erroneously coded.
They can also cost money. Meliou said poor data quality costs the United States $600 billion per year.
“Data is critical in almost every aspect of society, including health care, education, economy and science,” she said. “However, because data is easily shared and reused, it has become less curated and less reliable. Data is often misused because its validity and origin are unclear, and mistakes easily propagate as data is often used to derive other data.”
Many researchers now looking at this problem make the mistake of looking at data out of context, according to Meliou, but her approach will take into account how the data is accumulated and shared.
Using the example of how dates are written differently in Europe and the United States, algorithms that collect data may obtain dates in one format and replicate them erroneously in the other, thereby changing a date of 12 April 2015 (12/04/2015) to December 12, 2015.
Meliou compared her work to a medical X-ray on data. “I don’t want to look at data at the surface; I want to see what the underlying problem might be,” she said.
As part of her research, Meliou is working with partners in industry, including Google.
Xin Luna Dong, a research scientist at Google, is one of the ones working with Meliou. She is helping construct the company’s Knowledge Graph, which she described as a knowledge base, a repository of data used by a computer system.
It appears in a Google search in different ways, including as basic facts that appear at the top of a search page or as a list of images at the top of a search page, for example, of museums in New York City or top jazz musicians.
“We need to ensure both that the knowledge is from authoritative sources, and that the data we gather is error free,” Dong wrote in an email.
Data quality is the most important issue in building the Knowledge Graph, and Meliou’s research addresses both finding good sources of data and diagnosing errors the data, she wrote.
“The web allows people to freely share data, but meanwhile makes it challenging to separate the wheat from the chaff,” she wrote. “The knowledge bases built by Google and other companies aim at providing more precise fact knowledge and presumably will alleviate the situation.”
Data used to be more tightly guarded and controlled, and as a result could be more trusted, Meliou said. With more sources, there is more room for error, but trust is still being placed in data, she said.
As data is becoming increasingly available, there are more innovations that rely on accurate data, according to Meliou. One example is self-driving cars, which use map data to operate, she said.
Meliou, whose grant will last five years, said her work would not likely result in one system that would eliminate errors, but instead a series of tools that data scientists could use to be able to have more faith in the data they use and to identify errors.
“The type of work that I do lies between theory and practice,” she said. “We always pick problems that are motivated by practical applications. I love to talk to people about their problems.”
Meliou joined the UMass Amherst College of Information and Computer Science in 2012 from a postdoctoral research position at the University of Washington. She received her Ph.D. in Computer Science from the University of California Berkeley in 2009.
In 2013, she was awarded a Google Faculty Research award.
Meliou admits that her task is a difficult one and that as data increases, it will remain a challenge to root out errors. At the same time, she said it is something that must be done.
She added that while humans can be the source of problems, it is important for humans to remain involved in data processing.
“I never want to take people out of the process; I never want to take people out of data, but I want to help people deal with their data and do tasks more efficiently,” she said. “I don’t expect the system to go and do everything by itself, but give the application developer clues so they don’t have to look at billions of data points.”
| }
|