New computer method to disambiguate namesakes

New computer method to disambiguate namesakes

New computer method to disambiguate namesakes

Scientists have developed a novel machine-learning method that can differentiate between people with the same name.

All individuals are unique, but millions of people share names. How to distinguish between - or disambiguate - people with common names has always perplexed researchers.

This conundrum occurs in a wide range of environments, from the bibliographic to law enforcement and other areas.

Now, scientists from the Indiana University-Purdue University Indianapolis (IUPUI) in the US have developed a novel machine-learning method to provide better solutions to this problem.

The new method is an improvement on currently existing approaches of name disambiguation because it works on streaming data that enables the identification of previously unencountered names.

Existing methods can disambiguate an individual only if the person's records are present in machine-learning training data, whereas the new method can perform non-exhaustive classification so that it can detect the fact that a new record that appears in streaming data actually belongs to a fourth person, even if the training data has records of only three different persons.

"Non-exhaustiveness" is a very important aspect for name disambiguation because training data can never be exhaustive, as it is impossible to include records of all living individuals.

"We can teach the computer to recognise names and disambiguate information accumulated from a variety of sources - Facebook, Twitter and blog posts, public records, and other documents - by collecting features such as Facebook friends and keywords from people's posts using the identical algorithm," said IUPUI associate professor Mohammad al Hasan.

"Our proposed method is scalable and will be able to group records belonging to a unique person even if thousands of people have the same name, an extremely complicated task.

"Our innovative machine-learning model can perform name disambiguation in an online setting instantaneously and, importantly, in a non-exhaustive fashion," said Hasan, who led the study.

"Our method grows and changes when new persons appear, enabling us to recognise the ever-growing number of individuals whose records were not previously encountered, he said.

Some names are more common than others, so the number of individuals sharing that name grows faster than other names.

While working in a non-exhaustive setting, our model automatically detects such names and adjusts the model parameters accordingly, researchers said.