<p>Scientists have developed a novel machine-learning method that can differentiate between people with the same name.<br /><br /></p>.<p>All individuals are unique, but millions of people share names. How to distinguish between - or disambiguate - people with common names has always perplexed researchers.<br /><br />This conundrum occurs in a wide range of environments, from the bibliographic to law enforcement and other areas.<br /><br />Now, scientists from the Indiana University-Purdue University Indianapolis (IUPUI) in the US have developed a novel machine-learning method to provide better solutions to this problem.<br /><br />The new method is an improvement on currently existing approaches of name disambiguation because it works on streaming data that enables the identification of previously unencountered names.<br /><br />Existing methods can disambiguate an individual only if the person's records are present in machine-learning training data, whereas the new method can perform non-exhaustive classification so that it can detect the fact that a new record that appears in streaming data actually belongs to a fourth person, even if the training data has records of only three different persons.<br /><br />"Non-exhaustiveness" is a very important aspect for name disambiguation because training data can never be exhaustive, as it is impossible to include records of all living individuals.<br /><br />"We can teach the computer to recognise names and disambiguate information accumulated from a variety of sources - Facebook, Twitter and blog posts, public records, and other documents - by collecting features such as Facebook friends and keywords from people's posts using the identical algorithm," said IUPUI associate professor Mohammad al Hasan.<br /><br />"Our proposed method is scalable and will be able to group records belonging to a unique person even if thousands of people have the same name, an extremely complicated task.<br /><br />"Our innovative machine-learning model can perform name disambiguation in an online setting instantaneously and, importantly, in a non-exhaustive fashion," said Hasan, who led the study.<br /><br />"Our method grows and changes when new persons appear, enabling us to recognise the ever-growing number of individuals whose records were not previously encountered, he said.<br /><br />Some names are more common than others, so the number of individuals sharing that name grows faster than other names.<br /><br />While working in a non-exhaustive setting, our model automatically detects such names and adjusts the model parameters accordingly, researchers said.</p>
<p>Scientists have developed a novel machine-learning method that can differentiate between people with the same name.<br /><br /></p>.<p>All individuals are unique, but millions of people share names. How to distinguish between - or disambiguate - people with common names has always perplexed researchers.<br /><br />This conundrum occurs in a wide range of environments, from the bibliographic to law enforcement and other areas.<br /><br />Now, scientists from the Indiana University-Purdue University Indianapolis (IUPUI) in the US have developed a novel machine-learning method to provide better solutions to this problem.<br /><br />The new method is an improvement on currently existing approaches of name disambiguation because it works on streaming data that enables the identification of previously unencountered names.<br /><br />Existing methods can disambiguate an individual only if the person's records are present in machine-learning training data, whereas the new method can perform non-exhaustive classification so that it can detect the fact that a new record that appears in streaming data actually belongs to a fourth person, even if the training data has records of only three different persons.<br /><br />"Non-exhaustiveness" is a very important aspect for name disambiguation because training data can never be exhaustive, as it is impossible to include records of all living individuals.<br /><br />"We can teach the computer to recognise names and disambiguate information accumulated from a variety of sources - Facebook, Twitter and blog posts, public records, and other documents - by collecting features such as Facebook friends and keywords from people's posts using the identical algorithm," said IUPUI associate professor Mohammad al Hasan.<br /><br />"Our proposed method is scalable and will be able to group records belonging to a unique person even if thousands of people have the same name, an extremely complicated task.<br /><br />"Our innovative machine-learning model can perform name disambiguation in an online setting instantaneously and, importantly, in a non-exhaustive fashion," said Hasan, who led the study.<br /><br />"Our method grows and changes when new persons appear, enabling us to recognise the ever-growing number of individuals whose records were not previously encountered, he said.<br /><br />Some names are more common than others, so the number of individuals sharing that name grows faster than other names.<br /><br />While working in a non-exhaustive setting, our model automatically detects such names and adjusts the model parameters accordingly, researchers said.</p>