×
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT

Scouring Web to make new words 'lookupable'

Erin McKean's campaign seeks to unearth 1 million English words
Last Updated 11 October 2015, 18:37 IST
A couple of weeks ago, two of my New York Times colleagues chronicled digital culture trends that are so newish and niche-y that conventional English dictionaries don’t yet include words for either of them.

In an article on September 20, Stephanie Rosenbloom, a travel columnist, reviewed flight apps that try to perfect “farecasting” – that is, she explained, the art of “predicting the best date to buy a ticket” to obtain the lowest fares.

That same day Jenna Wortham, a columnist for The Times Magazine, described a phenomenon she called “technomysticism,” in which Internet users embrace medieval beliefs, spells and charms.

These word coinages may be too fresh – and too little used for now – to be of immediate interest to major English dictionaries. But Erin McKean, a lexicographer with an egalitarian approach to language, thinks “madeupical” words such as these deserve to be documented.

McKean started a campaign last month on Kickstarter, the crowdfunding site, to unearth 1 million “missing” English words – words that are not found in traditional dictionaries. To locate the underdocumented expressions, she has engaged a pair of data scientists to scrape and analyse language used in online publications. McKean said she planned to incorporate the found words in Wordnik.com, an online dictionary of which she is a co-founder.

“We really believe that every word should be lookupable,” McKean told me recently. “That doesn’t mean that every word should be used in every situation. But we think that people by and large are entirely capable of making that decision for themselves.”

Before her analytics project gets underway next month, McKean is crowdsourcing a list of missing words for possible inclusion in Wordnik. Candidates so far include: procrastatweeting, dronevertising and roomnesia, a condition in which people forget why they walked into a room.

McKean, a former editor of the New Oxford American Dictionary, and two colleagues introduced the Wordnik site in 2009 with the aim of addressing some limitations they had encountered while working for dictionary publishers.

Traditional print dictionaries employ lexicographers to track and assess words, selecting the worthiest candidates to be included in published editions. But printed lexicons naturally have limited space. And with only periodic updates, they are not intended to keep pace with contemporary spoken language.

In a recent quarterly online update, the Oxford English Dictionary added the word “hoverboard” – 26 years after the floating skateboards were first mentioned in the movie “Back to the Future II.”

An editor’s note explained that the OED had decided to add “hoverboard” now because the dictionary’s word-monitoring system had recently detected an increased use of the term, most likely, the note says, related to a 2015 date that is an important plot element in the film. (It doesn’t always take decades to document a new word. The OED added “podcast” in 2008 just four years after it says the word emerged.)

With no space limitations or publication deadlines, Wordnik is able to incorporate a vast number of new words on a continuing basis. In addition to human contributors, the site uses automated online searches to locate sentences that contain certain words on blogs, social media, news and other sites.

When a person looks up a term on Wordnik, the site displays full-sentence examples of its usage, taken from sources like The Huffington Post and Boing Boing. If the word has an entry in certain more traditional dictionaries, the site also provides that definition.

McKean said Wordnik had accumulated some information on 8 million words, both old and new. Its inclusive approach makes the site more of a word welcomer than a winnower. “The question is no longer, ‘Is this a good word?’” McKean said . “The question is: ‘What is this word good for? Is this word good for what I need?'”

She now plans to expand Wordnik’s word-acquisition system by turning to data analytics to pinpoint emerging terms, like farecasting, that writers explained in passing when they mentioned them. McKean refers to these readily available explanations as “free-range definitions.” They are easy to locate, she said, because writers often use stock phrases, like “also known as” or “scientists term this” to signal to their readers that they’re about to introduce a new or unfamiliar term.

To cast a wider net for her project, McKean has enlisted Summer.ai, a data analytics firm. The company plans to use computational techniques to analyse online publications for language structure and patterns – like quotation marks and dashes – that are likely to indicate new words accompanied by self-contained definitions.

Speed of adoption

Some lexicographers already track whether words are nearing the end of their useful life spans. But Manuel Ebert, a former neuroscientist who is the co-founder of Summer.ai, said the Wordnik research might help track the speed of new-word adoption.

“We can actually measure when words get adopted in mainstream lingo,” he said, by looking at when writers stop explaining neologisms like “infotainment” and start using them as if their meanings were commonly understood. “It will be interesting to see which words will very quickly get adopted and which words remain outsiders.”

Researchers like Paul Cook, an assistant professor of computer science at the University of New Brunswick in Canada, are using similar techniques to find other kinds of novel words.

Cook developed a programme several years ago to analyse posts on Twitter that included new lexical blends – like “jeggings,” a combination of jeans and leggings – and their definitions. Among other portmanteau words, his Twitter research turned up “awksome” (awkward plus awesome) and “hilazing” (hilarious plus amazing). He hopes eventually to use his programme to generate a blended-word lexicon.

“We could have some sort of automatically generated blend dictionary,” Cook said. “If you had information like this, some dictionaries might be interested in providing this kind of information, as opposed to none.”

This more-words-the-merrier approach is one that lexicographers like McKean favour. “Every new word added to the expressiveness of English adds to the things that it’s possible to say,” she says. “English already has one of the world’s largest installed user bases. So why wouldn’t we want to add to it?”

ADVERTISEMENT
(Published 11 October 2015, 17:31 IST)

Follow us on

ADVERTISEMENT
ADVERTISEMENT