I have a number of documents from different sources. Many of them reference a company name, but may have stored the information slightly differently. The name is a field in the documents.
I’d like to be able to detect variations on the same name, something like:
- Ajax Company Incorporated
- Ajax Co. Inc.
- Ajax Company Inc.
- Ajax Company
- Ajax Company (formerly Ajax Unlimited)
- etc
Does MarkLogic have any facility to query documents that have “similar” name as above? I’m not sure if there’s a more technical term that I should be searching for. Preferably for either the node client API or server-side js.
There are several options you could try, or combine:
owl:sameAs
triples, or you could make use of the MarkLogic thsr library.spell:double-metaphone
on each token in the name at ingest, and also on the search terms to search with those instead of the real name.Search term expansion sounds like most straight-forward in this case, particularly since you are talking about mere spelling differences of terms like ‘Company’ and ‘Incorporated’.
HTH!