Term Extraction from Parallel Texts

One way to specify precise meanings is to use two words instead of one.   ‘Hide’ might mean a skin or ‘to conceal’.  ‘Skin’ might mean a piece of leather or ‘to flay’, to remove the skin of an animal.   But ‘hide/skin’ clearly means the outermost part of an animal or the leather coming from it, not ‘to conceal’ or ‘to flay’.   Word pairs (and sometimes triples) are fundamental in some of the research reported elsewhere in these pages.

Rather than using word pairs from one language, it is also effective to use pairs from two or more languages.   To make that easier for a person without vast linguistic abilities it is possible to extract pairs of words in two languages by comparison of parallel texts.  For this to succeed, non-bipartite weighted matching algorithms are needed.

In this work Biblical texts have been used because they are readily available in machine-readable form in many different languages.

Copyright © 2009   Douglas Pardoe Wilson

This entry was posted in Old Pages. Bookmark the permalink.

Leave a Reply