Skip to content Skip to sidebar Skip to footer

How To Match Overlapping Keywords With Regex

This example finds only sam. How to make it find both sam and samwise? var regex = /sam|samwise|merry|pippin/g; var string = 'samwise gamgee'; var match = string.match(regex); cons

Solution 1:

You can use lookahead regex with capturing group for this overlapping match:

var regex = /(?=(sam))(?=(samwise))/;
varstring = 'samwise';
var match = string.match( regex ).filter(Boolean);
//=> ["sam", "samwise"]
  • It is important to not to use g (global) flag in the regex.
  • filter(Boolean) is used to remove first empty result from matched array.

Solution 2:

Why not just mapindexOf() on array substr:

varstring = 'samwise gamgee';
var substr = ['sam', 'samwise', 'merry', 'pippin'];

var matches = substr.map(function(m) {
  return (string.indexOf(m) < 0 ? false : m);
}).filter(Boolean);

See fiddleconsole.log(matches);

Array [ "sam", "samwise" ]

Probably of better performance than using regex. But if you need the regex functionality e.g. for caseless matching, word boundaries, returned matches... use with exec method:

var matches = substr.map(function(v) {
  var re = newRegExp("\\b" + v, "i"); var m = re.exec(string); 
  return (m !== null ? m[0] : false);
}).filter(Boolean);

This one with i-flag (ignore case) returns each first match with initial \bword boundary.

Solution 3:

I can't think of a simple and elegant solution, but I've got something that uses a single regex:

functionquotemeta(s) {
    return s.replace(/\W/g, '\\$&');
}

let keywords = ['samwise', 'sam'];

let subsumed_by = {};
keywords.sort();
for (let i = keywords.length; i--; ) {
    let k = keywords[i];
    for (let j = i - 1; j >= 0 && k.startsWith(keywords[j]); j--) {
        (subsumed_by[k] = subsumed_by[k] || []).push(keywords[j]);
    }
}

keywords.sort(function (a, b) b.length - a.length);
let re = newRegExp('(?=(' + keywords.map(quotemeta).join('|') + '))[\\s\\S]', 'g');

let string = 'samwise samgee';

let result = [];
let m;
while (m = re.exec(string)) {
    result.push(m[1]);
    result.push.apply(result, subsumed_by[m[1]] || []);
}

console.log(result);

Solution 4:

How about:

var re = /((sam)(?:wise)?)/;
var m = 'samwise'.match(re); // gives ["samwise", "samwise", "sam"]
var m = 'sam'.match(re);     // gives ["sam", "sam", "sam"]

You can use Unique values in an array to remove dupplicates.

Solution 5:

If you don't want to create special cases, and if order doesn't matter, why not first match only full names with:

\b(sam|samwise|merry|pippin)\b

and then, filter if some of these doesn't contain shorter one? for example with:

(sam|samwise|merry|pippin)(?=\w+\b)

It is not one elegant regex, but I suppose it is simpler than iterating through all matches.

Post a Comment for "How To Match Overlapping Keywords With Regex"