How To Match Overlapping Keywords With Regex
Solution 1:
You can use lookahead regex with capturing group for this overlapping match:
var regex = /(?=(sam))(?=(samwise))/;
varstring = 'samwise';
var match = string.match( regex ).filter(Boolean);
//=> ["sam", "samwise"]
- It is important to not to use
g
(global) flag in the regex. filter(Boolean)
is used to remove first empty result from matched array.
Solution 2:
Why not just mapindexOf() on array substr:
varstring = 'samwise gamgee';
var substr = ['sam', 'samwise', 'merry', 'pippin'];
var matches = substr.map(function(m) {
return (string.indexOf(m) < 0 ? false : m);
}).filter(Boolean);
See fiddleconsole.log(matches);
Array [ "sam", "samwise" ]
Probably of better performance than using regex. But if you need the regex functionality e.g. for caseless matching, word boundaries, returned matches... use with exec method:
var matches = substr.map(function(v) {
var re = newRegExp("\\b" + v, "i"); var m = re.exec(string);
return (m !== null ? m[0] : false);
}).filter(Boolean);
This one with i
-flag (ignore case) returns each first match with initial \b
word boundary.
Solution 3:
I can't think of a simple and elegant solution, but I've got something that uses a single regex:
functionquotemeta(s) {
return s.replace(/\W/g, '\\$&');
}
let keywords = ['samwise', 'sam'];
let subsumed_by = {};
keywords.sort();
for (let i = keywords.length; i--; ) {
let k = keywords[i];
for (let j = i - 1; j >= 0 && k.startsWith(keywords[j]); j--) {
(subsumed_by[k] = subsumed_by[k] || []).push(keywords[j]);
}
}
keywords.sort(function (a, b) b.length - a.length);
let re = newRegExp('(?=(' + keywords.map(quotemeta).join('|') + '))[\\s\\S]', 'g');
let string = 'samwise samgee';
let result = [];
let m;
while (m = re.exec(string)) {
result.push(m[1]);
result.push.apply(result, subsumed_by[m[1]] || []);
}
console.log(result);
Solution 4:
How about:
var re = /((sam)(?:wise)?)/;
var m = 'samwise'.match(re); // gives ["samwise", "samwise", "sam"]
var m = 'sam'.match(re); // gives ["sam", "sam", "sam"]
You can use Unique values in an array to remove dupplicates.
Solution 5:
If you don't want to create special cases, and if order doesn't matter, why not first match only full names with:
\b(sam|samwise|merry|pippin)\b
and then, filter if some of these doesn't contain shorter one? for example with:
(sam|samwise|merry|pippin)(?=\w+\b)
It is not one elegant regex, but I suppose it is simpler than iterating through all matches.
Post a Comment for "How To Match Overlapping Keywords With Regex"