javascript regex test() not recognising danish characters -
i have made search-function, facebooks searchfield autocomplete, using javascript , regex. works fine, when search danish letters Æ, Ø, Å, .test() function won't recognize , nothing returned.
this how search part working:
var regxsearch = "\\b"+sterm; //sterm value of search field var regx = new regexp(regxsearch,"gi"); var namecheck = regx.test(users[i]["user"]["name"]);
imagine usernames asbjørn, østergård , jason:
- if search "asbjør", "asb" or "ørn" return true.
- if search "øster" or "østergård" return false.
- if search "stergård", "ård" or "rd" return true
- if search "j", "jas", "jaso" etc return true
- if search "ason" or "son" return false
i found fiddle able search æøå, works when search entire word. i'm not enough decode how works, maybe can use find possible fix problem: http://jsfiddle.net/8y3cm/17/
is fixable or need switch kind of plugin search-function?
your problem twofold.
first: \b
matches word break on position. word break matches when on 1 of sides have word character , on other side not word character. regex starts "\\b"+sterm
, fail jason
on \bason
, \bson
, match on \bj
, \bjas
, \bjaso
. if there 'nothing' on left of \b
counts 'not word character' (there no word character, see? :), , in fail cases there is there, while in match cases there not.
second: characters ø
, å
not considered "word characters" in javascript, simple test show you:
alert ("østergård".match(/\w+/g));
since not considered word character, behavior of \b
reverse of think does:
alert ("østergård".match(/\bøster/)); // null
it fails because \b
sees not-word character on right (the ø
) , should match word character on left (it doesn't, there nothing there).
a small test suite sample cases:
var sterm = [ [ "asbjørn", "asbjør", "asb", "ørn" ], [ "østergård", "øster", "østergård" ], [ "østergård", "stergård", "ård", "rd" ], [ "jason", "j", "jas", "jaso" ], [ "jason", "ason", "son" ] ]; var r = ''; (s=0; s<sterm.length; s++) { (s2=1; s2<sterm[s].length; s2++) { var regxsearch = "\\b"+sterm[s][s2]; //sterm value of search field var regx = new regexp(regxsearch,"gi"); var namecheck = regx.test(sterm[s][0]); r += "["+sterm[s][s2]+"] on ["+sterm[s][0]+"] "+namecheck+'\r'; } } alert (r);
shows same order of true
, false
reported. if remove \b
in regxsearch
see all return true
.
why own 'temporary fix' fix it?
you replace non-word characters (nothing personal, according javascript!) valid word characters, , expected behavior of \b
back.
a better fix not rely on specific behavior of \b
(and, extension, \w
). if these user names may appear anywhere in text (so not @ beginning of string), can use this:
var regxsearch = "(^|[^\\wøå])"+sterm;
where regex
(^|[^\\wøå])
stands for
^
beginning of string|
or[^...]
not (^
) of characters\w
,ø
,å
Comments
Post a Comment