Elasticsearch Pattern_capture filter emits a token that is not matched with pattern also -
i have case have extract domain part emails found in text. used uax_url_email tokenizer create emails single. , have pattern_capture filter emit "@(.+)" pattern string. uax_url_email return words not email , pattern capture filter not filter that. suggestions?
"custom_analyzer":{ "tokenizer": "uax_url_email", "filter": [ "email_domain_filter" ] } "filter": { "email_domain_filter":{ "type": "pattern_capture", "preserve_original": false, "patterns": [ "@(.+)" ] } }
input string : "my email id xyz@gmail.com"
output tokens: my, email, id, is, gmail.com
but need gmail.com
"if none of patterns match, or if preserveoriginal true, original token preserved."
try adding pattern matches other tokens not contain capture group (e.g. ".*")
Comments
Post a Comment