c++ - How to find whether byte read is japanese or english? -
i have array contains japanese , ascii characters. trying find whether characters read english character or japanese characters.
in order solve followed as
- read first byte , if multicharcterswidth not equal one, move pointer next byte display whole 2 byte , display japanese character has been read.
- if multicharcterswidth equal one, display byte. , show message english has been read.
above algo work fine fails in case of halfwidth form of japanese eg.シ,ァ etc. 1 byte. how can find out whether characters japanese or english?
**note:**what tried read web first byte tell whether japanese or not have covered in step 1 of algo. won't work half width.
edit: problem solving include control characters 0x80 @ start , end of characters identify string of characters. wrote following identify end of control character.
cntlchar.....(my characters , can japnese).....cntlchar
if ((buf[*p+1] & 0x80) && (mbmbcs_charwidth(&buf[*p]) == 1)) // end of control characters reached else // *p++
it worked fine when english didn't work japanese half width.
how can handle this?
your data must using windows codepage 932. guess, examining codepoints shows describing.
the codepage shows characters in range 00
7f
"english" (a better description "7-bit ascii"), characters in ranges 81
9f
, e0
ff
first byte of multibyte code, , between a1
, df
half-width kana characters.
Comments
Post a Comment