OK. I'm an idiot.
So here's today's BIG Unicode lesson; understand this and, maybe, half your troubles will evaporate.
Unicode is NOT a "code".
No. Unicode is a kind of platonic ideal of which everything else is an "encoding".
ASCII is an encoding. UTF-8 is an encoding. That weird character set you got with Portuguese accented letters is an encoding.
Hence the verb "encode" means to turn a Unicode string into a byte string.
And "decode" means to turn a byte string (say one imported from another application) back into the pure Unicode.
I repeat. You DO NOT encode byte-strings into Unicode-strings. You decode them into Unicode. And then you re-encode them when you want to export them (as, say, XML or JSON).
read --> decode --> do stuff in your app --> encode --> write
Thanks ... that's all.
No comments:
Post a Comment