Proper handling of Unicode is too often forgotten until a user reports a bug. When we developed Sync 4, the latest version of our desktop application, we wanted to make certain that it would work from the start with our user base around the world, and proper handling of Unicode was critical to that commitment.
Sync 4 is primary written in python 2.7. Unfortunately, python 2.7 support for Unicode characters is not as comprehensive as we needed and we were forced to handle a lot of cases on our own. We started using non-ASCII character strings in our tests to cover the new code we'd written, but developers often forgot to make the strings in tests include non-ASCII characters because it was not easy to construct such strings. Thats when RotUnicode was born.
RotUnicode is a python codec for converting between a string of ASCII and Unicode chars maintaining readability. It lets developers continue to write ASCII strings and convert them to Unicode strings with non-ASCII characters with ease.
To use, you first register the rotunicode codec -
>>> import codecs
>>> from box.util.rotunicode import RotUnicode
You can then use it as follows -
>>> 'Hello World!'.encode('rotunicode')
>>> 'Ĥȅľľő Ŵőŕľď!'.decode('rotunicode')
Here's a sample bug, in python 2.7, that RotUnicode makes easy to expose.
>>> import os
>>> name = 'foo'.encode('rotunicode')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2:
ordinal not in range(128)
The Meaning Behind the Name
RotUnicode stands for rotate-to-unicode. Or rotten-unicode for those who have nightmares about Unicode =). It was inspired by Rot13.
RotUnicode is available on github. We welcome feedback and pull requests and hope that you find it useful!
Checkout our some of the other projects we've open sourced at opensource.box.com.