Discussion:
[Bitpim-devel] Mac issue with vcard import and UTF-16 versus UTF-8
Peter Dufault
2004-07-06 00:32:25 UTC
Permalink
I created a fake contact list using Apple's address book in order to
generate some Mac screen shots for Roger (maybe I'll get them to you
tomorrow, Roger), and I added one of my favorite museums, the Musée
d'Orsay, together with the appropriate accent mark. The resultant
Vcard file was UTF-16 and Bitpim didn't handle it at all, I got many
complaints similar to:

Fixing up bad line
PyxoUzpuWDxwXENxXD90X0JsWDtsW0FpWD5iUjhgUDpdUTtWSzddUTxcTzhbUTxZUDpXTzxW
TzxV
Fixing up bad line
TjtKRC9RSjdJQjBNRjZKRTJKRDNKRDRHQTFLSTZDQDAzLyYzLyUzLSQzLiQzLiU0LyU2Mic2
Mik1

which is obvious confusion with the Mac photo object and there were no
imports at all.

I puzzled over it a bit, and then when I did a "file" command on the
.vcf file realized it was UTF-16.

I used Mac's "textedit" to convert from UTF-16 to UTF-8 and then it
imported OK.

This is outside of my realm of expertise, but I'm sure some non-US
users will bump into this issue.

Peter

Peter Dufault
HD Associates, Inc.
Roger Binns
2004-07-06 01:19:52 UTC
Permalink
Post by Peter Dufault
Fixing up bad line
PyxoUzpuWDxwXENxXD90X0JsWDtsW0FpWD5iUjhgUDpdUTtWSzddUTxcTzhbUTxZUDpXTzxW
TzxV
That message is printed if the line itself doesn't comply with
vcard standards (ie a text string followed by a colon).
Post by Peter Dufault
I puzzled over it a bit, and then when I did a "file" command on the
.vcf file realized it was UTF-16.
I believe the vcf files are required to be in ASCII, using the various
encoding schemes (base64, quoted printable) to represent non-ascii
characters.

Anyway can you send me the vcf so I can make BitPim handle it?
Post by Peter Dufault
This is outside of my realm of expertise, but I'm sure some non-US
users will bump into this issue.
There are a whole host of issues. Python handles the character sets
fine, but a set has to be used for the gui and other interfacing
(the phones, serial connection, USB, filesystem). Currently the
character set of the host is used.

When we make the change to wxPython 2.5.2, I am also considering
moving to the Unicode version of that at the same time (ie take
all the pain at once). It will require a little more work in
the packaging for Win9x/ME but all the other platforms should
be fine.

We may also finally have to address the issue of what character
sets the phones actually support (which mainly seems to be ASCII
plus some accented characters for Spanish).

BitPim also needs localisation (all visible text has to go
through a routine that loads the local language version of
the string).

I won't be doing that myself, but will help out anyone who
wants to do it. However to my knowledge there are no users
of BitPim outside the US anyway (CDMA phones aren't used
quite as widely as GSM :-)

Roger
Roger Binns
2004-07-06 22:41:03 UTC
Permalink
Post by Roger Binns
Anyway can you send me the vcf so I can make BitPim handle it?
Peter sent me the vcard.

The file was a "plain text unicode" file encoded with two bytes
per character. It did also begin with a UNICODE byte order marker.
Personally I believe this is a violation of the vCard spec (along
with Apple using CATEGORY instead of CATEGORIES). More fodder for
the debate Tom and I were having :-)

Anyway I have added a common function for opening text files.
It scans the begining of the file for a Unicode byte order
marker, and if that is present uses other bits of the Python
library to get the data out, otherwise it falls back on
the default open as a text file function.

It has worked fine in my testing, but you can never have too
much testing :-)

Roger
Peter Dufault
2004-07-06 23:58:35 UTC
Permalink
Post by Roger Binns
It has worked fine in my testing, but you can never have too
much testing :-)
I don't have time to look at this tonight, but preliminary results are
bad on the Mac.

Importing the UTF-16 Vcard export results in a seg-fault. Not the
nothing import we had before from UTF-16, but a seg-fault, so that's an
improvement from my point of view.

Importing the associated UTF-8 file created by "TextEdit save-as utf-8"
works fine.

Bitpim's dying gasp is "*** malloc[697]: error for object 0x64873e0:
Incorrect checksum for freed object - object was probably modified
after being freed; break at szone_error
Segmentation fault.
AAARRRGGGG"

(I made up that last line in line with Python's mission statement. I
don't know Python but I've been reading up a bit.)

The last time I saw this in bitpim I provided the default OS X trace
back (that is, the trace back saved by the OS due to the crash and not
with a special GDB run) and it was attributed to "Typical python mac
widget issue problems, all should be well when we update to the new
pythonw release". This time I'll run it in gdb tomorrow, and break at
szone_error, and see if I can't tickle out more info.

Peter Dufault
HD Associates, Inc.
Roger Binns
2004-07-07 02:00:13 UTC
Permalink
Post by Peter Dufault
Importing the UTF-16 Vcard export results in a seg-fault. Not the
nothing import we had before from UTF-16, but a seg-fault, so that's an
improvement from my point of view.
I had all sorts of problems after a while. It turns out that the
non-unicode version of wxPython causes all sorts of memory corruption
if you feed it any unicode strings (on Windows and Linux anyway).

I have committed a fix. You should delete ~/.bitpim-files/phonebook/index.idx
since loading that with Unicode strings (even though all the characters
are ascii) will cause the corruption.

I have also removed the import confirmation dialog on all platforms
(the one that was coming up with no contents for some Mac folks).
I'd be happy to hear suggestions and how to achieve the same goals
in a better user interface.
Post by Peter Dufault
The last time I saw this in bitpim I provided the default OS X trace
back (that is, the trace back saved by the OS due to the crash and not
with a special GDB run) and it was attributed to "Typical python mac
widget issue problems, all should be well when we update to the new
pythonw release". This time I'll run it in gdb tomorrow, and break at
szone_error, and see if I can't tickle out more info.
Normally if the error is in wx functions then that is the case.
This time it is random memory corruption which makes things
very interesting. Fortunately I eventually managed to figure it
out with gdb on Linux.

Roger

Loading...