Discussion:
[Bitpim-devel] Vcards
Roger Binns
2004-06-12 17:50:49 UTC
Permalink
Is there anyone who wants to write Vcard code?

I've been delving into Evolution, and its native storage format
is actually vcards inside a db database. (Somewhat insane if
you ask me).

I think the preferred format for Apple Addressbook is also Vcard.

If you code and want to help out, let me know. If you don't
I would like to build up some test data, in particular what
the various programs export.

The vcard format itself is under specified. It is pretty obvious
to a human what it means, but to a program things can be
really difficult since stuff is vague.

In the examples directory there is already a vcards.vcf file where
I gathered 8 examples of various vcards. Most actually come from
the standards documents themselves. Feel free to try it:
http://cvs.sf.net/viewcvs.py/bitpim/bitpim/examples/vcards.vcf?view=markup

Evolution only imports 3 of the entries, and doesn't mention the
5 it ignored! I have yet to encounter any program that actually
conforms 100% to the standard for things like escaping semi-colons
and dealing with newlines and backslashes.

Roger
Tom Pollard
2004-06-14 01:03:31 UTC
Permalink
Post by Roger Binns
Is there anyone who wants to write Vcard code?
Well, I still want to. I made a fair amount of progress in
understanding the VCard spec and in looking for existing
implementations (in C++, Perl and Python). As far as producing Python
code that would be useful for
bitpim, I was just getting started (a couple of months ago) when work
pressure intervened to draw me away from it. What is it you're hoping
for, exactly, in terms of a timeframe and functionality? My goal was
to produce VFile, VCard and VCalendar classes that would be generally
useful. I'm writing the parser to the spec, rather than to a
particular set of sample files.

TomP
Roger Binns
2004-06-14 05:00:56 UTC
Permalink
Post by Tom Pollard
Well, I still want to. I made a fair amount of progress in
understanding the VCard spec and in looking for existing
implementations (in C++, Perl and Python).
There are effectively 3 parts

- The VFile format (how stuff is formatted and encoded)
- The specific schemas (VCard, VCalendar)
- Understanding the meaning of the schemas (eg how are
fax, default fax and business fax represented)

There is an abandoned pdi library that implements the first
one and bits of the second. The last part would have to
be BitPim specific code.
Post by Tom Pollard
What is it you're hoping
for, exactly, in terms of a timeframe
In the short term (preferably in time to make the next build :-)
I need parsing of vcards, especially those produced by Evolution
and preferably Apple if anyone sends me a complete sample.
Post by Tom Pollard
and functionality?
In the longer term, we will also need import of calendar and
export of Vcards.
Post by Tom Pollard
I'm writing the parser to the spec, rather than to a
particular set of sample files.
A noble effort, but I recommend you try and find a single
program that produces 100% compliant vcards! I couldn't.
They are all fairly ok for trivial information, but try
putting in long fields, filling every entry, having semi-colons
in the fields etc.

That is ultimately why I started collecting interesting vcards
to use as a test/coding suite :-)

I would greatly appreciate any code you want to contribute in
the short term. The current plans are to get the mechanism
behind several different import sources in place, and then
work on tweaking the actual import process (matching records
and figuring out changes) done. Then 0.7 will be considered
done.

Roger
Tom Pollard
2004-06-14 15:34:57 UTC
Permalink
Post by Roger Binns
There are effectively 3 parts
- The VFile format (how stuff is formatted and encoded)
- The specific schemas (VCard, VCalendar)
- Understanding the meaning of the schemas (eg how are
fax, default fax and business fax represented)
There is an abandoned pdi library that implements the first
one and bits of the second. The last part would have to
be BitPim specific code.
Yes, I found that early on and started to work with it, but it's
incomplete (as you noted) and it's fairly clumsy code (IMO). I don't
think it's a reasonable basis for further work.

The most nicely structured and flexible code I found was a set of Perl
modules, Text::VFile, Text::VCard, and their subordinate modules. I
had been working on building Python classes that followed the basic
structure of those modules. A direct port didn't look easy because of
the way they use the Perl AUTOLOAD functionality to streamline method
dispatch.
Post by Roger Binns
A noble effort, but I recommend you try and find a single
program that produces 100% compliant vcards! I couldn't.
They are all fairly ok for trivial information, but try
putting in long fields, filling every entry, having semi-colons
in the fields etc.
I haven't looked at any vcards closely enough to notice that yet. At
this point, I'm just using the vcard and vcalendar exports from my own
Mac as samples. Your complaints earlier were about the vagueness of
the spec (which I also don't see yet), rather than about violations of
the spec by people exporting vcards.
Post by Roger Binns
That is ultimately why I started collecting interesting vcards
to use as a test/coding suite :-)
I would greatly appreciate any code you want to contribute in
the short term. The current plans are to get the mechanism
behind several different import sources in place, and then
work on tweaking the actual import process (matching records
and figuring out changes) done. Then 0.7 will be considered
done.
Noted. I'll see what I can do.


Tom
Roger Binns
2004-06-14 18:20:42 UTC
Permalink
Post by Tom Pollard
I haven't looked at any vcards closely enough to notice that yet. At
this point, I'm just using the vcard and vcalendar exports from my own
Mac as samples. Your complaints earlier were about the vagueness of
the spec (which I also don't see yet), rather than about violations of
the spec by people exporting vcards.
Ok, which of these are invalid according to the spec (and note that this is
a simple field)?

TEL;WORK:908908908098098
TEL;WORK;VOICE:bus 903218490809
tel;type=work,voice,msg:+1 303 555-5555
work.tel;type=fax,voice,msg:+49 3581 123456
VOICE;WORK:+1-515-555-1234

And then when dealing with something more complex like an address,
the components are seperated by semi-colons. See what happens
if you put semicolons as part of a component value. Try having
multiple "Street" lines. Try making \ part of a value.

The output of EVERY program is different that I have tried.
That is why I started collecting samples, and you should
look at them :-)

Given the choice of pedantically following the standard or
accepting the output of all popular/common programs, I pick
the latter.

As for what BitPim will generate when exporting, that should
pedantically follow the standard.

Roger
Tom Pollard
2004-06-14 20:00:10 UTC
Permalink
Post by Roger Binns
Post by Tom Pollard
I haven't looked at any vcards closely enough to notice that yet. At
this point, I'm just using the vcard and vcalendar exports from my own
Mac as samples. Your complaints earlier were about the vagueness of
the spec (which I also don't see yet), rather than about violations of
the spec by people exporting vcards.
Ok, which of these are invalid according to the spec (and note that this is
a simple field)?
TEL;WORK:908908908098098
This is valid according to the 2.1 spec, assuming the phone number
itself is legal. The 3.0 spec would require the 'WORK' parameter to be
prefixed by "TYPE=".
Post by Roger Binns
TEL;WORK;VOICE:bus 903218490809
If 'bus ' is part of the phone number, then the same comments apply.
The 3.0 spec says a phone number must be "a single text value as
defined in [CCITT E.163] and [CCITT X.121]", but I'm unfamiliar with
those. At the level of the file parser, I'd treat phone numbers as
opaque strings and let the client code make sense of them (or not).
Post by Roger Binns
tel;type=work,voice,msg:+1 303 555-5555
That looks fine.
Post by Roger Binns
work.tel;type=fax,voice,msg:+49 3581 123456
That conforms to the spec, although it might be confusing to interpret
correctly if they really meant
'tel:type=work,fax,voice,msg:...'. Again, I would take it at face
value and report that this phone number belonged to group 'work' and
let the client code make sense of it (or not).
Post by Roger Binns
VOICE;WORK:+1-515-555-1234
That's obviously bogus. Does that actually appear in an exported vcard
somewhere?
Post by Roger Binns
And then when dealing with something more complex like an address,
the components are seperated by semi-colons. See what happens
if you put semicolons as part of a component value. Try having
multiple "Street" lines. Try making \ part of a value.
The output of EVERY program is different that I have tried.
Sure, the spec allows for a variety of styles, to some extent. As long
as they conform to the spec, that shouldn't bother you. It would be
more of an issue if each of those programs accepted only their own
idiosyncratic style of vcard on import.
Post by Roger Binns
That is why I started collecting samples, and you should
look at them :-)
I've got them, though I haven't looked at them in a while. They don't
include the 'VOICE:' example, I'm happy to see.


Tom
Roger Binns
2004-06-17 06:39:35 UTC
Permalink
(The numbers I supplied were random gunk typed in, mainly so that I
could tell what the programs did with the fields. Their actual
validity doesn't matter).
Post by Tom Pollard
Post by Roger Binns
VOICE;WORK:+1-515-555-1234
That's obviously bogus. Does that actually appear in an exported vcard
somewhere?
I made that one up. It would look just as valid to a human as the
others though.

All the others were out of the examples vcards file. That was no
less than 4 valid ways of expressing the same thing - a business
phone number. The problem with standards that allow such latitudes
is that programs turn out not to be interoperable because they only
parse some of the valid ways.

We also didn't throw encoding into the mix either.
Post by Tom Pollard
I've got them, though I haven't looked at them in a while. They don't
include the 'VOICE:' example, I'm happy to see.
Can you send me a fully featured Apple vcard I can add? There is
an Evolution one in there (the second entry).

I believe Evolution gets the Quoted-Printable continuation lines
wrong (they should be indented with at least one space IIRC).

None of the programs quote semi-colons in data values. Consequently
the N and ADR fields aren't correctly parseable if any of the components
include semi-colons.

I have written the code to get the vcards out of evolution. Just
need to parse them hopefully in time for the build this weekend.
I'd appreciate any code you have.

Roger
Tom Pollard
2004-06-17 09:03:48 UTC
Permalink
Post by Roger Binns
Post by Tom Pollard
Post by Roger Binns
VOICE;WORK:+1-515-555-1234
That's obviously bogus. Does that actually appear in an exported vcard
somewhere?
I made that one up. It would look just as valid to a human as the
others though.
Humans don't read and write vcards, of course - computers do.
Post by Roger Binns
All the others were out of the examples vcards file. That was no
less than 4 valid ways of expressing the same thing - a business
phone number. The problem with standards that allow such latitudes
is that programs turn out not to be interoperable because they only
parse some of the valid ways.
Is that a fact (that programs aren't interoperable), or a fear? The
point of a spec, of course, is to make it possible for programs to
interoperate, and if people write their import parsers according to the
spec, the variation in styles is not an issue.
Post by Roger Binns
Can you send me a fully featured Apple vcard I can add? There is
an Evolution one in there (the second entry).
I've attached a collection of vcards from my addressbook. Between
them, I think they probably they use most of the fields that the Apple
Address Book supports.
Roger Binns
2004-06-17 21:25:04 UTC
Permalink
Post by Tom Pollard
Humans don't read and write vcards, of course - computers do.
The whole point of the VCard design was that it would also look
useful to a human. And before the software was more widespread
it was predominantly read by humans.
Post by Tom Pollard
Is that a fact (that programs aren't interoperable), or a fear?
As a simple example, Evolution only imported 3 out of the 8 from
the examples file. (Silently I might add). Palm Desktop gave
an error without saying what the actual problem is. Specifically
it said: "An error occurred while importing the VCard file. The
file may not be a valid vCard file or it may contain too many
entries". It said that even after I removed the evolution entry
which is known to be invalid. Outlook only imported the first
entry. The Windows Address Book has the same behaviour. Neither
Mozilla nor Thunderbird appear to have the ability to import vcards.
Post by Tom Pollard
The
point of a spec, of course, is to make it possible for programs to
interoperate, and if people write their import parsers according to the
spec, the variation in styles is not an issue.
The problem with specs that allow multiple ways of doing things or
are vague is that it gets very difficult for programs to be
interoperable. In addition to the 4 different formats for specifying
a business phone number I already showed, there is also this which I believe is
also valid:

TEL;type=WORK;value=1234567890:

In my experience, standards that let do the same thing in multiple
ways cause problems once there is more than one implementation.
Having 5 different ways of saying business number is a really
really bad thing. Additionally there doesn't appear to be a
standard test suite which makes differing implementations far
more likely as we have already seen.

The standard rule is "be lenient in what you accept, conservative
with what you generate". http://www.trmk.org/~adam/blog/archive/000022.html
There are also many people who argue against that since it allows
non-conformant stuff to still work.
Post by Tom Pollard
Post by Roger Binns
Can you send me a fully featured Apple vcard I can add? There is
an Evolution one in there (the second entry).
I've attached a collection of vcards from my addressbook. Between
them, I think they probably they use most of the fields that the Apple
Address Book supports.
I just wanted to double check you had munged the information since
it looks really genuine! I intend to add it to the examples file.
Post by Tom Pollard
Here's an Address Book entry I just made up, which contains a few
gratuitous semicolons.
And Apple actually correctly quotes them!

However it looks like none of the vcards actually use the standard
ADR for addresses and instead they are stuck in extension fields
with a whole bunch of other gunk.
Post by Tom Pollard
Yes, continuation lines need to be indented with at least one space. I
don't see any cards in your vcards.vcf set that violate this.
The Evolution ones (second entry). In fact Evolution doesn't even
bother with ENCODING=QUOTED-PRINTABLE and just says QUOTED-PRINTABLE
(ie no ENCODING=).
Post by Tom Pollard
Ok, but I don't think I'll have anything usable that soon, unless you
just want to see the direction I'm headed in.
I would suggest doing it the other way round then. I am working off
the examples in examples/vcards.vcf and the parsing code is in
vcards.py. You can run it by itself.

Shout if you see any problems or have any suggestions (eg if I am digging
myself a whole, point it out before it gets too deep :-) I am only trying
to parse the vcards at the moment (and mainly only Evolution). Later
on the code will need to be improved for compatibility and to write
vcards.

Roger
Tom Pollard
2004-06-18 04:55:59 UTC
Permalink
Post by Roger Binns
The problem with specs that allow multiple ways of doing things or
are vague is that it gets very difficult for programs to be
interoperable. In addition to the 4 different formats for specifying
a business phone number I already showed, there is also this which I believe is
Up to a point, the variation in styles is harmless; most of the
equivalent examples you gave could trivially be reduced to a common
canonical format. If there's a problem with VCard, I think it might be
that some people have adopted it as a format without understanding the
semantics properly. For instance, the spec clearly states that the
group label (the word before the dot in the property name) is devoid of
information, other than to identify grouped properties; the label
itself does not need to be retained. That means it's illegal to use it
to convey that something is a business address, by using a group called
"work.", for instance.
Post by Roger Binns
The standard rule is "be lenient in what you accept, conservative
with what you generate".
http://www.trmk.org/~adam/blog/archive/000022.html
Personally, I also subscribe to that philosophy.
Post by Roger Binns
There are also many people who argue against that since it allows
non-conformant stuff to still work.
I've encountered that attitude. I take it to be a sign that someone's
a unconfident programmer, or that they're writing code that doesn't
actually have to serve people in the real world.
Post by Roger Binns
Post by Tom Pollard
Post by Roger Binns
Can you send me a fully featured Apple vcard I can add? There is
an Evolution one in there (the second entry).
I've attached a collection of vcards from my addressbook. Between
them, I think they probably they use most of the fields that the Apple
Address Book supports.
I just wanted to double check you had munged the information since
it looks really genuine! I intend to add it to the examples file.
Yes, I altered the phone numbers and addresses.
Post by Roger Binns
However it looks like none of the vcards actually use the standard
ADR for addresses and instead they are stuck in extension fields
with a whole bunch of other gunk.
They do use ADR; they're just always grouped with other properties, so
the ADR names always have group labels. You can ignore the group
labels.
Post by Roger Binns
Post by Tom Pollard
Yes, continuation lines need to be indented with at least one space.
I
don't see any cards in your vcards.vcf set that violate this.
The Evolution ones (second entry).
Ok, I see what you mean. (My bitpim source directory wasn't up to
date.) In this case they're using the quoted-printable "soft line
break" convention of ending a line with an '=' (RFC 2045). I could
imagine this is legal, too. I mean, you could argue, if the value
encoding is quoted-printable, then soft line breaks ought to be
allowed.
Post by Roger Binns
In fact Evolution doesn't even
bother with ENCODING=QUOTED-PRINTABLE and just says QUOTED-PRINTABLE
(ie no ENCODING=).
That's legal.
Post by Roger Binns
Post by Tom Pollard
Ok, but I don't think I'll have anything usable that soon, unless you
just want to see the direction I'm headed in.
I would suggest doing it the other way round then. I am working off
the examples in examples/vcards.vcf and the parsing code is in
vcards.py. You can run it by itself.
That sounds reasonable.
Post by Roger Binns
Shout if you see any problems or have any suggestions (eg if I am digging
myself a whole, point it out before it gets too deep :-) I am only trying
to parse the vcards at the moment (and mainly only Evolution). Later
on the code will need to be improved for compatibility and to write
vcards.
Ok. Looking forward to seeing it. I'll continue my own efforts at my
own pace, anyway. This is my "learning Python by doing a real project"
project. ;-)


Cheers,

Tom

Loading...