Discussion:
[Bitpim-devel] Duplicating email addresses on VX4400
Michael Casteel
2004-09-15 07:28:40 UTC
Permalink
If I download my phone to BitPim and then merge in my address book (with
a VCF file), BitPim is creating extra email address entries where there
is already an email address.

I'm inclined to try to correct this unless somebody already has.

The new entries are coming in with an address type (e.g. 'Business' or
'Home') from the VCF file and those don't get matched against the
default type downloaded from the phone.

Any hints where I should look for the code that merges the VCF and
BitPim data?
--
Mike Casteel
***@casteel.org Seattle, WA
Adit Panchal
2004-09-15 13:20:40 UTC
Permalink
Hi Mike,

This may be due to some code that I submitted earlier this summer. You
can find it in the mergefields method starting on line 2231. I checked
my code at the time, but there may be errors that I didn't catch. I
will try to take a closer look at it later tonight after school.

Adit
Post by Michael Casteel
If I download my phone to BitPim and then merge in my address book (with
a VCF file), BitPim is creating extra email address entries where there
is already an email address.
I'm inclined to try to correct this unless somebody already has.
The new entries are coming in with an address type (e.g. 'Business' or
'Home') from the VCF file and those don't get matched against the
default type downloaded from the phone.
Any hints where I should look for the code that merges the VCF and
BitPim data?
--
Mike Casteel
-------------------------------------------------------
This SF.Net email is sponsored by: thawte's Crypto Challenge Vl
Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam
Camcorder. More prizes in the weekly Lunch Hour Challenge.
Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m
_______________________________________________
Bitpim-devel mailing list
https://lists.sourceforge.net/lists/listinfo/bitpim-devel
Adit Panchal
2004-09-15 15:20:03 UTC
Permalink
I forgot to mention that the code was in phonebook.py.

Also, if you could send an example of an entry that does not match the
phone type, I could test it and see where the error occurs. I would need
both an existing entry in the BitPim phonebook and also the entry that is
imported from your phone.

Thanks,

Adit
Post by Adit Panchal
Hi Mike,
This may be due to some code that I submitted earlier this summer. You
can find it in the mergefields method starting on line 2231. I checked
my code at the time, but there may be errors that I didn't catch. I
will try to take a closer look at it later tonight after school.
Adit
Post by Michael Casteel
If I download my phone to BitPim and then merge in my address book (with
a VCF file), BitPim is creating extra email address entries where there
is already an email address.
I'm inclined to try to correct this unless somebody already has.
The new entries are coming in with an address type (e.g. 'Business' or
'Home') from the VCF file and those don't get matched against the
default type downloaded from the phone.
Any hints where I should look for the code that merges the VCF and
BitPim data?
--
Mike Casteel
-------------------------------------------------------
This SF.Net email is sponsored by: thawte's Crypto Challenge Vl
Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam
Camcorder. More prizes in the weekly Lunch Hour Challenge.
Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m
_______________________________________________
Bitpim-devel mailing list
https://lists.sourceforge.net/lists/listinfo/bitpim-devel
-------------------------------------------------------
This SF.Net email is sponsored by: thawte's Crypto Challenge Vl
Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam
Camcorder. More prizes in the weekly Lunch Hour Challenge.
Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m
_______________________________________________
Bitpim-devel mailing list
https://lists.sourceforge.net/lists/listinfo/bitpim-devel
Adit Panchal
2004-09-16 12:56:46 UTC
Permalink
I am also using OS X 10.3.5 with the latest CVS copy as of today and
cleaned out my bitpim data directory before proceeding. I used your
phonebook index.idx file and the entry showed up with a "Data" phone
number and a email address with type "none" (i.e. blank field). When I
imported the vCard, I did not get 2 emails as you mentioned earlier,
but the email type got updated to match the vCard type of "work"
("business" in bitpim).

I double checked the code to see if this was the right behavior and it
does look ok to me. The default behavior that I decided to use was that
if the field value already exists in bitpim, check to see if the type
is different. If the type is different, update it to the appropriate
one that is being imported. In your example that you provided, there
was no type for the email in the bitpim phonebook, but there was a type
of "work" in the vCard. Therefore the entry in the bitpim phonebook got
updated with the new type.

The following is a snippet of where the code catches the email address
and decides to update the type:

if (comparestrings(resfield, impfield) > threshold):
# an existing entry was matched so we stop
found=True

# since new item matches, we don't need to replace the
# original value, but we should update the type of item
# to reflect the imported value
# for example home --> business
if i.has_key('type'):
r['type'] = i['type']

# break out of original item loop
break

I also tried the reverse and set the bitpim entry to have a type of
"business" and the vCard to have no type. That resulted in no change to
the bitpim entry, since there is a type already present (it doesn't get
overwritten by a blank type).

Let me know what you think. If you have another example, we could try
testing that as well.

Thanks,

Adit
I just created a fresh bitpim phonebook in bitpim, the attached file
index.idx, with one entry, for a 'Tom Hall'. It pretty much matches my
Address Book entry (OS X 10.3.5).
Tom Hall's Address Book entry, exported as a vcard, is also attached as
THall.vcf. It contains a matching telephone number and email address.
If I import this vcf file, the import merge/confirm dialog contains the
one entry for Tom Hall, with one occurrence of the telephone number but
TWO occurrences of the email address.
Michael Casteel
2004-09-16 16:32:34 UTC
Permalink
Post by Adit Panchal
I am also using OS X 10.3.5 with the latest CVS copy as of today and
cleaned out my bitpim data directory before proceeding. I used your
phonebook index.idx file and the entry showed up with a "Data" phone
number and a email address with type "none" (i.e. blank field). When I
imported the vCard, I did not get 2 emails as you mentioned earlier,
but the email type got updated to match the vCard type of "work"
("business" in bitpim).
I will try again with the latest CVS. As of a few days ago, it was
resulting in two email entries, one blank and one 'business'.

If it is still happening, I will track it down and get back to you when
I have found what causes it on my machine.

What you describe is just how I was thinking it should work: just update
the email type to what is being imported--unless the import is 'blank',
good catch.
--
Mike Casteel
***@casteel.org Seattle, WA
Michael Casteel
2004-09-17 00:42:46 UTC
Permalink
I see why I get duplication of emails on import--the MergeEntries method
has no case for 'emails' and sends them to common.list_union. This is as
of the CVS update I just did 10 minutes ago.

I added a case to MergeEntries (under 'for key in intersect'),

elif key=="emails":
result[key]=mergefields(o['emails'], i['emails'], 'email')

(didn't know what cleaner to use) and then I saw the results you
reported, i.e. the imported 'type': 'business' email overrode the
untyped email address in the BitPim data file.

Looks like the CVS is missing this case in MergeEntries in order to
utilize your mergefields logic for email addresses.

I noticed that the imported phone number was of type 'office' while the
BitPim file had type 'data'. After the merge, it was left at 'data'
instead of being updated by the imported value. According to the logic
you explained, shouldn't this have been updated from the import to
'business'?
--
Mike Casteel
***@casteel.org Seattle, WA
Adit Panchal
2004-09-17 03:33:50 UTC
Permalink
Mistake on my part - the reason it does work in my code and didn't in
yours is because it is something I had added in when testing it. I
never sent the patch to enable it to Roger, since I didn't think it was
ready for prime-time. It does seem to work though so I have attached a
patch which should enable the fix. If you want to add the appropriate
lines to your copy of the code you can help me to test it and then I
can submit it to be included.

Thanks for reminding me to fix the missing code.

Adit
Post by Michael Casteel
I see why I get duplication of emails on import--the MergeEntries method
has no case for 'emails' and sends them to common.list_union. This is as
of the CVS update I just did 10 minutes ago.
I added a case to MergeEntries (under 'for key in intersect'),
result[key]=mergefields(o['emails'], i['emails'], 'email')
(didn't know what cleaner to use) and then I saw the results you
reported, i.e. the imported 'type': 'business' email overrode the
untyped email address in the BitPim data file.
Roger Binns
2004-09-17 05:14:52 UTC
Permalink
was ready for prime-time. It does seem to work though so I have
attached a patch which should enable the fix. If you want to add the
I have committed your code with the following changes:

- I didn't remove the URLs from when comparing two records to
see if they are similar

- The @ is not removed from the cleaned email address, since the
@ sign is very significant in email addresses

Roger
Michael Casteel
2004-09-17 00:50:17 UTC
Permalink
Post by Michael Casteel
I noticed that the imported phone number was of type 'office' while
the BitPim file had type 'data'. After the merge, it was left at
'data' instead of being updated by the imported value. According to
the logic you explained, shouldn't this have been updated from the
import to 'business'?
Ah, number merging is in 'mergenumberlists' rather than 'mergefields'.
Maybe it should be made consistent with mergefields? Or vice versa?
--
Mike Casteel
***@casteel.org Seattle, WA
Adit Panchal
2004-09-17 04:46:47 UTC
Permalink
Post by Michael Casteel
Post by Michael Casteel
I noticed that the imported phone number was of type 'office' while
the BitPim file had type 'data'. After the merge, it was left at
'data' instead of being updated by the imported value. According to
the logic you explained, shouldn't this have been updated from the
import to 'business'?
Ah, number merging is in 'mergenumberlists' rather than 'mergefields'.
Maybe it should be made consistent with mergefields? Or vice versa?
My logic is only currently applicable to the url field (and the email
field - included in the patch). The number field merging was written by
Roger prior to my submission and he may have had a different idea in
mind. At the time when I submitted the code, I had made the decision to
update the type if it does not exist, but don't blank an existing type.
You are right though, there should be some consistency between all the
field merging. I think the way I have it makes sense to me, but if it
needs to be changed to be consistent the number matching, I can make
the appropriate changes.

Any thoughts?

Adit
Michael Casteel
2004-09-17 04:58:01 UTC
Permalink
Post by Adit Panchal
Any thoughts?
The one thing I am certain of is that however it's done it won't be
right for everybody, or for every circumstance.

However, the merge operations for urls, emails and phone numbers are so
analogous that the apparent inconsistency of 'type' handling feels
wrong, dissonant somehow. I always much prefer software that feels more
harmonious.

I would DEFINITELY like to see them made consistent. And, from my
perspective (I'm maintaining my 'real' phone book in the computer's
Address Book, which I wish I could 'sync' with my phone but will settle
for 'importing'), would prefer the imported 'type' to take precedence.
--
Mike Casteel
***@casteel.org Seattle, WA
Roger Binns
2004-09-17 05:26:38 UTC
Permalink
Post by Michael Casteel
However, the merge operations for urls, emails and phone numbers are
so analogous that the apparent inconsistency of 'type' handling feels
wrong, dissonant somehow.
Actually they are not even remotely close (the phone numbers to everything
else).

Note that all this stuff is going to be done in two stages. The first
stage and what you are currently seeing is merging data 'cold' (ie
you have never spoken to the source before).

Later on a transaction log/deltas will be generated with data sources
that have been seen before and looking at how they have changed.
Those deltas are then applied, assuming they haven't already been
applied to the data.

For the first stage, there is no right answer. There is no real way
of knowing how accurate the incoming and existing data are. The
code basically has to guess. For example one guess is that the name
is more likely to be accurate in the existing data. Other guesses are
made with the different types of incoming data. In some cases the
incoming data replaces the existing data, in some cases it is added
together etc.

Additionally every single data source is lossy. For example the phones
truncate names and won't store unicode names. Outlook is crappy and will
only store 3 email addresses and one or two URLs. Palm Desktop is even
more restricted. So the main thing we can guarantee is that the data
is mangled from whoever gives it to us. BitPim's data representation is
designed to not have arbitrary limits - for example it can store any number
of numbers, emails and URLs.

Doing all this stuff is *really* hard. Well doing it well is hard :-)
It isn't something I really want to spend much coding time on.

To move to the second stage is going to require a complete change in
how BitPim stores the underlying data. (It needs to know what the
existing and imported data from sources looked like at various points
in the past in order to generate deltas). I think I have figured out
how to do this, and how to implement it. It will require moving to
SQLite as the underlying data storage as well as some neat tricks I
have picked up.

Roger
Michael Casteel
2004-09-17 05:00:24 UTC
Permalink
[...I] would prefer the imported 'type' to take
precedence.
Which means that I'd prefer that number merging be changed to be
consistent with the url/email merging you describe.
--
Mike Casteel
***@casteel.org Seattle, WA
Roger Binns
2004-09-17 05:29:11 UTC
Permalink
Post by Michael Casteel
Which means that I'd prefer that number merging be changed to be
consistent with the url/email merging you describe.
Can you please give me a simple test case to explain this? Do it
in this form:

Existing:

Number 1: 424987237498, home
Number 2: 0943280943, fax

Imported:

Number 1: 983789733,home

Current result:

Number 1: 983789733,home
Number 2: 424987237498, home
Number 3: 0943280943, fax

What I think it should do:

Number 1: 983789733,home
Number 2: 0943280943, fax

Do something similar for any other fields.

Roger
Michael Casteel
2004-09-17 06:33:37 UTC
Permalink
Can you please give me a simple test case to explain this? Do it in
My concern here is only with data items which 'match', be it URL, email
address or phone number--i.e. once 'cleaned' they 'appear' to be the
same. How should the corresponding 'type' attributes be handled?

Here is how it is now working, from my test case.

First, for phone number...

Existing:

#1: 2062560900, Home

Import:

#1: (206) 256-0900, Business

Result:

#1: 2062560900, Home

On the other hand, the *email* imports as follows:

Existing

#1: ***@schmo.org, Home

Import:

#1: ***@schmo.org, Business

Result in 0.7.18 and earlier:

#1: ***@schmo.org, Home
#2: ***@schmo.org, Business

Result with Adit's mergefields logic activated:

#1: ***@schmo.org, Business

The result in 0.7.18 and earlier just looks wrong to me--it's
'duplicating' an email address. The result with Adit's mergefields logic
appears inconsistent with the handling of phone numbers.

In other words, either both numbers and emails should retain the
(non-empty) Existing type, or both should adopt the imported type.
--
Mike Casteel
***@casteel.org Seattle, WA
Roger Binns
2004-09-18 03:21:58 UTC
Permalink
Post by Michael Casteel
First, for phone number...
#1: 2062560900, Home
#1: (206) 256-0900, Business
#1: 2062560900, Home
Ok, I see what your issue is. I think the reason why I kept it
that way was because BitPim is able to store richer category
information than most data sources, so the code was erring on
the side of assuming the incoming data is wrong. However I
do agree with you and will change it to what you want.

Roger

Loading...