[Bitpim-devel] Another little project

Discussion:

Roger Binns

2003-12-18 09:09:20 UTC

I have another little project for anyone who wants something
well defined and simple to learn Python and help bitpim.

I need code that can parse the following file:

http://www.linux-usb.org/usb.ids

I need to be able to lookup vendor, product and interface information.
Note also that about 2/3 of the way down it has definitions for
device classes which would be nice to parse as well. The audio and
HID stuff doesn't matter.

The idea will be to hook this code into the com/usb port browser so
that meaningful names are displayed for the various devices found.
I will probably ship a copy of that file with bitpim.

Roger

Steven Palm

2003-12-29 21:37:07 UTC

Permalink

Post by Roger Binns
I have another little project for anyone who wants something
well defined and simple to learn Python and help bitpim.
http://www.linux-usb.org/usb.ids

Sorry, Roger, I forget to follow up on this one... I'm going to start
on this as an adjunct Python learning tool. Even if someone else is
already doing it/done it, I'll enjoy the experience. :-) Looks to be
fairly simple, but useful as well, just like you stated.

-. ----. -.-- - -.--
Steve Palm - ***@n9yty.com
-. ----. -.-- - -.--

Roger Binns

2003-12-31 12:38:02 UTC

Permalink

Post by Steven Palm

Post by Roger Binns
http://www.linux-usb.org/usb.ids

Excellent. I haven't heard from anyone else working on it.

I did send the maintainer some updates for the VX4400/6000, the SCP-4900
and friendlier names for the USB to serial cables. Never got a response.

What I am envisaging is that a copy of that file will be in the BitPim
resources subdirectory.

We will also have a second file of our own with more detail and
friendlier names.

In the port browser, devices are looked up in our file first, and if
not found fallback on the bigger file. The class information
will also be useful.

Roger

Steven Palm

2004-01-05 18:27:18 UTC

Permalink

Post by Roger Binns
http://www.linux-usb.org/usb.ids
I need to be able to lookup vendor, product and interface information.
Note also that about 2/3 of the way down it has definitions for
device classes which would be nice to parse as well.

I have a parser done that will handle the file and create objects like
so...

A master vendor list object that contains all of the vendor objects.
Each vendor object contains a list of associated device objects, and
information about itself.
Each device object contains a list of associated interface objects,
and information about itself.
Each interface object simply contains information about itself.

I have a similar set of relationships for USBClass/Subclass/Protocol
information.

I have accessors set up to work in the following fashion....

import usb_ids as usbids

myUSBids = usbids.usb_ids("resources/usb.ids")

# PRINT OUT THE WHOLE TREE AS A TEST CASE
print_vendor_info(myUSBids)
print_class_info(myUSBids)

# Test lookup for various bits of USB Vendor/Device/Interface
information
vlist = myUSBids.getVendorList()
print vlist.getVendorInfo("05ac")
print vlist.getVendorInfo("05ac", "0206")
print vlist.getVendorInfo("05ac", "0206", "1")

# Test lookup for various bits of USB Class/Subclass/Protocol
information
clist = myUSBids.getUSBClassList()
print clist.getClassInfo("08")
print clist.getClassInfo("08", "04")
print clist.getClassInfo("08", "04", "00")

OUTPUT: (except the whole tree, I've not included that here)

Apple Computer, Inc.
('Apple Computer, Inc.', 'Apple Extended Keyboard [Mitsumi]')
('Apple Computer, Inc.', 'Apple Extended Keyboard [Mitsumi]', 'Unknown
Interface')

Mass Storage
('Mass Storage', 'Floppy (UFI)')
('Mass Storage', 'Floppy (UFI)', 'Control/Bulk/Interrupt')

Is that about right?

One small bit of (perhaps faulty) logic that may cause a problem... I
am making the assumption that any time you hit a comment or blank line
you are no longer processing a given vendor or class entry, so it will
stop looking for devices/interfaces/subclasses/protocols. This was
necessary because (although none are defined in the file at present) an
interface entry probably looks exactly like a protocol entry, and I
didn't want to start adding things to the wrong parent object.

Am I correct in assuming (there's that word!) that following:

vendor ID's are ALWAYS 4-char hex
device ID's are ALWAYS 4-char hex
iface ID's are ALWAYS 2-char hex

Class/Subclass/protocol ID's are ALWAYS 2-char hex

If so, then this should work. I am not doing anything to handle various
line endings, though, so that may be a problem. If it is, I'll rework
things. I'm working with the file as downloaded so it most likely has
UNIX type endings.

-. ----. -.-- - -.--
Steve Palm - ***@n9yty.com
-. ----. -.-- - -.--

Roger Binns

2004-01-05 19:43:38 UTC

Permalink

Post by Steven Palm
I have a parser done that will handle the file and create objects like
so...

Cool!

Post by Steven Palm
print vlist.getVendorInfo("05ac")
print vlist.getVendorInfo("05ac", "0206")
print vlist.getVendorInfo("05ac", "0206", "1")

Can you please make those work on integers instead?

You can turn a hex string into an integer with int("05ac", 16)

Post by Steven Palm
('Apple Computer, Inc.', 'Apple Extended Keyboard [Mitsumi]')
('Apple Computer, Inc.', 'Apple Extended Keyboard [Mitsumi]', 'Unknown
Interface')

You should return None for fields (eg "Unknown Interface") that can't
be found.

Post by Steven Palm
Is that about right?

Yes :-)

Post by Steven Palm
One small bit of (perhaps faulty) logic that may cause a problem... I
am making the assumption that any time you hit a comment or blank line
you are no longer processing a given vendor or class entry, so it will
stop looking for devices/interfaces/subclasses/protocols. This was
necessary because (although none are defined in the file at present) an
interface entry probably looks exactly like a protocol entry, and I
didn't want to start adding things to the wrong parent object.

Since we own the file, you either need to enhance your code to cope,
or generate an error, or have something that munges in the file before
it is committed.

Post by Steven Palm
vendor ID's are ALWAYS 4-char hex
device ID's are ALWAYS 4-char hex
iface ID's are ALWAYS 2-char hex

Always use integers for them and then it won't matter :-)

Post by Steven Palm
I am not doing anything to handle various line endings,

In the open, make sure "t" is specified - ie open("filename", "rt")
CVS ensures that each platform has text files in its own line endings.
Python 2.3 does allow "ru" for "universal line endings" but we are stuck
with Python 2.2 on Linux so that can't be used (yet).

The additional changed needed is that we have to read in a second file
that contains USB stuff we know about (eg the phones). The choice
is to create two objects to represent each file, or add a function
that allows reading in the second file (overwriting any entries
that were in the first).

On a small stylistic point, you don't need parentheses for if statements
in Python. It is the C programmer in you :-) Of course I keep forgetting
them in C code now, but that is a syntax error!

Roger

Steven Palm

2004-01-05 22:11:09 UTC

Permalink

Post by Roger Binns

Post by Steven Palm
print vlist.getVendorInfo("05ac")
print vlist.getVendorInfo("05ac", "0206")
print vlist.getVendorInfo("05ac", "0206", "1")

Can you please make those work on integers instead?
You can turn a hex string into an integer with int("05ac", 16)

Fair enough... I store the ID code as the original hex string, but
things are organized by integer now, and all inquiries require this
format. So if in code you want to search/inquire by hex, you'll just
have to put the above noted code fragment in.

Post by Roger Binns

Post by Steven Palm
('Apple Computer, Inc.', 'Apple Extended Keyboard [Mitsumi]')
('Apple Computer, Inc.', 'Apple Extended Keyboard [Mitsumi]', 'Unknown
Interface')

You should return None for fields (eg "Unknown Interface") that can't
be found.

Simple enough.

Post by Roger Binns

Post by Steven Palm
Is that about right?

Yes :-)

Hey, I got one right! :-) LoL

Post by Roger Binns

Since we own the file, you either need to enhance your code to cope,
or generate an error, or have something that munges in the file before
it is committed.

I suppose an alternate approach would be to always consider a line for
it's matching as as a new vendor or new class line, and just interpret
each subline according to the last major entry type you saw... yeah,
that's it... ;-) Then we don't care about blank lines or comments.
This is now implemented and sees to work properly.

Post by Roger Binns
Always use integers for them and then it won't matter :-)

I was referring to the input file, actually, for pattern matching.
Yes, for storage/internal use it doesn't matter.

Post by Roger Binns

Post by Steven Palm
I am not doing anything to handle various line endings,

In the open, make sure "t" is specified - ie open("filename", "rt")

Okay, I will do this and also just expand on how a line is handled and
be more rigorous about end-of-line trimming, removing all \n and \r
characters at the end of the line before doing a pattern match on it.

Post by Roger Binns
The additional changed needed is that we have to read in a second file
that contains USB stuff we know about (eg the phones). The choice
is to create two objects to represent each file, or add a function
that allows reading in the second file (overwriting any entries
that were in the first).

I thought of that, and thought you would have a "internet" database
object and "bitpim" database object. If you fail to find it in the
internet db you would consult the bitpim object. However, it is trivial
to add a method and have it merge in (with the assumption each
subsequent file has more accurate data) as many additional files as you
would like, so that's done. The test fragment at the bottom of the
module shows a test for this.

Post by Roger Binns
On a small stylistic point, you don't need parentheses for if
statements
in Python. It is the C programmer in you :-) Of course I keep forgetting
them in C code now, but that is a syntax error!

Yes, but I like the (), as it helps me focus. LOL

Any other suggestions for the module, or is it pretty much OK now? :-)

- Steve

-. ----. -.-- - -.--
Steve Palm - ***@n9yty.com
-. ----. -.-- - -.--

Roger Binns

2004-01-06 00:25:17 UTC

Permalink

Post by Steven Palm
I store the ID code as the original hex string,

That is a really bad idea! What is the difference between the
following:

005ac
5ac
5Ac
05aC

It will use less memory and be quicker to access if the internals
use pure integers!

Post by Steven Palm
Hey, I got one right! :-) LoL

Wait for a year, and then look at the code you wrote. I always
get embarrassed by my old code. But hey, it even happens to Linus :-)

http://lkml.org/lkml/2003/12/22/137

Post by Steven Palm
I suppose an alternate approach would be to always consider a line for
it's matching as as a new vendor or new class line, and just interpret
each subline according to the last major entry type you saw... yeah,
that's it... ;-) Then we don't care about blank lines or comments.
This is now implemented and sees to work properly.

The usual way of implementing this sort of thing is a state machine
which is what it seems like you have done. The error cases and
some of the transitions get really tricky :-)

Post by Steven Palm
Okay, I will do this and also just expand on how a line is handled and
be more rigorous about end-of-line trimming, removing all \n and \r
characters at the end of the line before doing a pattern match on it.

That will work.

Post by Steven Palm
I thought of that, and thought you would have a "internet" database
object and "bitpim" database object. If you fail to find it in the
internet db you would consult the bitpim object.

The search actually needs to be the other way. For example the Internet
database has things like "Prolific 2303 USB serial bridge" when we
want to display "FutureDial USB to serial cable for LGVX1/VX10/VX4400 (Prolific PL2303)"

Post by Steven Palm
Yes, but I like the (), as it helps me focus. LOL

I always prefer less code and punctuation, except when there is ambiguity.

Post by Steven Palm
Any other suggestions for the module, or is it pretty much OK now? :-)

Other than the keys you use should be integers throughout, epydoc reported this:

=======================================================
C:\projects\bitpim\usb_ids.py
In usb_ids.VendorList docstring (line 190):
-------------------------------------------------------
L192: Error: Improper paragraph indentation.

pychecker is happy. I'll get around to hooking it into the browser at some
point. A one line change also results in modems being listed on Windows
in comscan. Unfortunately more code will be needed since it will currently
consider the modem interface valid for the 4400/6000 which it certainly isn't.

Roger

Steven Palm

2004-01-06 04:35:08 UTC

Permalink

Post by Roger Binns

Post by Steven Palm
I store the ID code as the original hex string,

That is a really bad idea! What is the difference between the

For informational purposes only, it is not used for any
calculations/etc... Sorry if that wasn't clear. For actual indexing of
the values in the internal lists used to represent the objects, it is
using pure integers. I suppose there really isn't much value to having
this form of the ID stored, so it can go away and save a bit of memory.
I'll make it so.

Post by Roger Binns

Post by Steven Palm
Hey, I got one right! :-) LoL

Wait for a year, and then look at the code you wrote. I always
get embarrassed by my old code. But hey, it even happens to Linus :-)

Oh, I know all about this, and being my first Python code, I have NO
DOUBT I'll wonder what I was thinking...

Post by Roger Binns
=======================================================
C:\projects\bitpim\usb_ids.py
-------------------------------------------------------
L192: Error: Improper paragraph indentation.
pychecker is happy.

I'll have to read up on those checkers, I don't know anything about
either one. ;-)

If you find anything else that seems wrong or needs changing, let me
know or fix it yourself depending on your mood at the time. :-)

-. ----. -.-- - -.--
Steve Palm - ***@n9yty.com
-. ----. -.-- - -.--

Roger Binns

2004-01-06 06:30:07 UTC

Permalink

Post by Steven Palm
Oh, I know all about this, and being my first Python code, I have NO
DOUBT I'll wonder what I was thinking...

The very first Python code I wrote was a script I converted from TCL.
TCL doesn't really have objects, so it was this massive mess of
pseudo object oriented procedural code using two different languages
"normal" way of doing things, coupled with not really knowing what
is "normal" in Python. You will be pleased to know the project was
not BitPim :-)

You might find the O'Reilly Python Cookbook quite useful. It is
also available online at

http://aspn.activestate.com/ASPN/Cookbook/Python/

The file c.bat runs the all checkers and documentation generators.
It is quite a feat given the primitiveness of NT command scripting.
Unfortunately pychecker generates huge numbers of false positives.
wxPython 2.5 should remove a large number of them.

Roger