www.delorie.com/pcb/component-dbs.html   search  
Component Databases

One of the most stubborn and difficult problems the gEDA/PCB workflow has is that of mapping symbols to footprints accurately. We call this the "light/heavy symbol problem" or the "transistor problem". This document describes my thoughts on solving this problem. First, some definitions:
Component
A specific instance of an physical part.
Symbol
A graphical representation of the functionality of a component. Note that there are global symbol files, which apply to all symbols of that type, and specific symbol instances in a schematic, which apply to a single component. Where it matters, it will be specified which is meant.
Footprint
A pattern of copper to which a component is soldered.
Element
A specific footprint, along with attributes, which indicates where a specific component will be soldered.
Light Symbol
A symbol which contains the minimum amount of information needed. Such symbols are usually very generic and can apply to many components.
Heavy Symbol
A symbol which contains as much information as is available. Such symbols usually reflect a specific part from a specific manufacturer.

So, the problem is... how heavy should a symbol be? If the symbol is too light, the user must do extra work to associate a symbol with a component accurately, and the risk of getting it wrong is higher (a common problem is getting the pinouts on transistors right - hence the "transistor problem" name). If a symbol is too heavy, making changes to the design becomes more complicated (for example, to change an op-amp, you need to change the symbol in the schematic - despite both symbols looking identical).

Currently, there's only two places where component information goes. Most of it goes in the symbol - pin numbers, vendor names, component values, etc. Some goes in the PCB element. The workflow takes some information from the symbol and copies it into the element, like pin names and component values. However, all the information must be in one of those two places.

My idea is to have a third repository of information, which contains the difference between a light and a heavy symbol. This extra database, which I'll call the component database or "partdb" (because "componentdb" just doesn't roll off the tongue), contains all the info needed to turn a light (generic) symbol into a heavy (specific) symbol. For example, if your schematic called for a 3.3uF 16v capacitor, the database would let you find all the manufacturers who make such a part, what the available packages are, and what PCB footprints they'd use. Based on some heuristics, a specific component would be chosen to be used, and the additional information added to the symbol and element.

There's a couple of issues that come up when you design such a database:

First I'll describe the mechanics of the database interactions, then I'll talk about how it applies to the gschem->pcb flow.

My idea for the database is that it is a plug-in for gschem and gattrib. Given a set of pre-existing attributes, the API finds all the other attributes which may be set, and what values those attributes may be set to. For example, if you asked for a 3.3uF 16v capacitor, the database may return various packages that capacitors with those values may be found in, such as 0805 or radial-16-7.5 (names may be fictitious), and the vendors who sell such capacitors. Based on pre-determined rules (like, 0805 is the preferred package, or Digikey is the preferred vendor), the component selection will be narrowed down as much as possible. The user may need to make further choices in order to reduce the set of possibilities to a specific component, but the gschem->pcb flow only needs enough choices made to narrow down the footprint and pin mapping, as those are all that affect the board layout.

Yes, I said pin mapping - this is something else I think belongs in this new design. See Pin Mapping for details.

One thing that needs to be done, enhancement-wise, is to be able to mark each attribute in a schematic symbol with some information about where it came from - an attribute set by the user is more important than an inferred attribute based on a database search. Also, it would be nice to be able to tag attributes which are set via back-annotatation from PCB. That way, when the user changes a user-specified attribute, the inferred attributes can be recalculated. Perhaps the attribute name could be suffixed with "?" for inferred attributes or "%" for back-annotated ones. That way, apps that don't know about the scheme just ignore the "special" attributes. Alternately, storing the flags in the T line lets apps that don't know about the scheme see the attributes.

For example, changing a capacitor from 16v to 25v may require the package be changed from 0805 to 1206. This can be done automatically if the package was originally inferred, but if the package was originally user-specified, this would lead to an impossible constraint and require user help.

Preferrably, the GUI would not allow the user to create such an impossible constraint in the first time - the choice of 25v would be inaccessible until the package were changed (or unset), or choosing 25v would require the user to select which attribute would be unset (package, or value, or don't change voltage, etc).

So what are the heuristics? Well, there are two type - implicit and explicit. Implicit heuristics are those built into the system, such as "if you search for the possible values of an attribute, and you get exactly one result, that's the value". Explicit ones are things like preferred packages, approved vendor lists, etc. Such rules could be stored in the schematics, a project file like gafrc, or in the database itself.

How is the database stored? I'm not going to specify that - it's irrelevent. What's important is how the app interacts with the database. My idea is that there's a well-documented ABI for a plug-in that provides the data. That way, the user can provide whatever back-end they prefer - perl script, CSV files, SQL database, web query - whatever. Even an aggregator that merges multiple plug-ins into a single one. The ABI works something like this:

I suppose optionally each value could have a flag sent with it, that says if the value is possible or not. That way the user can see what values *might* be possible, if other attributes were set right. The app could then query the plugin, omitting various attributes, to try to determine how the user could legitimately choose an otherwise impossible value.

There are a couple of ways the plugin data gets used. First, in gschem/gattrib, the the GUI has a way of querying the database for potential values of attributes - such as choosing variants, picking parts from official part lists, sticking to on-hand inventory, etc. Second, the sch->pcb flow tools query the database to fill in missing information that might be needed for the layout. At least, it needs to find the footprint and pin mapping. If it cannot determine a single footprint/mapping from the data given, the conversion fails and the user must further specify the part.

Also, this scheme allows various ways of storing the data. At one extreme, all the values are stored in the symbols - making them heavy within the schematic. At the other extreme, the refdes is the only things in the symbol, and an external table maps refdes to all the other information, to be used only by the netlister and sch->pcb tool. In the first case, the plugin is used by the user to heavify the symbol, in the second case the plugin is used by the sch->pcb tool to generate the layout and netlist.

Behind the Scenes

One aspect of the database layout I think is useful is the concept of mapping attributes into "classes". For example, there may be multiple graphical representations of a NAND gate. The database doesn't map symbols to components, it maps symbols to symbol classes, and maps symbol classes to components. Given three NAND symbols and four physical chips, you'd need twelve direct mappings (3 x 4), or seven classed mappings (3 + 4). Also, given symbol classes, it allows gschem to swap symbols to others within the same class without the overhead of deleting the symbol and re-creating it. Similarly, footprints can be swapped within a class - for example, the various RESC*{L,N,M} footprints.

In a more generic sense, the database should contain multiple tables. Each table specifies some attributes which are constant for all entries, plus a set of entries that specify the variants. So one table specifies all the 0603 resistors from Rohm (symbol, device, footprint class constant, value and rohm's part number vary), another maps rohm part numbers to digikey part numbers (vendor and manufacturer constant, both part numbers vary). This grouping is the core of the CSV table format used by gedasymbols.org. Given a set of attributes, you find all the tables that mention those attributes and combine them into a large synthetic table on which you do the query. You'd have to be smart about adding "in-between" attributes, like if you had a value and wanted a footprint, you'd have to include footprint-class in order to get from one to another.

  webmaster   donations   bookstore     delorie software   privacy  
  Copyright 2009   by DJ Delorie     Updated Dec 2009