> I'm trying to estimate a conversion factor for a particular Ada 95
    > compiler on a particular target.  The conversion is from SLOC (loosely
    > translated:  number of semi-colons) to bytes of executable code.
    > Does anyone have any guidance on how I could do this, given that I have
    > little to no past history to base it on (Ada 95, that is)?  Is there any
    > research published on the WWW that speaks to this?

Yes, the there is lots of stuff on the web that speaks to this, but why
would you want to do it?  There is no correlation between SLOC and bytes
of code. For example, I have an Ada program consisting of a single
generic instantiation (your SLOC definition would compute it at 4
semicolons long), and it generates more than 20 million bytes of code.
I have another Ada program which your SLOC definition would compute at
approximately 200,000 semicolons long, which generates less than
32,000 bytes of code. As soon as you permit a language to have templates,
macros, generics, or other artifacts that permit one or two semicolons
to bring in an UNLIMITED number of bytes of object code, in other words,
if you permit any software resuse, you lose the correlation completely.

There are two main reasons people on the web claim to measure lines of code.
The first is to approximate the volume of work involved in developing
software. The second is to approximate the volume of work involved in
maintaining software.

At software development time, after factoring out the main correlation,
which is how long the organization took to develop their previous
projects, and the second largest correlation, which is the coupling of
the design, there is a correlation with ELOC, but not SLOC defined by
semicolons. ELOC is executable lines of code, and is not related to
development cost. Development cost is correlated to SLOC because
of systems of management which base themselves on SLOC counts; however,
a higher correlation to cost can be obtained by including weightier
factors such as coupling through global variables, aliasing, etc.

At software maintenance time, after factoring out the main cost correlation,
which is how long the organization took to implement a change to their
previous projects, and the second largest correlation, the coupling in
of the code, and the third largest correlation, the average distance between
the open and close of each structure, there is no correlation with ELOC or
with SLOC, because one change (whether bug or enhancement) could
potentially result in any number of lines of code from zero to
a million times the current project size. However, at software
maintenance time, after factoring out the two largest cost correlations,
there is a correlation between the amount of time it takes to
analyze the impact of a change on the rest of the system (note that
this correlation is not fully statistically independent of the coupling,
when there is a large coupling, so it sometimes appears that SLOC
has more impact on anlysis time than it actually does).

To be realistic, it is true that much software is paid for by the
semicolon, but to be equally realistic, some of the aberrations we
see in software today are caused by this way of paying for it. When
we find ourselves without tools to measure previous performance, or to
track the global variables causing coupling, we measure what we can
measure. However, even when standardizing on measures that are
not as correlated as we wish at development time, such as ELOC,
we should attempt to use the best information we can. If possible,
measure the use of global variables as a more direct measure of
maintenance and development cost. If possible, measure the past
performance per change. If possible, do no limit your volume
measurement to counting semicolons. However, recognize that by
counting only semicolons, you are in the company of the majority
of software developers.