> I'm trying to estimate a conversion factor for a particular Ada 95 > compiler on a particular target. The conversion is from SLOC (loosely > translated: number of semi-colons) to bytes of executable code. > Does anyone have any guidance on how I could do this, given that I have > little to no past history to base it on (Ada 95, that is)? Is there any > research published on the WWW that speaks to this? Yes, the there is lots of stuff on the web that speaks to this, but why would you want to do it? There is no correlation between SLOC and bytes of code. For example, I have an Ada program consisting of a single generic instantiation (your SLOC definition would compute it at 4 semicolons long), and it generates more than 20 million bytes of code. I have another Ada program which your SLOC definition would compute at approximately 200,000 semicolons long, which generates less than 32,000 bytes of code. As soon as you permit a language to have templates, macros, generics, or other artifacts that permit one or two semicolons to bring in an UNLIMITED number of bytes of object code, in other words, if you permit any software resuse, you lose the correlation completely. There are two main reasons people on the web claim to measure lines of code. The first is to approximate the volume of work involved in developing software. The second is to approximate the volume of work involved in maintaining software. At software development time, after factoring out the main correlation, which is how long the organization took to develop their previous projects, and the second largest correlation, which is the coupling of the design, there is a correlation with ELOC, but not SLOC defined by semicolons. ELOC is executable lines of code, and is not related to development cost. Development cost is correlated to SLOC because of systems of management which base themselves on SLOC counts; however, a higher correlation to cost can be obtained by including weightier factors such as coupling through global variables, aliasing, etc. At software maintenance time, after factoring out the main cost correlation, which is how long the organization took to implement a change to their previous projects, and the second largest correlation, the coupling in of the code, and the third largest correlation, the average distance between the open and close of each structure, there is no correlation with ELOC or with SLOC, because one change (whether bug or enhancement) could potentially result in any number of lines of code from zero to a million times the current project size. However, at software maintenance time, after factoring out the two largest cost correlations, there is a correlation between the amount of time it takes to analyze the impact of a change on the rest of the system (note that this correlation is not fully statistically independent of the coupling, when there is a large coupling, so it sometimes appears that SLOC has more impact on anlysis time than it actually does). To be realistic, it is true that much software is paid for by the semicolon, but to be equally realistic, some of the aberrations we see in software today are caused by this way of paying for it. When we find ourselves without tools to measure previous performance, or to track the global variables causing coupling, we measure what we can measure. However, even when standardizing on measures that are not as correlated as we wish at development time, such as ELOC, we should attempt to use the best information we can. If possible, measure the use of global variables as a more direct measure of maintenance and development cost. If possible, measure the past performance per change. If possible, do no limit your volume measurement to counting semicolons. However, recognize that by counting only semicolons, you are in the company of the majority of software developers.