Counting lines of source code

Filed under:Methodology — posted by Consultant on July 14, 2008 @ 2:21 pm

If you were ever involved in the process of scoping out a source code audit project, you have probably run into the situation where you have to figure out how to count the code. There are several things involved:

  • What tool are you going to use to count it?
  • Should the tool be able to understand the language?
  • Or will you simply use a line counting tool such as wc -l?
  • What will you consider in your count? Code lines, blank lines, what about comments?

I’m going to go ahead and cover each of the points I mention above, based in my own experience. I therefore invite you to submit and share your comments based on your own xperiences as well.

I believe in providing [potential] customers with accurate information and I understand that some times that may not even be possible - but when it comes to counting lines of source code I rather use a tool that can parse the code than simply running a “wc -l” - However any functional bugs within the tool you use may end up impacting your estimate considerably and for that reason you need to test the tool first (or make sure it’s got some testing already.) The main benefit you would get from parsing the source code other than simply counting raw lines within a file is the ability of identifying source code comments (the way of specifying comments varies according to each language.)

Now, why would you be interested in identifying source code comments? Good question! And it is up to how you perform your scoping. The reasons I can think of are:

  • You have a certain metric for source code lines and a different one for comments (in average, comments should be easier/faster to read.)
  • You want to exclude comments from the estimate. You are ok with working an extra bit to cover for any comments.
  • You want to exclude comments from the estimate. You will simply blink, look away, close your eyes whenever you go through a commented line while performing the review.
  • You simply want to provide your client with a break down containing lines of source code and comments.

I have been in both sides of the court, considering comments within the estimate and excluding them. But either way, I have always looked at comments while reviewing code! You could argue that comments don’t get compiled/run, that there could be plenty of dead code laying around - but comments not only can be fun! they are a window into the programmer’s mind and what’s more valuable than that?! Yes, you can find dead code - but why is that code even there? From a version to the other dead becomes alive and boom! And flexibility is the key! If you do know of a directory just filled with dead code, why not look carefully or talk to your client and about excluding that piece from your estimation?

So it is up to whether you decide to include comments in your time estimate or not - but it shouldn’t be your choice to decide whether to review them or not.

So this all ends up being:

source_code_lines = lines_in_file - blank_lines - comment_lines | comment_lines = lines_in_file - source_code_lines - blank_lines | total = source_code_lines + comment_lines

I have recently found a very nice tool called CLOC (http://cloc.sourceforge.net/) - which deals with a wide set of programming languages. Test it out!

Later.

zero comments so far »

Please won't you leave a comment, below? It'll put some text here!

Copy link for RSS feed for comments on this post or for TrackBack URI

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

(required)

(required)




image: detail of installation by Bronwyn Lace