CLOC - Count Lines of Code

I have mentioned the tool in my previous post but the tool deserves an entire dedicated post! If you were looking for a tool to count source code lines, here’s a nice one.

Take a look at CLOC (http://cloc.sourceforge.net/), an excerpt from its website:

“cloc counts blank lines, comment lines, and physical lines of source code in many programming languages. It is written entirely in Perl with no dependencies outside the standard distribution of Perl v5.6 and higher (code from some external modules is embedded within cloc) and so is quite portable. cloc is known to run on many flavors of Linux, AIX, Solaris, IRIX, z/OS, and Windows. (To run the Perl source version of cloc on Windows one needs ActiveState Perl 5.6.1 or higher, or Cygwin installed. Alternatively one can use the Windows binary of cloc generated with perl2exe to run on Windows computers that have neither Perl nor Cygwin.)

cloc contains code from David Wheeler’s SLOCCount, Damian Conway and Abigail’s Perl module Regexp::Common, and Sean M. Burke’s Perl module Win32::Autoglob, … ”

Hope you find it useful

 

Counting lines of source code

If you were ever involved in the process of scoping out a source code audit project, you have probably run into the situation where you have to figure out how to count the code. There are several things involved:

  • What tool are you going to use to count it?
  • Should the tool be able to understand the language?
  • Or will you simply use a line counting tool such as wc -l?
  • What will you consider in your count? Code lines, blank lines, what about comments?

I’m going to go ahead and cover each of the points I mention above, based in my own experience. I therefore invite you to submit and share your comments based on your own xperiences as well.

I believe in providing [potential] customers with accurate information and I understand that some times that may not even be possible - but when it comes to counting lines of source code I rather use a tool that can parse the code than simply running a “wc -l” - However any functional bugs within the tool you use may end up impacting your estimate considerably and for that reason you need to test the tool first (or make sure it’s got some testing already.) The main benefit you would get from parsing the source code other than simply counting raw lines within a file is the ability of identifying source code comments (the way of specifying comments varies according to each language.)

Now, why would you be interested in identifying source code comments? Good question! And it is up to how you perform your scoping. The reasons I can think of are:

  • You have a certain metric for source code lines and a different one for comments (in average, comments should be easier/faster to read.)
  • You want to exclude comments from the estimate. You are ok with working an extra bit to cover for any comments.
  • You want to exclude comments from the estimate. You will simply blink, look away, close your eyes whenever you go through a commented line while performing the review.
  • You simply want to provide your client with a break down containing lines of source code and comments.

I have been in both sides of the court, considering comments within the estimate and excluding them. But either way, I have always looked at comments while reviewing code! You could argue that comments don’t get compiled/run, that there could be plenty of dead code laying around - but comments not only can be fun! they are a window into the programmer’s mind and what’s more valuable than that?! Yes, you can find dead code - but why is that code even there? From a version to the other dead becomes alive and boom! And flexibility is the key! If you do know of a directory just filled with dead code, why not look carefully or talk to your client and about excluding that piece from your estimation?

So it is up to whether you decide to include comments in your time estimate or not - but it shouldn’t be your choice to decide whether to review them or not.

So this all ends up being:

source_code_lines = lines_in_file - blank_lines - comment_lines | comment_lines = lines_in_file - source_code_lines - blank_lines | total = source_code_lines + comment_lines

I have recently found a very nice tool called CLOC (http://cloc.sourceforge.net/) - which deals with a wide set of programming languages. Test it out!

Later.

 
  • © 2009 penetrationtests.com