Release: 2016-07-22 08:56:26
This article rewrites the core algorithm through the C extension interface to optimize the C/Python code statistics tool (CPLineCounter) implemented in the previous series of articles, and compares it with common statistical tools on the Internet. Actual measurements show that CPLineCounter is superior to other similar statistical tools in terms of statistical accuracy and performance. Taking tens of millions of lines of code as an example to evaluate performance, when CPLineCounter is run in Cpython and Pypy environments, it is 14.5 times and 29 times faster than the foreign statistical tool cloc1.64, respectively, and 1.8 times and 3.6 times faster than the domestic SourceCounter3.4.

Run test environment
This article is based on the Windows system platform and runs and tests the code examples involved. The platform information is as follows:

>>> import sys, platform
>>> print '%s %s, Python %s' %(platform.system(), platform.release(), platform.python_version())
Windows XP, Python 2.7.11
>>> sys.version
'2.7.11 (v2.7.11:6d1b6a68f775, Dec 5 2015, 20:32:19) [MSC v.1500 32 bit (Intel)]' 
Copy after login

Note that there are syntax differences between different versions of Python, so some code examples in the article need to be slightly modified in order to run in a lower version of Python environment.
1. Code implementation and optimization
To avoid fragmentation, this section will give the complete implementation code. Note that some variable or function definitions in this section are slightly different from the implementations in previous series of articles, please pay attention to the screening.
1.1 Code Implementation
First, define two lists to store statistical results:

import os, sys
rawCountInfo = [0, 0, 0, 0, 0]
detailCountInfo = [] 
Copy after login

Among them, rawCountInfo stores the rough total number of file lines. The list elements are the total number of file lines, code lines, comment lines and blank lines, as well as the number of files. detailCountInfo stores detailed statistical information, including the line count information and file name of a single file, and the sum of the line counts of all files.

The specific implementation code will be given below. To avoid pasting large sections of code, describe functions briefly.

 def CalcLinesCh(line, isBlockComment):
 lineType, lineLen = 0, len(line)
 if not lineLen:
  return lineType

 line = line + '\n' #添加一个字符防止iChar+1时越界
 iChar, isLineComment = 0, False
 while iChar < lineLen:
  if line[iChar] == ' ' or line[iChar] == '\t': #空白字符
   iChar += 1; continue
  elif line[iChar] == '/' and line[iChar+1] == '/': #行注释
   isLineComment = True
   lineType |= 2; iChar += 1 #跳过'/'
  elif line[iChar] == '/' and line[iChar+1] == '*': #块注释开始符
   isBlockComment[0] = True
   lineType |= 2; iChar += 1
  elif line[iChar] == '*' and line[iChar+1] == '/': #块注释结束符
   isBlockComment[0] = False
   lineType |= 2; iChar += 1
   if isLineComment or isBlockComment[0]:
    lineType |= 2
    lineType |= 1
  iChar += 1

 return lineType #Bitmap:0空行,1代码,2注释,3代码和注释

def CalcLinesPy(line, isBlockComment):
 #isBlockComment[single quotes, double quotes]
 lineType, lineLen = 0, len(line)
 if not lineLen:
  return lineType

 line = line + '\n\n' #添加两个字符防止iChar+2时越界
 iChar, isLineComment = 0, False
 while iChar < lineLen:
  if line[iChar] == ' ' or line[iChar] == '\t': #空白字符
   iChar += 1; continue
  elif line[iChar] == '#':   #行注释
   isLineComment = True
   lineType |= 2
  elif line[iChar:iChar+3] == "'''": #单引号块注释
   if isBlockComment[0] or isBlockComment[1]:
    isBlockComment[0] = False
    isBlockComment[0] = True
   lineType |= 2; iChar += 2
  elif line[iChar:iChar+3] == '"""': #双引号块注释
   if isBlockComment[0] or isBlockComment[1]:
    isBlockComment[1] = False
    isBlockComment[1] = True
   lineType |= 2; iChar += 2
   if isLineComment or isBlockComment[0] or isBlockComment[1]:
    lineType |= 2
    lineType |= 1
  iChar += 1

 return lineType #Bitmap:0空行,1代码,2注释,3代码和注释 

Copy after login

The CalcLinesCh() and CalcLinesPy() functions determine file line attributes based on C and Python syntax respectively, and count by code, comment or blank line respectively.

 from ctypes import c_uint, c_ubyte, CDLL
CFuncObj = None
def LoadCExtLib():
  global CFuncObj
  CFuncObj = CDLL('CalcLines.dll')
 except Exception: #不捕获系统退出(SystemExit)和键盘中断(KeyboardInterrupt)异常

def CalcLines(fileType, line, isBlockComment):
  bCmmtArr = (c_ubyte * len(isBlockComment))(*isBlockComment)
  CFuncObj.CalcLinesCh.restype = c_uint
  if fileType is 'ch': #is(同一性运算符)判断对象标识(id)是否相同,较==更快
   lineType = CFuncObj.CalcLinesCh(line, bCmmtArr)
   lineType = CFuncObj.CalcLinesPy(line, bCmmtArr)

  isBlockComment[0] = True if bCmmtArr[0] else False
  isBlockComment[1] = True if bCmmtArr[1] else False
  #isBlockComment = [True if i else False for i in bCmmtArr]
 except Exception, e:
  #print e
  if fileType is 'ch':
   lineType = CalcLinesCh(line, isBlockComment)
   lineType = CalcLinesPy(line, isBlockComment)

 return lineType 

Copy after login

In order to improve the running speed, the author rewrote the CalcLinesCh() and CalcLinesPy() functions in C language and compiled them into a dynamic link library. For details on the implementation and use of the C language versions of these two functions, see Section 1.2. The LoadCExtLib() and CalcLines() functions are designed to load the dynamic link library and execute the corresponding C version statistical function. If the loading fails, the slower Python version statistical function will be executed.

The above code runs in the CPython environment, and the C dynamic library is loaded and executed through the built-in ctypes module of Python2.5 and subsequent versions. As an external function library for Python, this module provides data types compatible with the C language and allows calling functions in DLLs or shared libraries. Therefore, ctypes is often used to wrap external dynamic libraries in pure Python code.

If the code runs in the Pypy environment, you need to use the cffi interface to call the C program:

from cffi import FFI
CFuncObj, ffiBuilder = None, FFI()
def LoadCExtLib():
  global CFuncObj
  unsigned int CalcLinesCh(char *line, unsigned char isBlockComment[2]);
  unsigned int CalcLinesPy(char *line, unsigned char isBlockComment[2]);
  CFuncObj = ffiBuilder.dlopen('CalcLines.dll')
 except Exception: #不捕获系统退出(SystemExit)和键盘中断(KeyboardInterrupt)异常

def CalcLines(fileType, line, isBlockComment):
  bCmmtArr = ffiBuilder.new('unsigned char[2]', isBlockComment)
  if fileType is 'ch': #is(同一性运算符)判断对象标识(id)是否相同,较==更快
   lineType = CFuncObj.CalcLinesCh(line, bCmmtArr)
   lineType = CFuncObj.CalcLinesPy(line, bCmmtArr)

  isBlockComment[0] = True if bCmmtArr[0] else False
  isBlockComment[1] = True if bCmmtArr[1] else False
  #isBlockComment = [True if i else False for i in bCmmtArr]
 except Exception, e:
  #print e
  if fileType is 'ch':
   lineType = CalcLinesCh(line, isBlockComment)
   lineType = CalcLinesPy(line, isBlockComment)

 return lineType 

Copy after login

cffi usage is similar to ctypes, but allows direct loading of C files to call functions inside (automatically compiled during interpretation). For the sake of unification, the method of loading dynamic libraries is still used here.

def SafeDiv(dividend, divisor):
 if divisor: return float(dividend)/divisor
 elif dividend:  return -1
 else:    return 0

gProcFileNum = 0
def CountFileLines(filePath, isRawReport=True, isShortName=False):
 fileExt = os.path.splitext(filePath)
 if fileExt[1] == '.c' or fileExt[1] == '.h':
  fileType = 'ch'
 elif fileExt[1] == '.py': #==(比较运算符)判断对象值(value)是否相同
  fileType = 'py'

 global gProcFileNum; gProcFileNum += 1
 sys.stderr.write('%d files processed...\r'%gProcFileNum)

 isBlockComment = [False]*2 #或定义为全局变量,以保存上次值
 lineCountInfo = [0]*5  #[代码总行数, 代码行数, 注释行数, 空白行数, 注释率]
 with open(filePath, 'r') as file:
  for line in file:
   lineType = CalcLines(fileType, line.strip(), isBlockComment)
   lineCountInfo[0] += 1
   if lineType == 0: lineCountInfo[3] += 1
   elif lineType == 1: lineCountInfo[1] += 1
   elif lineType == 2: lineCountInfo[2] += 1
   elif lineType == 3: lineCountInfo[1] += 1; lineCountInfo[2] += 1
    assert False, 'Unexpected lineType: %d(0~3)!' %lineType

 if isRawReport:
  global rawCountInfo
  rawCountInfo[:-1] = [x+y for x,y in zip(rawCountInfo[:-1], lineCountInfo[:-1])]
  rawCountInfo[-1] += 1
 elif isShortName:
  lineCountInfo[4] = SafeDiv(lineCountInfo[2], lineCountInfo[2]+lineCountInfo[1])
  detailCountInfo.append([os.path.basename(filePath), lineCountInfo])
  lineCountInfo[4] = SafeDiv(lineCountInfo[2], lineCountInfo[2]+lineCountInfo[1])
  detailCountInfo.append([filePath, lineCountInfo]) 

Copy after login

Pay attention to the "%d files processed..." progress prompt. Because it is impossible to know whether the output is redirected to the file through the command line (sys.stdout remains unchanged, sys.argv does not contain ">out"), the progress prompt writes a newline to the output file. Assuming that the number of code files is N, the output file will contain N lines of progress information. Currently, you can only use the feature that redirection only affects standard output by default to output progress information from standard error to the console; at the same time, add the -o option to explicitly distinguish between standard output and file writing, reducing the user's redirection risk. possibility.

In addition, when calling the CalcLines() function, the strip() method is used to remove blank characters at the beginning and end of the file line. Therefore, there is no need for line terminator judgment branches in CalcLinesCh() and CalcLinesPy().

SORT_ORDER = (lambda x:x[0], False)
def SetSortArg(sortArg=None):
 global SORT_ORDER
 if not sortArg:
 if any(s in sortArg for s in ('file', '0')): #条件宽松些
 #if sortArg in ('rfile', 'file', 'r0', '0'):
  keyFunc = lambda x:x[1][0]
 elif any(s in sortArg for s in ('code', '1')):
  keyFunc = lambda x:x[1][1]
 elif any(s in sortArg for s in ('cmmt', '2')):
  keyFunc = lambda x:x[1][2]
 elif any(s in sortArg for s in ('blan', '3')):
  keyFunc = lambda x:x[1][3]
 elif any(s in sortArg for s in ('ctpr', '4')):
  keyFunc = lambda x:x[1][4]
 elif any(s in sortArg for s in ('name', '5')):
  keyFunc = lambda x:x[0]
 else: #因argparse内已限制排序参数范围,此处也可用assert
  print >>sys.stderr, 'Unsupported sort order(%s)!' %sortArg

 isReverse = sortArg[0]=='r' #False:升序(ascending); True:降序(decending)
 SORT_ORDER = (keyFunc, isReverse)

def ReportCounterInfo(isRawReport=True, stream=sys.stdout):
  #代码注释率 = 注释行 / (注释行+有效代码行)
 print >>stream, 'FileLines CodeLines CommentLines BlankLines CommentPercent %s'\
   %(not isRawReport and 'FileName' or '')

 if isRawReport:
  print >>stream, '%-11d%-11d%-14d%-12d%-16.2f<Total:%d Code Files>' %(rawCountInfo[0],\
    rawCountInfo[1], rawCountInfo[2], rawCountInfo[3], \
    SafeDiv(rawCountInfo[2], rawCountInfo[2]+rawCountInfo[1]), rawCountInfo[4])

 total = [0, 0, 0, 0]
 detailCountInfo.sort(key=SORT_ORDER[0], reverse=SORT_ORDER[1])
 for item in detailCountInfo:
  print >>stream, '%-11d%-11d%-14d%-12d%-16.2f%s' %(item[1][0], item[1][1], item[1][2], \
    item[1][3], item[1][4], item[0])
  total[0] += item[1][0]; total[1] += item[1][1]
  total[2] += item[1][2]; total[3] += item[1][3]
 print >>stream, '-' * 90 #输出90个负号(minus)或连字号(hyphen)
 print >>stream, '%-11d%-11d%-14d%-12d%-16.2f<Total:%d Code Files>' \
   %(total[0], total[1], total[2], total[3], \
   SafeDiv(total[2], total[2]+total[1]), len(detailCountInfo)) 

Copy after login

ReportCounterInfo() outputs statistical reports. Note that before the detailed report is output, the output content will be sorted according to the specified sorting rules. Additionally, the terminology for blank lines has been changed from EmptyLines to BlankLines. The former means that the line does not contain any other characters except the line terminator, and the latter means that the line only contains whitespace characters (spaces, tabs, line terminators, etc.).

To support counting multiple directories and/or files at the same time, use ParseTargetList() to parse the directory-file mixed list, and store its elements in the directory and file lists respectively:

def ParseTargetList(targetList):
 fileList, dirList = [], []
 if targetList == []:
 for item in targetList:
  if os.path.isfile(item):
  elif os.path.isdir(item):
   print >>sys.stderr, "'%s' is neither a file nor a directory!" %item
 return [fileList, dirList] 
Copy after login

The LineCounter() function performs statistics based on directory and file lists:

def CountDir(dirList, isKeep=False, isRawReport=True, isShortName=False):
 for dir in dirList:
  if isKeep:
   for file in os.listdir(dir):
    CountFileLines(os.path.join(dir, file), isRawReport, isShortName)
   for root, dirs, files in os.walk(dir):
    for file in files:
     CountFileLines(os.path.join(root, file), isRawReport, isShortName)

def CountFile(fileList, isRawReport=True, isShortName=False):
 for file in fileList:
  CountFileLines(file, isRawReport, isShortName)

def LineCounter(isKeep=False, isRawReport=True, isShortName=False, targetList=[]):
 fileList, dirList = ParseTargetList(targetList)
 if fileList != []:
  CountFile(fileList, isRawReport, isShortName)
 if dirList != []:
  CountDir(dirList, isKeep, isRawReport, isShortName) 

Copy after login

Then, add command line parsing processing:

import argparse
def ParseCmdArgs(argv=sys.argv):
 parser = argparse.ArgumentParser(usage='%(prog)s [options] target',
      description='Count lines in code files.')
 parser.add_argument('target', nargs='*',
   help='space-separated list of directories AND/OR files')
 parser.add_argument('-k', '--keep', action='store_true',
   help='do not walk down subdirectories')
 parser.add_argument('-d', '--detail', action='store_true',
   help='report counting result in detail')
 parser.add_argument('-b', '--basename', action='store_true',
   help='do not show file\'s full path')
## sortWords = ['0', '1', '2', '3', '4', '5', 'file', 'code', 'cmmt', 'blan', 'ctpr', 'name']
## parser.add_argument('-s', '--sort',
##  choices=[x+y for x in ['','r'] for y in sortWords],
##  help='sort order: {0,1,2,3,4,5} or {file,code,cmmt,blan,ctpr,name},' \
##    "prefix 'r' means sorting in reverse order")
 parser.add_argument('-s', '--sort',
   help='sort order: {0,1,2,3,4,5} or {file,code,cmmt,blan,ctpr,name}, ' \
    "prefix 'r' means sorting in reverse order")
 parser.add_argument('-o', '--out',
   help='save counting result in OUT')
 parser.add_argument('-c', '--cache', action='store_true',
   help='use cache to count faster(unreliable when files are modified)')
 parser.add_argument('-v', '--version', action='version',
   version='%(prog)s 3.0 by xywang')

 args = parser.parse_args()
 return (args.keep, args.detail, args.basename, args.sort, args.out, args.cache, args.target) 

Copy after login

Note the -s option added to the ParseCmdArgs() function. This option specifies how the output is sorted, and the r prefix specifies ascending or descending order. For example, -s 0 or -s file indicates that the output is sorted in ascending order by the number of file lines, and -s r0 or -s rfile indicates that the output is sorted in descending order by the number of file lines.
The -c cache option is most useful when changing the output collation. To support this option, use the Json module to persist statistical reports:

CACHE_FILE = 'Counter.dump'

from json import dump, JSONDecoder
def CounterDump(data):
 if CACHE_DUMPER == None:
 dump(data, CACHE_DUMPER)

def ParseJson(jsonData):
 endPos = 0
 while True:
  jsonData = jsonData[endPos:].lstrip()
   pyObj, endPos = JSONDecoder().raw_decode(jsonData)
   yield pyObj
  except ValueError:

def CounterLoad():
 global CACHE_GEN
 if CACHE_GEN == None:
  CACHE_GEN = ParseJson(open(CACHE_FILE, 'r').read())

  return next(CACHE_GEN)
 except StopIteration, e:
  return []

def shouldUseCache(keep, detail, basename, cache, target):
 if not cache: #未指定启用缓存
  return False

  (_keep, _detail, _basename, _target) = CounterLoad()
 except (IOError, EOFError, ValueError): #缓存文件不存在或内容为空或不合法
  return False

 if keep == _keep and detail == _detail and basename == _basename \
  and sorted(target) == sorted(_target):
  return True
  return False 

Copy after login

注意,json持久化会涉及字符编码问题。例如,当源文件名包含gbk编码的中文字符时,文件名写入detailCountInfo前应通过unicode(os.path.basename(filePath), 'gbk')转换为Unicode,否则dump时会报错。幸好,只有测试用的源码文件才可能包含中文字符。因此,通常不用考虑编码问题。


def main():
 global gIsStdout, rawCountInfo, detailCountInfo
 (keep, detail, basename, sort, out, cache, target) = ParseCmdArgs()
 stream = sys.stdout if not out else open(out, 'w')
 SetSortArg(sort); LoadCExtLib()
 cacheUsed = shouldUseCache(keep, detail, basename, cache, target)
 if cacheUsed:
   (rawCountInfo, detailCountInfo) = CounterLoad()
  except (EOFError, ValueError), e: #不太可能出现
   print >>sys.stderr, 'Unexpected Cache Corruption(%s), Try Counting Directly.'%e
   LineCounter(keep, not detail, basename, target)
  LineCounter(keep, not detail, basename, target)

 ReportCounterInfo(not detail, stream)
 CounterDump((keep, detail, basename, target))
 CounterDump((rawCountInfo, detailCountInfo)) 

Copy after login


if __name__ == '__main__':
 from time import clock
 startTime = clock()
 endTime = clock()
 print >>sys.stderr, 'Time Elasped: %.2f sec.' %(endTime-startTime) 
Copy after login

1.2 代码优化

#include <stdio.h>
#include <string.h>
#define TRUE 1
#define FALSE 0

unsigned int CalcLinesCh(char *line, unsigned char isBlockComment[2]) {
 unsigned int lineType = 0;
 unsigned int lineLen = strlen(line);
  return lineType;

 char *expandLine = calloc(lineLen + 1/*\n*/, 1);
 if(NULL == expandLine)
  return lineType;
 memmove(expandLine, line, lineLen);
 expandLine[lineLen] = '\n'; //添加一个字符防止iChar+1时越界

 unsigned int iChar = 0;
 unsigned char isLineComment = FALSE;
 while(iChar < lineLen) {
  if(expandLine[iChar] == ' ' || expandLine[iChar] == '\t') { //空白字符
   iChar += 1; continue;
  else if(expandLine[iChar] == '/' && expandLine[iChar+1] == '/') { //行注释
   isLineComment = TRUE;
   lineType |= 2; iChar += 1; //跳过'/'
  else if(expandLine[iChar] == '/' && expandLine[iChar+1] == '*') { //块注释开始符
   isBlockComment[0] = TRUE;
   lineType |= 2; iChar += 1;
  else if(expandLine[iChar] == '*' && expandLine[iChar+1] == '/') { //块注释结束符
   isBlockComment[0] = FALSE;
   lineType |= 2; iChar += 1;
  else {
   if(isLineComment || isBlockComment[0])
    lineType |= 2;
    lineType |= 1;
  iChar += 1;

 return lineType; //Bitmap:0空行,1代码,2注释,3代码和注释

unsigned int CalcLinesPy(char *line, unsigned char isBlockComment[2]) {
 //isBlockComment[single quotes, double quotes]
 unsigned int lineType = 0;
 unsigned int lineLen = strlen(line);
  return lineType;

 char *expandLine = calloc(lineLen + 2/*\n\n*/, 1);
 if(NULL == expandLine)
  return lineType;
 memmove(expandLine, line, lineLen);
 expandLine[lineLen] = '\n'; expandLine[lineLen+1] = '\n';

 unsigned int iChar = 0;
 unsigned char isLineComment = FALSE;
 while(iChar < lineLen) {
  if(expandLine[iChar] == ' ' || expandLine[iChar] == '\t') { //空白字符
   iChar += 1; continue;
  else if(expandLine[iChar] == '#') { //行注释
   isLineComment = TRUE;
   lineType |= 2;
  else if(expandLine[iChar] == '\'' && expandLine[iChar+1] == '\''
    && expandLine[iChar+2] == '\'') { //单引号块注释
   if(isBlockComment[0] || isBlockComment[1])
    isBlockComment[0] = FALSE;
    isBlockComment[0] = TRUE;
   lineType |= 2; iChar += 2;
  else if(expandLine[iChar] == '"' && expandLine[iChar+1] == '"'
    && expandLine[iChar+2] == '"') { //双引号块注释
   if(isBlockComment[0] || isBlockComment[1])
    isBlockComment[1] = FALSE;
    isBlockComment[1] = TRUE;
   lineType |= 2; iChar += 2;
  else {
   if(isLineComment || isBlockComment[0] || isBlockComment[1])
    lineType |= 2;
    lineType |= 1;
  iChar += 1;

 return lineType; //Bitmap:0空行,1代码,2注释,3代码和注释

Copy after login


 #define TRUE 1
#define FALSE 0
unsigned int CalcLinesCh(char *line, unsigned char isBlockComment[2]) {
 unsigned int lineType = 0;

 unsigned int iChar = 0;
 unsigned char isLineComment = FALSE;
 while(line[iChar] != '\0') {
  if(line[iChar] == ' ' || line[iChar] == '\t') { //空白字符
   iChar += 1; continue;
  else if(line[iChar] == '/' && line[iChar+1] == '/') { //行注释
   isLineComment = TRUE;
   lineType |= 2; iChar += 1; //跳过'/'
  else if(line[iChar] == '/' && line[iChar+1] == '*') { //块注释开始符
   isBlockComment[0] = TRUE;
   lineType |= 2; iChar += 1;
  else if(line[iChar] == '*' && line[iChar+1] == '/') { //块注释结束符
   isBlockComment[0] = FALSE;
   lineType |= 2; iChar += 1;
  else {
   if(isLineComment || isBlockComment[0])
    lineType |= 2;
    lineType |= 1;
  iChar += 1;

 return lineType; //Bitmap:0空行,1代码,2注释,3代码和注释

unsigned int CalcLinesPy(char *line, unsigned char isBlockComment[2]) {
 //isBlockComment[single quotes, double quotes]
 unsigned int lineType = 0;

 unsigned int iChar = 0;
 unsigned char isLineComment = FALSE;
 while(line[iChar] != '\0') {
  if(line[iChar] == ' ' || line[iChar] == '\t') { //空白字符
   iChar += 1; continue;
  else if(line[iChar] == '#') { //行注释
   isLineComment = TRUE;
   lineType |= 2;
  else if(line[iChar] == '\'' && line[iChar+1] == '\''
    && line[iChar+2] == '\'') { //单引号块注释
   if(isBlockComment[0] || isBlockComment[1])
    isBlockComment[0] = FALSE;
    isBlockComment[0] = TRUE;
   lineType |= 2; iChar += 2;
  else if(line[iChar] == '"' && line[iChar+1] == '"'
    && line[iChar+2] == '"') { //双引号块注释
   if(isBlockComment[0] || isBlockComment[1])
    isBlockComment[1] = FALSE;
    isBlockComment[1] = TRUE;
   lineType |= 2; iChar += 2;
  else {
   if(isLineComment || isBlockComment[0] || isBlockComment[1])
    lineType |= 2;
    lineType |= 1;
  iChar += 1;

 return lineType; //Bitmap:0空行,1代码,2注释,3代码和注释

Copy after login


作者的Windows系统最初未安装Microsoft VC++工具,因此使用已安装的MinGW开发环境编译dll文件。将上述C代码保存为CalcLines.c,编译命令如下:
gcc -shared -o CalcLines.dll CalcLines.c

其间,作者还尝试其他C扩展工具,如PyInline。在http://pyinline.sourceforge.net/下载压缩包,解压后拷贝目录PyInline-0.03至Lib\site-packages下。在命令提示符窗口中进入该目录,执行python setup.py install安装PyInline
执行示例时提示BuildError: error: Unable to find vcvarsall.bat。查阅网络资料,作者下载Microsoft Visual C++ Compiler for Python 2.7并安装。然而,实践后发现PyInline非常难用,于是作罢。

由于对MinGW编译效果存疑,作者最终决定安装VS2008 Express Edition。之所以选择2008版本,是考虑到CPython2.7的Windows版本基于VS2008的运行时(runtime)库。安装后,在C:\Program Files\Microsoft Visual Studio 9.0\VC\bin目录可找到cl.exe(编译器)和link.exe(链接器)。按照网络教程设置环境变量后,即可在Visual Studio 2008 Command Prompt命令提示符中编译和链接程序。输入cl /help或cl -help可查看编译器选项说明。

_declspec(dllexport) unsigned int CalcLinesCh(char *line, unsigned char isBlockComment[2]) {...
_declspec(dllexport) unsigned int CalcLinesPy(char *line, unsigned char isBlockComment[2]) {...

cl /Ox /Ot /Wall /LD /FeCalcLines.dll CalcLines.c




Python 2.7.10 (b0a649e90b66, Apr 28 2016, 13:11:00)
[PyPy 5.1.1 with MSC v.1500 32 bit] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>> import cffi
>>>> cffi.__version__
Copy after login


 D:\pytest>CPLineCounter -d lctest -s code
FileLines CodeLines CommentLines BlankLines CommentPercent FileName
6   3   4    0   0.57   D:\pytest\lctest\hard.c
27   7   15   5   0.68   D:\pytest\lctest\file27_code7_cmmt15_blank5.py
33   19   15   4   0.44   D:\pytest\lctest\line.c
44   34   3    7   0.08   D:\pytest\lctest\test.c
44   34   3    7   0.08   D:\pytest\lctest\subdir\test.c
243  162  26   60   0.14   D:\pytest\lctest\subdir\CLineCounter.py
397  259  66   83   0.20   <Total:6 Code Files>
Time Elasped: 0.04 sec. 
Copy after login

二. 精度与性能评测





最后,测试统计性能。在作者的Windows XP主机(Pentium G630 2.7GHz主频2GB内存)上,统计5857个C源代码文件,总行数接近千万级。上述工具的性能表现如下表所示。表中仅显示总计项,实际上仍统计单个文件的行数信息。注意,测试时linecount要勾选"目录统计时包含同名文件",cloc要添加--skip-uniqueness和--by-file选项。    







 E:\PyTest>kernprof -l -v CPLineCounter.py source -d > out.txt
140872  93736  32106   16938  0.26   <Total:82 Code Files>
Wrote profile results to CPLineCounter.py.lprof
Timer unit: 2.79365e-07 s

Total time: 5.81981 s
File: CPLineCounter.py
Function: CountFileLines at line 143

Line #  Hits   Time Per Hit % Time Line Contents
 143           @profile
 144           def CountFileLines(filePath, isRawReport=True, isShortName=False):
... ... ... ... ... ... ... ...
 162  82  7083200 86380.5  34.0  with open(filePath, 'r') as file:
 163 140954  1851877  13.1  8.9   for line in file:
 164 140872  6437774  45.7  30.9    lineType = CalcLines(fileType, line.strip(), isBlockComment)
 165 140872  1761864  12.5  8.5    lineCountInfo[0] += 1
 166 140872  1662583  11.8  8.0    if lineType == 0: lineCountInfo[3] += 1
 167 123934  1499176  12.1  7.2    elif lineType == 1: lineCountInfo[1] += 1
 168  32106  406931  12.7  2.0    elif lineType == 2: lineCountInfo[2] += 1
 169  1908  27634  14.5  0.1    elif lineType == 3: lineCountInfo[1] += 1; lineCountInfo[2] += 1
... ... ... ... ... ... ... ... 

Copy after login

line_profiler可用pip install line_profiler安装。在待评估函数前添加装饰器@profile后,运行kernprof命令,将给出被装饰函数中每行代码所耗费的时间。-l选项指明逐行分析,-v选项则指明执行后屏显计时信息。Hits(执行次数)或Time(执行时间)值较大的代码行具有较大的优化空间。


Finally, if you only count the number of lines of code, you can use the following shell command on Linux or Mac:
find ./codeDir -name "*.c" -or -name "*.h" | xargs wc -l #Total number of lines except blank lines
find ./codeDir -name "*.c" -or -name "*.h" | xargs wc -l #The number of lines and sum of each file

The above is the entire content of this article. I hope it will be helpful to everyone’s study. I also hope that everyone will support Script Home.

