Albedo Systems




COMMCONV.EXE V1.0

Conversion and simplification of Common NCSA/CERN format website logs

Copyright Albedo Systems Ltd 1998
http://www.albedo.co.uk/
24 March 1998

CONTENTS SUMMARY
COMMCONV.EXE is a stand-alone DOS program, written in C++, and running under Windows NT/95. It allows the conversion of conventional web site logs created in the Common NCSA/CERN format into page-only logs in a simpler format, retaining all useful information, but rejecting all non-HTML elements and failed pages (for example, 404 Not Founds). It also allows you to exclude sub-domains of your site - which may actually be separate websites in their own right, and identifies the point at which visitors first hit the site.

This program was written to support our commercial website log analysis program, LogSite, which we (naturally) commend strongly - it offers many unique features. See our website for details. However, you may well find COMMCONV.EXE useful in some other way - in which case, bon appetit.

Note: a Cold Fusion version of this program (CFX_) is available at http://www.albedo.co.uk/goodies/logsite/cflsite.cfm

Further note: if you are using a Common NCSA log and have a choice, move to the Combined NCSA/CERN format (for which we have also written a conversion program - COMBCONV.EXE). The combined format also returns useful information on the User Agents (e.g. browsers) hitting your site, and can tell you which pages your visitors are coming from.

CONDITIONS
COMMCONV.EXE is freeware. As such, it may be freely distributed in the form of this zip file, provided this readme file is retained. All technical queries, bug reports etc. with regard to its use should be addressed to admin@albedo.co.uk.

We'd, of course, appreciate it if you credited us/linked to us, but you know you don't have to. We'll be doing more stuff soon, so please do drop by our website at http://www.albedo.co.uk/

For more information on Cold Fusion itself and other custom tags, check out Allaire's site at http://www.allaire.com/.

WHAT IT DOES
The input file for this tag must be in the Common NCSA/CERN format. A typical line of such a file would look like this:

255.255.255.255 www.mysite.co.uk - [24/Mar/1998:18:44:37 +0000] "GET /stuff/index.cfm HTTP/1.0" 200 12167

Not all of this information is frightfully relevant to a site user/owner, though a webmaster may find it useful. The 200, for example, simply means that this was a successful request. CFX_COMBCONV will convert this information into a simpler format:

"24-Mar-98", "18:44 ", "255.255.255.255", "", "d:\mysite\htdocs\stuff\index.cfm", "", "http://www.mysite.co.uk/"

...which breaks down into:

Date, Time, IP Address of visitor, Agent (not available, but included for compatibility), Page Accessed, Page Description (not available, but included for compatibility), Referring page (not strictly available, but see below).

Additionally, non-page lines (for example gifs) are ignored - only file extensions .htm* (includes .htm, .html .html-ssi etc) and .cfm are allowed through. All failed page requests are ignored, too. Optional parameters can be set, so that all files in a particular sub-domain (i.e. a directory) are ignored too.

Although, as we have noted, Common logs do not contain page referral information, COMMCONV.EXE uses this field in the output to show which fields it believes are being accessed from within the site, and which from outside. It does this by assuming that any pages accessed by an user with 15 minutes of each other are part of the same visit. Our logging program, LogSite, needs this information.

The next version, when we get around to writing it, will allow the user to select permissible extensions, and add page description fields.

INSTALLATION
The zip file that contains this file should contain:
  • commdocd.cfm: this file.
  • commconv.exe: the conversion program.
  • Albedo6.gif, back.gif, rule_pnk.gif, dot_clr.gif: Formatting graphics for this readme file.
Just extract the .exe file and put it where you want to run it.

USAGE
Commconv is runnable from the DOS prompt. It also requires a parameter file to run successfully. You may put this whereever you want, but the program defaults to a file called Combconv.ini placed in the same directory as the program. If, on the other hand, you wish to use a different parameter file, the usage is commconv [filename], where [filename] is the full pathname of the parameter file.

Here's an example of a parameter file - all the attributes must be on separate lines. (Bear in mind that not all of these parameters are mandatory).

LOG_IN="c:\webs\mysite\logs\mysite.log"
LOG_OUT="c:\webs\mysite\logs\new.log"
DOMAIN_NAME="http://www.mysite.co.uk/"
SUB_DOMAIN1="sites/"
SUB_DOMAIN2="homepages/"
PATH="c:\webs\mysite\htdocs\"

Most of these are easy to understand, (PATH is there to preserve compatibility with LogSite). They are:

LOG_IN (mandatory) path specification for the log file that you wish to convert.

LOG_IN (mandatory) path specification for the output log file.

DOMAIN_NAME (optional, defaults to NULL) If a page is determined by the program to be an internal referral (i.e. someone has clicked through to a page from one of your other pages) then this is placed in the referral field to tag the line as such. It allows our log analysis program, LogSite, to do a somewhat more thorough job than would otherwise be possible (it analyses visits by checking for outside referrals).

SUB_DOMAIN1 and SUB_DOMAIN2 (optional, default to NULL) search strings that allow you to exclude parts of your site from the log. Use with caution - for example, "sites/" would not merely exclude everything in http://www.mysite.co.uk/sites/ but also: http://www.mysite.co.uk/other_stuff/sites/. Just make sure they're unique.

PATH (optional, defaults to NULL). The function of this parameter is simply to provide output compatible with our LogSite program, which itself was designed to work with Cold Fusion user logs. Basically, it helps to convert the server page template name to a local path, hence "stuff/index.cfm" could become "d:\mysite\htdocs\stuff\index.cfm".

OTHER ALBEDO SOFTWARE
If you liked this, check out our other software:

LogSite, which this tag was designed to work with, is a comprehensive site logging program. It is commercial, but inexpensive, with some quite unique features. Take a look at its output, or download a demo from http://www.albedo.co.uk/goodies/logsite/doslsite.cfm

You may also like our CFX_FONT Cold Fusion tag, downloadable from http://www.albedo.co.uk/goodies/cfgfx.cfm. It allows you to create masses of anti-aliased text graphics and has oodles of options.

Oh, and of course, don't forget to pick up our other log conversion programs. COMBCONV.EXE converts the NCSA/CERN Combined format, and there are Cold Fusion versions (CFX_) of both programs at http://www.albedo.co.uk/goodies/logsite/cflsite.cfm

Fin Fahey
Fiona Daly

LEGAL DISCLAIMER
Neither Albedo Systems Ltd. nor anyone else who has been involved in the creation, production or delivery of this product shall be liable for any direct, indirect, consequential or incidental damages (including damages for loss of business profits, business interruption, loss of business information, and the like) arising out of the use or inability to use this product even if Albedo Systems Ltd. has been advised of the possibility of such damages.