Albedo Systems




CFX_COMMCONV V1.0

Conversion and simplification of Common NCSA/CERN format website logs

Copyright Albedo Systems Ltd 1998
http://www.albedo.co.uk/
24 March 1998

CONTENTS SUMMARY
CFX_COMMCONV is an extension tag to Allaire's Cold Fusion active web page language, written in C++, and running under Windows NT/95. It allows the conversion of conventional web site logs created in the Common NCSA/CERN format into page-only logs in a simpler format, retaining all useful information, but rejecting all non-HTML elements and failed pages (for example, 404 Not Founds). It also allows you to exclude sub-domains of your site - which may actually be separate websites in their own right, and identifies the point at which visitors first hit the site.

This program was written to support our commercial website log analysis program, LogSite, which we (naturally) commend strongly - it offers many unique features. See our website for details. However, you may well find CFX_COMMCONV useful in some other way - in which case, bon appetit.

Note: a stand-alone DOS version of this program (.exe) is available at http://www.albedo.co.uk/goodies/logsite/doslsite.cfm

Further note: if you are using a Common NCSA log and have a choice, move to the Combined NCSA/CERN format (for which we have also written a conversion program - CFX_COMBCONV). The combined format also returns useful information on the User Agents (e.g. browsers) hitting your site, and can tell you which pages your visitors are coming from.

CONDITIONS
CFX_COMMCONV is freeware. As such, it may be freely distributed in the form of this zip file, provided this readme file is retained. All technical queries, bug reports etc. with regard to its use should be addressed to admin@albedo.co.uk.

We'd, of course, appreciate it if you credited us/linked to us, but you know you don't have to. We'll be doing more stuff soon, so please do drop by our website at http://www.albedo.co.uk/

For more information on Cold Fusion itself and other custom tags, check out Allaire's site at http://www.allaire.com/.

WHAT IT DOES
The input file for this tag must be in the Common NCSA/CERN format. A typical line of such a file would look like this:

255.255.255.255 www.mysite.co.uk - [24/Mar/1998:18:44:37 +0000] "GET /stuff/index.cfm HTTP/1.0" 200 12167

Not all of this information is frightfully relevant to a site user/owner, though a webmaster may find it useful. The 200, for example, simply means that this was a successful request. CFX_COMBCONV will convert this information into a simpler format:

"24-Mar-98", "18:44 ", "255.255.255.255", "", "d:\mysite\htdocs\stuff\index.cfm", "", "http://www.mysite.co.uk/"

...which breaks down into:

Date, Time, IP Address of visitor, Agent (not available, but included for compatibility), Page Accessed, Page Description (not available, but included for compatibility), Referring page (not strictly available, but see below).

Additionally, non-page lines (for example gifs) are ignored - only file extensions .htm* (includes .htm, .html .html-ssi etc) and .cfm are allowed through. All failed page requests are ignored, too. Optional parameters can be set, so that all files in a particular sub-domain (i.e. a directory) are ignored too.

Although, as we have noted, Common logs do not contain page referral information, CFX_COMMCONV uses this field in the output to show which fields it believes are being accessed from within the site, and which from outside. It does this by assuming that any pages accessed by an user with 15 minutes of each other are part of the same visit. Our logging program, LogSite, needs this information.

The next version, when we get around to writing it, will allow the user to select permissible extensions, and add page description fields.

INSTALLATION
The zip file that contains this file should contain:
  • commdoc.cfm: this file.
  • cfx_commconv.dll: the Cold Fusion extension dll
  • Albedo6.gif, back.gif, rule_pnk.gif, dot_clr.gif: Formatting graphics for this readme file.
To get started, decide, or get your server admin person to decide, where you are going to put extension Cold Fusion tags on your server, then put CFX_COMMCONV.dll in that directory. Then fire up the Cold Fusion administrator program - With Cold Fusion 2 on O'Reilly Website, it's called cfmadm20.exe in website/cfusion/bin. Cold Fusion 3 uses a different method - the administrator program is itself written in Cold Fusion, and is in website/cfide/administrator.

Click on the 'cfx tags' tab or button, then click on Add. Enter the required info, and make sure the library is not persistent in memory.

USAGE
Here's an example, with all the attributes on separate lines to make it clear... (Bear in mind that not all of these parameters are mandatory).

<CFX_COMMCONV
LOG_IN="c:\webs\mysite\logs\mysite.log"
LOG_OUT="c:\webs\mysite\logs\new.log"
DOMAIN_NAME="http://www.mysite.co.uk/"
SUB_DOMAIN1="sites/"
SUB_DOMAIN2="homepages/"
PATH="c:\webs\mysite\htdocs\"
>

Most of these are easy to understand, (PATH is there to preserve compatibility with CFX_LOGSITE). They are:

LOG_IN (mandatory) path specification for the log file that you wish to convert.

LOG_IN (mandatory) path specification for the output log file.

DOMAIN_NAME (optional, defaults to NULL) If a page is determined by the program to be an internal referral (i.e. someone has clicked through to a page from one of your other pages) then this is placed in the referral field to tag the line as such. It allows our log analysis program, LogSite, to do a somewhat more thorough job than would otherwise be possible (it analyses visits by checking for outside referrals).

SUB_DOMAIN1 and SUB_DOMAIN2 (optional, default to NULL) search strings that allow you to exclude parts of your site from the log. Use with caution - for example, "sites/" would not merely exclude everything in http://www.mysite.co.uk/sites/ but also: http://www.mysite.co.uk/other_stuff/sites/. Just make sure they're unique.

PATH (optional, defaults to NULL). The function of this parameter is simply to provide output compatible with our CFX_LOGSITE tag, which itself was designed to work with Cold Fusion user logs. Basically, it helps to convert the server page template name to a local path, hence "stuff/index.cfm" could become "d:\mysite\htdocs\stuff\index.cfm".

OTHER ALBEDO SOFTWARE
If you liked this, check out our other software:

CFX_LOGSITE, which this tag was designed to work with, is a comprehensive site logging program. It is commercial, but inexpensive, with some quite unique features. Take a look at its output, or download a demo from http://www.albedo.co.uk/goodies/logsite/cflsite.cfm

You may also like our CFX_FONT tag, downloadable from http://www.albedo.co.uk/goodies/cfgfx.cfm. It allows you to create masses of anti-aliased text graphics and has oodles of options.

Oh, and of course, don't forget to pick up our other log conversion programs. CFX_COMBCONV converts the NCSA/CERN Combined format, and there are stand-alone DOS versions (.exe) of both programs at http://www.albedo.co.uk/goodies/logsite/doslsite.cfm

Fin Fahey
Fiona Daly

LEGAL DISCLAIMER
Neither Albedo Systems Ltd. nor anyone else who has been involved in the creation, production or delivery of this product shall be liable for any direct, indirect, consequential or incidental damages (including damages for loss of business profits, business interruption, loss of business information, and the like) arising out of the use or inability to use this product even if Albedo Systems Ltd. has been advised of the possibility of such damages.