Albedo Systems




COMBCONV.EXE V1.0

Conversion and simplification of Combined NCSA/CERN format website logs

Copyright Albedo Systems Ltd 1998
http://www.albedo.co.uk/
24 March 1998

CONTENTS SUMMARY
COMBCONV.EXE is a stand-alone DOS program, written in C++, and running under Windows NT/95. It allows the conversion of conventional web site logs created in the Combined NCSA/CERN format into page-only logs in a simpler format, retaining all useful information, but rejecting all non-HTML elements and failed pages (for example, 404 Not Founds). It also allows you to exclude sub-domains of your site - which may actually be separate websites in their own right.

This program was written to support our commercial website log analysis program, LogSite, which we (naturally) commend strongly - it offers many unique features. See our website for details. However, you may well find COMBCONV.EXE useful in some other way - in which case, bon appetit.

Note: a Cold Fusion version of this program (CFX_) is available at http://www.albedo.co.uk/goodies/logsite/cflsite.cfm

CONDITIONS
COMBCONV.EXE is freeware. As such, it may be freely distributed in the form of this zip file, provided this readme file is retained. All technical queries, bug reports etc. with regard to its use should be addressed to admin@albedo.co.uk.

We'd, of course, appreciate it if you credited us/linked to us, but you know you don't have to. We'll be doing more stuff soon, so please do drop by our website at http://www.albedo.co.uk/


WHAT IT DOES
The input file for this program must be in the Combined NCSA/CERN format. A typical line of such a file would look like this:

127.0.0.1 255.255.255.255 - [24/Mar/1998:18:44:37 +0000] "GET /stuff/index.cfm HTTP/1.0" 200 8300 "http://someone_else/links.htm" "Mozilla/4.04"

Not all of this information is frightfully relevant to a site user/owner, though a webmaster may find it useful. The 200, for example, simply means that this was a successful request. CFX_COMBCONV will convert this information into a simpler format:

"24-Mar-98", "18:44 ", "255.255.255.255", "Mozilla/4.04", "d:\mysite\htdocs\stuff\index.cfm", "", "http://someone_else/links.htm"

...which breaks down into:

Date, Time, IP Address of visitor, Browser (or other agent) used, Page Accessed, Page Description (not available, but included for compatibility), Referring page

Additionally, non-page lines (for example gifs) are ignored - only file extensions .htm* (includes .htm, .html .html-ssi etc) and .cfm are allowed through. All failed page requests are ignored, too. Optional parameters can be set, so that all files in a particular sub-domain (i.e. a directory) are ignored too.

The next version, when we get around to writing it, will allow the user to select permissible extensions, and add page description fields.

INSTALLATION
The zip file that contains this file should contain:
  • combdocd.cfm: this file.
  • combconv.exe: the log conversion program.
  • Albedo6.gif, back.gif, rule_pnk.gif, dot_clr.gif: Formatting graphics for this readme file.
Just extract the .exe file and put it where you want to run it.

USAGE
Combconv is runnable from the DOS prompt. It also requires a parameter file to run successfully. You may put this whereever you want, but the program defaults to a file called Combconv.ini placed in the same directory as the program. If, on the other hand, you wish to use a different parameter file, the usage is combconv [filename], where [filename] is the full pathname of the parameter file.

Here's an example of a parameter file - all the attributes must be on separate lines. (Bear in mind that not all of these parameters are mandatory).

LOG_IN="c:\webs\mysite\logs\mysite.log"
LOG_OUT="c:\webs\mysite\logs\new.log"
SUB_DOMAIN1="sites/"
SUB_DOMAIN2="homepages/"
PATH="c:\webs\mysite\htdocs\"

Most of these are easy to understand, (PATH is there to preserve compatibility with LogSite). They are:

LOG_IN (mandatory) path specification for the log file that you wish to convert.

LOG_IN (mandatory) path specification for the output log file.

SUB_DOMAIN1 and SUB_DOMAIN2 (optional, default to NULL) search strings that allow you to exclude parts of your site from the log. Use with caution - for example, "sites/" would not merely exclude everything in http://www.mysite.co.uk/sites/ but also: http://www.mysite.co.uk/other_stuff/sites/. Just make sure they're unique.

PATH (optional, defaults to NULL). The function of this parameter is simply to provide output compatible with our LogSite program, which itself was designed to work with Cold Fusion user logs. Basically, it helps to convert the server page template name to a local path, hence "stuff/index.cfm" could become "d:\mysite\htdocs\stuff\index.cfm".

OTHER ALBEDO SOFTWARE
If you liked this, check out our other software:

LogSite, which this program was designed to work with, is a comprehensive site logging program. It is commercial, but inexpensive, with some quite unique features. Take a look at its output, or download a demo from http://www.albedo.co.uk/goodies/logsite/doslsite.cfm

You may also like our Cold Fusion CFX_FONT tag, downloadable from http://www.albedo.co.uk/goodies/cfgfx.cfm. It allows you to create masses of anti-aliased text graphics and has oodles of options.

Oh, and of course, don't forget to pick up our other log conversion programs. COMMCONV.EXE converts the NCSA/CERN common format, and there are Cold Fusion versions (CFX_) of both programs at http://www.albedo.co.uk/goodies/logsite/cflsite.cfm

Fin Fahey
Fiona Daly

LEGAL DISCLAIMER
Neither Albedo Systems Ltd. nor anyone else who has been involved in the creation, production or delivery of this product shall be liable for any direct, indirect, consequential or incidental damages (including damages for loss of business profits, business interruption, loss of business information, and the like) arising out of the use or inability to use this product even if Albedo Systems Ltd. has been advised of the possibility of such damages.