Log Analysis

After normalisation, comes the analysis process. The


Google

analysis process responsability is to extracts, infers or derives other information from the logged data. Since the superservice's logged data is in a standard format, the analysers are generic in the sense that they can operate for all the superservice's supported log formats, if the product's was clever enough to log the information required by the analyser. The analysis process is shown in the Figure 1.3 figure.

Figure 1.3. The Log Analysis Process

Since each analyser can add information to or create a new DLF, each analyser will generate data according to special kind of schemas.

Lire's framework include two kind of analysers. The difference between the two resides in the mapping between the source data and the new data they generate. Extended analysers generate new data for each DLF record whereas derived analysers are used when the new data doesn't have a one-to-one mapping with the source data.

The analysers produce data according to a data model which is specified in other DLF schemas. There are extended schemas and derived schemas. An extended schema simply adds new fields to the base superservice's schema. For example, in the web superservice's schema, a lot of information can be obtained from the referer field. From this information, it is possible to guess the user's browser, language or operating system. Those fields are specified in the www-referer extended schema; one analyser is responsible for extracting this information from the referer field.

But sometimes the analysis cannot just simply add information to each event record, an altogether different schema is needed then. For those cases, there is the derived schema. An example of the use of such a schema in the current Lire distribution is the analyser which creates user sessions based on the logged client IP address and user agent. This analyser defines the www-session derived schema.

Analysers are simple perl modules that receive the base superservice's DLF records and output DLF records in the extended or derived schema. The architecture supports cascading of schemas; this feature isn't used anywhere now.