jBASE General Index Information

Definition	Description
External Representation	This part describes how the index data is stored in "internal" or "external" format and when you are doing selections against the data whether it uses "internal" or "external" representation to do the query against.
Multi-values	This part describes how multi-values are handled in jBASE and the limitations of using a multi-valued field when creating an index definition
Sync	This part describes what is know as "in sync" for a jBASE index
Regular Expressions	This part describes regular expressions, which is a Unix based mechanism for doing pattern matching when searching for a text string.
Universal Co-ordinated Time	This part describes what Universal Co-ordinate Time is, or confusingly UTC as the acronym.
jQL Usage	This part shows how the jQL commands such as SELECT or COUNT can use the index data to satisfy the request.

In jBASE there is a concept of internal and external representation of data. These differences usually materialise in fields where dates and times are stored, but are not necessarily limited to dates and times.

An "external" representation of a date is that which a user might understand, such as "15 JULY 1956" or "8/22/87". The data in the file would normally be stored in "internal" representation, and in the case of dates this is the number of days since 31 DEC 1967.

When you create an index definition you might, for example, want to create it on a date field that is stored in the internal representation of a date. The following example creates an index on attribute 7, and it does so as a right justified (i.e. numeric) sort so that the numbers in attribute 7 are sorted numerically (i.e. sorted in ascending dates)

If we assume attribute 7 is the order date stored as a numeric value (i.e. count of days since 31 DEC 1967) then the following command would select all orders received on and after 1st January 1997.

Obviously this isn't a particularly friendly sort of command ! The answer is to use the "look-up" code definition when an index is first created. When this is defined it means that any subsequent queries against the index from key-select or query-index will have that input string converted from an external representation to an internal representation. The following is a modification of the above example to show the index being created with the -lD option (which causes input data to be passed through the "D" conversion) and the modified, easier to use, version of key-select

The look-up code only applies to the key-select or query-index command. Statements inside jBC code such as SELECTINDEX or SELECT .... ATKEY will continue to use the internal representation and no lookup code will be applied. The following is an example of jBC code to do the equivalent of the above key-select statement.

When an index is created, by default jBASE assumes it is a multi-value index. Consider the following creation of an index.

Let us further assume DESCRIPTION is a DICT record that defines attribute 1. Finally let us assume just a single record exists like this :

This shows there is one record called "Jim" who has 3 products assigned to him, SLIPPERS, PIPE and BOOKS. As we have created an index on attribute 1, there will be 3 index keys created, one for each value.

This default usage of multi-values breaks down when multiple attributes are defined in the index definition. Consider the following index definition.

In the above example we concatenate attribute 1 with attribute 2 to create the index definition. If we assume both attributes are multi-valued, then what does the above actually mean? Does it mean we take the 3 values in attribute 1, add the 3 values in attribute 2, this creates 6 values and so we create a total of 6 index keys ? Or does it mean we take value 1 from both attribute 1 and 2 and concatenate them to create a single value ? If so, what happens when they have differing number of values in each attribute ?

As you can see, once we use 2 or more attributes the multi-value approach becomes difficult to solve in a logical and consistent manner. Therefore we will flag an error and the above example of create-index will fail.

If you want to do a concatenation or some other complex definition, you can only do this in non-multi-value mode. This is performed using the -m or (M) option to create-index. With this option we will not do a multi-value extract but instead treat the entire attribute as a single entity as though it had no multi-value marks. Therefore the above create-index example could be replaced by :

If you really want to have a go at multi-value support , then you will need to call a subroutine to perform it, and do the coding yourself in jBC code. In the following example we call a subroutine called CONCAT. This subroutine will be called with attribute 5 (in this example) passed. However the subroutine will also have the entire record passed to it in another parameter, and in your SUBROUTINE code you can do whatever you like. The resulting output can have as many values as you like, created however you like.

Once an index has been created with create-index , and assuming the index data is in sync, then you can start to use the index data with special commands such as key-select or with jBC statements such as SELECTINDEX.

Another way the index data can be used is from jQL commands such as COUNT or SELECT. This occurs automatically and will dramatically speed up the operation of the jQL statements.

For example, let us assume you create an index called DESCR on a file called PRODUCTS like this

This usage of create-index uses an existing DICT definition called DESCR and create and index name of DESCR. If you now do a SELECT against the file like this

Then in the above example the SELECT command will automatically use the index instead of the usual mechanism of searching through the entire file. You could have similarly used the following command instead :

Note the index could equally have been created with a command like the following, it is just we showed the alternative syntax for create-index as this used an existing DICT definition to create the index (the command syntax is easier but less powerful)

There are some limitations to this usage of the index. If the jQL statement is too complex or the index data is not in sync, the jQL command will still work but will use the normal mechanism of searching the file rather than using any index data for improved performance.

Limited compound selection criteria is supported -- for example the following first example will still use the index data whereas the second example will not even if an index exists for both BIRTHDAY and SURNAME.

If you want to be absolutely sure the index data is used to speed up your query, then you should change your application to use the key-select command.

When an index is created using the create-index command, you specify a file name ( (1) above) and the create-index command will create both the index definition (2) and the index data (3). When this is done the index data is said to be "in sync" with the file data. In other words, the index data matches the actual file data.

There may be occasions when this is not true and the index is not "in sync". This could occur following a restore from an archive using jrestore or maybe the -a option to create-index was used.

When a index is "in sync" it means commands such as key-select can be used against the index. It also means that jQL commands such as SELECT and COUNT may be able to use the index to speed up the command.

If the index is not "in sync" then the commands key-select and query-index will fail with an error message. The jQL commands such as SELECT and COUNT will simply avoid using the index and will use the regular mechanism of searching the file data. The jBC statements such as SELECTINDEX will simply fail to give any keys and the OPENINDEX statement will take the ELSE clause.

If a file is not "in sync" then you can use the rebuild-index command to rebuild the index data from the file data and once rebuild-index is run the status will change to "in sync"

Regular expressions are the name given to a set of pattern matching characters. The term is derived from Unix environment. The regular expressions can be used to great effect using the query-index command (or key-select command) to decide what records to select.

A full description of regular expressions can be obtained on Unix systems by entering the command :

For Windows/NT systems only a limited subset of regular expressions are available. The following characters inside a regular expression have special meaning :

Pattern	Description
^	The text must match the start of the index key
$	The text must match the end of the index key
.*	Any number of characters of any value may follow
\x	This escapes the character x meaning it just evaluates to x. This is useful if you want to include say the ^ character as part of your text string rather than a character with special meaning.

For example, on either a Unix or Windows/NT system you could use key-select to find a product description that has the text SLIPPER at the start of the description. This can be done using a jQL style syntax you might be familiar with using or by using regular expressions. The two methods are therefore :

As a more complicate regular expression, the following example looks for a product that begins with the string BIG , has the words RED somewhere in the text, and then must end with the words ENGINE :

The Unix implementation uses the operating system supplied version of regular expressions and these are far more powerful than the jBASE supplied version of regular expressions on Windows/NT systems. As already mentioned , use man 5 regexp to find more details. The following example looks for a product description that begins with the words PIPE , any number of spaces, then one or more numeric characters follow (including optionally a decimal point), any number of spaces, and finally the characters �mm� , which are case insensitive :

A standard time and date format exists on many operating systems including Unix and Windows/NT. This is called Universal Co-ordinated Time and is the number of seconds since 00:00:00 January 1st 1970. Confusingly it appears to have been given the acronym UTC (instead of UCT as you might expect !).

Within a jBASE application the normal convention adopted by jBASE , for the purposes of legacy code compatibility, is that times are stored as seconds past 00:00:00 and dates stored as the number of days since 31st December 1967.

The way to convert from UTC to jBASE time and dates and vice-versa can be demonstrated by the following code segment

One important aspect to remember is that the UTC is often in the base time of the operating system without any time zone applied. For example on Unix systems you set the time and date of your system in UTC but then individual accounts may have different time zones applied. Thus if you create an index at what appears to be a time of say 10:40:29 then this could actually be a time of 11:40:29 but with a time zone of minus one hour applied.

http://807199.827977/r5/knowledgebase/manuals/3.0/SecondaryIndexes/html/general.htm last modified on 06/19/09 05:29 AM