11
CTX_REPORT

This chapter describes how to use the CTX_REPORT package to create various index reports. This chapter contains the following topics:

Procedures in CTX_REPORT

The CTX_REPORT package contains the following procedures:

Name	Description
`DESCRIBE_INDEX`	Create a report describing the index.
`DESCRIBE_POLICY`	Create a report describing a policy.
`CREATE_INDEX_SCRIPT`	Creates a SQL*Plus script to duplicate the named index.
`CREATE_POLICY_SCRIPT`	Creates a SQL*Plus script to duplicate the named policy.
`INDEX_SIZE`	Creates a report to show the internal objects of an index, their tablespaces and used sizes.
`INDEX_STATS`	Creates a report to show the various statistics of an index.
`TOKEN_INFO`	Creates a report showing the information for a token, decoded.
`TOKEN_TYPE`	Translates a name and returns a numeric token type.

Using the Function Versions

Some of the procedures in the CTX_REPORT package have function variants. You can call these functions as follows:

select ctx_report.describe_index('MYINDEX') from dual;

In SQL*Plus, to generate an output file to send to support, you can do:

set long 64000
set pages 0
set heading off
set feedback off
spool outputfile
select ctx_report.describe_index('MYINDEX') from dual;
spool off

DESCRIBE_INDEX

Creates a report describing the index. This includes the settings of the index meta-data, the indexing objects used, the settings of the attributes of the objects, and index partition descriptions, if any.

You can call this operation as a procedure with an IN OUT CLOB parameter or as a function that returns the report as a CLOB.

Syntax

procedure CTX_REPORT.DESCRIBE_INDEX(
  index_name IN VARCHAR2,
  report     IN OUT NOCOPY CLOB
);

function CTX_REPORT.DESCRIBE_INDEX(
  index_name IN VARCHAR2
) return CLOB;

index_name

Specify the name of the index to describe.

report

Specify the CLOB locator to which to write the report.

If report is NULL, a session-duration temporary CLOB will be created and returned. It is the caller's responsibility to free this temporary CLOB as needed.

The report CLOB will be truncated before report is generated, so any existing contents will be overwritten by this call.

DESCRIBE_POLICY

Creates a report describing the policy. This includes the settings of the policy meta-data, the indexing objects used, the settings of the attributes of the objects.

You can call this operation as a procedure with an IN OUT CLOB parameter or as a function that returns the report as a CLOB.

Syntax

procedure CTX_REPORT.DESCRIBE_POLICY(
  policy_name IN VARCHAR2,
  report     IN OUT NOCOPY CLOB
);

function CTX_REPORT.DESCRIBE_POLICY(
  policy_name IN VARCHAR2
) return CLOB;

policy_name

Specify the name of the policy to describe

report

Specify the CLOB locator to which to write the report.

If report is NULL, a session-duration temporary CLOB will be created and returned. It is the caller's responsibility to free this temporary CLOB as needed.

The report CLOB will be truncated before report is generated, so any existing contents will be overwritten by this call.

CREATE_INDEX_SCRIPT

Creates a SQL*Plus script which will create a text index that duplicates the named text index.

The created script will include creation of preferences identical to those used in the named text index. However, the names of the preferences will be different.

You can call this operation as a procedure with an IN OUT CLOB parameter or as a function that returns the report as a CLOB.

Syntax

procedure CTX_REPORT.CREATE_INDEX_SCRIPT(
  index_name      in varchar2,
  report          in out nocopy clob,
  prefname_prefix in varchar2 default null
);

function CTX_REPORT.CREATE_INDEX_SCRIPT(
  index_name      in varchar2,
  prefname_prefix in varchar2 default null
) return clob;

index_name

Specify the name of the index.

report

Specify the CLOB locator to which to write the script.

If report is NULL, a session-duration temporary CLOB will be created and returned. It is the caller's responsibility to free this temporary CLOB as needed.

The report clob will be truncated before report is generated, so any existing contents will be overwritten by this call.

prefname_prefix

Specify optional prefix to use for preference names.

If prefname_prefix is omitted or NULL, index name will be used. The prefname_prefix follows index length restrictions.

CREATE_POLICY_SCRIPT

Creates a SQL*Plus script which will create a text policy that duplicates the named text policy.

The created script will include creation of preferences identical to those used in the named text policy.

You can call this operation as a procedure with an IN OUT CLOB parameter or as a function that returns the report as a CLOB.

Syntax

procedure CTX_REPORT.CREATE_POLICY_SCRIPT(
  policy_name      in varchar2,
  report          in out nocopy clob,
  prefname_prefix in varchar2 default null
);

function CTX_REPORT.CREATE_POLICY_SCRIPT(
  policy_name      in varchar2,
  prefname_prefix in varchar2 default null
) return clob;

policy_name

Specify the name of the policy.

report

Specify the locator to which to write the script.

If report is NULL, a session-duration temporary CLOB will be created and returned. It is the caller's responsibility to free this temporary CLOB as needed.

The report CLOB will be truncated before report is generated, so any existing contents will be overwritten by this call.

prefname_prefix

Specify the optional prefix to use for preference names. If prefname_prefix is omitted or NULL, policy name will be used. prefname_prefix follows policy length restrictions.

INDEX_SIZE

Creates a report showing the internal objects of the text index or text index partition, and their tablespaces, allocated, and used sizes.

You can call this operation as a procedure with an IN OUT CLOB parameter or as a function that returns the report as a CLOB.

Syntax

procedure CTX_REPORT.INDEX_SIZE(
  index_name IN VARCHAR2,
  report     IN OUT NOCOPY CLOB,
  part_name  IN VARCHAR2 DEFAULT NULL
);

function CTX_REPORT.INDEX_SIZE(
  index_name  IN VARCHAR2,
  part_name   IN VARCHAR2 DEFAULT NULL
) return clob;

index_name

Specify the name of the index to describe

report

Specify the CLOB locator to which to write the report.

If report is NULL, a session-duration temporary CLOB will be created and returned. It is the caller's responsibility to free this temporary CLOB as needed.

The report clob will be truncated before report is generated, so any existing contents will be overwritten by this call

part_name

Specify the name of the index partition (optional). If part_name is NULL, and the index is a local partitioned text index, then all objects of all partitions will be displayed. If part_name is provided, then only the objects of a particular partition will be displayed.

INDEX_STATS

Creates a report showing various calculated statistics about the text index.

This procedure will fully scan the text index tables, so it may take a long time to run for large indexes.

INDEX_STATS will create and use a session-duration temporary table, which will be created in CTXSYS temp tablespace.

procedure index_stats(
  index_name in varchar2,
  report     in out nocopy clob,
  part_name  in varchar2 default null,
  frag_stats in boolean default TRUE,
  list_size  in number  default 100
);

index_name

Specify the name of the index to describe. You can specify a CONTEXT, CTXCAT, CTXRULE, or CTXXPATH index.

report

Specify the CLOB locator to which to write the report.If report is NULL, a session-duration temporary CLOB will be created and returned. It is the caller's responsibility to free this temporary CLOB as needed.

The report clob will be truncated before report is generated, so any existing contents will be overwritten by this call.

part_name

Specify the name of the index partition. If the index is a local partitioned index, then part_name must be provided. INDEX_STATS will calculate the statistics for that index partition.

frag_stats

Specify TRUE to calculate fragmentation statistics. If frag_stats is FALSE, the report will not show any statistics relating to size of index data. However, the operation should take less time and resources to calculate the token statistics.

list_size

Specify the number of elements in each compiled list. list_size has a maximum value of 1000.

Example

The following is sample output for INDEX_STATS on a context index. This report has been truncated for clarity. It shows some of the token statistics and all of the fragmentation statistics.

The fragmentation statistics are at the end of the report. It tells you optimal row fragmentation, an estimated amount of garbage data in the index, and a list of the most fragmented tokens. Running CTX_DDL.OPTIMIZE_INDEX cleans up the index.

=================================================================
              STATISTICS FOR "DR_TEST"."TDRBPRX21"
=================================================================

indexed documents:                                          53
allocated docids:                                           68
$I rows:                                                16,259

-----------------------------------------------------------------
                        TOKEN STATISTICS
-----------------------------------------------------------------

unique tokens:                                          13,445
average $I rows per token:                                1.21
tokens with most $I rows:
  telecommunications industry (THEME)                         6
  science and technology (THEME)                             6
  EMAIL (FIELD SECTION "SOURCE")                             6
  DEC (FIELD SECTION "TIMESTAMP")                            6
  electronic mail (THEME)                                    6
  computer networking (THEME)                                6
  communications (THEME)                                     6
  95 (FIELD SECTION "TIMESTAMP")                             6
  15 (FIELD SECTION "TIMESTAMP")                             6
  HEADLINE (ZONE SECTION)                                    6

average size per token:                                      8
tokens with largest size:
  T (NORMAL)                                               405
  SAID (NORMAL)                                            313
  HEADLINE (ZONE SECTION)                                  272
  NEW (NORMAL)                                             267
  I (NORMAL)                                               230
  MILLION (PREFIX)                                         222
  D (NORMAL)                                               219
  MILLION (NORMAL)                                         215
  U (NORMAL)                                               192
  DEC (FIELD SECTION "TIMESTAMP")                          186

average frequency per token:                              2.00
most frequent tokens:
  HEADLINE (ZONE SECTION)                                   68
  DEC (FIELD SECTION "TIMESTAMP")                           62
  95 (FIELD SECTION "TIMESTAMP")                            62
  15 (FIELD SECTION "TIMESTAMP")                            62
  T (NORMAL)                                                61
  D (NORMAL)                                                59
  881115 (THEME)                                            58
  881115 (NORMAL)                                           58
  I (NORMAL)                                                55
  geography (THEME)                                         52

token statistics by type:
  token type:                                           NORMAL
    unique tokens:                                       6,344
    total rows:                                          7,631
    average rows:                                         1.20
    total size:                              67,445 (65.86 KB)
    average size:                                           11
    average frequency:                                    2.33
    most frequent tokens:
      T                                                     61
      D                                                     59
      881115                                                58
      I                                                     55
      SAID                                                  45
      C                                                     43
      NEW                                                   36
      MILLION                                               32
      FIRST                                                 28
      COMPANY                                               27

  token type:                                            THEME
    unique tokens:                                       4,563
    total rows:                                          5,523
    average rows:                                         1.21
    total size:                              21,930 (21.42 KB)
    average size:                                            5
    average frequency:                                    2.40
    most frequent tokens:
      881115                                                58
      political geography                                   52
      geography                                             52
      United States                                         51
      business and economics                                50
      abstract ideas and concepts                           48
      North America                                         48
      science and technology                                46
      NKS                                                   34
      nulls                                                 34

The fragmentation portion of this report is as follows:

-----------------------------------------------------------------
                    FRAGMENTATION STATISTICS
-----------------------------------------------------------------

total size of $I data:                     116,772 (114.04 KB)

$I rows:                                                16,259
estimated $I rows if optimal:                           13,445
estimated row fragmentation:                              17 %

garbage docids:                                             15
estimated garbage size:                      21,379 (20.88 KB)

most fragmented tokens:
  telecommunications industry (THEME)                      83 %
  science and technology (THEME)                          83 %
  EMAIL (FIELD SECTION "SOURCE")                          83 %
  DEC (FIELD SECTION "TIMESTAMP")                         83 %
  electronic mail (THEME)                                 83 %
  computer networking (THEME)                             83 %
  communications (THEME)                                  83 %
  95 (FIELD SECTION "TIMESTAMP")                          83 %
  HEADLINE (ZONE SECTION)                                 83 %
  15 (FIELD SECTION "TIMESTAMP")                          83 %

TOKEN_INFO

Creates a report showing the information for a token, decoded. This procedure will fully scan the info for a token, so it may take a long time to run for really large tokens.

You can call this operation as a procedure with an IN OUT CLOB parameter or as a function that returns the report as a CLOB.

Syntax

procedure CTX_REPORT.TOKEN_INFO(
  index_name      in varchar2,
  report          in out nocopy clob,
  token           in varchar2,
  token_type      in number,
  part_name       in varchar2 default null,
  raw_info        in boolean  default FALSE,
  decoded_info    in boolean  default TRUE
);

function CTX_REPORT.TOKEN_INFO(
  index_name      in varchar2,
  token           in varchar2,
  token_type      in number,
  part_name       in varchar2 default null,
  raw_info        in varchar2 default 'N',
  decoded_info    in varchar2 default 'Y'
) return clob;

index_name

Specify the name of the index.

report

Specify the CLOB locator to which to write the report.

If report is NULL, a session-duration temporary CLOB will be created and returned. It is the caller's responsibility to free this temporary CLOB as needed.

The report clob will be truncated before report is generated, so any existing contents will be overwritten by this call token may be case-sensitive, depending on the passed-in token type.

token

Specify the token text.

token_type

Specify the token type. THEME, ZONE, ATTR, PATH, and PATH ATTR tokens are case-sensitive.

Everything else gets passed through the lexer, so if the index's lexer is case-sensitive, the token input is case-sensitive.

part_name

Specify the name of the index partition.

If the index is a local partitioned index, then part_name must be provided. TOKEN_INFO will apply to just that index partition.

raw_info

Specify TRUE to include a hex dump of the index data. If raw_info is TRUE, the report will include a hex dump of the raw data in the token_info column.

decoded_info

Specify decode and include docid and offset data. If decoded_info is FALSE, ctx_report will not attempt to decode the token information. This is useful when you just want a dump of data.

resolve_docids

Specify TRUE to resolve docids to rowids.

To facilitate inline invocation, the boolean arguments are varchar2 in the function variant. You can pass in 'Y', 'N', 'YES', 'NO', 'T', 'F', 'TRUE', or 'FALSE'

TOKEN_TYPE

This is a helper function which translates an English name into a numeric token type. This is suitable for use with token_info, or any other CTX API which takes in a token_type.

function token_type(
  index_name in varchar2,
  type_name  in varchar2
) return number;

TOKEN_TYPE_TEXT      constant number := 0;
TOKEN_TYPE_THEME     constant number := 1;
TOKEN_TYPE_ZONE_SEC  constant number := 2;
TOKEN_TYPE_ATTR_TEXT constant number := 4;
TOKEN_TYPE_ATTR_SEC  constant number := 5;
TOKEN_TYPE_PREFIX    constant number := 6;
TOKEN_TYPE_PATH_SEC  constant number := 7;
TOKEN_TYPE_PATH_ATTR constant number := 8;
TOKEN_TYPE_STEM      constant number := 9;

index_name

Specify the name of the index.

type_name

Specify an English name for token_type. The following strings are legal input. All input is case-insensitive.

Input	Meaning	Type Returned
TEXT	Normal text token.	0
THEME	Theme token.	1
ZONE SEC	Zone token.	2
ATTR TEXT	Text that occurs in attribute.	4
ATTR SEC	Attribute section.	5
PREFIX	Prefix token.	6
PATH SEC	Path section.	7
PATH ATTR	Path attribute section.	8
STEM	Stem form token.	9
FIELD <name> TEXT	Text token in field section <name>	16-79
FILED <name> PREFIX	Prefix token in field section <name>	616-916
FIELD <name> STEM	Stem token in field section <name>	916-979

For FIELD types, the index meta-data needs to be read, so if you are going to be calling this a lot for such things, you might want to consider caching the values in local variables rather than calling token_type over and over again.

The constant types (0 - 9) also have constants in this package defined.

Example

      typenum := ctx_report.token_type('myindex', 'field author text');

11 CTX_REPORT

Procedures in CTX_REPORT

Using the Function Versions

DESCRIBE_INDEX

Syntax

DESCRIBE_POLICY

Syntax

CREATE_INDEX_SCRIPT

Syntax

CREATE_POLICY_SCRIPT

Syntax

INDEX_SIZE

Syntax

INDEX_STATS

Example

TOKEN_INFO

Syntax

TOKEN_TYPE

Example

11
CTX_REPORT