AustLII [Home] [Databases] [Search] [WorldLII] [Feedback] [Help] Related Links

SINO CGI API

* Full SINO Documentation
* AustLII VC List

You are here: AustLII >> Technical Library >> Web Developers >> SINO CGI API

Overview

...and so welcome to the SINO CGI API.

Physical Concordances

A concordance is an index of words, built over a group of documents. For example, the entire AustLII primary and secondary materials collection is indexed in a single physical concordance. A concordance acts much like the index at the back of a book. You look up a word, and get a list of documents (or in a book index, a list of page numbers).

A physical concordance ("PhC") is one that actually exists on the system. That is to say, there is a real file that corresponds to that PhC. In the context of virtual concordances below, a PhC is sometimes also known as a "database" (but don't confuse this with an AustLII database -- they're seperate concepts of "database").

PhC's have a file location on the web server, but you can't access them directly. You can only access their contents via the virtual concordance ("VC"). This is actually much more convenient, but sometimes you want to be able to refer to specific PhC within a search form. In such cases, you use the PhC alias to "single out" the PhC within the VC.

Virtual Concordances

A virtual concordance ("VC") is a collection of one or more physical concordances ("PhC's") that you can search in one hit. When building a search interface into AustLII, you must specify one (and only one) VC. Your users search is then conducted over the documents in that VC.

As stated above, a VC is one or more PhC's. So in searching a VC, you may in fact be moving over multiple concordances. For example, here are three PhC's that exist on AustLII:

  1. /au: Concordance of all Australian materials under www.austlii.edu.au/au/;

  2. /nz: Concordance of all New Zealand materials under www.austlii.edu.au/nz/;

  3. /sp: Concordance of all South Pacific materials under www.austlii.edu.au/sp/.

Each of these has a corresponding VC, so they can be searched individually. However there is also a VC called /austlii which contains all three PhC's. Specifying a VC of "/austlii" therefore searches material from Australia, New Zealand and the South Pacific -- all from the one search.

A PhC doesn't necessarily have a corresponding VC but generally they do. The specific VC's and PhC's available at a SINO web site will vary. The AustLII VC List is available here.

Note that, in search results, the results from the PhC's are not merged -- each PhC appears with a seperate heading in the results for the VC.

Mask Paths

So let's say we want our users to only search High Court cases. That's an Australian database so we set our VC to "/au" since that's a valid VC name (it's in the AustLII VC List). How do we single out the High Court?

The High Court is located at /au/cases/cth/high_ct/ on the AustLII server. What we do is specify a mask path that begins with that location. For example

<input type="hidden" name="mask_path" value="au/cases/cth/high_ct">

Now anything our user searches for will be restricted to the High Court database. We can list multiple mask paths, in order to select multiple databases:

<input type="hidden" name="mask_path" value="au/cases/cth/high_ct">
<input type="hidden" name="mask_path" value="au/cases/cth/federal_ct">

This allows us to search both High Court and Federal Court cases at the same time. You can build any combination you want -- provided those mask paths are part of that VC.

A problem arises when a VC contains multiple PhC's. The mask paths are hard to set, because they appear to apply to each PhC, when in fact you want one set of mask paths for one PhC, and another set for the other. You can do this by refering to the PhC alias when setting the mask path. For example, here are a set of mask paths to search only superior courts in all available Australasian databases:

<input type="hidden" name="meta" value="/austlii">
<input type="hidden" name="mask_au" value="au/cases/cth/high_ct">
<input type="hidden" name="mask_nz" value="nz/cases/NZCA">
<input type="hidden" name="mask_sp" value="fj/cases/">
<input type="hidden" name="mask_sp" value="vu/cases/">

Recall that the "/austlii" VC contains three PhC's: /au, /nz and /sp. We can't just list lots of "mask_path"s, because all mask_paths are applied to all PhC's. Instead we want to single out each PhC to set a mask path unique to it. So instead of "path" we use the PhC alias (or database alias) which we get from the VC List (chop off the leading "/"). And so we set the High Court mask path for the /au database, the New Zealand Court of Appeal for the /nz database, and the Fiji and Vanuatu case law databases (they only have superior courts) for the /sp database.

Parameter Reference

callback

Either "on", "off" or blank (or undefined). If on, triggers the display of word counts (shows words found (and how many), words not found (check spelling) and words ignored (common words)). The SINO CGI must have been compiled with support for this option built in (this is the case on AustLII).

db

Used to isolate a physical concordance (or "database") from a search of a virtual concordance. Generally not set in a search form. Since a virtual concordance is made up of a number of physical databases, it is sometimes necessary to search the VC but only get results for a specific component database. Set this option to the database alias (if the database is not aliased, this option is not available).

legisopt

Either "toponly" or blank (or undefined). If set to "toponly", then only the table of provisions for acts are printed out if those acts appear in search results. All other sections within the act are ignored. Can cause some funny counting inconsistencies in search results and so generally isn't available on AustLII. But can be a nice feature if you know what you're doing.

mask_path

Specifies that the search scope should be limited to those documents whose path begins with mask_path. A mask path is applied to each physical concordance within a virtual concordance -- regardless of whether or not such documents even exist in the physical concordance. Note that the effect of applying a non-existent mask path is to apply no mask path at all (ie the search is unrestricted).

mask_XXX

Specifies that the search scope should be limited to those documents in the physical concordance named in "XXX" that begin with the given mask path. In this case the mask path is only applied to those physical concordances that have the alias "XXX".

meta

Contains the value of the virtual concordance to perform the search over (see above for list of valid VC's). Usually set in a hidden field. Note that if you allow the user to change this on the form you may not be able to support mask paths (this may not be required depending on the VC you chose).

method

The method parameter sets a context for interpreting the users search input (passed in query, below). The method argument can take any one of these values: If any method other than "boolean" is chosen then the SINO CGI modifies the search before passing it to the search engine.

On AustLII you may see a method labeled "case name". This is just a title method with a different label.

Definining multiple method parameters is undefined.

offset

The starting offset for search results. Usually not set in the form, but is used to "get next page" of results. For example, an offset of "50" means "skip the first 50 results" and starts displaying results at #51.

query

The text of the users search query needs to be put here. The simplest method is to use <input type="text" name="query"> however any text-based form element will do (you could even use radio buttons to allow certain canned searches to be done).

Depending on the method chosen the users actual search query will be modified before being passed on to the search engine.

rank

Either on, off or blank (or undefined). If on, then search results are relevance ranked. It is off by default.

results

Set to the number of search results to show per page. The default is usually 50 but this can be changed by the server administrator. Values over 1000 are ignored.

stats

Either on, off or blank (or undefined). If on, then displays cumulative performance stats for that CGI process at the end of the search results. The SINO CGI must have been compiled with support for this option built in. Generally only used in debugging.

syns

Either on, off or blank (or undefined). If on, then SINO will use the synonyms file (thesaurus) during the search. AustLII doesn't have a thesaurus yet so this option is currently disabled.


AustLII: Copyright Policy | Privacy Policy | Disclaimers | Feedback
URL: http://www.austlii.edu.au/techlib/webdev/cgiapi.html