CAT Dictionary File Types

From Plover Wiki
Revision as of 07:35, 20 March 2026 by BTackt (talk | contribs) (comes from Dictionary format - just didn't want this in the other page.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

This page is outdated and possibly incomplete.

dct (aka Stentura, Jet, MDB, Microsoft Access)

.dct file format is used by the software DigitalCAT.

These are standard Microsoft Access databases (otherwise known as Microsoft Jet databases, or MDB files. They contain a single table named "Steno", which has columns named "Steno", "English", "Flags", and more. Each row represents a translation.

"Steno" is a text column containing the stroke, encoded as a concatenated sequence of six-digit hex strings representing bitmasks; each bit represents a key in the standard steno order.

"English" is a text column giving the text translation of that stroke.

"Flags" is an integer bitmap. Within the dictionary editor, flags are selected for prefixes, suffixes, capitalizing the next stroke, glue, and more using checkboxes. The combination of all flags is stored in this column.

.dct also stores info about the date an entry was created, the last time it was translated, as well as the number of times it has been used within DigitalCAT.

A limited number of functions are added within the English field such as indications for switching paragraph type {Q} and {A}. Otherwise, the English field is mainly treated as plaintext.

sgdct (CaseCat)

CaseCATalyst dictionaries have the extension .sgdct. There is often a corresponding .sgxml file, but this contains no dictionary data.

Much of the detail of the file format remains unknown.

The files begin with a 640-byte header, which begins with the magic number SGCAT32. Nothing is known about header fields at present.

One or more records follow the header. Each record gives a single translation from steno to text.

The record header is 21 bytes. header[18] contains the number of strokes, and header[19] gives the number of letters in the text. Each is an unsigned byte. The purpose of all other fields in the record header is unknown at present.

The stroke follows, as a sequence of four-byte unsigned integers. Each integer is a bitmap of keys in the standard steno order, with the first "S" as the most significant bit.

Then the text follows, as ordinary ASCII text. Nothing is currently known about coding of text outside the ASCII range. Various non-ASCII characters crop up, apparently as control codes.

Finally, there are zero to three padding bytes, in order to bring us up to a four-byte boundary.

.dix (Eclipse)

format=frameless
format=frameless

This page is incomplete. If you know about this subject, please contribute to the wiki by adding more information.