CIF format, CIFTAB and ShredCIF

CIF format is designed for the archiving, validation and publication of crystallographic data. The ACTA instruction tells SHELXL to write two CIF-format files: name.fcf contains the reflection data and name.cif all other information. In general these files are based on CoreCIF version 2.4.3, with a couple of extensions (e.g. for defining Friedel completeness) which it is hoped will become standard, and several SHELX-specific identifiers that begin with 'shelx'. In general SHELXL tries to put only the information that it knows must be correct into the CIF files, other items are given the value '?'. To maximize the information that it can use, the reflection data should not be merged before running SHELXL. Details of the hardware used and other items that SHELXL cannot know for certain (like the color of the crystal) can be added using programs such as CIFTAB, XCIF, PublCIF or EnCIFer that can combine the CIF file from SHELXL with other CIF files, and in the case of the PublCIF and EnCIFer, also edit the resulting CIF file. Important recent changes in the CIFs generated by SHELXL are discussed here.

It is important to realise that the small molecule CoreCIF is incompatible with MMCIF, a CIF format used internally by the PDB and by some macromolecular programs; even the unit-cell demensions have different names! Since many items in the dialect of CIF used by SHELX-97 have in the meantime been 'deprecated' and replaced by new CIF identifiers, most programs that were designed to work with CIF files from SHELXL-97 no longer work with the current SHELXL, though for important programs such as CheckCIF, PLATON and Coot this will just be a question of installing an up-to-date release. It has to be said that the august IUCr COMCIFS committee had no idea of the amount of chaos it would cause when they replaced _symmetry_equiv_pos_as_xyz by _space_group_symop_operation_xyz, thereby causing many older programs to fail!

SHELXL now embeds the .res and .hkl files into the output .cif file. This provides a particularly efficient way of archiving a structure determination. The program ShredCIF may be started with:

shredcif name

in order to shred the file name.cif. The output consists of the file name.res renamed as name.ins and the file name.hkl, plus a file name_x.cif that contains the remainder of the shredded CIF file. ShredCIF also checks the checksums of the embedded files to verify that there were no transmission errors. The .ins and .hkl files may be used immediately for a refinement with SHELXL.

CIFTAB reads CIF files and converts them into tables. Bruker users may prefer to rename CIFTAB to XCIF and the template files ciftab.* to xcif.*. The tables produced by CIFTAB/XCIF may prove useful for padding out Ph.D. theses. These programs can also be used to add site-specific CIF items etc. stored in a separate CIF-format file to a CIF file written by SHELXL. CIFTAB is started by the command:

ciftab name

to read the file name.cif. In order to make tables of crystal data, atom parameters, bond lengths and angles, anisotropic displacement parameters and hydrogen atom coordinates, CIFTAB reads a template file that defines how tables should be formatted. Users are encouraged to modify these templates for their own purposes. A 'plain text' template ciftab.def is provided, together with two template files [ciftab.rta (Å) and ciftab.rtm (pm)] for making Rich Text Format (RTF) tables that can be read and edited with most word processors. These templates have not changed since earlier versions of SHELXL, the new CIFTAB (and XCIF) can read both old and new CIF files (but of course the old CIFTAB cannot read the new CIF files). Note that if you wish to use template files that you had prepared yourself for use with the older CIF files, you may need to edit them to update any deprecated CIF names that they use!

The CIFTAB template files
This information is provided for those who are interested in modifying the template files provided to add a personal touch to their tables, other CIFTAB users may skip this section. The template file is simply copied to the output file, except that directives (lines beginning with '?' or '$') have special meanings:

'\n\' (where n is a number) is replaced by the ASCII character n (e.g. \12\ starts a new page), and CIF identifiers (which begin with the character '_') are replaced by the appropriate number or string from the CIF file. CIF identifiers may optionally be followed (without an intervening space) by one or more of: '<n', '>n', ':n' and '=n', where n is an integer; the CIF identifier (including qualifier) must be terminated by one space that is not copied to the output file. '<n' left justifies the CIF item so that it starts in column n, and is usually used for strings. '>n' right justifies a string or justifies a number so that the figure immediately to the left of the decimal point appears in column n; if there is no decimal point then the last digit appears in column n. In either case the standard deviation (if any) extends to the right with brackets but without intervening spaces. If '<n' and '>n' are both absent, the CIF item is inserted at the current position. If ':n' is absent the item is treated as a string (see above), otherwise it is treated as a number; n is the power of 10 with which the CIF item should be multiplied, and is useful for converting Å to pm or printing coordinates as integers; n may be negative, zero or positive. '=n' rounds the CIF item (after application of ':n') so that there are not more than n figures after the decimal point; n must be zero or positive.

A line beginning with 'loop_' is repeated until the corresponding loop in the CIF file is exhausted; all the CIF items in the line must be in the same loop in the CIF input file. A line containing at least 4 consecutive underscore characters is copied to the output file unchanged, and may be used for drawing an horizontal line. There are also two pseudo-CIF-identifiers: '_tabno' is the number of the table, and '_comno' is a number or text string to identify the compound. Both may be set via the CIFTAB menu. '_tabno' but not '_comno' is incremented each time it is used.

An underscore '_' followed by a space may be used to continue on the next line without creating a new line in the output file. Lines beginning with question marks are output to the console (without the leading question mark) as questions; if the answer to the question is not 'Y' or 'y', everything in the format file is skipped until the next line which begins with a question mark. The directive $symops:n, where n is an integer, prints the symmetry operations used to generate equivalent atoms, starting each line of text in column n. These operators are referenced by '#m' (where m is an integer) after the atom name. The line beginning '$symops:n' usually follows the tables of selected bond lengths and angles, torsion angles and hydrogen bonds.

The remaining directives may appear at any point in the format file except immediately after a continuation line marker, but always on a line beginning with '$'.

'h=none': leave out all hydrogen atoms.

'h=only': leave out all non-hydrogen atoms.

'h=free': leave out riding or rigid group hydrogens but include the rest.

'h=all': include all hydrogen and all other atoms.

The hydrogen atom directives apply only to tables of coordinates; hydrogen atoms are recognized by the .._type_symbol 'H'. A common user error on writing format files is to forget that 'h=only' etc. applies until it is replaced by another 'h=...' directive! The publication flags can be used to control which hydrogen atoms appear in tables of bond lengths, angles etc.

'brack': Atom names should include brackets (if present in the CIF file).

'nobrack': Brackets are deleted from the atom names.

'flag': Only output items for which the publication flag is 'Y' or 'y'.

'noflag': Output all items, ignoring the publication flag.

The default settings are '$h=none,brack,flag'. The standard tables file ciftab.def illustrates the use of most of these facilities. CIFTAB extends some of the standard CIF codes to make them more suitable for tables, and also takes special action when items such as _refine_ls_extinction_coef are missing or undefined. The simplest method of altering the contents and format of results tables is to create a different ciftab.??? format file (or a collection of such files for various purposes), using the standard file ciftab.def as a starting model. Thus the output can be tailored to different journals, doctoral theses, reports, etc.

Using SHELXL CIF files for publication in Acta Crystallographica
The process of converting a virgin SHELXL .cif file into an electronic manuscript for Acta Cryst. Section C may seem at first rather complex and daunting, but fortunately the IUCr provides extensive documentation on using CIF, and a program PublCIF to automate the procedure. Prior to writing up the structure, it is strongly recommended that a CIF file be submitted to the CheckCIF server!