SHELXC keywords

SHELXC is usually started from a script or a GUI. On the command line, 'shelxc' should be followed by the filename stem 'name' that defines the three files that it should write, which are:

name.hkl - merged native reflection data h, k, l, I and σ(I) in SHELX HKLF4 format

name_fa.hkl - h, k, l, FA and σ(FA) in SHELX HKLF3 format

name_fa.ins - instruction file for substructure solution with SHELXD

The last two of these are needed for input to SHELXD to determine the substructure and the first two are input to SHELXE for phasing. The native reflection data are also in a suitable format for SHELXL, but will need the free-R reflections flagged (e.g. by XPREP). SHELXC reads keywords from standard input. The keywords may be given in any order, and only the first four characters are significant, so 'SIRA' is the same as 'SIRAS'.


Keywords to identify the input reflection data

At least one data input file must be named, but there will often be more. The input data files can be in SHELX .hkl, SCALEPACK .sca or XDS XDS_ASCII.HKL format. In order to read more than one file in XDS format they should either be read from different folders or they should be renamed, e.g. to XDS_PEAK.HKL etc. SHELXC decides on the file format by reading the first few lines, not by the filename extension. The XDS files have the advantage that they are always unmerged, otherwise 'OUTPUT POLISH UNMERGED' (SCALA) or 'NO MERGE ORIGINAL INDEX' (HKL2000 / SCALEPACK) should be used to make the .sca files. If a SHELX HKLF3 format .hkl file is read in, it should be followed by '-f' on the same line to indicate that it contains F-values rather than intensities. If only .mtz files are available, Tim Grüne's mtz2sca can be used to convert them to .sca format. In general none of the data files input to SHELXC need to be merged or sorted and may or may not contain systematically absent reflections, because SHELXC does all the necessary scaling etc. itself. If at all possible, unmerged data should be input. It is best not to allow other programs to maul the data first and upset the statistics! Examples of data input keywords for a SAD experiment are:

SAD XDS_ASCII.HKL - anomalous data (used as native too unless NAT is also specified)
NAT native -f - native data (optional)

The native data can be very useful for getting good maps if the resolution is higher than for the SAD data. In this example the default .hkl is attached to 'native' to make native.hkl. -f specifies that F is read rather than intensity. Friedel pairs must be present in the SAD data but are not required for the native data. For a SIR experiment the NAT file is essential:

NAT nat.sca - native data
SIR derivative.sca - derivative data

In practice, the derivative (e.g. an iodide soak) will give an anomalous signal, so SIRAS is normally better (the files were renamed from XDS_ASCII.HKL here to avoid a clash of names):

NAT XDS_NAT.HKL - native data
SIRAS XDS_DERIV.HKL - derivative data

For SHELXC, a MAD experiment is restricted to four wavelengths, identified by the keywords PEAK, INFL, HREM and LREM, plus optionally NAT. If only two wavelengths are specified, they must include peak or inflection point (or both). HREM stands for 'high energy remote' and LREM for 'low energy remote'.


Further keywords

The SHELXD instructions SHEL, SFAC, ESEL, FIND, MIND, DSUL and NTRY may be input for passing on to SHELXD; see the SHELXD keywords for more information about them. The CELL and SPAG keywords are always required for SHELXC.

CELL - the unit-cell parameters a, b, c, alpha, beta, gamma. If there are seven items, the first is assumed to be the wavelength (to be compatible with other SHELX programs).

SPAG - the name of the space group. Only Sohncke space groups are permitted, but some common non-standard settings are allowed, e.g. 'P22121'. Embedded spaces are ignored. SPAG is used to generate the LATT and SYMM instructions that are written to name_fa.ins. If the space group is specified as 'R3' or 'R32' the program checks the cell dimensions to see whether the hexagonal or primitive rhombohedral setting is required.

MAXM - allocates working space for reflections (for all datsets); e.g. the default 'MAXM 2' reserves space for 2000000 reflections.

DSCA - the factor (default 0.98) by which to multiply the native data for SIR and SIRAS or the AFTER or RIPAS data for RIP after the data have been put onto a common scale (this allows for the extra scattering power of the heavy atoms etc.).

ASCA - a scale factor (default 1.0) that is applied to the anomalous signal in a MAD experiment; to apply MAD to a small molecule, ASCA and DSCA should both be between 0 and 1, the best values have to be found by trial and error.

SMAD - (without a number) sets the dispersive term to zero in a MAD experiment. This is equivalent to SAD using weighted mean anomalous differences from all the MAD datasets. This can be useful when MAD appears to fail (especially if the wavelengths were labeled wrongly).

REM - lines beginning with these three letters are ignored.


MAD phasing example

SHELXC can be called in different ways, but in this example we will store the instructions in a separate file gere_mad (so that it is also Windows compatible). SHELXC is then started with:

shelxc <gere_mad

Linux or Mac users might prefer to use:

shelxc <gere_mad ¦ tee gere.lis

so that they have a permanent record of the console output. The file gere_mad (a well-known CCP4 test structure) could contain:

NAT gere_nat.sca
PEAK gere_peak.hkl
INFL gere_infl.hkl
HREM gere_hrem.hkl
LREM gere_lrem.hkl
CELL 109.02 61.75 71.74 90.00 97.08 90.00
SPAG C2
FIND 12


Special instructions for RIP phasing

BEFORE or NAT - the dataset collected before UV or X-ray radiation damage.

AFTER or RIP - the dataset collected after UV or X-ray radiation damage. In this case RIP phasing is applied analogously to SIR, without the use of anomalous scattering. DSCA (see above) can be critical for RIP experiments and various values around 0.98 should be tried.

RIPAS - The dataset collected after UV or X-ray radiation for data processing similar to SIRAS (but see RIPW).

RIPW - gives the weight w (default 0.6) to be assigned to the NAT or BEFORE data in the estimation of the anomalous signal (a weight of 1-w is applied to the 'RIPAS' data).