## Macromolecular substructure solution with SHELXD

SHELXD was originally written for *ab initio*
small molecule direct methods,
but it turned out to be even more useful for the location of the
heavy atoms in the experimental phasing of macromolecules by the SAD,
SIRAS, MAD and similar methods. Whereas the *ab initio*
solution of small molecules requires two dual-space stages
(

Running SHELXD for substructure solution

The input to SHELXD consists of two files *name_fa.ins*
and *name_fa.hkl* that usually will both
have been set up by SHELXC, possibly
under the control of a GUI such as
hkl2map,
*name_fa.hkl* contains F*.ins* file for a straightforward
selenomethionine MAD experiment follows:

CELL 0.98000 96.00 120.00 166.13 90.00 90.00 90.00

ZERR 16.00 0.019 0.024 0.033 0.0 0.0 0.0

LATT -7

SYMM -X, -Y, 0.5+Z

SYMM -X, Y, 0.5-Z

SYMM X, -Y, -Z

SFAC SE

UNIT 128

PATS

FIND 8

MIND -3.5 3

NTRY 1000

HKLF 3

END

Patterson seeding (*.res**.pdb*

For SAD data, for which the experimental phase information tends to
be weaker than for MAD, the critical parameters are the resolution d at
which to truncate the data (

d can be particularly
critical for sulfur-SAD phasing. If d>2.0Å the disulfide bonds
may not be fully resolved, but in the range 2.8>d>2.0 the

For the classical case of
rhombohedral insulin, which has two zinc ions on threefold axes and six
disulfides in the asymmetric unit, for CuKα radiation for which
f"(Zn) = 0.68 and f"(S) = 0.56, and significant
anomalous signal at 2.0Å, the instructions between

UNIT 500

SHEL 999 2.0

PATS

FIND 14

MIND -1.5 -0.1

NTRY 5000

In this example, if the anomalous data only extended to about 2.5Å, it would be better to search for two zincs and six disulfides:

UNIT 500

SHEL 999 2.5

PATS

FIND 8

DSUL 6

MIND -3.5 -0.1

NTRY 1000

At the zinc absorption edge of 1.283Å, f"(Zn) = 3.89
is so much greater than for sulfur (f" = 0.39) that it might be
advisable just to search for the two zinc sites with

In difficult cases it may well be worth increasing the number of trials. some large substructures solved only once in 50000 trials or more. In such cases one should try to use a computer with as many CPUs as possible, SHELXD will take full advantage of them!

At the end of the dual-space direct methods SHELXD refines the site occupancies assuming that all atoms are of the same type. This provides an adequate approximation in the case where different anomalous scatterers are present (e.g. Ca2+ and S in trypsin). For a SeMet MAD or sulfur-SAD experiment there should be a clear drop in occupancy after the last site. For halide soaks there is often a continuous descent to the noise level, which is usually assumed to be at an occupancy of about 0.15 relative to the site with the highest occupancy. This can be used to fine-tune the number of sites, which should be within about 20% of the true value for the best results.

Alphabetical list of SHELXD instructions

All instructions in the *.ins* file commence with a four (or less) letter
word (which may be an atom name) followed by numbers and other information in free
format, separated by one or more spaces. Upper and lower case input may be freely
mixed. Defaults are given in square brackets; '#' indicates that the program will
generate a suitable default value based on the rest of the available information.
Continuation lines are flagged by '=' at the end of a line, the instruction being
continued on the next line which must start with at least one space. Other lines
beginning with one or more spaces are treated as comments, so blank lines may be
added to improve readability. All characters following '!' or '=' in an instruction
are ignored, except after

These instructions define PDB format atoms for use by GROP.

All correlation coefficients (CC) are calculated using weights
w = 1/[1+**g**σ²(E)]. If the σ(E) values
read from the *.hkl* file are known to be very unreliable, it might
be better to set **g** to zero. The correlation coefficients between
E

Wavelength and unit-cell dimensions in Ångstroms and degrees.

Converts the most suitable **nss** peaks into disulfide units
with S-S distances of 2.06Å. This is an improvement on treating
these atoms as super-sulfurs. Each disulfide counts as a single
peak for

This is the last instruction in the rare cases when the
*.ins* file is not terminated by the

Minimum E and high-resolution limit for **Emin** defaults to 1.2 for *ab initio* structure solution
and to 1.5 for heavy atom location (the appropriate value is set as
default depending on whether a **Emin** if the resolution
is low.

Search for **na** atoms in **ncy** dual space cycles. If
**na** is the number of atoms remaining
after the random omit procedure. **ncy** defaults to the largest
of (20 or **na**) or, if **na** and 20). If

Resolution of all Fourier syntheses (including the PSMF but excluding
the Patterson itself) in terms of the minimum ratio of the number of
grid points along an axis to the maximum reflection index used along
that axis.

The dual-space direct methods is seeded by a 6D search for small rigid
group to find a high value (not necessarily the global maximum) of
ΣE**Eg** and d > **dg**, where d is the
resolution in Ångstroms. For each of **nor** random orientations,
the local maxima of this function are found starting from **ntr**
random translations, and the atom positions corresponding to the
orientation/translation combination that gives the highest value for this
function are used to initiate the dual-space recycling (*.ins* file. All other PDB records should be removed. The atomic
number is deduced from the atom name applying PDB rules. A short piece of
alpha-helix might be used for solving small proteins and a diglucose
fragment might be suitable for cyclodextrins. In practice, a thorough
6-dimensional search (with a large **nor** value and
**Eg** = 0) using

**m** = 4 for F² in *.hkl* file,
**m** = 3 for F (or F

**nh** is the number of (heavy) atoms to retain as fixed atoms
during

Lattice type: 1=P, 2=I, 3=rhombohedral obverse on hexagonal axes, 4=F,
5=A, 6=B, 7=C. **N** must be made negative if the structure is
non-centrosymmetric.

|**mdis**| is the shortest distance allowed between atoms for
**mdis** is negative PATFOM is
calculated, and the crossword table for the best PATFOM value so far
is output to the *.lst* file. In this case the solution is
passed on to the **mdeq** is the
minimum distance between symmetry equivalents for **mdis**| distance is used). The default value of
-0.1 for **mdeq** allows heavy atom sites on special positions,
which is normally recommended for small molecules or for heavy atom
soaks for macromolecular phasing.
For the location of selenium or sulfur in macromolecular phasing
it is advisable to use a value of 3.0 to avoid spurious solutions
such as *uraninum atom solutions* that are incorrect but fit
the tangent formula. For

The coordinates of the atoms that follow this instruction
are changed to:

**dx** + **sign**⋅x

y' = **dy** + **sign**⋅y

z' = **dz** +
**sign**⋅z

Maximum number of (largest) TPR (triple phase relations) per reflection.
If **ntpr** is negative, E is replaced by E/[1+σ²(E)] in
the estimation of probabilities involved in the tangent formula and
minimal function, as recommended by Giacovazzo, Siliqi & Garcia-Rodriguez
(2001).

Number of global tries if starting from random atoms, **ntry** is zero or absent, the program runs until
it is interrupted by creating a *name.fin* file in the current
working directory (e.g. using the UNIX command *touch*).

Calculates and stores Patterson. A random search is performed for
**np** two-atom vectors corresponding to Patterson peaks or for a
random orientation vector of length |**dis**|, using **npt**
random translations, selecting the one with the best Patterson minimum
function PMF (see **nf** random selections. This favors the
highest peaks but (if **nf** is not too large) also allows lower
peaks a chance. For example, with the default **np** = 100
and **nf** = 5, the chance is 39.5% that one of the first
10 vectors will be chosen and 91.9% that one of the first 50 will be
chosen. The default value of **npt** is 9999 for space groups with
a floating origin and 99999 for other space groups. When the space
group is P1, an extra atom is placed on the origin in addition to the
two-atom vector employed for the translation search. In the special
case when **nf** randomly
oriented vectors of length |**dis**| are compared on the basis of
the corresponding Patterson densities and the best used for the
translation search. If **ncy** greater than zero (or *full-symmetry Patterson
superposition minimum function* (i.e. a superposition based on
the two peaks and all their symmetry equivalents) is used to locate
the starting atoms for the first

**maxb** is the maximum number of bonds to atoms or higher
peaks, the peak is deleted if there are more. Peaks are also deleted
if they are less than **dsp** Ångstroms from their
equivalents. Atoms are not output to the final *.res* file if
they are in a molecule that consists of less than **mf** atoms.

**pres** is the resolution of the Patterson in terms of minimum
ratio of the number of grid points along an axis and the maximum
reflection index along that axis. If **nres** is negative a
* supersharp Patterson* with coefficients √(E³F)
is calculated (in which case a finer grid is advisable, i.e.

**psfac**is the fraction of the lowest values in the sorted list of Patterson heights that is summed to get the PMF.

Followed by a comment on the same line. This comment is ignored by the
program but is copied to the results file (*.res*).

**nrand** defines a different sequence of random numbers. If
**nrand** is omitted or zero, the seed is randomized so a new
sequence is always generated.

These element symbols define the order of scattering factors to be
employed by the program. The first 94 elements of the periodic system
are recognized. For some options, e.g. substructure solution, only
the first element type is used.

Resolution limits in Å for all calculations. Both limits must be
specified but it does not matter which is given first.

During **min2** times the first, the first peak is rejected (before
applying *uranium atom solutions*. For large equal-atom
structures in space group P1, where there is a danger of an
uranium-atom pseudo-solution, it might be a good idea to specify

Symmetry operators, i.e. coordinates of the general positions as given in the
International Tables, volume A. The operator

Fraction |**ftan**| of the ncy dual space (**fex** is the fraction of reflections with the largest Ecalc values to
hold fixed when doing tangent expansion to find the remaining phases.
**ftan**|−**ncy**
cycles. If **ftan** is negative, the occupancies are refined for
the final (1−|**ftan**|)−**ncy** cycles. This is
particularly useful for the anomalous sites in halide soak experiments,
since these often have partial occupancies, but for other substructure
problems it also provides a good check as to how many heavy atom sites
are present. It is not recommended for normal *ab initio*
applications of SHELXD because the algorithm employed uses a large
amount of memory (in the interests of speed).

After **CCmin**, **delCC** of best CC value so far. If PATFOM is
calculated, then only solutions with either the best initial CC
(i.e. after *.res* and *.pdb* files. If
*.res* and *.pdb* files
are written after the **CCmin** and
**delCC** are 45 and 1 resp. for full *ab initio* solutions,
and 10 and 5 resp. for substructure solution (i.e. when

Title of up to 76 characters, to appear at suitable places in the output.

Expand data to non-centrosymmetric triclinic for all calculations.

Number of atoms of each type in the cell, in

Randomly omit fraction **fr** of the atoms in the dual space
recycling (except in the last cycle and the cycles for which no tangent
refinement is performed - see

Z-value (number of formula units per cell) followed by the estimated errors in
the unit-cell dimensions. This information is not actually required by SHELXD
but is allowed for compatibility with SHELXL.