## SHELXS - classical direct methods for small molecules

SHELXS is a program of some antiquity for solving small (up to about 100
unique non-hydrogen atom) structures by * direct methods*. It is
very fast because in the 1980s computers were so slow. SHELXS is based on the
classical

*of Karle and Hauptman, but uses phase annealing and includes information from the weak reflections via the*

**tangent formula***. For further details see the*

**negative quartets**Running SHELXS direct methods

SHELXS requires two input files: an instruction file *name.ins*
and a reflection file *name.hkl*, and outputs the 'best' solution to
*name.res*. A summary is output to the console and a full listing
to *name.lst*. SHELXS is started with the command line:

A SHELXS job that is already running may be terminated
gracefully by creating a file *name.fin* in the directory in which
SHELXS is running. Many GUIs can generate the *.ins* file.
The following example *ylid.ins* was set up, together with
*ylid.hkl*, by the Bruker AXS program XPREP:

TITL ylid in P2(1)2(1)2(1)

CELL 0.71073 5.9645 9.0412 18.3971 90.000 90.000 90.000

ZERR 4.00 0.0004 0.0005 0.0022 0.000 0.000 0.000

LATT -1

SYMM 0.5-X, -Y, 0.5+Z

SYMM -X, 0.5+Y, 0.5-Z

SYMM 0.5+X, 0.5-Y, -Z

SFAC C H O S

UNIT 44 40 8 4

TREF

HKLF 4

END

The resulting *ylid.res* file will require editing, to define the
atom names and scattering factor numbers and delete noise peaks, before
it can be renamed to *ylid.ins* for SHELXL refinement using the
same reflection data file *ylid.hkl*.
Many possible instructions may be included in the *.ins* file
(see list below), but the only options that are really worth trying if
the above fails to solve the structure are:

**1.** Increase the number of tries from the default

**2.** If more than one solution has good Rα and NQUAL values,
it is possible that the structure has been solved but the program has
chosen the wrong solution. The list of +/- signs (the seminvariant
phses) can then be examined to see which solutions are likely to be
equivalent or not. Other solutions can then be generated with

**3.** If heavy atoms are present, it might be better to try to
locate them first by replacing

Patterson interpretation and partial structure expansion
with SHELXS

For Patterson interpretation, *.ins* file begins with

The

Patterson interpretation algorithm

The algorithm used to interpret the Patterson to find the heavier atoms is as follows:

**1.** One peak is selected from the sharpened Patterson (or input
by means of a

**2.** The Patterson function is calculated twice, displaced from
the origin by +U and -U, where U is the superposition vector. At each
grid point the lower of the two values is taken, and the resulting
* superposition minimum function* is interpolated to find the
peak positions. This is a much cleaner map than the original Patterson
and contains only 2N (or 4N etc. if the superposition vector was multiple)
peaks rather than N². The superposition map should ideally consist
of one image of the structure and its inverse; it has an effective
space group of P-1 (or C-1 for a centered lattice etc.).

**3.** Possible origin shifts are found which place one of the
images correctly with respect to the cell origin, i.e. most of the
symmetry equivalents can be found in the peak-list. The SYMFOM figure
of merit (normalized so that the largest value for a given
superposition vector is 99.9) indicates how well the space group
symmetry is satisfied for this image.

**4.** For each acceptable origin shift, atomic numbers are
assigned to the potential atoms based on average peak heights, and
a *crossword table* is generated. This gives the minimum
distance and Patterson minimum function for each possible pair of
unique atoms, taking symmetry into account. This table should be
interpreted by hand to find a subset of the atoms making chemically
sensible minimum interatomic distances linked by consistently large
Patterson minimum function values. The PATFOM figure of merit measures
the internal consistency of these minimum function values and is also
normalised to a maximum of 99.9 for a given superposition vector.
The Patterson values are recalculated from the original F

Alphabetical list of instructions in the SHELXS *.ins*
file

*.ins*file

All instructions in the *.ins* file commence with a four (or less) letter
word (which may be an atom name) followed by numbers and other information in free
format, separated by one or more spaces. Upper and lower case input may be freely
mixed. Defaults are given in square brackets; '#' indicates that the program will
generate a suitable default value based on the rest of the available information.
Continuation lines are flagged by '=' at the end of a line, the instruction being
continued on the next line which must start with at least one space. Other lines
beginning with one or more spaces are treated as comments, so blank lines may be
added to improve readability. All characters following '!' or '=' in an instruction
are ignored, except after *.ins* file for compatibility with SHELXL, but will be ignored.

Wavelength and unit-cell dimensions in Angstroms and degrees.

All missing reflections in the resolution range d(min) to d(max) Å
(the order of d(min) and d(max) is unimportant) are generated on a
statistical basis, assuming that they were skipped during the data
collection because a prescan indicated that they were weak (only
relevant for a 1-D detector!). These reflections will then be flagged
as 'unobserved', but improve the estimation of the remaining E-values
and enable an increased number of negative quartets to be identified.
d(min) should be safely inside the resolution limit of the data and
d(max) should be set so that there is no danger of regenerating strong
reflections (as weak) which were cut off by the beam stop etc.

This is the last instruction in the rare cases when the
*.ins* file is not terminated by the

Emin sets the minimum E-value for the list of largest E-values
that the program normally retains in memory; it should be set so as to
give more than enough reflections for TREF etc. It is also the threshold
used for tangent expansion and 'peak-list optimisation'. It is advisable
to reduce Emin to about 1.0 for triclinic structures and pseudosymmetry
problems. If Emin is negative, acentric triclinic data are generated for
use in all calculations. The other parameters control the normalisation
of the E-values:

new(E) =
old(E)⋅exp[8π^{2}dU(sin(θ)/λ)^{2}]
/ [ old(E)^{-4} + Emax^{-4} ]^{0.25}

^{2}dU(sin(θ)/λ)

^{2}] / [ old(E)

^{-4}+ Emax

^{-4}]

^{0.25}

renorm is a factor to control the parity group renormalisation; 0.0 implies no renormalisation, 1.0 sets full renormalisation, i.e. the mean value of E² becomes unity for each parity group. If axis is 1, 2 or 3, an additional similar renormalisation is applied for groups defined by the absolute value of the h, k or l index respectively.

The unique unit of the cell for performing the Fourier calculation
is set up automatically unless specified by the user using

**code** is ignored by SHELXS
but is included for compatibility with PATSEE and SHELXL
(where it is used for different purposes).

Fourier grid, when not set automatically. Starting points and
increments are multiplied by 100. **s** means starting value,
**d** increment, **l** is the direction perpendicular to the
layers, **a** is across the paper from left to right, and
**d** is down the paper from top to bottom. Note that the
grid is 53 x 53 x nl points that **sl** and **dl** need
not be integral. The 103 x 103 x nl grid is only available when
it is set automatically by the program (see above).

Before running SHELXS, a reflection data file *name.hkl*
must usually be prepared. The **r11..r33**
(which should have a positive determinant). **n** is negative if
reflection data follow, otherwise they are read from the
*.hkl* file. The data are read in fixed format (3I4,2F8.2)
(except for **n** = 1) subject to FORTRAN conventions.
The data are terminated by a record with h, k and l all zero (except
**n** = 1, which contains a terminator and checksum). If batch
numbers, direction cosines or wavelengths are present in the
*.hkl* file they will be ignored. The multiplicative
scale s multiplies both F² and σ(F²) (or F and
σ(F) for **n** = 1 or 3). The multiplicative weight
**wt** multiplies all 1/σ² values and **m** is
an integer offset needed to *read condensed data*
(

**n = 1:** SHELX-76 condensed data, now deprecated.

**n = 3:** h k l F

**n = 4:** h k l F² σ(F²). The
recommended format for nearly all purposes.

**n = 7:** h k l E or h k l P (Patterson coefficient)
depending on

*There may only be one HKLF instruction and it must
come last!*

The first stage involves five cycles of weighted tangent formula
refinement (based on triplet phase relations only) starting from
**nn** reflections with random phases and weights of 1. Single phase
seminvariants which have Σ**s-**
or greater than **s+** are included with their predicted phases and
unit weights. All these reflections are held fixed during the
**wr**, but both the phases and the weights are allowed to
vary. If **nf** is non-zero, the **nf** 'best' (based on the negative
quartet and triplet consistency) phase sets are retained and the
process repeated for (**npp-nf**) parallel phase sets, where **npp**
is the previous number of phase sets processed in parallel (often 128).
This is repeated for **nf** fewer phase sets each time until only a
quarter of the original number are processed in parallel. This rather
involved algorithm is required to make efficient use of available
computer memory. Typically **nf** should be 8 or 16 for 128 parallel
permutations. The purpose of the **ns** reflections and the strongest
**mtpr** triplets for each reflection (or less, if not so many can
be found) are used in the

Lattice type: 1=P, 2=I, 3=rhombohedral obverse on hexagonal axes, 4=F,
5=A, 6=B, 7=C. **N** must be made negative if the structure is
non-centrosymmetric.

If **m** = 1 or **m** = 2 writes h, k, l, A and B lists to the
*name.res* file, where A and B are the real and imaginary parts
of a point-atom structure factor respectively. If **m** = 1
the list corresponds to the phased E-values for the 'best' direct methods
solution, before partial structure expansion (if any). If
**m** = 2 the list is produced after the
final cycle of partial structure expansion, and corresponds to the
weighted E-values used for the final Fourier synthesis. These options
enable other Fourier programs to be used, e.g. for graphical display
of 3D-Fouriers for data to less than atomic resolution. After data
reduction and merging equivalent reflections, a list of h, k, l,
F**m** = 3) or
h, k, l, F**m** = 4) is written to the *name.res* file. This
provided a useful input file for programs such as DIRDIF and MULTAN
that did not provide sort/merge and rejection of systematic absences
etc. SHELXS always averages Friedel opposites. In all four cases the
output format is (3I4,2F8.2), and the list is terminated by a dummy
reflection 0,0,0.

Forces the following atoms, and atoms or peaks that are bonded
to them, into molecule **n** of the **n** may not be greater than 99.

More sets the amount of (printer) output; **verbosity** takes a
value in the range 0 (least) to 3 (most verbose).

The coordinates of the atoms that follow this instruction
are changed to:

**dx** + **sign**⋅x

y' = **dy** + **sign**⋅y

z' = **dz** +
**sign**⋅z

Thresholds for flagging reflections as 'unobserved'. Note that if no
**s**⋅σ(F), the reflection is
considered to be 'unobserved'. If **2θ(lim)** is POSITIVE,
it specifies a 2θ value above which the data are treated as
'unobserved'; if it is negative, the absolute value is used as a
lower 2θ cutoff.

The reflection **h k l** is flagged as 'unobserved' in the list of
merged reflections after data reduction. It will not be used directly
in phase refinement or Fourier calculations, but is retained for statistical
purposes and as a possible cross-term in a negative quartet. Thus if it
is known that a strong reflection has been included accidentally in the
*.hkl* file with a very small intensity (e.g. because it was cut
off by the beam stop), it is advisable to delete it from the *.hkl*
file rather than using

The second stage of phase refinement is based on 'phase
annealing'
(Sheldrick, 1990).
This has proved to be an efficient
search method for large structures, and possesses a number of
beneficial side-effects. It is based on **nsteps** cycles of tangent
formula refinement (one cycle is a pass through all **ns** phases),
in which a correction is applied to the tangent formula phase. The
phase annealing algorithm gives the magnitude of the correction
(it is larger when the 'temperature' is higher; this corresponds
to a larger value of **Boltz**), and the sign is chosen to give
the best agreement with the negative quartets (if there are no
negative quartets involving the reflection in question, a random
sign is used instead). After each cycle through all **ns** phases,
a new value for **Boltz** is obtained by multiplying the old
value by **cool**; this corresponds to a reduction in the
'temperature'. To save time, only **ns** reflections are
refined using the strongest **mtpr** triplets and **mnqr**
quartets for each reflection (or less, if not so many
phase relations can be found). The phase annealing parameters
chosen by the program will rarely need to be altered; however
if poor convergence is observed, the **Boltz** value should be
reduced; it should usually be in the range 0.2 to 0.5. When
the '**Boltz** should be set at a
somewhat higher value (0.4 to 0.7) so that not too many solutions
are duplicated.

If **npeaks** is positive it is the number of highest unique
Fourier peaks that are written to the *.res* and *.lst*
files; the remaining parameters are ignored. If **npeaks** is
given as
negative, the program attempts to arrange the peaks into unique
molecules taking the space group symmetry into account, and to
'plot' a projection of each such molecule on the printer (i.e.
the *.lst* file). Distances involving peaks which are less
than r1+r2+**d1** (the covalent radii r are defined via
*bonds* for purposes of the molecule
assembly and tables. Distances involving atoms and/or peaks that
are less than r1+r2+**d2** are considered to be *non-bonded
interactions*. Such
interactions are ignored when defining molecules, but the
corresponding atoms and distances are included in the
line-printer output. Thus an atom may appear in more than
one map, or more than once on the same map. Negative **d2**
includes hydrogen atoms in these non-bonds, otherwise they are
ignored (the absolute value of **d2** is used in the test).
Peaks are always always assigned the radius of *.res* file they are given names
beginning with 'Q' and followed by the same numbers. To
simplify interpretation of the lineprinter plots, extra
symmetry-generated atoms are added, so that atoms or peaks may
appear more than once. A table of the appropriate coordinates
and symmetry transformations appears at the end of the output.
See also

The largest |**m**| E-values and the complete Patterson map
are dumped into the *name.res* file in fixed format for
use by the Patterson search program PATSEE. **2θ(max)**
should be used to limit the resolution of the E-values
generated; the default value corresponds to
sinθ=λ/2. The **2θ(max)** value is also
written to the *.res* file, so it is possible to restrict
the resolution of the E-values actually used by PATSEE to a
lower **2θ(max)** by editing this file without rerunning
SHELXS; of course the E-values with higher 2θ than the
value used in SHELXS were not written to the *.res*
file and so cannot be recovered in this way. When **m** is
negative a *super-sharp* Patterson with coefficients
√(E³F) is used; if **m** is positive a standard
sharpened Patterson with coefficients (EF) is employed.
The resulting *name.res* file must be renamed
name.inp (or *name.pat* if the search fragment and
encoded Patterson are to be read from separate files)
for use by PATSEE. After a

Followed by a comment on the same line. This comment is ignored by the
program but is copied to the results file (*.res*).

These element symbols define the order of scattering factors to be employed by
the program. The first 94 elements of the periodic system are recognized.
SHELXS uses absorption coefficients from International Tables
(1991) volume C. For organic structures the first two SFAC types should be C
and H, in that order; the E-Fourier recycling generally assigns the first

Scattering factor in the form of an exponential series, followed by real and
imaginary corrections, linear absorption coefficient, covalent radius and
atomic weight. In addition, a 'label' consisting of up to 4 characters
beginning with a letter (e.g. Ca2+) may be included before **a1**. The two
*.ins* file; the order of
the SFAC instructions (and the order of element names in the first type of

The following fragment (which should begin with a

Symmetry operators, i.e. coordinates of the general positions as given in the
International Tables, volume A. The operator

If the time **t** (measured in seconds from the start of the job) is
exceeded, SHELXS performs no further blocks of phase permutations (direct
methods), but goes on to the final E-map recycling etc. In the case of
Patterson interpretation, no further vector superpositions are performed
after this time has expired. This instruction is a relic from the days
when a SHELXS job took hours rather than a fraction of a second!

Title of up to 76 characters, to appear at suitable places in the output.

**np** is the number of direct methods attempts; if negative, only
the solution with code number |**np**| is generated (the code number is in
fact a random number seed). Since the random number generation is very
machine dependent, this can only be relied upon to generate the same
results when run on the same model of computer. This facility is used
to generate E-maps for solutions which do not have the 'best' combined
figure of merit. No other parameter may be changed if it is desired to
repeat a solution in this way. **nE** reflections are employed in
the full tangent formula phase refinement. Values of **nE** that give
fewer than 20 unique phase relations per reflection for the full phase
refinement are not recommended. **kapscal** multiplies the products
of the three E-values used in triplet phase relations; it may be
regarded as a fudge factor to allow for experimental errors and also to
discourage overconsistent (uranium atom) solutions in symorphic space
groups. If it is negative the cross-term criteria for the negative
quartets are relaxed (but all three cross-term reflections must still
be measured), and more negative quartets are used in the phase
refinement, which is also useful for symorphic space groups.
**ntan** is the number of cycles of full tangent formula refinement,
which follows the phase annealing stage and involves all **nE**
reflections; it may be increased (at the cost of CPU time) if there is
evidence that the refinement is not converging well.

To avoid overconsistency, cos^{-1}(<α>/α)
is added to the modified tangent formula phase when <α> is
less than α. α is the weighted sum of the
cosines of the triple phase invariants and <α> is its
statistically predicted value; the sign of the correction is
chosen to give the best agreement with the negative quartets (a random
sign is used if there are no negative quartets involving the phase in
question). This tends to drive the figures of merit Rα and NQUAL
simultaneously to desirable values. If **ntan** is negative, a
penalty function (<Σ**wn** is a parameter used in calculating
the combined figure of merit CFOM:

#### CFOM = Rα (NQUAL < wn) or
Rα + (wn−NQUAL)² (NQUAL >= wn)

**wn** should be about 0.1 more negative than the anticipated value
of NQUAL. Only the

Number of atoms of each type in the cell, in

Z-value (number of formula units per cell) followed by the estimated errors in
the unit-cell dimensions. This information is not actually required by SHELXS
but is allowed for compatibility with SHELXL.