SHELXS - classical direct methods for small molecules
SHELXS is a program of some antiquity for solving small (up to about 100
unique non-hydrogen atom) structures by direct methods. It is
very fast because in the 1980s computers were so slow. SHELXS is based on the
classical tangent formula of Karle and Hauptman, but uses
and includes information from the weak reflections via the negative
quartets. For further details see the
Running SHELXS direct methods
SHELXS requires two input files: an instruction file name.ins and a reflection file name.hkl, and outputs the 'best' solution to name.res. A summary is output to the console and a full listing to name.lst. SHELXS is started with the command line:
A SHELXS job that is already running may be terminated gracefully by creating a file name.fin in the directory in which SHELXS is running. Many GUIs can generate the .ins file. The following example ylid.ins was set up, together with ylid.hkl, by the Bruker AXS program XPREP:
TITL ylid in P2(1)2(1)2(1)
CELL 0.71073 5.9645 9.0412 18.3971 90.000 90.000 90.000
ZERR 4.00 0.0004 0.0005 0.0022 0.000 0.000 0.000
SYMM 0.5-X, -Y, 0.5+Z
SYMM -X, 0.5+Y, 0.5-Z
SYMM 0.5+X, 0.5-Y, -Z
SFAC C H O S
UNIT 44 40 8 4
The resulting ylid.res file will require editing, to define the atom names and scattering factor numbers and delete noise peaks, before it can be renamed to ylid.ins for SHELXL refinement using the same reflection data file ylid.hkl. Many possible instructions may be included in the .ins file (see list below), but the only options that are really worth trying if the above fails to solve the structure are:
1. Increase the number of tries from the default
2. If more than one solution has good Rα and NQUAL values,
it is possible that the structure has been solved but the program has
chosen the wrong solution. The list of +/- signs (the seminvariant
phses) can then be examined to see which solutions are likely to be
equivalent or not. Other solutions can then be generated with
3. If heavy atoms are present, it might be better to try to
locate them first by replacing
Patterson interpretation and partial structure expansion
For Patterson interpretation,
Patterson interpretation algorithm
The algorithm used to interpret the Patterson to find the heavier atoms is as follows:
1. One peak is selected from the sharpened Patterson (or input
by means of a
2. The Patterson function is calculated twice, displaced from the origin by +U and -U, where U is the superposition vector. At each grid point the lower of the two values is taken, and the resulting superposition minimum function is interpolated to find the peak positions. This is a much cleaner map than the original Patterson and contains only 2N (or 4N etc. if the superposition vector was multiple) peaks rather than N². The superposition map should ideally consist of one image of the structure and its inverse; it has an effective space group of P-1 (or C-1 for a centered lattice etc.).
3. Possible origin shifts are found which place one of the images correctly with respect to the cell origin, i.e. most of the symmetry equivalents can be found in the peak-list. The SYMFOM figure of merit (normalized so that the largest value for a given superposition vector is 99.9) indicates how well the space group symmetry is satisfied for this image.
4. For each acceptable origin shift, atomic numbers are
assigned to the potential atoms based on average peak heights, and
a crossword table is generated. This gives the minimum
distance and Patterson minimum function for each possible pair of
unique atoms, taking symmetry into account. This table should be
interpreted by hand to find a subset of the atoms making chemically
sensible minimum interatomic distances linked by consistently large
Patterson minimum function values. The PATFOM figure of merit measures
the internal consistency of these minimum function values and is also
normalised to a maximum of 99.9 for a given superposition vector.
The Patterson values are recalculated from the original F
Alphabetical list of instructions in the SHELXS .ins
All instructions in the .ins file commence with a four (or less) letter
word (which may be an atom name) followed by numbers and other information in free
format, separated by one or more spaces. Upper and lower case input may be freely
mixed. Defaults are given in square brackets; '#' indicates that the program will
generate a suitable default value based on the rest of the available information.
Continuation lines are flagged by '=' at the end of a line, the instruction being
continued on the next line which must start with at least one space. Other lines
beginning with one or more spaces are treated as comments, so blank lines may be
added to improve readability. All characters following '!' or '=' in an instruction
are ignored, except after
Wavelength and unit-cell dimensions in Angstroms and degrees.
All missing reflections in the resolution range d(min) to d(max) Å (the order of d(min) and d(max) is unimportant) are generated on a statistical basis, assuming that they were skipped during the data collection because a prescan indicated that they were weak (only relevant for a 1-D detector!). These reflections will then be flagged as 'unobserved', but improve the estimation of the remaining E-values and enable an increased number of negative quartets to be identified. d(min) should be safely inside the resolution limit of the data and d(max) should be set so that there is no danger of regenerating strong reflections (as weak) which were cut off by the beam stop etc.
This is the last instruction in the rare cases when the .ins file is not terminated by the
Emin sets the minimum E-value for the list of largest E-values that the program normally retains in memory; it should be set so as to give more than enough reflections for TREF etc. It is also the threshold used for tangent expansion and 'peak-list optimisation'. It is advisable to reduce Emin to about 1.0 for triclinic structures and pseudosymmetry problems. If Emin is negative, acentric triclinic data are generated for use in all calculations. The other parameters control the normalisation of the E-values:
/ [ old(E)-4 + Emax-4 ]0.25
renorm is a factor to control the parity group renormalisation; 0.0 implies no renormalisation, 1.0 sets full renormalisation, i.e. the mean value of E² becomes unity for each parity group. If axis is 1, 2 or 3, an additional similar renormalisation is applied for groups defined by the absolute value of the h, k or l index respectively.
The unique unit of the cell for performing the Fourier calculation is set up automatically unless specified by the user using
Fourier grid, when not set automatically. Starting points and increments are multiplied by 100. s means starting value, d increment, l is the direction perpendicular to the layers, a is across the paper from left to right, and d is down the paper from top to bottom. Note that the grid is 53 x 53 x nl points that sl and dl need not be integral. The 103 x 103 x nl grid is only available when it is set automatically by the program (see above).
Before running SHELXS, a reflection data file name.hkl must usually be prepared. The
n = 1: SHELX-76 condensed data, now deprecated.
n = 3: h k l F
n = 4: h k l F² σ(F²). The recommended format for nearly all purposes.
n = 7: h k l E or h k l P (Patterson coefficient)
There may only be one HKLF instruction and it must come last!
The first stage involves five cycles of weighted tangent formula refinement (based on triplet phase relations only) starting from nn reflections with random phases and weights of 1. Single phase seminvariants which have Σ
Lattice type: 1=P, 2=I, 3=rhombohedral obverse on hexagonal axes, 4=F, 5=A, 6=B, 7=C. N must be made negative if the structure is non-centrosymmetric.
If m = 1 or m = 2 writes h, k, l, A and B lists to the name.res file, where A and B are the real and imaginary parts of a point-atom structure factor respectively. If m = 1 the list corresponds to the phased E-values for the 'best' direct methods solution, before partial structure expansion (if any). If m = 2 the list is produced after the final cycle of partial structure expansion, and corresponds to the weighted E-values used for the final Fourier synthesis. These options enable other Fourier programs to be used, e.g. for graphical display of 3D-Fouriers for data to less than atomic resolution. After data reduction and merging equivalent reflections, a list of h, k, l, F
Forces the following atoms, and atoms or peaks that are bonded to them, into molecule n of the
More sets the amount of (printer) output; verbosity takes a value in the range 0 (least) to 3 (most verbose).
The coordinates of the atoms that follow this instruction are changed to:
y' = dy + sign⋅y
z' = dz + sign⋅z
Thresholds for flagging reflections as 'unobserved'. Note that if no
The reflection h k l is flagged as 'unobserved' in the list of merged reflections after data reduction. It will not be used directly in phase refinement or Fourier calculations, but is retained for statistical purposes and as a possible cross-term in a negative quartet. Thus if it is known that a strong reflection has been included accidentally in the .hkl file with a very small intensity (e.g. because it was cut off by the beam stop), it is advisable to delete it from the .hkl file rather than using
The second stage of phase refinement is based on 'phase annealing' (Sheldrick, 1990). This has proved to be an efficient search method for large structures, and possesses a number of beneficial side-effects. It is based on nsteps cycles of tangent formula refinement (one cycle is a pass through all ns phases), in which a correction is applied to the tangent formula phase. The phase annealing algorithm gives the magnitude of the correction (it is larger when the 'temperature' is higher; this corresponds to a larger value of Boltz), and the sign is chosen to give the best agreement with the negative quartets (if there are no negative quartets involving the reflection in question, a random sign is used instead). After each cycle through all ns phases, a new value for Boltz is obtained by multiplying the old value by cool; this corresponds to a reduction in the 'temperature'. To save time, only ns reflections are refined using the strongest mtpr triplets and mnqr quartets for each reflection (or less, if not so many phase relations can be found). The phase annealing parameters chosen by the program will rarely need to be altered; however if poor convergence is observed, the Boltz value should be reduced; it should usually be in the range 0.2 to 0.5. When the '
If npeaks is positive it is the number of highest unique Fourier peaks that are written to the .res and .lst files; the remaining parameters are ignored. If npeaks is given as negative, the program attempts to arrange the peaks into unique molecules taking the space group symmetry into account, and to 'plot' a projection of each such molecule on the printer (i.e. the .lst file). Distances involving peaks which are less than r1+r2+d1 (the covalent radii r are defined via
The largest |m| E-values and the complete Patterson map are dumped into the name.res file in fixed format for use by the Patterson search program PATSEE. 2θ(max) should be used to limit the resolution of the E-values generated; the default value corresponds to sinθ=λ/2. The 2θ(max) value is also written to the .res file, so it is possible to restrict the resolution of the E-values actually used by PATSEE to a lower 2θ(max) by editing this file without rerunning SHELXS; of course the E-values with higher 2θ than the value used in SHELXS were not written to the .res file and so cannot be recovered in this way. When m is negative a super-sharp Patterson with coefficients √(E³F) is used; if m is positive a standard sharpened Patterson with coefficients (EF) is employed. The resulting name.res file must be renamed name.inp (or name.pat if the search fragment and encoded Patterson are to be read from separate files) for use by PATSEE. After a
Followed by a comment on the same line. This comment is ignored by the program but is copied to the results file (.res).
These element symbols define the order of scattering factors to be employed by the program. The first 94 elements of the periodic system are recognized. SHELXS uses absorption coefficients from International Tables (1991) volume C. For organic structures the first two SFAC types should be C and H, in that order; the E-Fourier recycling generally assigns the first
Scattering factor in the form of an exponential series, followed by real and imaginary corrections, linear absorption coefficient, covalent radius and atomic weight. In addition, a 'label' consisting of up to 4 characters beginning with a letter (e.g. Ca2+) may be included before a1. The two
The following fragment (which should begin with a
Symmetry operators, i.e. coordinates of the general positions as given in the International Tables, volume A. The operator
If the time t (measured in seconds from the start of the job) is exceeded, SHELXS performs no further blocks of phase permutations (direct methods), but goes on to the final E-map recycling etc. In the case of Patterson interpretation, no further vector superpositions are performed after this time has expired. This instruction is a relic from the days when a SHELXS job took hours rather than a fraction of a second!
Title of up to 76 characters, to appear at suitable places in the output.
np is the number of direct methods attempts; if negative, only the solution with code number |np| is generated (the code number is in fact a random number seed). Since the random number generation is very machine dependent, this can only be relied upon to generate the same results when run on the same model of computer. This facility is used to generate E-maps for solutions which do not have the 'best' combined figure of merit. No other parameter may be changed if it is desired to repeat a solution in this way. nE reflections are employed in the full tangent formula phase refinement. Values of nE that give fewer than 20 unique phase relations per reflection for the full phase refinement are not recommended. kapscal multiplies the products of the three E-values used in triplet phase relations; it may be regarded as a fudge factor to allow for experimental errors and also to discourage overconsistent (uranium atom) solutions in symorphic space groups. If it is negative the cross-term criteria for the negative quartets are relaxed (but all three cross-term reflections must still be measured), and more negative quartets are used in the phase refinement, which is also useful for symorphic space groups. ntan is the number of cycles of full tangent formula refinement, which follows the phase annealing stage and involves all nE reflections; it may be increased (at the cost of CPU time) if there is evidence that the refinement is not converging well.
To avoid overconsistency, cos-1(<α>/α) is added to the modified tangent formula phase when <α> is less than α. α is the weighted sum of the cosines of the triple phase invariants and <α> is its statistically predicted value; the sign of the correction is chosen to give the best agreement with the negative quartets (a random sign is used if there are no negative quartets involving the phase in question). This tends to drive the figures of merit Rα and NQUAL simultaneously to desirable values. If ntan is negative, a penalty function (<Σ
CFOM = Rα
(NQUAL < wn) or
Rα + (wn−NQUAL)² (NQUAL >= wn)
wn should be about 0.1 more negative than the anticipated value
of NQUAL. Only the
Number of atoms of each type in the cell, in
Z-value (number of formula units per cell) followed by the estimated errors in the unit-cell dimensions. This information is not actually required by SHELXS but is allowed for compatibility with SHELXL.