The HKL file
All measured reflection data are saved in your starting file momonewunmerged.hkl. A HKL file consists of xray reflections related to reflection planes (hkl) of the crystal lattice. Each of these planes obeys Bragg's Equation. The experimental reflections are characterized by the indices h k l, a measured intensity I and its error sigma(I). 
To view the HKL file with the text editor nedit, type the command nedit momonewunmerged.hkl &. 
The following image shows a part of the file opened with nedit: 

Each line corresponds to a measured reflection.
The first three columns contain Bragg Indices
h, k and l, thus defining the crystal lattice plane of the corresponding reflection
(see above). The next two columns contain the intensities I of the reflections
and the corresponding errors sigma(I). Finally, the last column shows in
which 'run' of the experiment the reflection has been measured. A run is
a set of 'xray takes' during which the crystal is rotatated about a particular
axis of the diffractometer.
Example one: The reflection (326) appears three times in the list, i.e. it has been measured three times during the experiment. First, the reflection has been observed in run 1 with a intensity of 57.80 (and an error of about 6%), the second time in run 3 with a intensity of 50.34 (7% error) and the third time in run 4 with a intensity of 53.71 (6% error). For reflections measured more than once, the intensities may (and should) be averaged. In this way, the determined values become more accurate and the errors are reduced. Example two: The reflections (226) und (226) are related to each other by the centrosymmety of the diffraction pattern and are called a Friedel pair. For such reflections Friedel's Law: I(hkl) = I(hkl) is strictly true in case of centrosymmetric space
groups, where the crystal's unit cell contains an inversion centre.
In that case, also intensities of 'Friedel mates' can be averaged. If the
structure is not centrosymmetric, anomalous
scattering of heavy atoms possibly present has to be considered and
Friedel's law becomes an approximation, so that it is not recommended to
merge Friedel mates.
Starting XPREP and getting the cell geometry

XPREP is started with the command xprep name, in our case xprep momonewunmerged. The HKL file to be analyzed will be read in directly. 
The opened program window looks like this: 
36064 reflections are read from file momonewunmerged.hkl. The mean intensity of all data divided by its error  mean (I/sigma)  is calculated. This value is roughly indicating how strong the data are, one factor for this is how well the crystal has scattered the xray beam. (A value of one means that the intensity equals its error, i.e. that only background noise has been measured). 
Now the cell parameters determined during the experiment have to be given in the form a b c alpha beta gamma [enter]. The values shown here are also found in the text file momo.cell. 
Which crystal system would you expect looking at the cell parameters? Correct, it is orthorhombic. Next, the program checks whether a centered crystal lattice is present. This is done by comparison of expected systematic absences (missing reflections) with actually existing ones: 
The first line (N total) tells us, how many of
the total reflections should be absent according to the absence
law for a certain centered lattice. The line N (int>3sigma) specifies,
how many of those theoretically absent reflections are actually observed,
i.e. how many systematic absence violations exist. The remaining lines
are listing the intensities of the violating reflections.
A primitive lattice (first column) generates no absences, so there are no numbers given for analysis. If, for example, the lattice was body centered (I, 5th column), 18037 reflections should be absent. But of these 18037 reflections, 12180 are present with a high intensity (I > 3sigma), thus violating the absence law for Icentering. Looking at the remaining columns, one comes to the conclusion that no centered lattice is likely, so that the unit cell must be primitive (as proposed by the program). 
Confirm the suggestion [P] [enter] to get to XPREP's main menu: 
The top half of the main screen gives various
information. The first line shows which data set is currently being worked with
(momonewunmerged.hkl), at which wavelength it was measured (molybdenum
radiation 0.71073 A) and if the structure is chiral  this is not clear
yet. The next lines list two cells, first the original cell given at the
beginning, together with volume, errors and lattice type (P). The second
cell (current cell) is the one you work with at the moment. Mostly the
'original cell' and the 'current cell' are the same, but XPREP will change
the setting for unconventionally setup original cells  sometimes this
is not wanted by the user. So if in such a case the 'current cell' needs to
be reset, a matrix transformation has to be done (this would be option
U in the menu).
The bottom part of the screen is the options menu. The program suggests the most useful order of operations: 
First it should be checked whether the crystal system is of a higher symmetry than for the original (primitive) setting, so the program proposes option [H]  just press [enter] 
In this case the primitive orthorhombic cell
seems to be correct (highest symmetry possible).
Determining the spacegroup 
After that check, the space group can be determined. With the confirmed option [S] ([enter], as usual  pressing the enter key to confirm will not be mentioned any more) you get into a submenu: 
The proposed option [S] is the one you should choose to find a space group with no prior knowledge about the compound. Would you know the spacegroup in case of a structure isomorphous to a determined one, you could choose [I]. If preconditions about the chirality of the crystal/sample are to be made, one should choose [C] or [N] 
Select the given option [S]. After the crystal system has been confirmed to be orthorhombic [O], the absences will be analyzed once again, still resulting in a primitive lattice [P]. 

Now the space group determination starts. There
are several criteria that can serve as evidence for the most likely spacegroup(s):
1. The Evalue statistics. The distribution of Evalues can be used as a hint (but not as a proof) for or against a centrosymmetric space group. The statistic is based on Evalues, which are normalized structure factors, (the squareroots of the intensities), scaled so that the mean value of E^{2} is 1 in all resolution shells. For centrosymmetric structures the statistical frequency of particularly strong and weak Evalues is greater than for noncentrosymmetric ones. To express this fact numerically, the mean value of ¦E²1¦ is calculated: Theoretically expected values are 0.736 for noncentrosymmetric space groups and 0.968, if the space group is centrosymmetric. In our case it is 0.789, so a noncentrosymmetric space group is most likely. 2. The systematic absences. Absences are not only generated by centered lattices, but also by the presence of translational symmetry elements  either screw axes or glide planes. In this case the analysis of absences has to be done for the three axial directions of the unit cell (or reciprocal lattice, respectively). The way in which the directions are defined differs depending on the crystal system. In our case twofold screw axes 2_{1} along all three directions are observed in columns 4, 8 and 12 of the table: Along the first direction of the reciprocal lattice (corresponding to cell edge a) 13 absences are expected (column 4, line 1), and indeed there are no reflections (N=0) at the absence positions, which proves the existence of the screw axis. (The same is true for the other two directions: 17:0 and 94:2). Besides the three screw axes, there are no further translational elements present for this structure. Summing up, we now know, that there is a primitive orthorhombic lattice with no centrosymmetry and (only) three P2_{1} screw axes orthogonal to each other. The only possible space group combining these attributes is P2_{1}2_{1}2_{1}. It is worth mentioning that in general the systematic absences are more important (and often more reliable) than the Evalue statistics to distinguish between spacegroups. The latter criterium may for example be disturbed by the presence of heavy atoms. In our case, we would in principle not have needed the Evalues, because the absences clearly define the space group. 3. The frequency of the spacegroup. Should several spacegroups still be possible after the first two criteria, the number of known structures which cristallize in a certain spacegroup becomes an important indicator for the probability that this spacegroup is also correct for the new structure. The number of known structures is taken from the CSD (Cambridge Structural Database) and listed in the line of the proposed spacegroup(s): In our case, P2_{1}2_{1}2_{1} is suggested by XPREP as option [A]. Under the given number, 19, the spacegroup is listed in the International Tables of Crystallography. After the column for the CSD frequency and some other information, the CFOM value (combined figure of merit) is given, summing up all criteria explained before. The lower this value is (ideally around or even under one), the higher the probability for a spacegroup. 
Confirm [A], the only option present 
Analyzing and merging the data Knowing the spacegroup, symmetry equivalent reflections can be deduced and their intensities averaged. The necessary operations together with many analytical options and more special data modifications are found in a separate submenu. 
With main menu option [D] (read, modify or merge datasets) you enter the submenu: 

This new screen first lists the available datasets, followed by the options menu. During the operations, new datasets (e.g. merged and truncated ones) are generated. With option C one can switch between these datasets to work with the selected one. At the moment there is only the original dataset from our file. 
Before the reflections are merged, it is important to look at the data statistics, using option [S]: 

To create the statistical table, XPREP merges the data in advance. The user therefore has to decide what type of merging shall be done, considering the possible centrosymmetry of the spacegroup. As mentioned, only for centrosymmetric spacegroups, it is recommended to merge Friedelpaired reflections together with other symmetry equivalents  option [A]. In noncentrosymmetric cases like ours, option [S] is preferable, where only symmetry equivalent (and of course identical) reflections are merged. 
Select option [S]. (This is the first time that a program suggestion is not confirmed.) 
The data statistics look like this: 

The reflections are devided in resolution
shells, given in the first column of the table. The next two columns
list the number of reflections measured in the respective shell (data)
and the corresponding number expected due to the spacegroup symmetry (theory).
The resulting
completeness is given in the fourth column: A high
completeness is essential for a successful structure determination, because
in principle every theoretical reflection is needed for a complete fourier
synthesis describing the electron density map. Should the data collection lack
a sufficient completeness, the measurement has to be extended! Another
quantity derived from the number of experimental data is redundancy
(column five). For each unique reflection measured more than once (identically
or as a symmetry equivalent) the redundancy is greater than one. Thus, for
the same number of experimental reflections the data are more redundant,
the higher the symmetry is. Since averaged quantities are in general more
precise, if more data are used to calculate the mean, the redundancy increases
the quality of the whole structure determination.
Next, the mean Intensity I and the mean I/sigma are given, of which the latter is more important, representing the signaltonoise ratio of the data. The last two columns list the quality factors R(int) and R(sigma), both of which should be as low as possible. In the case of our data, the quality is good, except for the highest resolution shell (0.800.78 A). The completeness is otherwise almost at 100 %, the redundancy greater than fourfold and the mean intensity over sigma does not drop below five. It is normal that the data gradually get worse with increasing resolution, but if there are sudden jumps, it is better to exclude the very high resolution reflections which then are too weak for good quality. 
Note the great value differences  especially for completeness, redundancy and R(sigma)  in the thirdlast line of the table. To truncate the data at 0.8 A, select option [H] (apply high/low resolution cutoffs): 

Set the high resolution limit to 0.80 A, the low resolution limit is not changed  confirm [inf]. 
Looking again at the data statistics, a clear improvement for each overall mean value can be observed (last line, whole resolution range inf0.80): 

Create a new dataset of merged reflections, using option [M] (sortmerge current data). Take care of correct treatment for the Friedel pairs  again, select merging type [S]. Finally, save the merged dataset to a file with option [W]: 

First, the (SHELX specific) format of the HKL file is asked. For the other SHELXTL programs the HKLF4 format is needed, i.e. option [4]. The new file is called momonew.hkl. After that, confirm [0], so that none of the reflections is excluded from the dataset (which should be done with 5% of the reflections in case of macromolecular datasets). 
The new HKL file only contains 5580 merged reflections, which are now unique except for Friedel mates. You have given a new name to the file in order to save the old, unmerged one from being overwritten. The newly created datasets are now shown on top of the screen with indices #2 for the nonmerged, truncated dataset and #3 for the merged one: 

Note: If you should have forgotten to use the [W] option this time, you can still write the file later. 
Return to the main menu with option [E]. 
Preparing the instructions file for SHELXS There is still one thing to
do: The instructions file needed by the program SHELXS to solve
the structure has to be written. To create this file, you have to tell
the program what atom types you have reason to expect for the structure.
It is not very serious if you get this wrong, though naming all possible elements is more important than how many atoms of each you specify.

The type and number of possibly present atoms is given after chosing option [C] (define unitcell contents): 

The program asks for a kind of sum formula, where abbreviations for certain groups are allowed. 
Type C13 H22 O6, which is the sum formula of the known tutorial molecule. In real life, use any information you have about your compound, e.g. the sum formula of an expected reaction product. Note that Hydrogen atom numbers are not important here. 
From this information, XPREP calculates Z,
the number of formula units (may be identical to the molecule number) per
crystal unit cell. Dividing the cell volume by the total volume of all
formula atoms (theoretical value 18 A³ per atom, hydrogen atoms are
not counted), a number of 8 formula units results.
In spacegroup P2_{1}2_{1}2_{1}; the unit cell is divided into four identical asymmetric units. Therefore, each of these cell fractions should contain two molecules, which are not related by symmetry to one another (but related to the corresponding molecules of the three other asymmetric units). Back on the main screen, the new information is given in the information lines: 

In the secondlast line you find the crystal system, the space group and the Laue group. Next comes the sum formula, from which most of the following quantities are derived: the molecular weight, the Zvalue, the crystal density, the atomic volume and the total number of electrons per unit cell F(000). Finally Mu, a calculated absorption coefficient, is given. 
Now select option [F] to create the INS file. It should get the same name as the new HKL file, type momonew here. 

The contents of the INS text file momonew.ins are displayed automatically: The first line is always the title (TITL) of the structure: momonew in spacegroup P2_{1}2_{1}2_{1}. In the cell defining line (CELL) the xray wavelength is given first, then the cell parameters. In the following line ZERR the Zvalue together with the errors of the respective cell parameters are listed. LATT defines the type of the crystal lattice, 1 for primitive cells. If the structure is noncentrosymmetric, a minus precedes the number (otherwise not). Next the symmetry operators belonging to the actual spacegroup are listed in the SYMM lines. The basic operator (x,y,z) is never given, neither would be operators resulting from possible centrosymmetry or centered lattices, because the SHELX programs derive these operators from the LATT code. After SFAC, the presumably present atoms are stated. From the atom types, the atomic scattering factors are calculated, these are important for structure solution and refinement. The UNIT line corresponds to the atoms list, giving the number of atoms in the cell. TREF is specifying the phasing method for structure solution (see next chapter). HKLF4 is the standard HKL file format, listing squared structure factors (like in the unmerged starting file  see beginning of this chapter). END usually closes the instructions. 
The
merged HKL file should already have been written, so the next question
will be answered with [N].
As said before, the file creation can also be done here [Y], then the HKL
file would get the same file name stem as the INS file.
You have now finished the preparations for the structure solution and created two new files for SHELXS. Leave XPREP with main menu option [Q]. Continue the tutorial with the chapter about SHELXS. 