XPREP: Data analysis and modification
The HKL file
All measured reflection data are saved in your starting file momo-new-unmerged.hkl. A HKL file consists of x-ray reflections related to reflection planes (hkl) of the crystal lattice. Each of these planes obeys Bragg's Equation. The experimental reflections are characterized by the indices h k l, a measured intensity I and its error sigma(I).
To view the HKL file with the text editor nedit, type the command nedit momo-new-unmerged.hkl &
The following image shows a part of the file opened with nedit: 
Each line corresponds to a measured reflection. The first three columns contain Bragg Indices h, k and l, thus defining the crystal lattice plane of the corresponding reflection (see above). The next two columns contain the intensities I of the reflections and the corresponding errors sigma(I). Finally, the last column shows in which 'run' of the experiment the reflection has been measured. A run is a set of 'xray takes' during which the crystal is rotatated about a particular axis of the diffractometer
Example one: The reflection (326) appears three times in the list, i.e. it has been measured three times during the experiment. First, the reflection has been observed in run 1 with a intensity of 57.80 (and an error of about 6%), the second time in run 3 with a intensity of 50.34 (7% error) and the third time in run 4 with a intensity of 53.71 (6% error). For reflections measured more than once, the intensities may (and should) be averaged. In this way, the determined values become more accurate and the errors are reduced. 
Example two: The reflections (226) und (-2-2-6) are related to each other by the centrosymmety of the diffraction pattern and are called a Friedel pair. For such reflections Friedel's Law

I(hkl) = I(-h-k-l)

is strictly true in case of centrosymmetric space groups, where the crystal's unit cell contains an inversion centre. In that case, also intensities of 'Friedel mates' can be averaged. If the structure is not centrosymmetric, anomalous scattering of heavy atoms possibly present has to be considered and Friedel's law becomes an approximation, so that it is not recommended to merge Friedel mates. 
Example three: The set of reflections (226), (22-6), (2-2-6), (2-26), (-2-26) and (-2-2-6) is apparently symmetry-equivalent, i.e. the reflections are related to each other by symmetry operators of the crystal space group. Once the underlying symmetry is known, such reflections can also be merged. 

Starting XPREP and getting the cell geometry
To analyze the symmetry of the data (i.e. also the crystal symmetry) and to merge the reflections according to the found space group, the program XPREP is used. If the structure is not a problem case and has produced a data set of high quality, XPREP can be operated using default options, so that the user just needs to confirm the suggested commands ('enter' through the program) for most of the steps. Needless to say, it is still important to understand what one is doing!

XPREP is started with the command xprep name, in our case xprep momo-new-unmerged. The HKL file to be analyzed will be read in directly.
The opened program window looks like this: 
36064 reflections are read from file momo-new-unmerged.hkl. The mean intensity of all data divided by its error - mean (I/sigma) - is calculated. This value is roughly indicating how strong the data are, one factor for this is how well the crystal has scattered the x-ray beam. (A value of one means that the intensity equals its error, i.e. that only background noise has been measured). 
Now the cell parameters determined during the experiment have to be given in the form a b c alpha beta gamma [enter]. The values shown here are also found in the text file momo.cell
Which crystal system would you expect looking at the cell parameters? Correct, it is orthorhombic. Next, the program checks whether a centered crystal lattice is present. This is done by comparison of expected systematic absences (missing reflections) with actually existing ones: 
The first line (N total) tells us, how many of the total reflections should be absent according to the absence law for a certain centered lattice. The line N (int>3sigma) specifies, how many of those theoretically absent reflections are actually observed, i.e. how many systematic absence violations exist. The remaining lines are listing the intensities of the violating reflections. 
A primitive lattice (first column) generates no absences, so there are no numbers given for analysis. 
If, for example, the lattice was body centered (I, 5th column), 18037 reflections should be absent. But of these 18037 reflections, 12180 are present with a high intensity (I > 3sigma), thus violating the absence law for I-centering. Looking at the remaining columns, one comes to the conclusion that no centered lattice is likely, so that the unit cell must be primitive (as proposed by the program).
Confirm the suggestion [P] [enter] to get to XPREP's main menu: 
The top half of the main screen gives various information. The first line shows which data set is currently being worked with (momo-new-unmerged.hkl), at which wavelength it was measured (molybdenum radiation 0.71073 A) and if the structure is chiral - this is not clear yet. The next lines list two cells, first the original cell given at the beginning, together with volume, errors and lattice type (P). The second cell (current cell) is the one you work with at the moment. Mostly the 'original cell' and the 'current cell' are the same, but XPREP will change the setting for unconventionally set-up original cells - sometimes this is not wanted by the user. So if in such a case the 'current cell' needs to be reset, a matrix transformation has to be done (this would be option U in the menu). 
The bottom part of the screen is the options menu. The program suggests the most useful order of operations:
First it should be checked whether the crystal system is of a higher symmetry than for the original (primitive) setting, so the program proposes option [H] - just press [enter]
In this case the primitive orthorhombic cell seems to be correct (highest symmetry possible).

Determining the spacegroup

After that check, the space group can be determined. With the confirmed option [S] ([enter], as usual - pressing the enter key to confirm will not be mentioned any more) you get into a sub-menu:

The proposed option [S] is the one you should choose to find a space group with no prior knowledge about the compound. Would you know the spacegroup in case of a structure isomorphous to a determined one, you could choose [I]. If pre-conditions about the chirality of the crystal/sample are to be made, one should choose [C] or [N]
Select the given option [S]. After the crystal system has been confirmed to be orthorhombic [O], the absences will be analyzed once again, still resulting in a primitive lattice [P].

Now the space group determination starts. There are several criteria that can serve as evidence for the most likely spacegroup(s): 
1. The E-value statistics. The distribution of E-values can be used as a hint (but not as a proof) for or against a centrosymmetric space group. The statistic is based on E-values, which are normalized structure factors, (the square-roots of the intensities), scaled so that the mean value of E2 is 1 in all resolution shells. For centrosymmetric structures the statistical frequency of particularly strong and weak E-values is greater than for non-centrosymmetric ones. To express this fact numerically, the mean value of ¦E²-1¦ is calculated: Theoretically expected values are 0.736 for non-centrosymmetric space groups and 0.968, if the space group is centrosymmetric. In our case it is 0.789, so a non-centrosymmetric space group is most likely. 
2. The systematic absences. Absences are not only generated by centered lattices, but also by the presence of translational symmetry elements - either screw axes or glide planes. In this case the analysis of absences has to be done for the three axial directions of the unit cell (or reciprocal lattice, respectively). The way in which the directions are defined differs depending on the crystal system. In our case two-fold screw axes 21 along all three directions are observed in columns 4, 8 and 12 of the table: Along the first direction of the reciprocal lattice (corresponding to cell edge a) 13 absences are expected (column 4, line 1), and indeed there are no reflections (N=0) at the absence positions, which proves the existence of the screw axis. (The same is true for the other two directions: 17:0 and 94:2). Besides the three screw axes, there are no further translational elements present for this structure. 
Summing up, we now know, that there is a primitive orthorhombic lattice with no centrosymmetry and (only) three P21 screw axes orthogonal to each other. The only possible space group combining these attributes is P212121. It is worth mentioning that in general the systematic absences are more important (and often more reliable) than the E-value statistics to distinguish between spacegroups. The latter criterium may for example be disturbed by the presence of heavy atoms. In our case, we would in principle not have needed the E-values, because the absences clearly define the space group. 
3. The frequency of the spacegroup. Should several spacegroups still be possible after the first two criteria, the number of known structures which cristallize in a certain spacegroup becomes an important indicator for the probability that this spacegroup is also correct for the new structure. The number of known structures is taken from the CSD (Cambridge Structural Database) and listed in the line of the proposed spacegroup(s):
In our case, P212121 is suggested by XPREP as option [A]. Under the given number, 19, the spacegroup is listed in the International Tables of Crystallography. After the column for the CSD frequency and some other information, the CFOM value (combined figure of merit) is given, summing up all criteria explained before. The lower this value is (ideally around or even under one), the higher the probability for a spacegroup.
Confirm [A], the only option present

Analyzing and merging the data
Knowing the spacegroup, symmetry equivalent reflections can be deduced and their intensities averaged. The necessary operations together with many analytical options and more special data modifications are found in a separate sub-menu. 
With main menu option [D] (read, modify or merge datasets) you enter the sub-menu: 

This new screen first lists the available datasets, followed by the options menu. During the operations, new datasets (e.g. merged and truncated ones) are generated. With option C one can switch between these datasets to work with the selected one. At the moment there is only the original dataset from our file.
Before the reflections are merged, it is important to look at the data statistics, using option [S]

To create the statistical table, XPREP merges the data in advance. The user therefore has to decide what type of merging shall be done, considering the possible centrosymmetry of the spacegroup. As mentioned, only for centrosymmetric spacegroups, it is recommended to merge Friedel-paired reflections together with other symmetry equivalents - option [A]. In non-centrosymmetric cases like ours, option [S] is preferable, where only symmetry equivalent (and of course identical) reflections are merged.
Select option [S]. (This is the first time that a program suggestion is not confirmed.) 
The data statistics look like this: 

The reflections are devided in resolution shells, given in the first column of the table. The next two columns list the number of reflections measured in the respective shell (data) and the corresponding number expected due to the spacegroup symmetry (theory). The resulting completeness is given in the fourth column: A high completeness is essential for a successful structure determination, because in principle every theoretical reflection is needed for a complete fourier synthesis describing the electron density map. Should the data collection lack a sufficient completeness, the measurement has to be extended! Another quantity derived from the number of experimental data is redundancy (column five). For each unique reflection measured more than once (identically or as a symmetry equivalent) the redundancy is greater than one. Thus, for the same number of experimental reflections the data are more redundant, the higher the symmetry is. Since averaged quantities are in general more precise, if more data are used to calculate the mean, the redundancy increases the quality of the whole structure determination. 
Next, the mean Intensity I and the mean I/sigma are given, of which the latter is more important, representing the signal-to-noise ratio of the data. The last two columns list the quality factors R(int) and R(sigma), both of which should be as low as possible. 
In the case of our data, the quality is good, except for the highest resolution shell (0.80-0.78 A). The completeness is otherwise almost at 100 %, the redundancy greater than four-fold and the mean intensity over sigma does not drop below five. It is normal that the data gradually get worse with increasing resolution, but if there are sudden jumps, it is better to exclude the very high resolution reflections which then are too weak for good quality.
Note the great value differences - especially for completeness, redundancy and R(sigma) - in the third-last line of the table. To truncate the data at 0.8 A, select option [H] (apply high/low resolution cutoffs): 

Set the high resolution limit to 0.80 A, the low resolution limit is not changed - confirm [inf]
Looking again at the data statistics, a clear improvement for each overall mean value can be observed (last line, whole resolution range inf-0.80): 

Create a new dataset of merged reflections, using option [M] (sort-merge current data). Take care of correct treatment for the Friedel pairs - again, select merging type [S]. Finally, save the merged dataset to a file with option [W]

First, the (SHELX specific) format of the HKL file is asked. For the other SHELXTL programs the HKLF4 format is needed, i.e. option [4]. The new file is called momo-new.hkl. After that, confirm [0], so that none of the reflections is excluded from the dataset (which should be done with 5% of the reflections in case of macromolecular datasets). 
The new HKL file only contains 5580 merged reflections, which are now unique except for Friedel mates. You have given a new name to the file in order to save the old, unmerged one from being overwritten. The newly created datasets are now shown on top of the screen with indices #2 for the non-merged, truncated dataset and #3 for the merged one: 

Note: If you should have forgotten to use the [W] option this time, you can still write the file later. 
Return to the main menu with option [E]
Preparing the instructions file for SHELXS
There is still one thing to do: The instructions file needed by the program SHELXS to solve the structure has to be written. To create this file, you have to tell the program what atom types you have reason to expect for the structure. It is not very serious if you get this wrong, though naming all possible elements is more important than how many atoms of each you specify. 
The type and number of possibly present atoms is given after chosing option [C] (define unit-cell contents): 

The program asks for a kind of sum formula, where abbreviations for certain groups are allowed. 
Type C13 H22 O6, which is the sum formula of the known tutorial molecule. In real life, use any information you have about your compound, e.g. the sum formula of an expected reaction product. Note that Hydrogen atom numbers are not important here. 
From this information, XPREP calculates Z, the number of formula units (may be identical to the molecule number) per crystal unit cell. Dividing the cell volume by the total volume of all formula atoms (theoretical value 18 A³ per atom, hydrogen atoms are not counted), a number of 8 formula units results. 
In spacegroup P212121; the unit cell is divided into four identical asymmetric units. Therefore, each of these cell fractions should contain two molecules, which are not related by symmetry to one another (but related to the corresponding molecules of the three other asymmetric units). Back on the main screen, the new information is given in the information lines: 

In the second-last line you find the crystal system, the space group and the Laue group. Next comes the sum formula, from which most of the following quantities are derived: the molecular weight, the Z-value, the crystal density, the atomic volume and the total number of electrons per unit cell F(000). Finally Mu, a calculated absorption coefficient, is given.
Now select option [F] to create the INS file. It should get the same name as the new HKL file, type momo-new here.

The contents of the INS text file momo-new.ins are displayed automatically: The first line is always the title (TITL) of the structure: momo-new in spacegroup P212121. In the cell defining line (CELL) the x-ray wavelength is given first, then the cell parameters. In the following line ZERR the Z-value together with the errors of the respective cell parameters are listed. LATT defines the type of the crystal lattice, 1 for primitive cells. If the structure is non-centrosymmetric, a minus precedes the number (otherwise not). Next the symmetry operators belonging to the actual spacegroup are listed in the SYMM lines. The basic operator (x,y,z) is never given, neither would be operators resulting from possible centrosymmetry or centered lattices, because the SHELX programs derive these operators from the LATT code. After SFAC, the presumably present atoms are stated. From the atom types, the atomic scattering factors are calculated, these are important for structure solution and refinement. The UNIT line corresponds to the atoms list, giving the number of atoms in the cell. TREF is specifying the phasing method for structure solution (see next chapter). HKLF4 is the standard HKL file format, listing squared structure factors (like in the unmerged starting file - see beginning of this chapter). END usually closes the instructions.
The merged HKL file should already have been written, so the next question will be answered with [N]. As said before, the file creation can also be done here [Y], then the HKL file would get the same file name stem as the INS file. 
You have now finished the preparations for the structure solution and created two new files for SHELXS. Leave XPREP with main menu option [Q]. Continue the tutorial with the chapter about SHELXS