Refinement of Triclinic Lysozyme at 1.1 A |
since 21-Jul-00
Copyright by | |||||
|
In this tutorial, we will refine the structure of Lysozyme in space group P1 to a resolution of 1.1 A. The data were collected at 120 K on beamline X11 at EMBL c/o DESY to a resolution of 0.92 A. The structure has in fact been refined to this resolution by Walsh et al. (Acta Cryst. D54:522 (1998), [MEDLINE]) and these coordinates are available from the pdb under entry code 3LZT. For our exercise, the data were cut at 1.1 A. To make things more interesting we also pretend that we do not have a starting model of the protein in space group P1. Instead we take a model for Lysozyme in space group P43212 as the searchmodel for molecular replacement. A good one can be found under pdb-code 2LYM (Kundrot et al. J.Mol.Biol. 193:157-170 (1987), [MEDLINE]).
In the following, user input is marked in red and program output in blue. All SHELXL input and output files are available in a gzipped file p1lys.tar.gz. To look at intermediate models and maps as we go along, it is useful to have Xtalview installed on your computer.
The rest of the document describes the following steps:
To prepare the model for molecular replacement, the original pdb file has to be modified, i.e waters, hydrogens, and hetero compounds should be removed and B-factors reset. All this can be done using the program SHELXPRO. This program is started from the command line by typing shelxpro. It comes up with a menu and you have to answer some questions:
trs/p1lys> shelxpro
SHELXPRO - SHELX interface for protein applications - Version 97-2
Copyright(C) George M. Sheldrick 1996-7
[F] New output filename [V] R(free) files
[A] Anisotropic scaling (Hope & Parkin) [I] .ins from PDB file
[P] Progress of LS refinement diagram [L] Luzzati plot
[T] Thermal displacement analysis [E] Esd analysis
[U] Update .res (and .pdb) to .ins file [N] NCS analysis
[R] Ramachandran Phi-Psi plot [K] Kleywegt NCS plot
[M] Map file for O from .fcf [O] PDB file for O
[H] .hkl file from other data formats [Y] X-PLOR/CNS .fob to .hkl
[D] Convert DENZO/SCALEPACK .sca to .hkl [C] Color plots (now on)
[X] Write XtalView map coefficients [W] Write Turbo-Frodo map
[S] Reflection statistics from .fcf [Z] Least-squares fit
[J] Generate restraints from model [B] PDB deposition
[G] Generate PDB file from .res or .pdb [Q] Quit
Enter option: G
Reads a .ins, .res or .pdb format file and generates a new PDB format file.
This file may be used for input to standard protein programs such as AMoRe,
or re-read by SHELXPRO for least-squares fitting. B-values may be reset to
typical values, disorder, solvent and H-atoms removed, chain IDs created,
and multiple copies of chains generated by (non-)crystallographic symmetry.
In the new PDB file all atoms are isotropic.
Enter N to abort option, <Enter> to continue: <Enter>
Read PDB (P) or SHELX .ins or .res (S) file [S]: P
Name of file to read [shelxpro.pdb]: 2lym.pdb
Replace B-values with standard values (Y or N) ? [N]: y
Remove hydrogen atoms (Y or N) ? [Y]: y
Reset PART 1 occ. to 1, delete other disorder components (Y or N) ? [Y]: y
Remove all residues except standard amino-acids (Y or N) ? [N]: y
1001 atoms stored
PDB file to write (may be same as read) [shelxpro.pdb]: 2lym_mod.pdb
Name of protein for PDB file:
tetragonal lysozyme
Now the atoms are written to the PDB file, starting with chains, followed
by the remaining atoms. In both cases residues may be selected by number;
symmetry transformations may also be applied.
Select chain ('$' if chain ID blank, to exit): $
New ID for this chain in PDB file [ ]: <Enter>
The symmetry operator may be specifed using decimals or fractions
Symmetry operator [x,y,z]: <Enter>
First and last residues to process [1 129]: <Enter>
New residue number for the first of these [1]: <Enter>
1001 atoms written to PDB file
Select chain ('$' if chain ID blank, to exit): <Enter>
SHELXPRO - SHELX interface for protein applications - Version 97-2
Copyright(C) George M. Sheldrick 1996-7
[F] New output filename [V] R(free) files
[A] Anisotropic scaling (Hope & Parkin) [I] .ins from PDB file
[P] Progress of LS refinement diagram [L] Luzzati plot
[T] Thermal displacement analysis [E] Esd analysis
[U] Update .res (and .pdb) to .ins file [N] NCS analysis
[R] Ramachandran Phi-Psi plot [K] Kleywegt NCS plot
[M] Map file for O from .fcf [O] PDB file for O
[H] .hkl file from other data formats [Y] X-PLOR/CNS .fob to .hkl
[D] Convert DENZO/SCALEPACK .sca to .hkl [C] Color plots (now on)
[X] Write XtalView map coefficients [W] Write Turbo-Frodo map
[S] Reflection statistics from .fcf [Z] Least-squares fit
[J] Generate restraints from model [B] PDB deposition
[G] Generate PDB file from .res or .pdb [Q] Quit
Enter option: Q
The resulting file 2lym_mod.pdb
needs a short massage to keep the molecular replacement program EPMR from
crashing: we have to put some spaces at
the end of all lines. After loading the file into my favourite editor
vi, This can be done by telling the editor to::s/.$/& /g.- handy, isn't it ?
Now we have a the search model ready in the file 2lym_mod.pdb.
EPMR needs a file containing structure factors, not intensities. The easiest way to make F's out of I's is to use the program XPREP (available from Bruker AXS).
Alternatively you can use programs from the CCP4 suite.
A script to do the conversion and its output are available here:
sca2hkl3.csh
sca2hkl3.out.
In any case, we will end up with a file
p1lys.hkl3 containing h, k, l, F, sig(F).
truncate will also tell us that the Wilson B factor of the data is 5.3 A
EPMR is a very easy to use molecular replacement program. To solve this structure we have to put the unit cell and the space group number into a file p1lys.cell. Then we run the program:
trs/p1lys> epmr p1lys.cell 2lym_mod.pdb p1lys.hkl3 > epmr.out &After about 1 minute it comes up with a solution which is dumped to a pdb file: epmr.1.best.pdb. The logfile of this EPMR run can be found here: epmr.out. For the solution found, the correlation coefficient and the R value for data between 4.0 and 15.0 A are 45.6% and 44.5% respectively.
Now we have a starting model. The next step is to convert the pdb file to something useful for SHELXL and to put Rfree flags onto the reflections which also have to be converted from SCALEPACK format to SHELX format.
The file epmr.1.best.pdb is a pretty normal pdb file. We use SHELXPRO to convert this pdb file to a SHELXL ins file, which contains both instructions and coordinates for a refinement job:
trs/p1lys> shelxpro SHELXPRO - SHELX interface for protein applications - Version 97-2 Copyright(C) George M. Sheldrick 1996-7 [F] New output filename [V] R(free) files [A] Anisotropic scaling (Hope & Parkin) [I] .ins from PDB file [P] Progress of LS refinement diagram [L] Luzzati plot [T] Thermal displacement analysis [E] Esd analysis [U] Update .res (and .pdb) to .ins file [N] NCS analysis [R] Ramachandran Phi-Psi plot [K] Kleywegt NCS plot [M] Map file for O from .fcf [O] PDB file for O [H] .hkl file from other data formats [Y] X-PLOR/CNS .fob to .hkl [D] Convert DENZO/SCALEPACK .sca to .hkl [C] Color plots (now on) [X] Write XtalView map coefficients [W] Write Turbo-Frodo map [S] Reflection statistics from .fcf [Z] Least-squares fit [J] Generate restraints from model [B] PDB deposition [G] Generate PDB file from .res or .pdb [Q] Quit Enter option: I Reads a PDB file and generates a SHELXL .ins file. The PDB file is assumed to conform strictly to the PDB format as defined by the Protein Data Bank, but closely related non-standard formats (e.g. CCP4 and XPLOR) can usually be understood. The program will ask for the missing cell and symmetry information etc. Engh and Huber restraints are included in the .ins file for standard residues, and extra restraints are added for disulfide bridges and C-terminal carboxyl groups. A summary of the residue and atom names is written to the .pro file for subsequent reference. ** The 'I' option is intended for initial input of a structure to SHELXL, NOT for updating between refinement jobs, for which 'U' should be used. ** Enter N to abort option,The file p1lys_mr.ins now contains a SHELXL instruction file.to continue: I Enter name of .ins file [shelxpro.ins]: p1lys_mr.ins Enter name of PDB file [shelxpro.ent]: epmr.1.best.pdb Enter title [shelxpro]: P1 lysozyme after mol rep CELL in Angstroms and deg. [ ]: 26.650 30.800 33.630 89.300 72.600 67.800 Enter Z (number of molecules per cell) [4]: 1 Enter space group in PDB or XPREP notation [P212121]: P1 Enter wavelength in Angstroms [1.54178]: 0.927 SCALE instructions not found in PDB file - standard transformation applied using current cell Enter old residue numbers (modified by chain ID, if any) for all N-terminii ( if none). To continue on the next line, put "=" at the end of the line : 1 Enter old residue numbers for all C-terminii in the same way: 129 Enter old residue numbers in the same way at which renumbering of a block of residues should start. The block continues until the next residue specified here ( if none): New residue number for first solvent water [1001]: Reset water occupancies to unity (Y or N) ? [Y]: HKLF code (3 for F, 4 for F-squared) [4]: The .ins file has been written successfully. The U option in SHELXPRO may be used for further checking of occupancies etc. to continue: . . Main Menue to Quit .
The only thing missing now is an Rfree-flagged list of reflections in format suitable for SHELXL.
trs/p1lys> shelxpro SHELXPRO - SHELX interface for protein applications - Version 97-2 Copyright(C) George M. Sheldrick 1996-7 [F] New output filename [V] R(free) files [A] Anisotropic scaling (Hope & Parkin) [I] .ins from PDB file [P] Progress of LS refinement diagram [L] Luzzati plot [T] Thermal displacement analysis [E] Esd analysis [U] Update .res (and .pdb) to .ins file [N] NCS analysis [R] Ramachandran Phi-Psi plot [K] Kleywegt NCS plot [M] Map file for O from .fcf [O] PDB file for O [H] .hkl file from other data formats [Y] X-PLOR/CNS .fob to .hkl [D] Convert DENZO/SCALEPACK .sca to .hkl [C] Color plots (now on) [X] Write XtalView map coefficients [W] Write Turbo-Frodo map [S] Reflection statistics from .fcf [Z] Least-squares fit [J] Generate restraints from model [B] PDB deposition [G] Generate PDB file from .res or .pdb [Q] Quit Enter option: D Reads DENZO/SCALEPACK .sca file created with or without the "anomalous" option and writes SHELX .hkl file for input to SHELXS or SHELXL with HKLF 4. If the .sca file was created with the "anomalous" option, an anomalous delta-F file may be created for heavy-atom location with SHELXS. Enter N to abort option,This time we stay in the program, as the freshly written file p1lys.hkl is not what we really want. We have to do one more round to put Rfree flags:to continue: Name of .sca file created using DENZO/SCALEPACK [shelxpro.sca]: p1lys.sca Cell: 26.650 30.800 33.630 89.300 72.600 67.800 Space group: P1 Enter name of .hkl output file [shelxpro.hkl]: p1lys.hkl Copy all data including Friedel opposites (C), merge Friedel opposites if any (M) or prepare anomalous delta-F file (A) [M]: 35836 Reflections written in HKLF 4 format to file p1lys.hkl to continue: SHELXPRO - SHELX interface for protein applications - Version 97-2 Copyright(C) George M. Sheldrick 1996-7 [F] New output filename [V] R(free) files [A] Anisotropic scaling (Hope & Parkin) [I] .ins from PDB file [P] Progress of LS refinement diagram [L] Luzzati plot [T] Thermal displacement analysis [E] Esd analysis [U] Update .res (and .pdb) to .ins file [N] NCS analysis [R] Ramachandran Phi-Psi plot [K] Kleywegt NCS plot [M] Map file for O from .fcf [O] PDB file for O [H] .hkl file from other data formats [Y] X-PLOR/CNS .fob to .hkl [D] Convert DENZO/SCALEPACK .sca to .hkl [C] Color plots (now on) [X] Write XtalView map coefficients [W] Write Turbo-Frodo map [S] Reflection statistics from .fcf [Z] Least-squares fit [J] Generate restraints from model [B] PDB deposition [G] Generate PDB file from .res or .pdb [Q] Quit Enter option:
Enter option: V Reads a file in SHELX HKLF 3 or 4 format and creates a new .hkl file in which P% of the data are flagged for use in an R(free) test by SHELXL using CGLS N -1 or L.S. N -1. These reflections may be chosen either at random or in thin resolution shells. The latter option is recommended when NCS (non- crystallographic symmetry) or twinning is present. CGLS or L.S. without the second parameter may be used for the final refinement against all data. See A.T. Brunger, Nature 355 (1992) 472-475 for a discussion of R(free). Enter N to abort option,Finally, we have a file containing Rfree-flagged intensities in SHELXL format: p1lys_rf.hkl. We should make a backup of this file and, to play it safe, make it not writeable:to continue: Input reflection data file [shelxpro.hkl]: p1lys.hkl Filename for .hkl file to write [shelxprot.hkl]: p1lys_rf.hkl Percentage of data to be flagged for R(free) [5]: R(free) reflections random (R) or in thin shells (S) [R]: 35836 Reflections copied, of which 1795 flagged for R(free) to continue: . . Main Menue to Quit .
trs/p1lys> cp p1lys_rf.hkl ../backup trs/p1lys> chmod -w p1lys_rf.hkl
trs/p1lys> cp p1lys_mr.ins p1lys_0.insThe following changes have to be done:
trs/p1lys> ln p1lys_rf.hkl p1lys_0.hklAnd start our first refinement job:
trs/p1lys> shelxl p1lys_0.hkland the program immediately starts to complain:
trs/p1lys> shelxl p1lys_0 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + SHELXL-97 - CRYSTAL STRUCTURE REFINEMENT - UNIX VERSION + + Copyright(C) George M. Sheldrick 1993-7 Release 97-2 + + p1lys_0 started at 09:50:21 on 27-Jun-2000 + +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Read instructions and data ** Warning: no match for 2 atoms in DFIX ** ** Warning: 118 distances involving residues not restrained ** ** Warning: 11 bad CHIV instructions ignored **It is always important to understand the warnings that SHELXL gives to you - normally they point you to something interesting/worrying. In this case the non restrained distances are due to symmetry crashes which put atoms that should not have anything to do with each other at short distances. SHELXL then thinks that these atoms should have a bond between them but can not find a restraint for this bond. And complains. The symmetry crashes will simultaneously envoke a lot of anti-bumping restraints which may substantially confuse the minimizer. So we should fix the symmetry crashes.
To find out what is going on, we have a look at the following table in the lst-file p1lys_0.lst:
Following 1,2- or 1,3-distances involving residues not restrained CA_2 NE_73$5 C_2 NE_73$5 C_2 NE_73$5 C_2 CD_73$5 O_2 NE_73$5 N_3 NE_73$5 N_3 NH1_73$5 N_3 NE_73$5 N_3 CZ_73$5 CA_3 NE_73$5 C_3 NH1_73$5 C_3 NE_73$5 C_3 CZ_73$5 CB_3 NH1_73$5 CB_3 NE_73$5 CB_3 CZ_73$5 CB_3 NH1_73$5 CD1_3 NH1_73$5 CD2_3 NH1_73$5 CZ_5 O_101$5 CD_7 NH2_73$5 CZ_23 NH2_68$4 CA_68 CZ_125$7 CA_68 NH2_125$7 CA_68 NE_125$7 C_68 NH2_125$7 C_68 NH2_125$7 C_68 CZ_125$7 C_68 CZ_125$7 O_68 NH2_125$7 O_68 NE_125$7 O_68 CZ_125$7 CZ_68 OH_23$2 N_69 NH2_125$7 N_69 CZ_125$7 N_69 NE_125$7 N_69 CZ_125$7 N_69 NH1_125$7 CA_69 NH2_125$7 CA_69 CZ_125$7 CA_69 NH1_125$7 C_69 CZ_125$7 C_69 NH1_125$7 C_69 NH1_125$7 O_69 NH1_125$7 CB_69 CZ_125$7 CB_69 NH1_125$7 N_70 NH1_125$7 N_70 NH1_125$7 CA_70 NH1_125$7 CG_70 NH1_125$7 CD_70 NH1_125$7 CG_73 O_2$1 CD_73 CA_3$1 CD_73 O_2$1 CD_73 N_3$1 CD_73 C_2$1 NE_73 O_2$1 NE_73 CA_3$1 CZ_73 CA_3$1 CZ_73 CA_3$1 CZ_73 CG_3$1 CZ_73 N_3$1 CZ_73 OE1_7$1 CZ_73 O_2$1 CZ_73 C_2$1 NH1_73 CA_3$1 NH2_73 CA_3$1 C_101 NH2_5$1 N_109 NH2_128$8 CA_109 NE_128$8 CA_109 NH2_128$8 CA_109 CZ_128$8 C_109 NH2_128$8 CB_109 NH2_128$8 CB_109 NH2_128$8 CB_109 NE_128$8 CB_109 CZ_128$8 CG1_109 NE_128$8 CG1_109 NH2_128$8 CG1_109 CZ_128$8 CG2_109 NH2_128$8 CG2_109 CZ_128$8 CG2_109 NE_128$8 CD_125 C_68$3 NE_125 CA_69$3 NE_125 N_69$3 NE_125 O_68$3 NE_125 C_68$3 CZ_125 C_68$3 CZ_125 CD_70$3 CZ_125 N_70$3 CZ_125 CA_69$3 CZ_125 C_69$3 CZ_125 N_69$3 CZ_125 C_68$3 CZ_125 O_68$3 NH1_125 CA_69$3 NH1_125 C_68$3 NH1_125 O_68$3 NH1_125 N_69$3 NH2_125 C_68$3 NH2_125 O_68$3 NH2_125 N_69$3 NH2_125 CA_69$3 CD_128 CB_109$6 CD_128 CG2_109$6 NE_128 CG2_109$6 NE_128 CB_109$6 CZ_128 CG2_109$6 CZ_128 CA_109$6 CZ_128 CG2_109$6 CZ_128 CB_109$6 CZ_128 CB_109$6 NH1_128 CG2_109$6 NH1_128 CB_109$6 NH2_128 CB_109$6 NH2_128 CG2_109$6Most of these are Arg sidechains making trouble. The easiest solution is to simply cut all the bad ones after CB, i.e. to delete the respective atoms from the ins files using an editor. Doing this for residues 5, 68, 73, 125, and 128 gets rid of most of the complaints:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + SHELXL-97 - CRYSTAL STRUCTURE REFINEMENT - UNIX VERSION + + Copyright(C) George M. Sheldrick 1993-7 Release 97-2 + + p1lys_0a started at 10:07:03 on 27-Jun-2000 + +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Read instructions and data ** Warning: no match for 167 atoms in DFIX RTAB FLAT CHIV ** Data: 3174 unique, 0 suppressed R(int) = 0.0000 R(sigma) = 0.0278 Systematic absence violations: 0 Bad equivalents: 0 wR2 = 0.8144 before cycle 1 for 699 data and 9 / 980 parameters GooF = S = 99.999; Restrained GooF = 47.287 for 3042 restraints Mean shift/esd = 1.532 Maximum = -10.240 for OSF at 10:07:26 Max. shift = 0.207 A for CB_70The remaining warning concerning 'no match' for some restraints now corresponding to atoms missing from the model. At this stage, this warning can be safely ignored as we do not espect our model to be complete right now, anyway. The program happily finishes producing an Rwork of 46.4 and an Rfree of 47.4 percent. The result of the refinement is stored in the file: p1lys_0a.res. Diagnostic output can be found in p1lys_0a.lst.
trs/p1lys> cp p1lys_0a.res p1lys_1.insand apply the following changes:
Have a look at the models and the maps now, if you want !
A good criterion for finding problematic places is to look at the Max(SIMU) deviation given in the list of reliability criteria towards the end of the lst file. If this number is larger than 0.15 (i.e. the B factors of two neighbouring atoms differ by more than 0.15 * 8 * pi^2 = 12 A^2), there is a good chance that the respective residues need some rebuilding. Based on this criterion, the following residues were identified and rebuild: Lys1, Lys13, Arg14, Arg21, Asn44, Arg45, Asn46, Thr47, Arg61, Thr62, Thr69, Pro70, Leu75, Ser85, Lys97,, Lys116, Some residues were removed altogether: Asp48,Val99-Asp101,Leu129. The previously deleted sidechains of Arg5 and Arg73 were built.
As there were drastic changes, the 'I' option of SHELXPRO was used to create a new ins-file:
Menu ... . ** The 'I' option is intended for initial input of a structure to SHELXL, NOT for updating between refinement jobs, for which 'U' should be used. ** Enter N to abort option,Two small changes have to be done in ins file produced by SHELXPRO:to continue: Enter name of .ins file [shelxpro.ins]: p1lys_2.ins Enter name of PDB file [shelxpro.ent]: p1lys_1_mod.pdb Enter title [shelxpro]: triclinic lysozyme after first rebuilbing CELL in Angstroms and deg. [26.650 30.800 33.630 89.30 72.60 67.80]: Enter Z (number of molecules per cell) [4]: 1 Enter space group in PDB or XPREP notation [P212121]: P1 Enter wavelength in Angstroms [1.54178]: 0.927 Generate atom coordinates using SCALE instructions from PDB file (P) or use current cell to calculate transformation matrix (C) [C]: Enter old residue numbers (modified by chain ID, if any) for all N-terminii ( if none). To continue on the next line, put "=" at the end of the line : 1 Enter old residue numbers for all C-terminii in the same way: Enter old residue numbers in the same way at which renumbering of a block of residues should start. The block continues until the next residue specified here ( if none): 1 New residue number for first solvent water [1001]: Reset water occupancies to unity (Y or N) ? [Y]: Current old residue number is 1. Enter new residue number. This defines the offset to be applied to residue numbers for the rest of this block: Current old residue number is 1. Enter new residue number. This defines the ld residue numbers for all C-terminii in the same way: Enter old residue numbers in the same way at which renumbering of a block of residues should start. The block continues until the next residue specified here ( if none): 1 New residue number for first solvent water [1001]: Reset water occupancies to unity (Y or N) ? [Y]: Current old residue number is 1. Enter new residue number. This defines the offset to be applied to residue numbers for the rest of this block: Current old residue number is 1. Enter new residue number. This defines the offset to be applied to residue numbers for the rest of this block: 1 HKLF code (3 for F, 4 for F-squared) [4]: The .ins file has been written successfully. The U option in SHELXPRO may be used for further checking of occupancies etc. to continue: . Main Menue to Quit .
To prepare for solvent divining using SHELXWAT, the .res file from the previous run needs only minor changes:
RESI 1001 HOH O 4 -0.6579 0.9624 0.6135 11.00000 0.1
Then we run shelxwat with the following parameters:
-n10 Number of overall cycles -s4 Scattering factor number for oxygen -u0.100 Starting isotropic U for waters -r0.200 Water rejected or halved if U exceeds this value -m50 Maximum number of waters to be added in one cycle -w4.000 Minimum height/sigma for added water -f Full occupancies only [use -h for full and half occupancies]The corresponding command line is:
shelxwat -n10 -s4 -u0.1 -r0.2 -m50 p1lys_3 > p1lys_3.out
If you want to compare the previous res file and the new ins file you have to run the UNIX diff command on the new bak file (this file is a copy of the initial ins file - the actual ins file is overwritten all the time when shelxwat is running):
diff p1lys_2.res p1lys_3.bak
| Using the 'P' option of SHELXPRO, we can make a nice plot monitoring the progress made: p1lys_3.ps |
|
Overall, some 90 plus waters were added lowering Rwork and Rfree to 20.2 and 24.6%, respectively. We keep the waters and continue.
The following residues are possibly disordered. As we are not using all data yet, we simply ignore them: Lys13, Arg21, Leu25, Arg112, Ile124
The following residues were missing and we can build them: Asp48, Val99, Asn103, Gly102.
We save the resulting model from within XtalView as p1lys_3_mod.pdb and this time, use the 'U' option of SHELXPRO to produce the next ins file:
. . [S] Reflection statistics from .fcf [Z] Least-squares fit [J] Generate restraints from model [B] PDB deposition [G] Generate PDB file from .res or .pdb [Q] Quit Enter option: U Converts SHELXL .res file to a new .ins file by including new or changed atoms from PDB format files such as those written by the graphics program "O". All other SHELXL commands are retained unchanged. This instruction also provides for setting up disorder refinement and updating the list of solvent molecules. The .res file should not contain instructions other than RESI, AFIX, PART and atoms between FVAR AND HKLF, and both FVAR and HKLF must be present. Note that although it is possible to set up threefold or multiple disorders in this way, the necessary SUMP restraints must be edited into the .ins file later by hand. This option may also be used without a .pdb file to update .res to .ins and apply various checks. Enter N to abort option,I don't really know what the 1.0 0.0 0.0 etc. mean ... But they do not harm. We simply continue ...to continue: Enter Name of .res (or .ins) file to read [shelxpro.res]: p1lys_3.res 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.00000 0.00000 0.00000 1027 atoms and 0 peaks read
There are now two alternative approaches to updating the atom list. If a
graphics program such as XtalView that understands disorder and anisotropy
has been used to prepare a PDB format file, ALL atoms may be taken from
this file. With other graphics programs such as O it is better to start
with atoms from a .res file and update individual residues interactively.
Replace ALL atoms and peaks with atoms from a PDB file (Y/N)? [Y]: y
Name of PDB file to read [shelxpro.pdb]: p1lys_3_mod.pdb
Renumber residues (other than waters) ? [N]: N
Add, halve or delete waters (Y or N) ? [Y]: Enter
Should occupancies be halved for waters with high U-values (Y or N) ? [N]: Enter
Ueq-threshold for rejecting waters [0.8]: Enter
Renumber residues for waters ? [Y]: yy
Starting residue number for waters [1001]: Enter
84 full and 0 partly occupied waters plus 960 other atoms in list
Transform waters to equivalent nearest to a non-water (Y or N) ? [Y]: Enter
Emulate WHAT-IF bug of ignoring PART numbers greater than 1? This only works
if PART 1 atoms come before PART 2 etc.! [N]:
3.096 O_1016 NH1_Arg73
3.003 #O_1016 NZ_Lys33
Transform water to symmetry equivalent # [Y]:
3.349 O_1018 CE1_Phe3
3.211 #O_1018 CB_Asp48
Transform water to symmetry equivalent # [Y]:
.
.
... more waters to be put into the right place ...
. . 3.459 O_1078 NZ_Lys13 3.424 #O_1078 CE_Lys33 Transform water to symmetry equivalent # [Y]: 3.696 O_1082 ND2_Asn37 3.646 #O_1082 NH1_Arg61 Transform water to symmetry equivalent # [Y]: Repeat water reorganization (Y or N) ? [N]: N .ins file to write (may be same as read) [shelxpro.ins]: p1lys_4.ins SHELXPRO - SHELX interface for protein applications - Version 97-2 Copyright(C) George M. Sheldrick 1996-7 [F] New output filename [V] R(free) files [A] Anisotropic scaling (Hope & Parkin) [I] .ins from PDB file [P] Progress of LS refinement diagram [L] Luzzati plot [T] Thermal displacement analysis [E] Esd analysis . .We start the next job:
/home/trs> shelxl p1lys_4and get some warnings:
Following 1,2- or 1,3-distances involving residues not restrained CG_7 O_1045 CD_7 O_1045 OE1_7 O_1045 OE2_7 O_1045 CA_99 O_1064 CB_99 O_1064 CB_99 O_1064 CB_99 O_1064 CG1_99 O_1064 CG1_99 O_1064 CG2_99 O_1064 CG2_99 O_1064These warnings are caused by waters that SHELXWAT put to interpret an electron density for a sidechain. As we forgot to delete them when modelling the sidechain, they are know causing problems. After deleting Hoh1045 and HoH1064, we call the new ins file p1lys_4a.ins and restart the job, again using shelxwat:
shelxwat -n10 -s4 -u0.1 -r0.2 -m50 p1lys_4a
Before including all data, we have a quick look at p1lys_4a.pdb and its maps: Arg21 is clearly in the wrong rotamer, Ser100 and Asp101 are missing. Thr43 is definitely disordered. We ignore all these. As we also do not find any real problems in 'Disagreeable restraints':
Disagreeable restraints before cycle 6
Observed Target Error Sigma Restraint
1.8777 0.5000 FLAT O_3 CA_3 N_4 CA_4
1.6648 0.5000 FLAT O_53 CA_53 N_54 CA_54
2.5451 0.5000 FLAT O_62 CA_62 N_63 CA_63
,
we can safely include all data into the refinement by copying p1lys_4a.res to p1lys_5.ins
and applying some small changes:
To include anisotropic displacement parameters into the refinement, the following change have to be made to p1lys_6.ins (which is a copy of p1lys_5.res).
Including ADP's into the refinement more than doubled the number of parameter in the model (from 4204 to 9453). This new parametrization caused a significant drop in both Rwork (3.1%) and Rfree (2.5%) and therefore is justified.
Disagreeable restraints before cycle 21
Observed Target Error Sigma Restraint
2.6298 2.4620 0.1678 0.0400 DANG C_103 N_103
2.3186 2.4710 -0.1524 0.0400 DANG CG_21 NE_21
2.3656 2.5040 -0.1384 0.0400 DANG C_103 CB_103
2.3818 2.5040 -0.1222 0.0400 DANG CG1_99 CG2_99
1.9181 0.5000 FLAT O_3 CA_3 N_4 CA_4
2.3906 0.5000 FLAT O_62 CA_62 N_63 CA_63
-0.3100 0.1000 SIMU U33 CD_112 NE_112
Now we will look at the places indicated in the list. Please open
Xtalview and display the model p1lys_6.pdb and
the corresponding electron density using the file
p1lys_6.fcf. To my opinion, the best choice of
maps is to look at a SIGMAA-weighted 2mFo-DFc map at 1.0sigma (in blue)
together with a straight 1Fo-1Fc difference map at +/- 2.5 sigma
(in green and red, respectively).
Disagreeable FLAT restraints on residue 3 and 62: There is nothing special to seen in the electron density. It is actually quite normal to see violated FLATS for omega angles as these are not as flat as people thought (Mac Arthur & Thornton, J.Mol.Biol 1996 264:1180-1195), MEDLINE)
Disagreeable SIMU restraint for Arg112: There is no clear indication on what to do, we leave this one as it is
Fourier peaks appended to .res file
x y z sof U Peak Distances to nearest atoms (including symmetry equivalents)
Q1 1 -0.3243 0.8430 1.0186 1.00000 0.05 1.50 2.17 O_1004 3.02 NE1_62 3.11 O_1081 3.27 NH1_73
Q2 1 -0.7717 0.6415 0.3616 1.00000 0.05 1.44 1.42 C_99 2.23 O_99 2.52 CA_99 2.75 O_96
Q3 1 -0.6273 0.6219 0.1878 1.00000 0.05 1.31 2.26 N_102 2.58 O_1073 2.84 CA_102 3.37 OD1_103
Q4 1 -0.5204 0.4281 0.1610 1.00000 0.05 1.22 2.72 O_102 2.95 O_67 3.12 O_125 3.18 NH2_5
Q5 1 -0.2483 0.7984 0.4732 1.00000 0.05 1.19 2.56 OE1_35 2.73 O_1028 2.76 O_1023 3.48 CD_35
Now the other peaks:
Q2-Q4,Q6: close to Val99-Asp102. done before
Q5: a water molecule. ignore in this phase
Q8: another nitrate, apply the same procedure as above
and some more waters that we ignore.| Arg68: Acceptable density, put it in. |
|
| Arg125: Nice density, put it in. |
|
Residues Arg128 and Leu129 are still a mess.
So far, we have not modelled any disorder. We write the model from XtalView to a pdb file called p1lys_6_mod.pdb. The we use the 'U' option in SHELXPRO to update p1lys_6.res using p1lys_6_mod.pdb and obtain the file p1lys_7.ins. Some small modifications are necessary before we start the next job:
Furthermore, we have to add some restraints for the nitrate molecules. Note that the SADI restraints only imposes that all bond length should have the same lenght but not which length.
FLAT_NO3 N O1 O2 O3we decrease the occupancy to 0.65 for all atoms between CA_102 and N_104:
RESI 102 GLYshelxwat -n5 -s4 -u0.15 -r0.3 -m50 p1lys_7a > p1lys_7a.outThe job converges at R-values of 15.8 (free) and 13.5 (work).
2.5139 2.3730 0.1409 0.0400 DANG OG1_43 CG2_43
2.6230 2.4620 0.1610 0.0400 DANG C_103a N_103a
2.5872 2.4660 0.1212 0.0400 DANG CD_68 CZ_68
2.6218 2.4970 0.1248 0.0400 DANG C_99 CB_99
2.3396 2.5040 -0.1644 0.0400 DANG CG1_99 CG2_99
1.9074 0.5000 FLAT O_3 CA_3 N_4 CA_4
2.4837 0.5000 FLAT O_62 CA_62 N_63 CA_63
1.5070 0.5000 FLAT O_101 CA_101 N_102 CA_102a
-0.3195 0.1000 SIMU U33 CB_114 CG_114
Some of the more interesting places to look at are: a wrong Chi1 rotamer for Val99, a potential second conformation for Arg114 (a little bit of density for every atom, very nice).
At places where you have to modify the model, do the following:
For p1lys_8 to p1lys_13 we know look at some interesting situations:
FVAR 0.14409 0.60127 0.47703 0.5 0.5 0.5 0.5 0.5
.
.
RESI 45 ARG
N 3 -0.253501 1.101474 0.452285 11.00000 0.07022 0.07846 =
0.09258 0.04009 -0.03593 -0.04887
CA 1 -0.252002 1.124127 0.413177 11.00000 0.07121 0.07364 =
0.07850 0.03471 -0.03840 -0.05241
C 1 -0.196467 1.096687 0.379008 11.00000 0.05700 0.09054 =
0.09636 0.04182 -0.03807 -0.04355
O 4 -0.151361 1.094247 0.383487 11.00000 0.06484 0.15748 =
0.15236 0.04420 -0.05401 -0.06088
PART 1 41.0
CB 1 -0.261554 1.175634 0.423002 10.50000 0.1
CG 1 -0.303197 1.196700 0.466914 10.50000 0.1
CD 1 -0.320214 1.249627 0.478206 10.50000 0.1
NE 3 -0.383742 1.274377 0.498092 10.50000 0.1
CZ 1 -0.411667 1.288025 0.538303 10.50000 0.1
NH1 3 -0.384939 1.279152 0.566438 10.50000 0.1
NH2 3 -0.467895 1.310270 0.550692 10.50000 0.1
PART 2 -41.0
CB 1 -0.261124 1.177588 0.424570 10.50000 0.1
CG 1 -0.318260 1.204679 0.458320 10.50000 0.1
CD 1 -0.333151 1.257347 0.468953 10.50000 0.1
NE 3 -0.384572 1.277852 0.507407 10.50000 0.1
CZ 1 -0.438420 1.290365 0.509540 10.50000 0.1
NH1 3 -0.452909 1.287381 0.475916 10.50000 0.1
NH2 3 -0.478846 1.307200 0.546176 10.50000 0.1
PART 0
If you look at the model you will notice a small confusion. The first known
conformation has become PART 2 and the new one PART 1 - this is the way XtalView
splits residues (somewhat confusing, but it works).
p1lys_9Nothing special.
| p1lys_10, Lys145: Another clear double conformation, despite only one atom being disordered I prefer to split the entire sidechain. This costs some parameters but in most cases gives a more stable refinement. And, for practical reasons, it is much easier to do the analysis in the end if all disordered residues are treated the same way. Look at p1lys_11.ins if you want to see how the disorder is described. |
|
If you want to know what happened in p1lys_12 and p1lys_13, have a look at the ins files and the maps yourself.
Make sure that you do not put the hydrogens you want to see (i.e. protonation of histidines etc). For the case of histidines, I also do not activate the generation of hydrogens for CE1 and CD2 - these hydrogens must be there and the corresponding electron density will give you a nice calibration for what you can espect for the hydrogens on the nitrogen atoms. For histidine we have:
HFIX_HIS 13 CA HFIX_HIS 23 CB HFIX_HIS 43 N REM HFIX_HIS 43 N ND1 CE1 CD2
Also do not activate the generation of hydroxyl hydrogens on Thr, Ser, and Tyr residues (i.e. keep the REM cards in front of e.g. HFIX_TYR 83 OH. Placing these hydrogens in not trivial and automatic placement will often put them into the wrong place.
The nterminal NH3 group needs special treatment. Simply put HFIX 33 N_1 before any other HFIX statements.
For incomplete residues (in our case Arg121 and Arg128 are cut after CB), the easiest way to stop SHELXL from complaining is by cheating a bit and making the CB methyl groups, in this case:
HFIX 33 CB_121 HFIX 33 CG_128Again, not elegant, but works. You must take a note about this trick in order to not submit wrong hydrogens to the PDB.
There is some more notes on hydrogens in the SHELXL-FAQ.
The final ins file is p1lys_14.ins. The job runs without any major problems and finshes at R(work,free) = (9.74,12.81). Note, that we did not include a single extra parameter into the refinement to achieve this gain. The job runs about 40 percent longer than an equivalent jobs without hydrogens. This is due to the larger number of atoms that have to be included into the structure factor calculation.
The inclusion of hydrogens has changed the model and we get quite a long list of disagreeable restraints:
Observed Target Error Sigma Restraint
2.7274 2.8000 -0.0726 0.0200 BUMP O_18 CB_19a
1.9863 2.1000 -0.1137 0.0200 BUMP HA_19a HD2B_19a
1.9670 2.1000 -0.1330 0.0200 BUMP HD2A_19a HA_81
1.9552 2.1000 -0.1448 0.0200 BUMP HD2B_19a HD1A_84
2.0308 2.1000 -0.0692 0.0200 BUMP HB2_15 HG1B_92
1.8944 2.1000 -0.2056 0.0200 BUMP HA_93 HD2B_93
2.1057 2.2450 -0.1393 0.0400 DANG OD1_37 ND2_37
2.5222 2.3930 0.1292 0.0400 DANG CB_106b OD1_106b
2.1403 2.4190 -0.2787 0.0400 DANG CB_19a ND2_19a
2.2771 2.4300 -0.1529 0.0400 DANG CA_43a OG1_43a
2.3127 2.4350 -0.1223 0.0400 DANG C_71 CA_72b
2.3252 2.4550 -0.1298 0.0400 DANG CB_19a N_19
2.5889 2.4660 0.1229 0.0400 DANG CD_68b CZ_68b
2.3488 2.4710 -0.1222 0.0400 DANG CG_114b NE_114b
2.6482 2.5040 0.1442 0.0400 DANG C_19 CB_19a
0.0725 0.0200 SAME/SADI O1_504 O3_504a O1_505 O2_505
2.0640 0.5000 FLAT O_3 CA_3 N_4 CA_4
2.4586 0.5000 FLAT O_62 CA_62 N_63 CA_63
1.5779 0.5000 FLAT O_71 CA_71 N_72b CA_72b
1.5126 0.5000 FLAT O_101 CA_101 N_102b CA_102b
-0.3327 0.1000 ISOR U22 O_1149
0.3206 0.1000 ISOR U33 O_1149
There is still things to do:
Sorry, but I ran out of steam here ...
The final job should include all data (i.e. work and free set). This can be done by removing the -1 from the CGLS statement. In this case, the job blows up, see (p1lys_15.out and p1lys_15.lst) due to the badly modelled double conformation of Arg68. Normally we would go back to the previous step and built a better model. Because I am lazy and this is only for the tutorial I simply deleted the respective atoms from the .ins file and made a new one: p1lys_15a.ins. This refinement happily runs through (p1lys_15.out) and finishes with an R-value of 9.84 percent for all reflections higher than 4 sigma.
step data used #par #obs Rw Rf Rd CPU job# --------------------------------------------------------------------------------- molecular replacement 15.0-4.0 3 775 44.5 - - 30 rigid body 10.0-2.5 9 2997 46.4 47.4 1.0 140 first round 10.0-1.5 3887 13818 30.2 34.3 4.1 500 after first rebuild 10cgls " 3735 " 26.2 31.4 5.2 360 SHELXWAT " 4091 " 20.2 24.6 4.4 2000 bld + SHELXWAT " 4231 " 18.7 22.8 4.1 2000 include all data 20cgls 10.0-1.1 4203 33993 19.1 21.7 2.6 1450 5 ANIS 20cgls " 9453 " 16.0 19.2 3.2 2850 6 Rebuild 10 SHELXWAT " 9557 " 13.4 15.8 1.8 1480 7a rebuild 10 cgls " 10481 " 12.3 15.3 3.0 1610 8 rebuild 10 cgls " 10819 " 12.1 15.1 3.0 1650 9 rebuild 10 cgls " 10838 " 11.6 14.6 3.0 1800 10 rebuild 10 cgls " 11494 " 11.3 14.4 3.1 1800 11 rebuild 10 cgls " 11576 " 11.0 14.0 3.0 1800 12 rebuild 10 cgls " 11774 " 10.7 13.8 3.1 1800 13 put hydrogens 10.0-1.1 11765 " 9.7 12.8 3.1 2500 14 include test set 10.0-1.1 11693 35786 9.8 - - 2600 15a