Google

FreeR Information (CCP4: General)

NAME

freerunique - Convert FreeRflags Between CCP4 and Other Formats (XPLOR/CNS/TNT/SHELX)

Contents

Creating a full unique set of reflections with the correct FreeRflags

For successful cross validation:

  1. It is important to select the same FreeR reflections for all related data sets (e.g. mutants, higher resolution data collected half-way through refinement, etc.).
  2. It is important to preserve the same FreeR set as you move from program to program.
  3. The FreeR set should itself be unbiased by prior refinement.
  4. The FreeR set should be representative of the full data set with respect to the distribution of structure factor amplitudes and the distribution of reflection resolution.
  5. Different programs have different philosophies for dealing with FreeR reflections:

    CCP4 first expands the data set to include all possible HKLs to the resolution given, marking those which are unmeasured. It then divides the data set into n partitions randomly, assigning a FreeRflag with values (0 1 2 ... (n-1)) to each set. These cross validation sets are used during density modification, and for refinement. The default FreeR set used within refinement is flagged as 0, but this can be changed by setting a KEYWORD FREE x.
    XPLOR assigns the flag TEST=x. The only acceptable values are:
    x=1 for the free set
    x=0 for the working set
    CNS assigns the flag TEST=x. The acceptable values range from x=0,1,...,n-1. The defaults are:
    x=1 for the free set
    x=0,2,...,n-1 for the working set
    SHELX has a flag, following the format (3I4,2F8.2,I2). The values are:
    -1 for the free set
    1 for the working set
    TNT seperates the data into different files; one for the free set, and one for the working set. Old versions of SHELX also seperated the data into different files.

    Choosing a FreeR fraction

    It is important to choose a fraction that is large enough so that the statistics are sensible (at least 500 reflections seems to be the consensus at the moment), but small enough so that as many reflections as possible are still used for the refinement. This is of course always true, whichever philosophy is chosen for the selection of the FreeR reflections!

    How to Convert Files?

    Starting from CCP4
    Convert to other formats from CCP4
    Examples
    MTZ to CNS/XPLOR
    MTZ to SHELX Intensities
    MTZ to TNT - working set
    MTZ to TNT - free set
    Starting from other formats
    Examples
    Starting from CNS/XPLOR
    Starting from SHELX Intensities
    Starting from TNT or old SHELX
    Starting from SHELX I and FC

    Starting from CCP4

    When you are ready to start the first refinement, or preferably as soon as you collect the native data:

    If this is a new data set

    Run uniqueify mydata.mtz.

    This script generates an output file mydata-unique.mtz which contains
    (H K L F SIGF ( I SIGI ) .. FreeR_flag) for all observed reflections to the resolution limit available, plus entries for any unobserved reflection, all with FreeR_flags assigned.

    The percentage flagged defaults to 5%, but this can be reset using
    uniqueify {-p fraction} mydata.mtz.

    The default label is FreeR_flag but this can be reset using
    uniqueify {-f FreeLABel} mydata.mtz.

    If this is an isomorphous data set which should preserve the same FreeR_flags

    A complete set of FreeR_flags (similar to that produced for a new data set, see above) can be added to any other related data set using CAD:

    cad hklin1 new.mtz hklin2 olddata-unique.mtz hklout new-unique.mtz
    LABI FILE 1 ALLIn 
    LABI FILE 2 E1=FreeR_flag
    END
    

    If the new data is to higher resolution, you will now need to run uniqueify again to pad out the FreeR_flags:
    uniqueify {-f FreeLABel} new-unique.mtz new-uniquer.mtz
    (the default label for the free set is FreeR_flag, but you can use whatever you like).

    The script will estimate the percentage of data you have used as a test set.

    This assigns FreeR_flags to any reflections in the higher resolution shell where the previous set of FreeR_flags are missing.

    Convert to Other Formats from CCP4

    You can use the jiffy MTZ2VARIOUS to convert from MTZ to XPLOR/CNS TNT or SHELX formats quite simply. They all have different conventions, but MTZ2VARIOUS attempts to reproduce them (see program documentation: MTZ2VARIOUS).

    XPLOR output will have TEST=0 for working set; TEST=1 for free set
    CNS output will have TEST=1 for free set; TEST=0,2,...,(n-1) for working set
    SHELX output will have 1 as the flag for the working set, and -1 for free set
    TNT output may be split into two files

    Examples

    MTZ to CNS/XPLOR

    #  test set flagged with TEST=1, working set with TEST=0
    #
    mtz2various     \
    hklin pc553_19f-unique.mtz \
    HKLOUT xplor.hkl \
    <<eof
    #  All these labels can be set and will be handled appropriately:
    #
    LABIN  FP=F SIGFP=SIGF [FPART PHIPART  PA PB PC PD  PHIB WEIGHT ] FREE=FreeR_flag
    OUTPUT CNS/XPLOR
    #
    END
    eof
    exit
    

    MTZ to SHELX Intensities

    mtz2various     \
     hklin lmw.mtz \
    HKLOUT shelxout.hkl \
    <<eof
    OUTPUT SHELX
    LABIN  FP=FRBP SIGFP=SIGFRBP [IP SIGIP FP(+) FP(-) IP(+) IP(-) ] FREE=FreeR_flag
    #  This will always output Is; and will rescale the data to fit the format.
    #  You can override the default by setting SCAL yourself.
    SCALE 0.01
    #
    END
    eof
    

    MTZ to TNT - working set

    # TNT uses a different asymmetric unit of reciprocal space to CCP4. Dale has
    # programs to convert the data if necessary.
    # The data is seperated into a free set and a working set.
    #
    mtz2various     \
     hklin lisa.wright/lmw.mtz \
    HKLOUT lisa.wright/tnt_work.hkl \
    <<eof
    LABIN  FP=FP SIGFP=SIGFP FREE=FreeR_flag
    OUTPUT TNT
    EXCLUDE FREER  0
    #
    END
    eof
    #
    

    MTZ to TNT - free set

    mtz2various     \
     hklin lisa.wright/lmw.mtz \
    HKLOUT lisa.wright/tnt_free.hkl \
    <<eof
    LABIN  FP=FP SIGFP=SIGFP FREE=FreeR_flag
    OUTPUT TNT
    INCLUDE FREER  0
    #
    END
    eof
    exit
    

    Convert to CCP4 from Other Formats

    These are all ASCII formats, so F2MTZ can be used in a straightforward way. After all these conversions you need to uniqueify the MTZ file.

    Run uniqueify {-f FreeLABel} mydata.mtz
    This will

    - fill out the missing data slots
    - sort out the variety of FreeR_flags
    - resort the data into CCP4 standard order

    The script guesses what style of file is being imported, by looking at the distribution of FreeR_flags:

    XPLOR or TNT
    a few 1s, many 0s
    CNS
    either (0,1,..,(n-1)) or a few 1s, many 0s
    SHELX
    a few -1s, many 1s

    It estimates the percentage of reflections flagged as the FreeR set, and then pads out the missing reflections and converts the flags to the CCP4 style of (0, 1,...,(n-1)).

    SHELX "input"
    Use F2MTZ and TRUNCATE to convert (H K L I SIGI FreeR_flag) to an MTZ file. See example.

    SHELX "output"
    Use F2MTZ (and TRUNCATE) to convert (H K L I SIGI FC PHIC FreeR_flag) to an MTZ file. See example.

    TNT
    The easiest way is to insert a final column of 1 into the working and 0 into the free set, 'cat' the two files together and use F2MTZ. See example.

    CNS/XPLOR
    See example.

    Examples

    Starting from CNS/XPLOR (complicated CNS/XPLOR to MTZ)

    #
    # NREFlection=     10208
    # ANOMalous=FALSe { equiv. to HERMitian=TRUE}
    # DECLare NAME=FOBS         DOMAin=RECIprocal   TYPE=COMP END
    # DECLare NAME=SIGMA        DOMAin=RECIprocal   TYPE=REAL END
    # DECLare NAME=FPART        DOMAin=RECIprocal   TYPE=COMP END
    # DECLare NAME=WEIGHT       DOMAin=RECIprocal   TYPE=REAL END
    # DECLare NAME=TEST         DOMAin=RECIprocal   TYPE=INTE END
    # INDE     6    0    0 FOBS=  1259.884     0.000 SIGMA=    38.561
    #                   FPART=     0.000     0.000 WEIGHT=     1.000 TEST=         0
    # INDE     8    0    0 FOBS=   827.600     0.000 SIGMA=    30.983
    #                   FPART=     0.000     0.000 WEIGHT=     1.000 TEST=         0
    #!/bin/csh -f 
    #
    f2mtz \
    hklin suying/b-over.hkl \
    hklout  suying/b-over.mtz \
    hklout  suying/b-over.mtz \
    <<eof
    # skip the NREF and DECLARE lines
    SKIP 7
    #  For XPLOR you would probably need: SKIP 0
    CELL     55.19   79.73   66.68   90.00   90.00   90.00
    SYMM C2221
    #
    # f2mtz assumes a free format without any character data
    #  So you must either remove these from the file, or design
    # a format statement to skip the labels.
    #
    # You have to get this format right! nX ignores n characters.
    # Count characters
    FORMT '(6x,3F5.0,6X,2f10.0,7X,f10.0,/,25X,2f10.0,8X,F10.0,6x,F10.0)'
    #
    #1234561234512345123451234561234567890123456789012345671234567890
    # INDE     6    0    0 FOBS=  1259.884     0.000 SIGMA=    38.561
    #1234567890123456789012345123456789012345678901234567812345678901234561234567890
    #                   FPART=     0.000     0.000 WEIGHT=     1.000 TEST=         0
    #
    #
    LABO H K L FRBP PHIB SIGFRBP FPART PHIPART WEIGHT FreeR_flag
    #
    CTYPO H H H F P Q F F W I
    END
    eof
    #
    uniqueify suying/b-over.mtz
    exit
    

    Starting from SHELX Intensities

    f2mtz \
    hklin pc553_19.hkl \
    hklout  pc553_19i.mtz \
    <<eof
    CELL    37.144   39.422   44.021  90.00  90.00  90.00
    SYMM P212121
    LABO H K L I SIGI [ FreeR_flag ]
    CTYPO H H H J Q   [    I       ]
    END
    eof
    #
    #      To reduce Is to Fs - use truncate
    #
    truncate \
    hklin pc553_19i.mtz \
    hklout pc553_19f.mtz \
    <<eof
    LABI IMEAN=I SIGIMEAN=SIGI
    END
    eof
    #
    #  If you read a FreeR_flag, you will now have to rescue it -
    #  TRUNCATE ignores it.
    #
    cad hklin1 pc553_19f.mtz \
        hklin2 pc553_19i.mtz \
        hklout pc553_19f-free.mtz \
    <<eof
    LABI FILE 1 ALLIN
    LABI FILE 2 E1=FreeR_flag
    END
    eof
    #
    # Modify FreeR_flags
    uniqueify pc553_19f.mtz
    #
    

    Starting from TNT or old SHELX (FreeR assigned to 10%)

    #   First edit the TNT to assign flag 1 to working set and 0 to free set;
    #   then cat both TNT files together:
    #
    #    sed 's/$/   1/' $SCRATCH/tnt-work.hkl
    #    sed 's/$/   1/' $SCRATCH/tnt-work.hkl
    #    cat $SCRATCH/tnt-work.hkl $SCRATCH/tnt-work.hkl > $SCRATCH/tnt-all.hkl
    #
    #  Example piece:
    HKL  -22   0   4  2010.9   134.7  1000.0  0.0000   1
    HKL  -22   0   5  4005.2    83.1  1000.0  0.0000   1
    HKL  -22   0   6  3661.5    91.1  1000.0  0.0000   1
    HKL  -22   0   7  2321.9    59.7  1000.0  0.0000   1
    ....
    HKL  -21   1   9   488.4   143.9  1000.0  0.0000   0
    HKL  -20   0   6   329.5   202.9  1000.0  0.0000   0
    HKL  -20   0  11  1009.2   146.7  1000.0  0.0000   0
    HKL  -20   4  10  1989.1    46.5  1000.0  0.0000   0
    ....
    #
    f2mtz \
    hklin tnt_all.hkl \
    hklout  tnt_all.mtz \
    <<eof
    CELL    37.144   39.422   44.021  90.00  90.00  90.00
    SYMM P212121
    LABO  H K L F SIGF  FreeRflag
    CTYPO H H H F Q    I
    #
    #  See above comments about formats.. You need to skip the HKL label.
    
    #
    FORMT '(4x,3F4.0,2F8.0,16X,F4.0)'
    #
    or, if PHI and FOM given
    #
    LABO  H K L F SIGF PHIB FOM  FreeRflag
    CTYPO H H H F Q    P    W    I
    FORMT '(4x,3F4.0,4F8.0,F4.0)'
    END
    eof
    #
    #    uniqueify will now complete hkl list and add FreeRflags
    #
    uniqueify -f FreeRflag  pc553_19f.mtz
    #!/bin/csh -f
    #
    

    Starting from SHELX I and FC

    f2mtz HKLIN ./1bxo*-sf.hkl \
    hklout  $CCP4_SCR/junk.mtz \
    <<eof
    TITLE X-PLOR to MTZ
    CELL  96.980   46.650   65.710  90.00 115.57  90.00
    LABOUT H   K  L   I   SIGI   FC PHIC 
    CTYPE  H   H  H   I     Q    F P
    SKIP 2
    SYMM C2
    eof
    if($status) exit
    truncate \
    hklin   $CCP4_SCR/junk.mtz \
    hklout  $CCP4_SCR/junk1.mtz \
    <<eof
    LABI IMEAN=I SIGIMEAN=SIGI
    TRUNCATE YES
    END
    eof
    #
    if($status) exit
    cad \
    hklin1  $CCP4_SCR/junk1.mtz \
    hklin2  $CCP4_SCR/junk.mtz \
    hklout ./ibxo-sf.mtz \
    <<eof
    LABI FILE 1 ALLIN
    LABI FILE 2 E1=FC E2=PHIC 
    END
    eof
    

    AUTHORS

    Eleanor Dodson, University of York, England
    Maria Turkenburg, University of York, England