Assignment : Protein Sequence Assignment

Find and Link Sequentially Related Protein Backbone Assignments

This popup window is designed to expedite the process of sequentially assigning a polypeptide backbone. The tool is semi-automated, in that possibilities for sequentially related peaks and spin systems are automatically found and presented, but it is still up to the user to determine which are really sequentially linked. Likewise, the tool will show where a sequential stretch of assignments best fits with a molecular sequence, but its is the users’ decision to commit the final choice. A fully automated tool is available using the Automated Seq. Assignment option, although this is best suited to relatively complete and clean data sets.

This system works with pairs of spectra where one spectrum has peaks that are through-carbonyl specific; to show the previous (i-1) residue relative to the amide (like with HNcoCA/CB), the other spectrum is non-specific in that it shows both peaks for the amides current residue (i) and the previous residue in the chain (i-1) (link with HNCA/CB). The general principle is that the combination of the carbonyl-specific and non-specific experiment enables identification of the resonances for the current residue and the residue one previous in the sequence. The sequential assignment is made by finding amide locations where the “current residue” resonances of one (hopefully unambiguously) match the “previous residue” resonances of another. Fitting of chains of amides linked in this way to the molecular sequence, by considering chemical shift values, then gives the final residue assignment.

For any of the spectra to be used by this system they must be peak picked and corresponding positions must be linked to the relevant spin system and resonance assignments, albeit in an anonymous way. This setup is readily achieved by the Pick & Assign From Roots option. The peaks of different spectra that relate to the same amide are only known to be from the same amide because of assignment. Several different kinds of spectra may be used, including those that detect 13C resonances (like HNCO, HNcaCO, HNcoCA, HNCA etc) and those that detect 1H resonances (like HNcocaHA, HNHA, etc). This tool can match multiple types of spectra at the same time to improve the assignment process, for example matching HA, CO, CA and CB positions to minimise ambiguity. However, the spectra that are involved with matching 1H positions are naturally displayed in a different spectrum window to those using 13C positions.

User Operation

To perform the sequence assignment the user first selects spectra, and spectrum windows to display them in, categorising them as “query”; selecting in the upper panel, or “match”; selecting in the lower panel (toggling the “Use?” column). The “query” spectra are those that remain at a set position according to a selected amide. The “match” spectra are those that the query peaks are compared against and represent the, potentially multiple, positions of candidates for amides that might be sequence neighbours to the query. Query spectra can be of through-carbonyl types (e.g, HNcoCA), in which case matches represent the preceding residue in the sequence (an i to i-1 step), or the query spectra can be non-specific, where the matches will be to the following residue in the sequence (i to i+1). When making sequential links the tool automatically knows which sequence direction to make links with.

The “Options” tab may be viewed to control how the matching of peaks works, but the default values will be acceptable in most instances. Most options control how the spectrum window strip regions are made and displayed. Occasionally it may be useful to consider the “Filter By Inter/Intra Type” options. By default matches between “previous residue” peaks and other “previous residue” peaks from different amides are discarded. However, if a “current residue” peak overlaps with a “previous residue” peak for the same amide (e.g. very similar 13C positions) then the default rule may miss a legitimate match. Toggling the “Filter By Inter/Intra Type” option off will allow positions with such overlap to be successfully matched, although this may increase the number of spurious matches and add ambiguity.

The actual sequential assignment is performed in the “Spin System Table” tab. The general idea is that the user selects a spin system row, whereupon the query window navigates to the amide position of that row and the match window displays all of the other amide positions, in as many strips as required, that have the relevant matches to at least some of the query peaks. The identity of the potential sequential matches is also listed in the middle “Matched Peak Positions” table. In this table the user may discard certain amide positions/strips from consideration, but the objective is to select one match position that represents a real sequential neighbour. If such a choice can be made [Set Seq Link] sets the appropriate connection between the query and matched amide spin systems. Not that this is merely a statement of sequence relationship, no residue assignment will be made unless one of the amides is assigned, whereupon the assignment will spread along the chain in the appropriate direction.

Once the user has connected sufficient spin systems into regions of sequential connectivity (visible in the “i+1” and “i-1” columns) the next task is to associate the linked but unassigned amides with particular residues. Full residue assignment is made by selecting a region of sequence in the lower left “Sequence Locations” table and clicking [Assign Selected]. This table and the “Residue Types” table to the right are designed to show you how the amides may fit into the residue sequence. The right hand table shows the probable residue types for the query amide, based upon the chemical shifts in the spin system (use [Predict Type] to see details of the prediction for the current spin system), and any sequentially linked neighbours. The left hand table shows where in the sequence the stretch of sequentially linked amides best fits, according to residue type predictions. Those residues that have already been assigned are coloured blue.

Caveats & Tips

When working with multiple polypeptide chains, make sure the correct one is selected in the “options” tab to give the right sequence.

The peak-peak match tolerance is set on a per-spectrum basis (for the relevant dimension) in the “Tolerance” columns of the “Windows and Spectra” tab. Note that this is actually the same attribute as the regular peak assignment tolerance.

It should be noted that mixing of carbonyl-specific and non-specific in the query or match spectra is not permitted; otherwise it is not clear in which direction the sequential linking is made.

The ability to use spectra that match in a 15N dimension, e.g. HNcaN, HNcoCAN will be added in due course.

Main Panel

button Clone: Clone popup window

button Help: Show popup help document

button Close: Close popup

Windows & Spectra

Selects which spectra are used as the source for sequential spin system queries and which are used to find match possibilities

Query Windows and Spectra

The spectra that are the fixed source of peak positions for a spin system, and the windows to display them in

pulldown 13C Window: Selects which spectrum window is used to display peaks that provide the source 13C positions; to match against potential sequence neighbours

pulldown 1H Window: Selects which spectrum window is used to display peaks that provide the source 1H positions; to match against potential sequence neighbours

pulldown Root Window: Selects which spectrum window is used to follow the root (amide) position of the query spin system

Table 1
Spectrum The “experiment:spectrum” name of the spectrum for a potential query peak list; used as the source of peak positions for a spin system (Editable)
Peak List The serial number of a potential query peak list within its spectrum; used as the source of peak positions for a spin system
Use? Sets whether the peak list is currently used as a source of spin systems for peak-peak matching to find sequence neighbours (Editable)
Tolerance Sets maximum ppm difference for peaks to be considered as potentially coming from the same resonance; in the spectrum dimension being matched (Editable)
Expt Type The full CCPN experiment type for the peak list; should correspond to a partner experiment type in the match spectra

Match Windows and Spectra

The spectra that matches to the query peaks are found within, and the windows to display them in

pulldown 13C Window: Selects which spectrum window is used to display potential (sequence neighbour) spin system matches to 13C query locations

pulldown 1H Window: Selects which spectrum window is used to display potential (sequence neighbour) spin system matches to 1H query locations

Table 2
Spectrum The “experiment:spectrum” name of the spectrum that may be searched for potential matches to the peak positions from the query spin systems (Editable)
Peak List The serial number of potential match peak list within its spectrum; may be searched for matches to query spin system peak positions
Use? Sets whether the peak list is currently used as a target for peak-peak matching to find sequence neighbours (Editable)
Tolerance Sets maximum ppm difference for peaks to be considered as potentially coming from the same resonance; in the spectrum dimension being matched. The value also weights peak closeness scores (Editable)
Expt Type The full CCPN experiment type for the peak list; should correspond to a partner experiment type in the query spectra

Spin System Table

Lists the spin systems from the query spectra, how each matches to potential sequence neighbours and fits in a protein sequence

Table 3
# The serial number of the query spin system; used as a source of peak positions to match against
Name The assignment annotation name for the query spin system
i-1 The spin system that is currently linked as being one position later in the sequence, relative to the spin system for the row
i+1 The spin system that is currently linked as being one position earlier in the sequence, relative to the spin system for the row
i+0 The identity of any extra spin system that is considered as being at the same sequential position; could be an NH2 side chain group etc.
Seq, Segment For sequentially connected runs of spin systems, the number of the run and the position within the run at which the spin system is found
Position The “root” location of the spin system (e.g. amide); the matching to find sequential matches used non-root peak positions

button Next Row: Show peak matches locations for the next spin system in the table, assuming a spin system is currently selected

button Previous Row: Show peak matches locations for the previous spin system in the table, assuming a spin system is currently selected

button Goto i-1: Show peak matches locations for the spin system connected as one earlier in the sequence, relative to the currently selected one

button Goto i+1: Show peak matches locations for the spin system connected as one later in the sequence, relative to the currently selected one

button Clear i-1 Links: For the selected spin system, clear any connections to spin systems set as being one earlier in the sequence

button Clear i+1 Links: For the selected spin system, clear any connections to spin systems set as being one later in the sequence

button Clear All Links: For the selected spin system, clear all connections to any sequentially linked spin systems

button Add HA/CO/CA/CB: Considering how peaks in different spectra overlap, add non-root (HA/CO/CA/CB) resonance assignments to peaks in the selected spin system

button Show Peaks: Show a table of peaks currently assigned to the selected spin system

button Predict Type: Show a table of the resonances currently contained within the selected spin system

Matched Peak Positions

The locations of the amide spin systems that are potential sequence neighbours to the query spin system

Table 4
Rank The rank of the spin system with regards to how well its peaks match the peaks in the query
Mean Peak Score The average match score for the peaks matched in the spin system
Total Peak Score The total of the match score for all peaks matched in the spin system
Spin System The identity (assignment annotation) for the spin system that was matched by considering peaks in the selected spectra
i-1 The identity of any spin system connected to the matched spin system one earlier in the sequence
i+1 The identity of any spin system connected to the matched spin system one later in the sequence
Num Peaks The number of peaks involved in the position comparison between the query and the matched spin system

button Refresh Matches: Manually force a refresh of the above table

button Discard Selected: Remove the currently selected match spin system from the table; only removes from display does not change anything else

button Set Seq Link: Set the selected spin system as being sequentially related to the query; sets up the appropriate sequence link and any inherited assignment

button Clear Rulers: Clear all ruler lines, that are use to mark peak positions, from the spectrum windows

Sequence Locations

A ranked list of the sequence locations that best match the residue type possibilities of the query spin system and any sequential links

Residue Types

The predicted residue types for the selected spin system and any sequence neighbours

Table 5
Rank The ranking how well the query and selected match spin system fits a region of protein sequence
Score The score for how well the query and selected match spin system, considering any set sequence links, fits the protein sequence
i-2 For a potential sequence location, the residue two positions before the residue paired with the query spin system
i-1 For a potential sequence location, the residue one position before the residue paired with the query spin system
i For a potential sequence location, the residue paired with the query spin system
i+1 For a potential sequence location, the residue two positions after the residue paired with the query spin system
i+2 For a potential sequence location, the residue two positions after the residue paired with the query spin system
Table 6
i-2 The score for predicting the residue type of the i-2 spin system; connected as two sequence positions earlier than the query spin system
i-1 The score for predicting the residue type of the i-1 spin system; connected as one sequence position earlier than the query spin system
i The score for predicting the residue type of the query spin system
i+1 The score for predicting the residue type of the i+1 spin system; connected as one sequence position later than the query spin system
i+2 The score for predicting the residue type of the i+2 spin system; connected as two sequence positions later than the query spin system

button Assign: Assign the query spin system and any sequentially linked spin systems to the residue sequence at the selected location

button Deassign: Remove any spin system assignments to the selected residue (position i), including any sequentially linked neighbours

button Strip: Display the spin systems in the selected sequence region as strips (within the match window)

Options

Specifies the various parameters that are used when matching peaks to fins potential sequentially linked spin systems

Sequence

pulldown Chain: Selects which molecular chain the assignment is done for; sets which sequence spin systems match and are assigned to

Peak Matching

int 7: Sets the maximum number of matching spin systems to display in strips of the selected spectrum window

float 5.0: When the “Focus 13C matches?” option is on, specifies how closely to zoom in on the 13C peak locations; the width of the window region

float 0.5: When the “Focus 1HC matches?” option is on, specifies how closely to zoom in on the 1H peak locations; the width of the window region

check Auto Match: Whether to automatically get sequential spin system match candidates when selecting in the list of query spin systems

check Filter 13C By Inter/Intra Type: Whether to exclude peak matches for 13C positions where the i-1 resonance matches another i-1 resonance; gives more spurious matches but useful if i & i-1 peaks have the same shift

check Filter 1H By Inter/Intra Type: Whether to exclude peak matches for 1H positions where the i-1 resonance matches another i-1 resonance; gives more spurious matches but useful if i & i-1 peaks have the same shift

check Focus 13C matches?: Whether to zoom in on the 13C peak locations using the above stated focus width for the window region

check Focus 1H matches?: Whether to zoom in on the 1H peak locations using the above stated focus width for the window region

pulldown Match Order: Sets whether potential sequential spin system matches should be ranked by the average peak position difference or total difference