Usage

Context dependence test

creme.context_dependence_test(model, x, tile_pos, num_shuffle, mean=True, drop_wt=False)

This test embeds a sequence pattern bounded by start and end in shuffled background contexts – in line with a global importance analysis. The background contexts are generated by dinucleotide shuffling the original sequence.

Parameters

modelkeras.Model
A keras model.

xnp.array
Single one-hot sequence shape (L, A).

tile_poslist
List with start index and end index of pattern-of-interest along L (i.e. [start, end]).

num_shuffleint
Number of shuffles to apply and average over.

meanbool
If True, return the mean predictions across shuffles, otherwise return full predictions.

drop_wtbool
If true, do not run predictions on the WT sequence. Use this to avoid the computational cost if predictions are already available.

Returns

np.array : prediction of wild type sequence. np.array : prediction of mutant sequences.

Context swap test

creme.context_swap_test(model, x_source, x_target, tile_pos)

This test places a source sequence pattern bounded by start and end in a target sequence context at the same position – in line with a global importance analysis.

inputs:

model: keras model: A keras model.
x_source(s)np.array: Source sequence (one-hot with shape (L, A) or (N, L, A) from which a pattern will be taken.
x_target(s)np.array: Target sequence with shape (L, A) or (N, L, A) that will inherit a source pattern.
tile_poslist: List of start and end index of pattern along L.
meanbool: If True, return the mean predictions across shuffles, otherwise return full predictions.

Returns

np.array : prediction of wild type sequence. np.array : prediction of mutant sequences.

Necessity test

creme.necessity_test(model, x, tiles, num_shuffle, mean=True, return_seqs=False)

This test systematically measures how tile shuffles affects model predictions.

Parameters

modelkeras.Model
A keras model.

xnp.array
Single one-hot sequence shape (L, A).

tileslist
List of tile positions (start, end) to shuffle (i.e. [[start1, end1], [start2, end2],…]).

num_shuffleint
Number of shuffles to apply and average over.

meanbool
If True, return the mean predictions across shuffles, otherwise return full predictions.

return_seqsbool
If True, return generated sequences for future use.

Returns

list : WT sequence prediction, mean and standard deviation of mutant predictions (with shuffled tile) or all mutant predictions without averaging and (optionally) return generated sequences.

Sufficiency test

creme.sufficiency_test(model, x, tss_tile, tiles, num_shuffle, tile_seq=None, mean=True, return_seqs=False)

This test measures if a region of the sequence together with the TSS tile is sufficient to get model predictions same as in the WT case.

Parameters

modelkeras.Model
A keras model.

xnp.array
Single one-hot sequence shape (L, A).

tss_tilelist
List of the tss_tile position to embed in shuffled sequences, i.e. [start, end].

tileslist
List of tile positions (start, end) to embed in shuffled sequences. (i.e. [[start1, end1], [start2, end2],…]).

num_shuffleint
Number of dinuc shuffles to apply to sequence context and average over.

tile_seqnumpy
Sequence of the TSS to be embedded. If provided this overrules the TSS tile coordinates.

meanbool
If True, return the mean predictions across shuffles, otherwise return full predictions.

return_seqsbool
If True, return the generated mutant sequences.

Returns

list of numpy arrays. Depending on arguments returns either the WT prediction, mean and standard deviation of mutant sequence (dinuc shuffled sequence with TSS and tile) predictions, mean and standard deviation of control sequences (dinuc shuffled sequence with TSS only) or all the predictions without averaging and (optionally) the constructed mutant sequences.

Distance Test

creme.distance_test(model, x, tile_fixed_coord, tile_var_coord, test_positions, num_shuffle, mean=True, seed=False)

This test maps out the distance dependence of tile1 (anchored) and tile 2 (variable position). Tiles are placed in dinuc shuffled background contexts, in line with global importance analysis.

Parameters

modelkeras.Model
A keras model.

xnp.array
Single one-hot sequence shape (L, A).

tile_fixed_coordlist
List with start index and end index of tile that is anchored (i.e. [start, end]).

tile_var_coordlist
List with start index and end index of tile that is to be tested.

test_positionslist
List with start index of positions to test tile_var.

num_shuffleint
Number of shuffles to apply and average over.

meanbool
If True, return the mean predictions across shuffles, otherwise return full predictions.

seed: bool
If Ture, set a seed for the random dinuc shuffle of sequence and use the same background sequences for all position tests (per sequence).

Returns

dict: results organized as dictionary containing control (i.e. sequence with TSS and tile in original position) and mutant (variable tile location) predictions (either summarized as mean and standard deviation or all the predictions).

Higher order interaction test

creme.higher_order_interaction_test(model, x, cre_tiles_to_test, optimization, num_shuffle=10, num_rounds=None)

This test performs a greedy search to identify which tile sets lead to optimal changes in model predictions. In each round, a new tile is identified, given the previous sets of tiles by shuffling each tile and selecting the tile with biggest effect (similar to necessity test)

Parameters

modelkeras.Model
A keras model.

xnp.array
Single one-hot sequence shape (L, A).

cre_tiles_to_testlist
List with tile coordinates to be tested, each with a list that consists of start index and end index.

optimizationnp.argmax or np.argmin
Function that identifies/selects tile index for each round of greedy search.

num_shuffleint
Number of shuffles to apply and average over.

num_roundsint
Number of rounds to perform greedy search.

Returns

dictionary with keys as iteration number, values as another dictionary with results for that iteration. These include: initial predictions for that iteration (in iteration 0 this is WT, in iteration 2 this is for a sequence with 2 tiles shuffled already); predictions for newly generated mutants; selected tile based on predictions of this iteration; per tile mean of shuffles for the selected best tile

Multiplicity test

creme.multiplicity_test(model, x, tss_tile_coord, cre_tile_coord, cre_tile_seq, test_coords, num_shuffle, num_copies, optimization)

Parameters

modelkeras.Model
A keras model.

xnp.array
Single one-hot sequence shape (L, A).

tss_tile_coordlist
Start and end coordinates of the TSS tile (which is fixed from the beginning)

cre_tile_coordlist
Start and end coordinates for where to insert the CRE as a control.

cre_tile_seqnp.array
Single one-hot sequence of the CRE shape (L, A) where L equals the length of the CRE.

test_coordsnp.array
Tile start positions to test. In iteration 0 all the positions in the array will be tested and the one with the most optimal prediction will be selected (and removed from the set of positions for subsequent iterations).

num_shuffleint
Number of shuffles to apply and average over.

num_copiesint
Number of copies to insert, i.e. iterations to run.

optimizationnp.argmax or np.argmin
Function that identifies tile index for each round of greedy search.

Returns

dict: results organized as dictionary containing: (i) TSS activity on its own (in dinuc shuffled backgrounds), (ii) TSS and CRE activity when CRE is positioned at the specified position, (iii) list of the most optimal prediction in each iteration showing the steps of the optimization process (iv) list of tile positions that were selected in each iteration, (v) list all mutant predictions - for each iteration the predictions for each of the tested positions.

Fine-tile search

creme.prune_sequence(model, wt_seq, control_sequences, mut, whole_tile_start, whole_tile_end, scales, thresholds, frac, N_batches, cre_type='enhancer')

This function prunes a tile through greedy search to find the most enhancing subset of sub-tiles, explaining a set fraction of the original enhancement. It’s done in stages where sub-tiles of a specified scale are shuffled, keeping the least enhancing N sub-tiles (N defined by N_batches). The TSS activity with only the remaining sub-tiles is computed and compared to the case when the entire tile is inserted, using a ratio, i.e. the ‘score’. If the score is above a set threshold, the optimization continues with more iterations. When the threshold is reached, the last step is reverted, and a new stage can begin if defined.

Parameters

modelkeras.Model
A keras model.

wt_seqnp.array
Single one-hot sequence shape (L, A).

control_sequencesnp.array
One-hot background sequences of shape (N, L, A).

mutfloat
Prediction when only TSS and the entire CRE are embedded in background sequences. This is used to compute fraction restored by a subset of tile sequences embedded.

whole_tile_startint
Start coordinate of the CRE to prune.

whole_tile_endint
End coordinate of the CRE to prune.

scaleslist
Window sizes to use for sub-tiles. Each stage of pruning can have a different window size, e.g. 500bp in the first and 50bp in the second, to speed up the optimization.

thresholdslist
Score thresholds to use to determine when to stop a given optimization.

fracfloat
Fraction of scale or window size to use to compute step size.

N_batcheslist
List of integer batch sizes to use in each stage of the optimization. Batch size determines the number of tiles that are pruned out in each iteration. For example, batch size of 1 means that only 1 tile (if searching for enhancers then the most silencing tile) will be pruned. This parameter allows to prune discontinuous patches of sequences.

cre_typestring
‘enhancer’ or ‘silencer’ - defines the optimization type, ie. to either prune the least enhancing or least silencing elements.

Returns

dict: returns a dictionary with a summary of results for each iteration. The information of each stage is saved as window size (key) and corresponding dictionary of ‘scores’ - the fraction tile activity recovered, bps - number of bps embedded, ‘all_removed_tiles’ - np.array of all the removed sub-tiles, ‘insert_coords’ - set of remaining/surviving sub-tiles.

Extra functions

creme.generate_tile_shuffles(x, tile_set, num_shuffle)

inputs:

x: np.array: Source sequenc (one-hot with shape (L, A) from which a pattern will be taken.
tile_set: list: List of start and end positions.
num_shuffle: int: Number of shuffles to apply and average over.

Returns

Mutant sequences with shuffled tile(s).