Usage

Context dependence test

_images/context_dependence_test.svg
creme.context_dependence_test(model, x, tile_pos, num_shuffle, mean=True, drop_wt=False)

This test embeds a sequence pattern bounded by start and end in shuffled background contexts – in line with a global importance analysis. The background contexts are generated by dinucleotide shuffling the original sequence.

Parameters

modelkeras.Model

A keras model.

xnp.array

Single one-hot sequence shape (L, A).

tile_poslist

List with start index and end index of pattern-of-interest along L (i.e. [start, end]).

num_shuffleint

Number of shuffles to apply and average over.

meanbool

If True, return the mean predictions across shuffles, otherwise return full predictions.

drop_wtbool

If true, do not run predictions on the WT sequence. Use this to avoid the computational cost if predictions are already available.

Returns

np.array : prediction of wild type sequence. np.array : prediction of mutant sequences.

Context swap test

_images/context_swap_test.svg
creme.context_swap_test(model, x_source, x_target, tile_pos)

This test places a source sequence pattern bounded by start and end in a target sequence context at the same position – in line with a global importance analysis.

inputs:
model: keras model

A keras model.

x_source(s)np.array

Source sequence (one-hot with shape (L, A) or (N, L, A) from which a pattern will be taken.

x_target(s)np.array

Target sequence with shape (L, A) or (N, L, A) that will inherit a source pattern.

tile_poslist

List of start and end index of pattern along L.

meanbool

If True, return the mean predictions across shuffles, otherwise return full predictions.

Returns

np.array : prediction of wild type sequence. np.array : prediction of mutant sequences.

Necessity test

_images/necessity_test.svg
creme.necessity_test(model, x, tiles, num_shuffle, mean=True, return_seqs=False)

This test systematically measures how tile shuffles affects model predictions.

Parameters

modelkeras.Model

A keras model.

xnp.array

Single one-hot sequence shape (L, A).

tileslist

List of tile positions (start, end) to shuffle (i.e. [[start1, end1], [start2, end2],…]).

num_shuffleint

Number of shuffles to apply and average over.

meanbool

If True, return the mean predictions across shuffles, otherwise return full predictions.

return_seqsbool

If True, return generated sequences for future use.

Returns

list : WT sequence prediction, mean and standard deviation of mutant predictions (with shuffled tile) or all mutant predictions without averaging and (optionally) return generated sequences.

Sufficiency test

_images/sufficiency_test.svg
creme.sufficiency_test(model, x, tss_tile, tiles, num_shuffle, tile_seq=None, mean=True, return_seqs=False)

This test measures if a region of the sequence together with the TSS tile is sufficient to get model predictions same as in the WT case.

Parameters

modelkeras.Model

A keras model.

xnp.array

Single one-hot sequence shape (L, A).

tss_tilelist

List of the tss_tile position to embed in shuffled sequences, i.e. [start, end].

tileslist

List of tile positions (start, end) to embed in shuffled sequences. (i.e. [[start1, end1], [start2, end2],…]).

num_shuffleint

Number of dinuc shuffles to apply to sequence context and average over.

tile_seqnumpy

Sequence of the TSS to be embedded. If provided this overrules the TSS tile coordinates.

meanbool

If True, return the mean predictions across shuffles, otherwise return full predictions.

return_seqsbool

If True, return the generated mutant sequences.

Returns

list of numpy arrays. Depending on arguments returns either the WT prediction, mean and standard deviation of mutant sequence (dinuc shuffled sequence with TSS and tile) predictions, mean and standard deviation of control sequences (dinuc shuffled sequence with TSS only) or all the predictions without averaging and (optionally) the constructed mutant sequences.

Distance Test

_images/distance_test.svg
creme.distance_test(model, x, tile_fixed_coord, tile_var_coord, test_positions, num_shuffle, mean=True, seed=False)

This test maps out the distance dependence of tile1 (anchored) and tile 2 (variable position). Tiles are placed in dinuc shuffled background contexts, in line with global importance analysis.

Parameters

modelkeras.Model

A keras model.

xnp.array

Single one-hot sequence shape (L, A).

tile_fixed_coordlist

List with start index and end index of tile that is anchored (i.e. [start, end]).

tile_var_coordlist

List with start index and end index of tile that is to be tested.

test_positionslist

List with start index of positions to test tile_var.

num_shuffleint

Number of shuffles to apply and average over.

meanbool

If True, return the mean predictions across shuffles, otherwise return full predictions.

seed: bool

If Ture, set a seed for the random dinuc shuffle of sequence and use the same background sequences for all position tests (per sequence).

Returns

dict: results organized as dictionary containing control (i.e. sequence with TSS and tile in original position) and mutant (variable tile location) predictions (either summarized as mean and standard deviation or all the predictions).

Higher order interaction test

_images/greedy_greedy_hippo.svg
creme.higher_order_interaction_test(model, x, cre_tiles_to_test, optimization, num_shuffle=10, num_rounds=None)

This test performs a greedy search to identify which tile sets lead to optimal changes in model predictions. In each round, a new tile is identified, given the previous sets of tiles by shuffling each tile and selecting the tile with biggest effect (similar to necessity test)

Parameters

modelkeras.Model

A keras model.

xnp.array

Single one-hot sequence shape (L, A).

cre_tiles_to_testlist

List with tile coordinates to be tested, each with a list that consists of start index and end index.

optimizationnp.argmax or np.argmin

Function that identifies/selects tile index for each round of greedy search.

num_shuffleint

Number of shuffles to apply and average over.

num_roundsint

Number of rounds to perform greedy search.

Returns

dictionary with keys as iteration number, values as another dictionary with results for that iteration. These include: initial predictions for that iteration (in iteration 0 this is WT, in iteration 2 this is for a sequence with 2 tiles shuffled already); predictions for newly generated mutants; selected tile based on predictions of this iteration; per tile mean of shuffles for the selected best tile

Multiplicity test

_images/multiplicity_test.svg
creme.multiplicity_test(model, x, tss_tile_coord, cre_tile_coord, cre_tile_seq, test_coords, num_shuffle, num_copies, optimization)

Parameters

modelkeras.Model

A keras model.

xnp.array

Single one-hot sequence shape (L, A).

tss_tile_coordlist

Start and end coordinates of the TSS tile (which is fixed from the beginning)

cre_tile_coordlist

Start and end coordinates for where to insert the CRE as a control.

cre_tile_seqnp.array

Single one-hot sequence of the CRE shape (L, A) where L equals the length of the CRE.

test_coordsnp.array

Tile start positions to test. In iteration 0 all the positions in the array will be tested and the one with the most optimal prediction will be selected (and removed from the set of positions for subsequent iterations).

num_shuffleint

Number of shuffles to apply and average over.

num_copiesint

Number of copies to insert, i.e. iterations to run.

optimizationnp.argmax or np.argmin

Function that identifies tile index for each round of greedy search.

Returns

dict: results organized as dictionary containing: (i) TSS activity on its own (in dinuc shuffled backgrounds), (ii) TSS and CRE activity when CRE is positioned at the specified position, (iii) list of the most optimal prediction in each iteration showing the steps of the optimization process (iv) list of tile positions that were selected in each iteration, (v) list all mutant predictions - for each iteration the predictions for each of the tested positions.

Extra functions

creme.generate_tile_shuffles(x, tile_set, num_shuffle)
inputs:
x: np.array

Source sequenc (one-hot with shape (L, A) from which a pattern will be taken.

tile_set: list

List of start and end positions.

num_shuffle: int

Number of shuffles to apply and average over.

Returns

Mutant sequences with shuffled tile(s).