Context dependence test
[1]:
from creme import creme
from creme import utils
import custom_model
import pandas as pd
import matplotlib.pyplot as plt
2024-06-05 03:10:53.417258: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Load Enformer and example sequences
[2]:
data_dir = '../../../data/'
track_index = [5111]
model = custom_model.Enformer(track_index=track_index)
2024-06-05 03:11:17.513393: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-05 03:11:17.948861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 69489 MB memory: -> device: 0, name: NVIDIA A100 80GB PCIe, pci bus id: 0000:85:00.0, compute capability: 8.0
[3]:
fasta_path = f'{data_dir}/GRCh38.primary_assembly.genome.fa'
seq_parser = utils.SequenceParser(fasta_path)
genes = ['ABCA8_chr17_68955392_-', 'NFKBIZ_chr3_101849513_+']
gene_seqs = {}
for gene in genes:
gene_name, chrom, start, strand = gene.split('_')
seq = seq_parser.extract_seq_centered(chrom, int(start), strand, model.seq_length)
gene_seqs[gene_name] = seq
[12]:
# TSS bin indeces
bins = [447, 448]
[4]:
abca8_wt = model.predict(gene_seqs['ABCA8'])[0,:,0]
nfkbiz_wt = model.predict(gene_seqs['NFKBIZ'])[0,:,0]
2024-06-05 03:11:20.967107: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:630] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2024-06-05 03:11:21.135040: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:428] Loaded cuDNN version 8401
[13]:
utils.plot_track([abca8_wt], color='green', zoom=[0, 896], marks=bins)
utils.plot_track([nfkbiz_wt], color='red', zoom=[0, 896], marks=bins)
[13]:
<Axes: >
Context dependence test
To run context dependence test we need: - a loaded model - onehot encoded sequence (WT) - a coordinate interval where the TSS - number of times to run the shuffling - optionally, we can choose if the results will be summarized or the results of each shuffle will be returned - optionally, we can omit getting WT predictions to save time
[6]:
seq_halflen = model.seq_length // 2
half_window_size = 2500
N_shuffles = 10
Enhancing context sequence: this is an example sequence with enhancing context, where shuffling the sequence context leads to a drop in TSS activity.
[7]:
_, pred_mut = creme.context_dependence_test(model, gene_seqs['ABCA8'],
[seq_halflen - half_window_size, seq_halflen + half_window_size],
N_shuffles, mean=False, drop_wt=True)
[15]:
utils.plot_track([abca8_wt], color='green', zoom=[400, 500], marks=bins)
ax=utils.plot_track(pred_mut[:,:,0], zoom=[400, 500], marks=bins)
utils.plot_track([pred_mut[:,:,0].mean(axis=0)], alpha=1, color='k', zoom=[400, 500], marks=bins, ax=ax)
[15]:
<Axes: >
Silencing context sequence: this is an example sequence with silencing context, where shuffling the sequence context leads to an increase in TSS activity.
[16]:
_, pred_mut = creme.context_dependence_test(model, gene_seqs['NFKBIZ'],
[seq_halflen - half_window_size, seq_halflen + half_window_size],
N_shuffles, mean=False, drop_wt=True)
[18]:
utils.plot_track([nfkbiz_wt], color='green', zoom=[400, 500], marks=bins)
ax=utils.plot_track(pred_mut[:,:,0], zoom=[400, 500], marks=bins)
utils.plot_track([pred_mut[:,:,0].mean(axis=0)], alpha=1, color='k', zoom=[400, 500], marks=bins, ax=ax)
[18]:
<Axes: >
[ ]: