首页 > 其他分享 >10X空间转录组数据分析之CNV轨迹层级

10X空间转录组数据分析之CNV轨迹层级

时间:2024-06-06 13:59:05浏览次数:12  
标签:CNV cnv palette 层级 False ad 10X id ads

作者,Evil Genius
今天我们要完成一项工程,那就是CNV层级轨迹,如下图:

首先来看一下什么是CNV谱系
拷贝数变异(CNVs)在细胞增殖过程中在基因组中积累,提供了系统发育谱系的特征。通过转录组数据上使用intercnv生成CNV谱,能够以空间和时间分辨率跟踪克隆系统发育。
文章空间转录组数据分析之CNV burden和CNV聚类(python版本)已经做了基础的CNV分析和聚类,这个是前提。
CNV谱系如下:

我们来完成这项工程,示例数据大家自行下载,大家都是生信资深人员了,中间过程就不啰嗦了。
from IPython.display import display, HTML, Image
display(HTML("<style>.container { width:95% !important; }</style>"))

%load_ext autoreload
%autoreload 2

import warnings
warnings.filterwarnings("ignore")

import os
import copy

import numpy as np
import pandas as pd
import scanpy as sc
import scanpy.external as sce
from anndata import AnnData

import matplotlib
from matplotlib import cm
import matplotlib.pyplot as plt
import matplotlib.patheffects as path_effects

import sys
sys.path.append("../../lib")
from stpalette import palette_WM4007_rna, palette_WM4237_rna
from plots import plotSpatialAll, plotSpatialEdge, euclidean_p1
from cheatmap import sMED, cdist, cplot
from utils import loadAdImage, filterCNV

model = 'WM4007'
palette_rna = palette_WM4007_rna if model=='WM4007' else palette_WM4237_rna

preprocessedStDataPath = 'd:/from HPCC 11 28 2022/results_NF1-nod-t2t-k35/%s/' % model
dataPath = '../../data/'
ids = sorted(np.loadtxt(dataPath + 'ids_%s_ST.txt' % model, dtype=str))
sids = [id[7:12] for id in ids]
from scipy.spatial.distance import cdist, pdist, squareform

ad_all = sc.read(dataPath + 'ad_all_human_clustered_st_%s.h5ad' % model)
ad_all = ad_all[ad_all.obs['human_ratio']>=0.25]
filter_spots_index = ad_all.obs.index.copy()
读取空间数据

以下内容为付费内容,定价 ¥200.00点击修改

ad_cnv = sc.read(dataPath + 'ad_all_human_clustered_cnv_%s.h5ad' % model)
ad_cnv = ad_cnv[ad_cnv.obs.index.intersection(filter_spots_index), :]

plt.rcParams["figure.figsize"] = (4, 3)
sc.pl.embedding(ad_cnv[ad_cnv.obs.index.intersection(filter_spots_index), :], color=['cluster', 'cnv_clusters'], basis="X_umap", title='time', palette=palette_rna, size=25)

dfc = ad_cnv.obs[['cnv_clusters', 'sample']].value_counts().unstack().fillna(0).astype(int)
dfc

cnv Burden
use_raw_distances = False
metric = 'cosine'

if use_raw_distances:
    df_cnv = ad_cnv.to_df()
    df_cnv.index = ad_cnv.obs['cnv_clusters']
else:
    df_cnv = pd.DataFrame(ad_cnv.obsm['X_pca'], index=ad_cnv.obs['cnv_clusters'])
    
uclusters = list(ad_cnv.obs['cnv_clusters'].unique())

res = dict()
for c1 in uclusters:
    for c2 in uclusters:
        if (c1!=c2) and (not (c1, c2) in res.keys()) and (not (c2, c1) in res.keys()):
            print(c1, '\t', c2, '\t', df_cnv.loc[c1].shape[0], df_cnv.loc[c2].shape[0])
            d = cdist(df_cnv.loc[c1], df_cnv.loc[c2], metric=metric).mean() # euclidean cosine
            res.update({(c1, c2): d})
df_pdist = pd.Series(res).unstack()
df_ppdist = df_pdist.reindex(uclusters).T.reindex(uclusters).fillna(0.) + df_pdist.reindex(uclusters).T.reindex(uclusters).T.fillna(0.)

df_ppdist = df_pdist.reindex(uclusters).T.reindex(uclusters).fillna(0.) + df_pdist.reindex(uclusters).T.reindex(uclusters).T.fillna(0.)
df_styled = df_ppdist.style.background_gradient(axis=None, vmin=df_ppdist[df_ppdist!=0].min().min(), cmap='bwr')
df_styled.to_excel(f'{model}_PCA_cosine_distance_average_over_cnv_clusters.xlsx')
df_styled

Load spatial
images = {id: [(ad_temp := sc.read(preprocessedStDataPath + '%s/st_adata_processed.h5ad' % id)).uns['spatial'], ad_temp.obs.index, ad_temp.obsm['spatial']] for id in ids}

ads = dict()
for sample in ids:
    print(sample, end='\t')
    ads[sample] = ad_cnv[ad_cnv.obs['id']==sample, :].copy()    
    ads[sample].uns['spatial'] = images[sample][0]
    ads[sample].obsm['spatial'] = pd.DataFrame(index=images[sample][1], data=images[sample][2]).reindex(ads[sample].obs['original_barcode']).values
    ads[sample][:, :] = filterCNV(ads[sample].to_df().T, n=10).T

plotSpatialAll(ads, identity='cnv_clusters', palette=palette_rna, f=0.75)

plotSpatialAll({id: ads[id] for id in ids if not '_TE_' in id and '_S2_' in id}, identity='cnv_cosine_distance', palette=palette_rna, nx=6, f=0.6, cmap='nipy_spectral', uniformColorScale=True, wspace=0.3)

params = dict(fontsize=12, markerscale=1.5, linewidth_factor=2.0, alpha=0.5, alpha_img=0.5, decay_power=4, absolute_cutoff=None, R=1, # 20
              alpha_quantile=0.99, method='cosine', cacheEdges=False, useSavedCache=False)

res=plotSpatialEdge({id: ads[id] for id in ids if not '_TE_' in id}, identity='cluster', palette=palette_rna, f=1.0, invert=False, use_raw_distances=False, noplot=False, **params)

params = dict(fontsize=12, markerscale=1.25, maintainScale=False, linewidth_factor=2.0, alpha=0.65, alpha_img=0.5, decay_power=4, absolute_cutoff=None, R=1, # 20
              alpha_quantile=0.99, method='cosine', cacheEdges=False, useSavedCache=False)

res=plotSpatialEdge({id: ads[id] for id in ids if (('_T0_' in id) and (not '_S2_' in id)) or  (('_T1_' in id) and (not '_S2_' in id)) or (('_TC_' in id) and (not '_S1_' in id))},
                    identity='cluster', palette=palette_rna, f=0.75, nx=3, invert=False, use_raw_distances=False, noplot=False, suptitle=False,  **params)

params = dict(fontsize=12, markerscale=1.25, maintainScale=False, linewidth_factor=2.0, alpha=0.65, alpha_img=0.5, decay_power=4, absolute_cutoff=0.8, R=1, # 20
              alpha_quantile=0.99, method='cosine', cacheEdges=False, useSavedCache=False)

res=plotSpatialEdge({id: ads[id] for id in ids if (('_T0_' in id) and (not '_S2_' in id)) or  (('_T1_' in id) and (not '_S2_' in id)) or (('_TC_' in id) and (not '_S1_' in id))},
                    identity='cluster', palette=palette_rna, f=0.75, nx=3, invert=False, use_raw_distances=False, noplot=False, suptitle=False,  **params)

params = dict(fontsize=12, markerscale=1.25, maintainScale=False, linewidth_factor=2.0, alpha=0.65, alpha_img=0.5, decay_power=4, absolute_cutoff=0.8, R=1, # 20
              alpha_quantile=0.99, method='cosine', cacheEdges=False, useSavedCache=False)

res=plotSpatialEdge({id: ads[id] for id in ids if (('_T0_' in id) and (not '_S2_' in id)) or  (('_T1_' in id) and (not '_S2_' in id)) or (('_TC_' in id) and (not '_S1_' in id))},
                    identity='cnv_clusters', palette=palette_rna, f=0.75, nx=3, invert=False, use_raw_distances=False, noplot=False, suptitle=False,  **params)

cnv_palette = {str(i): cm.Set3(c-1) for i, c in enumerate([2,1,7,9,5,11,6,10,8,4,3])}
cnv_palette.update({'T0': '#DE8C00', 'T1': '#7CAE00', 'T2': '#00C08B', 'T3': '#B79F00', 'T4': '#C77CFF', 'TC': '#FF64B0', '-1': 'white'})

from vplot import wrapVplotRatio, wrapVplotScore, wrapVplotGene, vplot

params = dict(fontsize=12, markerscale=1.25, maintainScale=False, linewidth_factor=2.0, alpha=0.95, alpha_img=0.5, decay_power=4, absolute_cutoff=2.0, R=1, # 20
              alpha_quantile=0.99, method='cosine', cacheEdges=False, useSavedCache=False)

res=plotSpatialEdge({id: ads[id] for id in ids if (('_T0_' in id) and (not '_S2_' in id)) or  (('_T1_' in id) and (not '_S2_' in id)) or (('_TC_' in id) and (not '_S1_' in id))},
                    identity='cnv_clusters', palette=cnv_palette, f=0.75, nx=3, invert=False, use_raw_distances=False, noplot=False, suptitle=False,  **params)

params = dict(fontsize=12, markerscale=1.25, maintainScale=False, linewidth_factor=2.0, alpha=0.95, alpha_img=0.5, decay_power=4, absolute_cutoff=0.8, R=1, # 20
              alpha_quantile=0.99, method='cosine', cacheEdges=False, useSavedCache=False)

res=plotSpatialEdge(ads, identity='cnv_clusters', palette=cnv_palette, f=0.75, nx=4, invert=False, use_raw_distances=False, noplot=False, suptitle=False,  **params)

def wrapVplotScoreLoc(score, genesDicts=None, identity=None, selected=[], palette=None, ads=None, ids=None, vprefix='score_', **kwargs):

    if len(selected)==0:
        selected = pd.Series(np.hstack([ads[id].obs[identity] for id in ids])).sort_values().unique().tolist()

    df = pd.concat([pd.Series(ads[id][ads[id].obs[identity].isin(selected), :].obs[vprefix + score].fillna(0).values) for id in ids], keys=ids, axis=1)
    dfc = pd.concat([pd.Series(ads[id][ads[id].obs[identity].isin(selected), :].obs[identity].values.astype(int)) for id in ids], keys=ids, axis=1)
    
    return vplot(df, dfc, name=vprefix + score, palette=palette, **kwargs)

for sample in ids:
    ads[sample].obs['score_avgdist'] = ads[sample].obs['avgdist']
wrapVplotScoreLoc('avgdist', identity='cnv_clusters', palette=cnv_palette, ads=ads, ids=ids, vmin=-0.1)
# wrapVplotScoreLoc('avgdist', identity='cluster', palette=palette_rna, ads=ads, ids=ids)

def sMED(v1, v2):

    """Simple Minimum Event Distance (MED), similar (but not equivalent!) to the one implemented in MEDALT.
    MED is the minimal number and the series of single-copy gains or losses that are required to evolve one genome to another.
    """
    a = v1 - v2
    a = a[np.roll(a, -1) != a]
    a = np.abs(a[a != 0]).sum()
    return int(a)

v1 = np.array([0,0,0,0,1,1,1,1,0,0,0,0,-1,-1,-1,0,0,0,1,1,0,0,0,1])
v2 = np.array([0,0,0,0,0,1,1,0,0,0,0,0,-1,-1,-1,0,0,0,-1,-1,0,0,0,1])

plt.rcParams["figure.figsize"] = (4, 4)
ad_cnv.obs['resistant'] = ((ad_cnv.obs['cluster'].isin(['8'])) & (ad_cnv.obs['time'].isin(['T4']))).astype(int).astype(str).astype('category')
ad_cnv.obs['resistantEarly'] = ((ad_cnv.obs['cluster'].isin(['8'])) & (ad_cnv.obs['time'].isin(['T1', 'T2', 'T3']))).astype(int).astype(str).astype('category')
sc.pl.embedding(ad_cnv, color=['time', 'cluster', 'resistant', 'resistantEarly'], basis="X_umap", palette=palette_rna, size=25)

identity = 'resistant'
sc.tl.embedding_density(ad_cnv, basis='umap', groupby=identity)
sc.pl.embedding_density(ad_cnv, groupby=identity, ncols=10, s=5)

ad_all = sc.read(dataPath + 'ad_all_human_clustered_st_%s.h5ad' % model)

ad_all[(ad_all.obs['cluster'].isin(['8'])) & (ad_cnv.obs['time'].isin(['T1', 'T2', 'T3', 'T4']))

identity = 'resistantEarly'
sc.tl.embedding_density(ad_cnv, basis='umap', groupby=identity)
sc.pl.embedding_density(ad_cnv, groupby=identity, ncols=10, s=50)

sc.pl.embedding(ad_cnv, color=['resistant', 'resistantEarly'], basis="X_umap", palette=palette_rna, size=5)

Load images and create ADs
images = {id: [(ad_temp := sc.read(preprocessedStDataPath + '%s/st_adata_processed.h5ad' % id)).uns['spatial'], ad_temp.obs.index, ad_temp.obsm['spatial']] for id in ids}

ads = dict()
for sample in ids:
    print(sample, end='\t')
    ads[sample] = ad_cnv[ad_cnv.obs['id']==sample, :].copy()    
    ads[sample].uns['spatial'] = images[sample][0]
    ads[sample].obsm['spatial'] = pd.DataFrame(index=images[sample][1], data=images[sample][2]).reindex(ads[sample].obs['original_barcode']).values
    ads[sample][:, :] = filterCNV(ads[sample].to_df().T, n=10).T


params2 = dict(fontsize=12, markerscale=1.5, linewidth_factor=1.0, alpha=1.0, alpha_img=1.0, decay_power=1,
               absolute_cutoff=0.4, alpha_quantile=0.75, method='pearson', cacheEdges=False) 

df_info = plotSpatialEdge({(id:='WM4007_T3_S1_ST'): ads[id]}, identity='cluster', palette=palette_rna, f=2, invert=False, **params2)

ad = ads['WM4007_T3_S1_ST']

df_temp, df_meta_temp = ad.to_df().T, ad.obs[['time', 'cluster']].sort_values(by=['time', 'cluster'], ascending=False)
df_temp = df_temp[df_meta_temp.index]
cplot(df_temp, df_meta_temp, sampleMED=1000, clusterObsByGroups=False, optimalOrderingForObs=False,
            palette=palette_rna, clusterVar=False, clusterObs=True, addLinesOnHeatmap=False, addLinesOnGroups=False, useMEDforObsGroups=True, useMEDforObs=True,
            colorbarLabel='InferCNV CNV', colorbarLabels=['Ampl.', 'Del.'], safetyLimit=2000, figsize=[14,8], dendrogramLineWidth=1.0);

dataPath = '../../data/'
dfc_WM4007 = sc.read(dataPath + 'ad_all_scaled_unfiltered_st_WM4007.h5ad').obs.set_index(['T', 'sample'])['in_tissue'].groupby(level=[0,1]).count().replace(0, np.nan).dropna()
dfc_WM4007_h = sc.read(dataPath + 'ad_all_human_clustered_st_WM4007.h5ad').obs.set_index(['T', 'sample'])['in_tissue'].groupby(level=[0,1]).count().replace(0, np.nan).dropna()

params2 = dict(fontsize=12, markerscale=1.5, linewidth_factor=1.0, alpha=1.0, alpha_img=1.0, decay_power=2, absolute_cutoff=70, alpha_quantile=0.75, method='MED', cacheEdges=False)

plotSpatialEdge(ads, identity='cluster', palette=palette_rna, f=1., invert=False, **params2)

params = dict(fontsize=12, markerscale=1.5, linewidth_factor=1.0, alpha=1.0, alpha_img=1.0, decay_power=2, absolute_cutoff=70, method='MED', alpha_quantile=0.75, cacheEdges=True)

plotSpatialEdge(ads, identity='cluster', palette=palette_rna, f=1., invert=False, **params)

示例数据上传到了https://images.jax.org/webclient/?show=project-1451,大家有意随意下载测试。
生活很好,有你更好

标签:CNV,cnv,palette,层级,False,ad,10X,id,ads
From: https://blog.csdn.net/weixin_53637133/article/details/139429943

相关文章

  • react中推荐使用发布订阅模式,进行跨多层级的组件间通信和事件传递吗?
    在React中,虽然发布订阅模式(Pub/Sub)可以作为一种实现跨多层级组件间通信的方法,但它并不是React官方推荐的主要手段,尤其是在ReactHooks和ContextAPI普及之后。React推荐的跨组件通信方法主要包括:Propsdrilling:最直接的方式,通过props从父组件向子组件传递数据,适合简单的数据流......
  • YOLOv10涨点改进:卷积魔改 | 可变形条带卷积(DSCN),魔改轻量DCNv3二次创新
     ......
  • 3DS MAX备忘笔记(命令-全层级通用)
    全层级可用命令重置变换:l 清除对模型变换操作的记录。(防止变换层级滞留影响后续精细操作)l 快速添加重置变换修改器:选定对象-实用程序-重置变换-重置选定内容,(添加重置变换后再CTRLz撤回可能出错)l 重置变换可以同时对多对象使用l 修改器搜索添加的重置变换不好使(???l ......
  • 3DS MAX备忘笔记(命令-点层级)
    点层级焊接:l 修复破面,可以设置阈值,全选点后一键焊接所有挨得近的重复点l 合并两物体,可以设置阈值,全选两破面所含的点后一键焊接所有挨得近的重复点,连接两个破面 目标焊接:可以跨物体焊接两个破面的点(无阻挡)移除backspace退格:保留面删除点删除delete:直接连带删除点周......
  • unity通过Transform:管理游戏对象的层级关系
    unity中可以通过Transform类来管理游戏对象的层级关系,查询相关组件。1.获取游戏对象的父类,打印出来Debug.Log(transform.parent);设置父对象,以照相机为例子,查询主摄像机(main代表主摄像机transform.SetParent(Camera.main.transform);2. 当前对象,获取根对象(最上方的那个......
  • AJ-Report 认证绕过与远程代码执行漏洞(CNVD-2024-15077)
    AJ-Report是全开源的一个BI平台。在其1.4.0版本及以前,存在一处认证绕过漏洞,攻击者利用该漏洞可以绕过权限校验并执行任意代码。补丁对比方法一从docker拖出代码,去gitee下载发行版,便于对比编译后的class。方法二查看git的commit记录,可以直接看到修改了哪些内容!后面要去学习......
  • 记录一次cnvd事件型证书漏洞挖掘
    事件起因是因为要搞毕设了,在为这个苦恼,突然负责毕设的老师说得到cnvd下发的证书结合你的漏洞挖掘的过程是可以当成毕设的,当时又学习了一段时间的web渗透方面的知识,于是踏上了废寝忘食的cnvd证书漏洞挖掘的日子。前言:听群友们说,一般可以获得cnvd事件型的证书要三大运营商、铁塔的......
  • 视觉语音识别挑战赛 CNVSRC 2024
        CNVSRC2024由NCMMSC2024组委会发起,清华大学、北京邮电大学、海天瑞声、语音之家共同主办。竞赛的目标是通过口唇动作来推断发音内容,进一步推动视觉语音识别技术的发展。视觉语音识别(也称为读唇技术)是一种通过观察唇部动作推断发音内容的技术,广泛应用于公共安全、......
  • 用层级理解冲突
    概述当上层软件提供了名称相同,但功能不同的api时,下层使用该api的软件就会不知道如何选择,或随机选择。进而可能造成软件故障,崩溃。所以起冲突的是上层软件,造成影响的却是下层软件。示例背景假设有一个小型的软件生态系统,包含两个上层库——LibraryA和LibraryB,以及一个下层的应......
  • 世微 AP510X 单路低压差线性恒流芯片 LED手电筒景观亮化台灯车灯照明
    AP510X是一系列电路简洁的单路线性LED恒流芯片,适用于3-60V电压范围的LED恒流调光领域。AP510X采用我司算法,可以实现高精度的恒流效果,输出电流恒流精度≤±3%,电源供电工作范围为3-40V,可以轻松满足锂电池以及市场上面中低压的应用需求。PWM调光支持高辉应用,可以支持20K以上的调光频......