博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
highly variable gene | 高变异基因的选择 | feature selection | 特征选择
阅读量:5321 次
发布时间:2019-06-14

本文共 1797 字,大约阅读时间需要 5 分钟。

在做单细胞的时候,有很多基因属于noise,就是变化没有规律,或者无显著变化的基因。在后续分析之前,我们需要把它们去掉。

以下是一种找出highly variable gene的方法:

The feature selection procedure is based on the largest difference between the observed coefficient of variation (CV) and the predicted CV (estimated by a non-linear noise model learned from the data) See Figure S1C. In particular, Support Vector Regression (SVR, Smola and Vapnik, 1997) was used for this purpose (scikit-learn python implementation, default parameters with gamma = 0.06; Pedregosa et al., 2011).

#Pre-filteringdf_f = df_merge.copy()df_f = df_f.ix[sum(df_f>=1, 1)>=5,:] # is at least 1 in X cellsdf_f = df_f.ix[sum(df_f>=2, 1)>=2,:] # is at least 2 in X cellsdf_f = df_f.ix[sum(df_f>=3, 1)>=1,:] # is at least 2 in X cells#Fittingmu = df_f.mean(1).valuessigma = df_f.std(1, ddof=1).valuescv = sigma/muscore, mu_linspace, cv_fit , params = fit_CV(mu,cv, 'SVR', svr_gamma=0.005)#Plottingdef plot_cvmean():    figure()    scatter(log2(mu),log2(cv), marker='o', edgecolor ='none',alpha=0.1, s=5)    mu_sorted = mu[argsort(score)[::-1]]    cv_sorted = cv[argsort(score)[::-1]]    scatter(log2(mu_sorted[:thrs]),log2(cv_sorted[:thrs]), marker='o', edgecolor ='none',alpha=0.15, s=8, c='r')    plot(mu_linspace, cv_fit,'-k', linewidth=1, label='$Fit$')    plot(linspace(-9,7), -0.5*linspace(-9,7), '-r', label='$Poisson$')    ylabel('log2 CV')    xlabel('log2 mean')    grid(alpha=0.3)    xlim(-8.6,6.5)    ylim(-2,6.5)    legend(loc=1, fontsize='small')    gca().set_aspect(1.2)    plot_cvmean()#Adjusting plot

 

对每一个基因在不同细胞中的表达量的mean和CV散点图,通过SVR拟合出noise的曲线。

通过the largest difference between the observed coefficient of variation (CV) and the predicted CV (estimated by a non-linear noise model learned from the data)就能找出highly variable gene了。

 

  

 

 

 

转载于:https://www.cnblogs.com/leezx/p/8631812.html

你可能感兴趣的文章
javaagent 简介
查看>>
python升级安装后的yum的修复
查看>>
Vim配置Node.js开发工具
查看>>
web前端面试题2017
查看>>
ELMAH——可插拔错误日志工具
查看>>
MySQL学习笔记(四)
查看>>
【Crash Course Psychology】2. Research & Experimentation笔记
查看>>
两数和
查看>>
移动设备和SharePoint 2013 - 第3部分:推送通知
查看>>
SOPC Builder中SystemID
查看>>
MySQL数据库备份工具mysqldump的使用(转)
查看>>
NTP服务器配置
查看>>
【转】OO无双的blocking/non-blocking执行时刻
查看>>
关于 linux 的 limit 的设置
查看>>
HDU(4528),BFS,2013腾讯编程马拉松初赛第五场(3月25日)
查看>>
vim中文帮助教程
查看>>
MySQL基础3
查看>>
RxJS & Angular
查看>>
面向对象(多异常的声明与处理)
查看>>
MTK笔记
查看>>