肿瘤防治研究  2015, Vol. 42 Issue (5): 446-449
本刊由国家卫生和计划生育委员会主管,湖北省卫生厅、中国抗癌协会、湖北省肿瘤医院主办。
0

文章信息

张新丽,赵艳华,李淑娜,张文玲. 2015.
ZHANG Xinli, ZHAO Yanhua, LI Shu'na, ZHANG Wenling . 2015.
生物信息学方法筛选结肠癌相关长链
Screening of Long Non-coding RNA in Colon Cancer by Bioinformatics
肿瘤防治研究, 2015, 42(05): 446-449
Cancer Research on Prevention and Treatment, 2015, 42 (05): 446-449
http://www.zlfzyj.com/CN/10.3971/j.issn.1000-8578.2015.05.005

文章历史

收稿日期:2014-08-14
修回日期:2014-10-27
生物信息学方法筛选结肠癌相关长链
张新丽, 赵艳华, 李淑娜, 张文玲    
410013 长沙,中南大学湘雅医学院医学检验系
摘要目的 筛选结肠癌中差异表达长链非编码RNA(lncRNA)并探讨其表达情况。方法 从NCBI(美国国立生物技术信息中心)公共数据平台Gene Expression Omnibus(GEO)下载结肠癌基因芯片数据GSE41328,包含10对结肠癌组织及正常组织,用微阵列显著性分析(SAM)软件输出差异表达基因,信息学网站对基因重注释得到lncRNA名称,然后用Gene Cluster和TreeView软件分析差异表达lncRNA数据,进一步验证其表达情况。结果 分析GSE41328发现,与正常组织相比,结肠癌共有66个lncRNA差异表达,其中22个高表达,44个低表达,结肠癌与正常组织表达值倍数均大于2或小于0.5,差异有统计学意义。结论 运用生物信息学方法筛选结肠癌相关lncRNA,可为寻找新的肿瘤标志物提供新的途径,但其结果需要进一步实验验证。
关键词长链非编码RNA    结肠癌     基因芯片    
Screening of Long Non-coding RNA in Colon Cancer by Bioinformatics
ZHANG Xinli, ZHAO Yanhua, LI Shu'na, ZHANG Wenling    
Department of Medical Laboratory, Xiangya School of Medicine, Central South University, Changsha 410013, China
AbstractObjective To screen the differentially-expressed lncRNA and analyze its expression in colon cancer. Methods The publicly available gene chip data GSE413268, which consists of 10 paired colon cancers and normal tissues, were downloaded from the Gene Expression Omnibus (GEO) of National Center for Biotechnology Information (NCBI) . The data were analyzed by SAM software, which generated some differentially-expressed gene probes. Then we re-annotated the probes in some online websites and used the Gene Cluster and TreeView software to further validate the expression of lncRNA. Results There were 66 differentially-expressed lncRNA in colon cancer tissues compared with normal tissues, among which 22 were up-regulated and 44 were down-regulated. All lncRNA showed notably differential expression and the fold-change of gene expression was all more than 2 or less than 0.5. Conclusion The bioinformatics applied to screen cancer-related lncRNA provide a novel way to find new biomarkers, but further study is needed to validate the results.
Key words: Long non-coding RNAlncRNA    Colon cancer    Gene microarray    
0 引言

结肠癌是最常见的恶性肿瘤之一,占世界癌症相关死亡率的第三位[1]。在发达国家,将近1/4的结肠癌患者得到明确诊断时已处于晚期,治疗效果不理想,因此寻找合适的肿瘤标志物对患者的早期诊断和治疗具有重要意义。

人类基因组中只有不到2%的基因进行转录,而在这些转录基因中,高达98%的为非编码RNA(non-coding RNA,ncRNA)[2]。根据长度大小可将ncRNA分为短链ncRNA、中等长度ncRNA和长链非编码lncRNA(long non-coding RNA)[3]。lncRNA是一类转录本超过200 nt不编码蛋白质的RNA[4, 5],参与表观遗传调控、染色体重塑和基因表达调控等过程[6, 7],在肿瘤的发生、发展过程中发挥着重要作用。本研究通过微阵列显著性分析(significance Analysis of Microarrays,SAM)软件对NCBI数据库GEO database中结肠癌基因芯片表达数据进行分析,找出在结肠癌中有显著表达差异的lncRNA,给予重新注释,分析其表达情况,从而筛选结肠癌相关lncRNA。

1 材料和方法 1.1 基因芯片数据来源

从NCBI(美国国立生物技术信息中心)公共数据平台Gene Expression Omnibus(GEO)下载结肠癌基因芯片数据GSE41328(http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE41328)的原始数据压缩包,该数据包括10对结肠癌及配对正常组织,芯片平台为Affymetrix Human Genome U133 Plus 2.0 Araay(GPL570)。

1.2 SAM软件分析数据

在斯坦福大学网站下载R语言环境和SAM软件,安装并导入Microsoft Excel中。将原始数据压缩包解压为TXT文本文档格式,在Microsoft Excel打开,根据SAM软件使用说明将数据整理成分析需要格式,点击SAM,选择二分类非配对比较,进行t检验,输出结果图。将错误发现率(false discovery rate,FDR)控制在1%左右,折叠倍率(Fold Change)值控制在2以上,输出表达差异显著的基因。

1.3 探针重注释

SAM输出的基因均以探针ID表示,通过探针ID号在Ensembl、GeneCard、DAVID在线网站对基因重注释,或下载GPL570(http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL570)基因平台TEX文本,在其中找到对应探针的GB_ACC(GenBank Accession Number),根据GB_ACC在上述网站或GenBank找到基因名称。基因类型为lncRNA、antisense、LincRNA、sense_overlapping、sense intronic和processed_transcripts 方可认为是lncRNA[8]

1.4 Cluster和TreeView进一步验证lncRNA表达情况

将上述已明显注释的lncRNA从基因芯片中的原数据调出,导入Microsoft Excel,保存为TXT格式。在Cluster软件中打开此文件,设置合适参数,Cut-off为0.1,选择Hierarchical分析类型,输出CDT格式文档,导入TreeView,得到差异表达基因热图。

1.5 统计学方法

基因芯片数据采用SAM软件分析,以t检验为基础,根据排列组合的原理分析基因芯片的样本,导出检验统计量的理论抽样分布,通过控制FDR值来识别显著差异基因。

2 结果 2.1 结肠癌与正常组织差异表达基因

结肠癌与正常组织相比,共有6 801个差异表达基因,经过分析有66个明确注释的lncRNA,其中有22个高表达,44个低表达,所有lncRNA在结肠癌中表达值与正常组织中的表达值之比均大于2或小于0.5,差异有统计学意义(P均 < 0.05),见图 1表 1(略)。

The red dots and green dots represented the probes up-regulated and down-regulated,respectively 图 1 GSE41328数据的分析结果图 Figure 1 The SAM plot of GSE41328
2.2 Cluster和TreeView与SAM软件分析结果比较

将SAM软件分析的差异表达基因探针重注释后,发现66个lncRNA有明确官方名称。收集这66个lncRNA基因芯片数据进行Cluster和TreeView分析,导出的基因热图与SAM软件分析结果一致,见图 2

The samples with ID of GSM1014798 to GSM1014812,GSM1014808 to GSM1014812 were colon cancer tissues,and those from GSM1014803 to GSM1014807,GSM1014813 GSM1014817 were normal colon tissues 图 2 10对结肠癌及其配对正常组织中66个差异表达lncRNAs的基因热图 Figure 2 Clustering heat map of 66 differentially-expressed lncRNAs in 10 paired colon cancer tissues and normal tissues
3 讨论

研究发现,lncRNA与人类许多疾病有关,如肿瘤[9]、心血管疾病[10]、炎症[11]和自身免疫性疾病[12]等。目前,研究最多的是肿瘤相关lncRNA,如肺腺癌相关转录本1(metastasis associated lung adenocarcinoma transcript 1,MALAT1)、HOX转录反义RNA(HOX transcript antisense RNA,HOTAIR)和X染色体失活特异转录物(X chromosome inactivation,Xist)等[13, 14, 15]

目前,检测lncRNA的方法主要有tiling芯片、RNA-seq测序、lncRNA特异芯片等,但这些方法价格昂贵且需要特殊仪器。近年来,有研究发现通过分析现有的基因芯片数据,同样能挖掘lncRNA的表达,因为在商品化的基因芯片上,有一些探针能特异性匹配lncRNA[16, 17],通过将这些探针重注释,得到其匹配的lncRNA的名称。这种方法可以充分利用已有的丰富的基因芯片数据资源,同时也节省了研究费用。

Affymetrix Human Genome U133 Plus 2.0 Array包含54 675个探针,覆盖了预测的外显子序列,通过这些探针可以寻找编码蛋白质的RNA、非编码RNA和未明确注释的表达序列标签(expressed sequence tag,EST)。每一个探针通常对应1个独立的外显子,而较长的外显子可以有多个探针与之对应。利用SAM软件分析芯片原始数据可以得到每一个探针的Fold Change值,根据此值筛选差异表达的探针,从而得到其对应转录物的表达值。再运用Cluster和TreeView软件分析得到视觉化的基因表达热图,进一步验证差异表达基因的表达情况。

通过对结肠癌芯片数据的分析,筛选出66个差异表达的lncRNA,这些lncRNA虽然有明确名称,但大部分基因功能不明。已有研究发现结直肠癌差异表达基因(colorectal neoplasia differentially expressed,CRNDE)在神经胶质瘤、结直肠癌和腺瘤中高表达[18, 19],并参与PI3K/Akt/mTOR和Raf/MAPK信号通路,调控细胞的增殖和凋亡过程[20]。浆细胞瘤转化迁移基因(plasmacytoma variant translocation 1 gene,PVT1)在肝癌、胃癌、结肠癌和乳腺癌等肿瘤组织中高表达[21, 22, 23, 24],基因位置与c-myc相邻,两者相互协同促进肿瘤细胞的生长[25]。LINC00152在胃癌组织中高表达,与肿瘤侵袭相关[26],而其在结肠癌中的作用未见报道。由此可见,本研究可为筛选肿瘤相关lncRNA提供一个方便、快捷的途径。

综上所述,该研究采用生物信息学方法分析已有的结肠癌芯片数据,初步筛选出结肠癌差异表达lncRNA,提示lncRNA表达在结肠癌发生明显改变,但其结果有待于进一步实验验证,为寻找早期肿瘤标志物和治疗新靶点提供了新的思路,为后续实验奠定了基础。

参考文献
[1] Jemal A, Bray F, Center MM, et al. Global cancer statistics[J]. CA Cancer J Clin, 2011, 61(2): 69-90.
[2] Nie L, Wu H J, Hsu J M, et al. Long non-coding RNAs: versatile master regulators of gene expression and crucial players in cancer[J]. Am J Transl Res, 2012, 4(2): 127-50.
[3] Da Sacco L, Baldassarre A, Masotti A. Bioinformatics tools and novel challenges in long non-coding RNAs (lncRNAs) functional analysis[J]. Int J Mol Sci, 2012, 13(1): 97-114.
[4] Cao WJ, Wu HL, He BS, et al. Analysis of long non-coding RNA expression profiles in gastric cancer[J]. World J Gastroenterol, 2013, 19(23): 3658-64.
[5] Maruyama R, Suzuki H. Long noncoding RNA involvement in cancer[J]. BMB Rep, 2012, 45(11): 604-11.
[6] Guttman M, Rinn JL. Modular regulatory principles of large non-coding RNAs[J]. Nature, 2012, 482(7385): 339-46.
[7] Nagano T, Fraser P. No-nonsense functions for long noncoding RNAs[J]. Cell, 2011, 145(2): 178-81.
[8] Derrien T, Johnson R, Bussotti G, et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression[J]. Genome Res, 2012, 22(9): 1775-89.
[9] Yang G, Lu X, Yuan L. LncRNA: A link between RNA and cancer[J]. Biochim Biophys Acta, 2014, 1839(11): 1097-109.
[10] Wang N, Luo YH, Deng JC, et al. LncRNAs and cardiovascular diseases[J]. Sheng Li Ke Xue Jin Zhan, 2014, 45(3): 172-6. [王楠, 罗雨虹, 邓嘉成, 等. 长非编码RNA(lncRNA)与心血管疾病[J]. 生理科学进展, 2014, 45(3): 172-6.]
[11] Wang H, Cao Q, Ge J, et al. LncRNA-regulated Infection and Inflammation Pathways Associated with Pregnancy Loss: Genome Wide Differential Expression of lncRNAs in Early Spontaneous Abortion[J]. Am J Reprod Immunol, 2014, 72(4): 359-75.
[12] Mirza AH, Kaur S, Brorsson CA, et al. Effects of GWAS-Associated Genetic Variants on lncRNAs within IBD and T1D Candidate Loci[J]. PLoS One, 2014, 9(8): e105723.
[13] Gutschner T, H?mmerle M, Diederichs S. MALAT1 -- a paradigm for long noncoding RNA function in cancer[J]. J Mol Med (Berl), 2013, 91(7): 791-801.
[14] Wu Y, Zhang L, Wang Y, et al. Long noncoding RNA HOTAIR involvement in cancer[J]. Tumour Biol, 2014, 35(10): 9531-8.
[15] ChalignéR, Heard E. X-chromosome inactivation in development and cancer[J]. FEBS Lett, 2014, 588(15): 2514-22.
[16] Pérot P, Cheynet V, Decaussin-Petrucci M, et al. Microarray-based identification of individual HERV loci expression: application to biomarker discovery in prostate cancer[J]. J Vis Exp, 2013, (81):e50713.
[17] Michelhaugh SK, Lipovich L, Blythe J, et al. Mining Affymetrix microarray data for long non-coding RNAs: altered expression in the nucleus accumbens of heroin abusers[J]. J Neurochem, 2011, 116(3): 459-66.
[18] Ellis B C, Molloy P L, Graham LD. CRNDE: A Long Non-Coding RNA Involved in CanceR, Neurobiology, and DEvelopment[J]. Front Genet, 2012, 3: 270.
[19] Graham LD, Pedersen SK, Brown GS, et al. Colorectal Neoplasia Differentially Expressed (CRNDE), a Novel Gene with Elevated Expression in Colorectal Adenomas and Adenocarcinomas[J]. Genes Cancer, 2011, 2(8): 829-40.
[20] Ellis BC, Graham LD, Molloy PL. CRNDE, a long non-coding RNA responsive to insulin/IGF signaling, regulates genes involved in central metabolism[J]. Biochim Biophys Acta, 2014, 1843(2): 372-86.
[21] Wang F, Yuan JH, Wang SB, et al. Oncofetal long noncoding RNA PVT1 promotes proliferation and stem cell-like property of hepatocellular carcinoma cells by stabilizing NOP2[J]. Hepatology, 2014, 60(4): 1278-90.
[22] Ding J, Li D, Gong M, et al. Expression and clinical significance of the long non-coding RNA PVT1 in human gastric cancer[J]. Onco Targets Ther, 2014, 7: 1625-30.
[23] Takahashi Y, Sawada G, Kurashige J, et al. Amplification of PVT-1 is involved in poor prognosis via apoptosis inhibition in colorectal cancers[J]. Br J Cancer, 2014, 110(1): 164-71.
[24] Zhang Z, Zhu Z, Zhang B, et al. Frequent mutation of rs13281615 and its association with PVT1 expression and cell proliferation in breast cancer[J]. J Genet Genomics, 2014, 41(4): 187-95.
[25] Riquelme E, Suraokar MB, Rodriguez J, et al. Frequent coamplification and cooperation between C-MYC and PVT1 oncogenes promote malignant pleural mesothelioma[J]. J Thorac Oncol, 2014, 9(7): 998-1007.
[26] Pang Q, Ge J, Shao Y, et al. Increased expression of long intergenic non-coding RNA LINC00152 in gastric cancer and its clinical significance[J]. Tumour Biol, 2014, 35(6): 5441-7.
生物信息学方法筛选结肠癌相关长链
张新丽, 赵艳华, 李淑娜, 张文玲