北京大学生物信息平台论坛

 找回密码
 立即注册
搜索
热搜: 通知 活动

R语言加速运行和读大数据集

[复制链接]
licheng 发表于 2015-12-23 13:41:16 | 显示全部楼层 |阅读模式
From: Cheng Li
Date: Thu, Nov 19, 2015 at 10:56 AM

R语言加速运行的办法:
http://www.seekingqed.com/programming/r/accelerate_r

http://blog.sciencenet.cn/blog-285393-887611.html

http://www.zhihu.com/question/29263036


有一本叫做the R inferno的书,作者是Patrick Burns,列举了R语言编程中容易出现的各种小麻烦和解决方法:
http://www.burns-stat.com/documents/books/the-r-inferno/
www.burns-stat.com/pages/Tutor/R_inferno.pdf


http://arxiv.org/abs/1503.00855v1
【迷你书:R语言加速指南】《How to speed up R code: an introduction》Nathan Uyttendaele (2015)

R中我们常会建立一个data.frame并向它添加新数据行预先估计和分配这个数据框需要的内存加快速度

http://stackoverflow.com/questions/20689650/how-to-append-rows-to-an-r-data-frame



Load large data set into R
2010

Q: I am trying to load a 193 by ~1million data set into R. Apparently it is too big for R to handle. So, I tried to use the "FF" package, but with no luck either. So i am wondering if there is a easy way to load this data into R.

A: See this link for memory management in R:
http://ggorjan.blogspot.com/2008/12/memory-limit-management-in-r.html

This matrix needs 1.5 GB memory (R uses 8 bytes (double float) for each number). So if your computer has 2GB it will be an issue.

> memory.size()
[1] 10.58218
> memory.limit()
[1] 3583.875
> a <- matrix(0, nrow=193, ncol=1e6)
> memory.size()
[1] 1482.903
> memory.limit()
[1] 3583.875


You may use these options of read.table() to read only a subset of data (e.g. 1000 rows) for the purpose of designing and writing code:
nrows
integer: the maximum number of rows to read in. Negative and other invalid values are ignored.
skip
integer: the number of lines of the data file to skip before beginning to read data.

When the code is ready, apply them to the full data set. In the meantime let's explore how to handle large data set in R, e.g. sql database, ff package:

http://n4.nabble.com/Large-data-sets-with-R-binding-to-hadoop-available-td850849.html

http://yusung.blogspot.com/2007/09/dealing-with-large-data-set-in-r.html

Cheng



回复

使用道具 举报

北京大学生物信息平台论坛

GMT+8, 2017-11-19 22:07 , Processed in 0.085194 second(s), 27 queries .

Powered by Discuz! X3

© 2001-2013 Comsenz Inc.

快速回复 返回顶部 返回列表