北京大学生物信息平台论坛

 找回密码
 立即注册
搜索
热搜: 通知 活动

数据安全:保存中间数据、备份数据

[复制链接]
licheng 发表于 2015-12-23 13:33:56 | 显示全部楼层 |阅读模式
用云盘来备份、同步和分享数据、文档、照片等。

备份文件和数据,3/22/2014
【Cheng,2014.3】

我的建议是每周备份新产生的文件、程序、运行结果等。我自己经常用“Super Flexible File Synchronizer"把我所有的目录和文件镜像拷贝到USB盘上。一些经验是:

  • 把备份的数据放在多个不同的地方,比如办公室和家里各一份。
  • 可以利用网上云盘储存,比如用百度云盘或印象笔记,把整个文件目录打包后上传。
  • 如果数据不小心被删除,尽早用软件恢复。因为删除的只是硬盘目录中的索引,物理文件还可能被恢复。比如用安卓手机上的“数据恢复神器”(Hexamob Recovery),我今天就用它恢复了手机上不慎删除的照片;或者用PC上类似的软件,如EasyRecovery(我没有试过)。

【2016.1】使用我们自己服务器的同学,一定要每周备份数据(原始数据和中间数据)和代码到云盘或USB硬盘,否则一旦丢失数据/代码,可能让课题(毕业)延迟数月。


【Cheng,2014.3】下面的经验是关于花费很长时间的计算。一是先在少量数据上试验成功后再计算所有数据。二是在运行过程中监控程序,定时输出部分计算结果的文件,这样即使断电或程序崩溃,部分计算的结果还在,重新算时不用从头开始。


【Cheng】Protecting computed data
Jun 2010


Recent experiences highlight the importance of protecting our computed results, which could take days to generate but are susceptible to power outage, hard disk failures, or a crash of the code. Here are some tips; your suggestions are welcome.
  • Don't wait till the end of computation to write your results to a file. Instead, periodically (e.g. for every 1000 genes) write out the current data and replace the output file. Or use the "append" option of write.table() in R to incrementally write out the newly computed results. This can be done with R code like:
for (i in 1:10000) {
     # your computation code
     if (i %% 1000 == 0) { # modulo operation to get remainder
         print(i)        # can be replaced by write.table
     }
}
[1] 1000
[1] 2000
[1] 3000
[1] 4000
  • Use similar code above to monitor the progress (print out i if looping through genes) of your code. So you estimate how long the code will run, and try to use efficient algorithms to reduce the computing time. A factor of 10 times reduction by using clever algorithms is not uncommon. Less computing time means less chance of interruption. E.g. utilize vector and matrix operations in R rather than loops or gene-wise R functions. More links:
  • Your code writes out multiple copies of the result file to different physical hard disks
  • Backup your code, write-ups, and result files often or regularly. I use Super Flexible File Synchronizer. You may also Gmail important files to yourself.
  • Sometimes we also receive email notices about power outage at particular nights, or system upgrade of the cluster. Lean on the cautious side.


Date: 2015-06-12 22:40 GMT+08:00
校园网通知:关于CryptoLocker等恶意勒索软件防范的安全通知
https://its.pku.edu.cn/announce/tz20150603135011.jsp


(微信公众号:wegenome)

回复

使用道具 举报

yup 发表于 2016-1-13 08:37:27 来自手机 | 显示全部楼层
个人觉得备份数据也要有层次结构,平时养成清晰、详细的工作日志习惯,以及定期的进展总结,可以起到提纲挈领的作用。对最重要的数据,写成文章发表是最好的备份了,如果足够好,全世界的数据库都帮你备份了,
回复 支持 反对

使用道具 举报

北京大学生物信息平台论坛

GMT+8, 2017-9-24 00:26 , Processed in 0.085592 second(s), 22 queries .

Powered by Discuz! X3

© 2001-2013 Comsenz Inc.

快速回复 返回顶部 返回列表