Data? What to do with it.

Are you new to analyzing a lot of data?

If you are new to working with large batches of data, e.g. next gen sequencing (NGS) data, then you may want to start with the Discovery Environment. It gives you a GUI (graphical user interface) for a lot of analysis tools that run in Linux. Not used to working on a Linux or Unix system? Think of a time, maybe you saw it in some old movies, when minicomputers (yes, mini) looked like this:

They ran Unix or something close to it. Then operating systems and computers evolved and eventually you have a lot people using more powerful systems than this one, as portable phones – pocket size. So maybe you aren't thinking in terms of how computers worked back in simpler times. But the simpler operating system and software is a strength when it comes to analyzing large data files. Bigger, more complicated software often gets in the way. But you can have the advantage of running the software that runs on Unix, or it's more recently developed, open source version, Linux – without having to learn to use command line software. Running command line software looks like this:

[rogerab@hratius test3]$ bedtools intersect -wo -f 0.9 -r  -a features1.gff -b BAgenomeRay41_mkr_all5.gff > BAgenome_mkr_ovlp.gff

[rogerab@hratius test3]$ less BAgenome_mkr_ovlp.gff 

[rogerab@hratius test3]$ cut BAgenome_mkr_ovlp.gff -f 1,2,3,4,5,6,7,8,16 > BAgenome_mkr_ovrlap.gff

[rogerab@hratius test3]$ less BAgenome_mkr_ovrlap.gff 

 

You're a scientist, so you can learn how to do this, but if it's not what you're used to, and you don't want to take the time right now, then the Discovery Environment is for you. It works with your data, and if you are reading this, chances are you have moved your data into the Data Store, or are in the process of doing that.

The Discovery Environment – work with a graphical user interface (GUI)

Using the Discovery Environment gives you an easy to use work environment, but...

You can't ignore that the analyses you are doing in the Discovery Environment actually run in the command line environment of Linux. So you can't disobey certain basic rules that apply to Linux. You can't use spaces in names or directories. If you name a directory "My Sequence Files", in Linux, when you refer to this directory, it will read its name as "My". Everything after the first space is something else, not your directory name. And the system will not find your directory. So maybe you should name it something like "My_Sequence_Files" or "MySequenceFiles" instead. The same is true for file names. No spaces. Avoid any odd characters, too, like & or % or # or $, etc. This stuff often means something different in the Linux environment. 

Oh, and for actually using the Discovery Environment. Start by going here:

Using the Discovery Environment

 

Atmosphere - If you really need or prefer the command line environment.

 

If you prefer to work applications in their native setting, and need to use a lot of detailed settings, then maybe Atmosphere is the best place to get started. Take a look here:

About Atmosphere

 

AGAVE - If you need a very powerful HPC system

 

If you have a very large volume of data that requires an HPC system. Or when a supercomputer is absolutely what you need. The Agave system is a system that works through the Discovery Environment, through Atmosphere, or even can be accessed by running jobs from your own desktop or laptop computer. It provides a way of taking your data from the data store, setting up an appropriate analysis system, running the job, and then returning the results to your data store. It is not generally for new users, but if you are accessing it directly, but since many applications in the Discovery Environment make use of Agave, you can get the advantage without using a command line system. 

http://agaveapi.co/documentation/

 

 

Unable to render {include} The included page could not be found.