Dockerizing Your Tools for the CyVerse Discovery Environment
Steps
The following steps serve as a guide for Dockerizing a tool in the DE (see Figure 2 below).
Sample Dockerfile 1: hisat2
Dockerfile for installing hisat2 in a Docker container based on the Ubuntu:14.04.03 image:
FROM ubuntu:14.04.3
MAINTAINER Eric Lyons
RUN apt-get update && apt-get install -y \
build-essential \
git \
python
ENV BINPATH /usr/bin
ENV SRCPATH /usr/src
ENV HISAT2GIT https://github.com/infphilo/hisat2.git
ENV HISAT2PATH $SRCPATH/hisat2
RUN mkdir -p $SRCPATH
WORKDIR $SRCPATH
# Clone and checkout the 2.0.3-beta release version of the git repo
RUN git clone "$HISAT2GIT" \
&& cd $HISAT2PATH \
&& git checkout 3f8c81375700d4107fdfd1caeaec01b5719ae4b8
RUN make -C $HISAT2PATH \
&& cp $HISAT2PATH/hisat2 $BINPATH \
&& cp $HISAT2PATH/hisat2-* $BINPATH
ENTRYPOINT ["/usr/bin/hisat2"]
Sample Dockerfile 2: NCBI SRA Submission pipeline
Dockerfile for installing NCBI SRA Submission in a Docker container based on a python:2.7 base image.
FROM python:2.7-slim
WORKDIR /root
COPY requirements.txt ./
RUN set -x \
&& apt-get update \
&& apt-get install -y gcc libxml2-dev libxslt1-dev lib32z1-dev --no-install-recommends \
&& rm -rf /var/lib/apt/lists/* \
&& pip install -r requirements.txt \
&& apt-get purge -y --auto-remove gcc lib32z1-dev
# Download Aspera Connect client from http://downloads.asperasoft.com/connect2/
ADD http://download.asperasoft.com/download/sw/connect/3.5/aspera-connect-3.5.1.92523-linux-64.sh aspera-connect-install.sh
# Install Aspera Connect client to ~/.aspera
RUN chmod 755 aspera-connect-install.sh
RUN ./aspera-connect-install.sh \
&& rm aspera-connect-install.sh
COPY ncbi_sra_submit.py metadata_client.py ncbi_sra_report_download.py ./
VOLUME [ "/root/config", "/root/templates", "/root/schemas" ]
ENTRYPOINT [ "python", "/root/ncbi_sra_submit.py" ]
CMD [ "--help" ]Why are these good Dockerfiles? Both Dockerfiles adhered to best practices by using official Docker images and installing the tool from a reliable source.
Example of a poor Dockerfile
Here is a hypothetical example of how a poor Dockerfile looks like. Let's assume we have a tool named FooBar.
FROM test/Ubuntu
RUN apt-get upgrade
RUN apt-get update
RUN apt-get install -y emboss python
RUN wget https://someuniversity.edu/test_tool/test.py Why is this a poor Dockerfile? It used an untested/unofficial Docker image. It contains a step that fetches binaries from some server at a university.
In addition, no fail-fast was written to the Dockerfile. At some point in time, this server was taken offline, despite assurances that it would remain online forever. This causes the image build to fail because its binaries cannot be retrieved, and no errors were written to the Dockerfile.
Step 3: Build and test the Dockerized tool.
Before you request the installation of the Dockerized tool:
Build a new image: Use docker build command as shown below to test your new image:
docker build -t <your/docker-image> .
Test the new image: Use docker run command as shown below to test your new image:
docker run --rm <your/docker-image> <entrypoint arguments>Test the inputs and outputs. Your tool will most likely require inputs and produce outputs. A docker run command like the following example should be used to ensure your tool will run inside the DE. If the tool's image was built from a Dockerfile with an ENTRYPOINT and tagged your/docker-image, then place some test input files into a scratch directory (~/my-scratch-dir in this example), and run a command like the following:
docker run --rm -v ~/my-scratch-dir:/working-dir -w /working-dir your/docker-image user-input-1 user-input-2 ...The -v option mounts the scratch directory on the host machine into that /working-dir directory inside the container, and the -w option sets the working directory inside the container to that same /working-dir directory.
Note: The DE will run a tool's Docker image, using a combination of the docker run flags -w and -v in order to mount the Condor node's working directory to some arbitrary working directory inside the container. All inputs will be placed inside this working directory, and the DE expects the tool to generate outputs under this working directory as well.
Additionally, all arguments entered by the user in the DE's app interface will be passed as command-line arguments to the docker run command following the your/docker-image image name. From the example command above, these would be user-input-1 user-input-2 ... Exceptions are the Environment Variable fields, which will be passed to the docker run command as
-eflags.Also note that Reference Genome/Sequence/Annotation input arguments are passed to the tool differently from other arguments, so if your tool requires these types of inputs, please include that requirement on the form when you request installation.
If the tool's container produced outputs in that host's scratch directory, then this tool is ready for the next step (Request installation of the Dockerized tool in the DE).
Step 4: Request installation of the Dockerized tool.
Read the main steps in the DE user manual for submitting your request for installation of the new tool (executable) in the DE. Once the tool is installed, you will receive an email notification.
As noted in the previous step, if your tool requires Reference Genome/Sequence/Annotation input arguments, note that in your request.
Step 5: Create and save the new app interface in the DE.
Once the Dockerized tool is installed, go to the CyVerse wiki to learn how to design a new interface, preview it, and save the new app within the DE.
Step 6: Test your app in the DE.
After creating the new app according to your design, test your app in the DE to make sure it works properly.
If your app works the way you expect it to, skip to Optional steps.
If your app still needs a bit of work and If the changes you make affect your Dockerfile (for example, it uses a newer version of the software and subsequently new dependencies are created), go back to step 2 and repeat.
Optional steps
Complete the additional optional steps as needed for your tool.
Sharing (publishing) your app in the DE
Once the app is working to your satisfaction and you have published it, it is immediately available in your personal workspace in the DE and you can begin using it to run your own analyses. If you want to share it with other users, you can either keep it in your personal workspace and share it with selected users (including defining their permissions in the app), or share it with the public. For more information, see Sharing your App or Workflow and Editing the User Manual.
Editing an unshared app
If you have not yet shared the app with the public (that is, it is still listed in your Apps under development folder in your personal workspace), you can still edit the file and create a new Dockerfile. Then email CyVerse Support to replace the Dockerfile.
Deleting/editing a publicly shared app
Once you have shared an app with the public, it cannot be deleted because of CyVerse's commitment to supporting reproducible science. Because public apps cannot be edited once they have been made public, if you need to change the app you must create a new version of the app and then create a new Dockerfile. Learn more about editing apps.
Requesting a different category for your app
When you share your app with the public, you will indicate the category or categories into which you think it should be placed. To request that your app be moved or added to a different or additional category, email CyVerse Support with the app name, current category or categories, and desired target category or categories.
Examples of Dockerization of tools in the DE
Before you Dockerize a tool, it is important that you understand program dependencies (check the program documentation/manual thoroughly).
Example 1: Dockerizing a simple bioinformatics tool - Kallisto
The Kallisto Docker image was built on an Ubuntu-64 bit Virtual Machine using Virtual Box.
1. Install Docker:
wget -qO- https://get.docker.com/ | sudo sh2. Create a Dockerfile:
FROM ubuntu:14.04.3
MAINTAINER Kapeel Chougule
LABEL Description="This image is used for running Kallisto RNA seq quantification tool"
# Install dependencies
RUN apt-get update && apt-get install -y build-essential cmake zlib1g-dev libhdf5-dev
# Install git and clone the kallisto tool
RUN apt-get install --yes git
RUN git clone https://github.com/pachterlab/kallisto.git \
&& cd kallisto \
&& git checkout 5c5ee8a45d6afce65adf4ab18048b40d527fcf5c \
&& mkdir build \
&& cd build \
&& cmake .. \
&& make \
&& make install
ENTRYPOINT ["kallisto"]
3. Build a Docker image:
Docker build -t"=ubuntu/kallisto" .