Locked History Actions

Admin/Data Integration

Data Integration for Local Instances

Built-in data files are critical for many Galaxy tools. This page will describe how to get data into your local instance of Galaxy in a general way. For instructions specific to our NGS (next-generation sequencing) tools, see NGS Local Setup.

There are several steps needed for adding a genome to Galaxy. The first is to get the actual data needed and to put it into an appropriate directory accessible to the Galaxy instance. Then you need to establish the particular .loc ("location") file. Finally, make sure that the genome is referenced in the $GALAXYROOT/tool-data/shared/ucsc/builds.txt file.

Get the data

First you need to determine what type of data you need. Usually these are .fasta, .nib, .2bit, or special index files, but each tool has a specific need. Open up the XML for the particular tool and identify the .loc file referred to in either a validator tag or options tag. Open the $GALAXYROOT/tool-data/<filename>.loc.sample file, and read it to discover the type of files necessary. Once you know what you need, you can go acquire it.

There are several ways to get the data. If you don't already have the right file on your system, you will need to get it from a site such as UCSC.

Make sure that the files are in a location accessible to your Galaxy instance.

Set up the loc file

Now look again at the .loc.sample file referred to in the last section to find instructions for setting up the actual .loc file. In all cases, there is one sequence per line, with tab-separated information. Create the relevant .loc file, add your sequences, and place it in the $GALAXYROOT/tool-data directory.

A couple of examples (with tabs separating the information):

alignseq.loc

align	anoGam1	dm1	/depot/data2/galaxy/anoGam1/align/dm1
align	anoGam1	dm2	/depot/data2/galaxy/anoGam1/align/dm2
align	canFam1	hg17	/depot/data2/galaxy/canFam1/align/hg17

faseq.loc

hg18	/depot/data2/galaxy/faseq/hg18
mm9	/depot/data2/galaxy/faseq/mm9
Arabidopsis	/depot/data2/galaxy/faseq/Arabidopsis

Add new genome as Galaxy build

The last thing you need to do is make sure that the genome you have added is in the Galaxy builds list. This is the list of builds that will appear in the Genome or Database/Build search box when you upload a file or change a file's metadata. All you need to do is add a line to the $GALAXYROOT/tool-data/shared/ucsc/builds.txt file for your sequence. The format of this line is the short form of the genome build (dbkey) followed by a tab and then a longer, more readable genome build name.

The following is what a few entries look like (with tabs separating the first and second parts):

hg18	Human Mar. 2006 (hg18)
mm9	Mouse July 2007 (mm9)
felCat3	Cat Mar. 2006 (felCat3)

Note that the dbkey that appears here needs to be identical to the one that appears in the loc file, including case.

Restart the server

Any time you change any of this type of information, the server has to be restarted.