Datasets are the inputs and outputs of each step in an analysis project in Galaxy. Datasets are associated with at least one History, which can be labeled, manipulated, and shared with anyone, whether they have a Galaxy account or not.
The tracking information associated with Datasets in a History represent an experimental record of the methods, parameters, and other inputs. These methods are easily extracted into Workflows, making an analysis pathway transparent, reproducible, and reusable.
Effectively managing datasets is important for general organization, collaboration, publishing, and for staying within the quotas set by the Main, Test, and other host instances.
Dataset Icons & Text
- Upper right corner
Display data in browser "eye icon"
Edit attributes "pencil icon"
Delete "'X' icon"
- Lower right corner
- Edit dataset tags
- Edit dataset annotation
- Upper left corner
- Dataset name
- Dataset size/number of lines (actual or estimated)
- format datatypes
- database
Info (optional)
- Lower left corner
- Download
- View Details
- Run this job again (optional)
- Display in trackster "Galaxy Track Browser (GTB)" (optional)
- display at UCSC main (optional)
- view in GeneTrack (optional)
- display at Ensembl Current (optional)
Data size and disk Quotas
- The size limit for a file loaded using FTP is 50G.
- The size limit for a job's output is (unrelated to quotas):
- The size limit for all data (quotas) on the Galaxy public servers is explained at:
- Administrative instructions for disk quotas
Format
- The format of a dataset is ideally defined by the assigned datatype attribute. Deviations in input dataset format are the first variable to examine when a tool (job) fails. Many of the tools in the "Text Manipulation" tool group can be used to both examine and correct a dataset's format to bring it into alignment with the assigned datatype attribute specification.
- To initially assign a dataset's datatype attribute, the uploaded/imported file can be specified with some import tools or be named with the appropriate file extension. To specify, modify or correct a dataset's datatype attribute after upload, click on the "pencil" icon
in the right corner of the dataset's box to reach the "Edit Attributes" form. Use the "Change data type" section of the form to make changes and click on "Save". Galaxy will modify the datatype and metadata. - To transform a dataset format (original -> new datatype attribute), use one of the many tools in the "Convert Formats" group.
- TIP The quickest way to locate tools that manipulate specific formats is to use the Tool Search (top of left Galaxy Tool panel, "Options" menu). For example, type in "M-A-F" to locate tools in the tool group 'Convert Formats' that transform to/from Multiple Alignment Format.
Visualize
- For many datatypes, clicking on the eye icon
for "Display data in browser" will display the contents or a preview of the contents in as unformatted text in the center pane (exceptions include compressed datatypes such as BAM). - Direct links to view a dataset within a browser may include:
Copy
- To copy the datasets within a history to another history, from the right history pane's top "Options" menu select "Copy Datasets". On the form in the center pane, specify the "From" and "To" history/histories.
- From: Select the datasets to be copied in the left column "Source History:".
- To: Select the location to copy the datasets in the right column "Destination History:".
- Options include a single existing history, multiple existing histories, or a newly created and named history.
- TIP to "Copy" a Hidden dataset (see below), in the "From" histories right pane, use "Option -> Show Hidden Datasets", then once the datasets refresh, use "This dataset has been hidden. Click _here_ to unhide."
Clone
- To clone a history is to create an exact copy of the prior history in one step. The new history will be named with the original history's name prefixed by "Clone of". Clone is the simplest way to manage datasets when some items in a history need to be retained but the remainder can be deleted (permanently, to reduce disk usage).
- Options are:
- "Clone all history items, including deleted items"
- "Clone only items that are not deleted"
- TIP One use of this option is to quickly retain some datasets and permanently delete others (to reduce disk use counted in user quota on Main or Test). First, in the History pane, in the original history, delete individual datasets by clicking on the "X" delete icon
if not to be Cloned, remember to delete Hidden datasets, (see below). Next, "Clone" the original History. Once complete, the cloned History will contain the datasets to be retained and the original History can be deleted permanently with "Options -> Saved Histories", select original History from the list, and clicking the button "Delete Permanently".
Hidden
- Datasets may be hidden in the default History view as a Workflow option. If you have run a workflow with hidden datasets, choose "Options -> Show Hidden Datasets" to view them.
- When using Clone (see above) to manage datasets to reduce disk usage for quotas, viewing and deleting hidden datasets can be a very important step. Unless deleted, hidden datasets are moved to the new cloned history.
- When using Copy (see above) to manage datasets to reduce disk usage for quotas, hidden datsets will not be in the "From" list of datasets available to transfer unless they are unhidden using "Option -> Show Hidden Datasets", then "This dataset has been hidden. Click__here_ to unhide."
Delete vs Delete Permanently
- Deleting Datasets and Histories
- Watch how it works in the Managing Histories screencast.
- Deleted datasets and histories can be recovered by users as they are retained in Galaxy for a time period set by the instance administrator. For the Galaxy public instances Main and Test, this is currently several months.
- Permanently deleted datasets and histories cannot be recovered by the user or administrator.
- Deleted datsets can be undeleted or permanently deleted using from the History pane "Options -> Show Deleted Datasets", and then: "This dataset has been deleted. Click _here_ to undelete or _here_ to immediately remove it from disk.".
- Quotas for Datasets and Histories
- Deleted datasets and deleted histories containing datasets are considered when calculating quotas on Main or Test.
- Permanently deleted datasets and permanently deleted histories containing datasets are not considered.
- Imported native Data Library datasets are not considered.
- Datasets can be associated with one or more History, but are only considered once.
- All copies of a dataset must be permanently deleted for it to not be considered.
- Active and Deleted histories can be permanently deleted using from the History pane "Options -> Saved Histories", then click on "Advanced Search", then click on "status: all". Check the box for the histories to be discarded and then click on the button "Permanently delete".
- WARNING Permanently deleted datasets and histories cannot be recovered by the user or administrator. The best way to avoid losing important data by accident is to clearly name all histories and important datasets.
- Name a dataset:
- Click on the "pencil icon"
in the right History pane) to reach the "Edit Attributes" form. Here a dataset's primary "Name", Info: , and "Annotation / Notes:" can be adjusted. - TIP Copying the Galaxy default "Name" into the "Info: field, then adding in a custom "Name" is one way to preserve the tool output original "Name: while still distinguishing one similarly named dataset from another. This can be useful when reviewing analysis steps and choosing which datasets to retain and which to remove when an analysis is under review or completed.
- Click on the "pencil icon"
- Name a history:
- Click near the top of the right history pane where the default text "Unnamed history" is located. Enter the new name and use the "enter/return" key on your computer.
- From the History pane use "Options -> Saved Histories", check the histories (one or more) to be renamed, then click on the bottom button "Rename". On the "Rename" form, "Current Name" is on the left, "New Name" is on the right. Edit "New Name" for each history then click on the button "Rename Histories".
- Name a dataset:

