Difference between revisions of "Galaxy Experiences"

Revision as of 03:45, 26 July 2011

Default (not flagged): Bioinformatics Core, Wellcome Trust Centre for Cell Biology, Edinburgh, UK.
- Installed in our centre in 2007 and the 1st production server was rolled out in April 2008

Main server: 2x6 core (=24 logical core) 64GB RAM
- Two instances running on different ports, one for testing and the other for production
Cluster: Under development
Various desktop machines for the development of new tools
May eventually add a cloud instance

Rapid prototyping: ability to add a tool that is under development and push back the optimisation to the user by allowing the user to play around with parameters and different data sets.
Generic workflows: Publish workflows for common tasks that anyone can import
Galaxy pages: Create tutorials and training materials with embedded Galaxy objects
Data Sharing: Use of galaxy's libraries to store and share data with users. As there is a concept of 'groups' we can share data with specific labs and projects. We have implemented specific file directories for each group so that command line users can place their data there for easy upload to Galaxy's libraries without any data duplication

NGS centric: many tools come with galaxy wrappers
Metadata on genome build (optional) and data type forces good data practices
Any command line tool can be added fairly quickly: a few min for a simple XML wrapper to a morning for a more complicated interface.

Login via Apache:
- At the moment if authentication comes from apache, galaxy assumes that the user has permission to use galaxy and will set up an account with the email given my apache. This is why our group has not yet implemented it on our university cluster.

Database for logging jobs: can use sqlite [default], mysql and postgres.
- Sqlite will start to break as the load on the server increases
- mysql support lacks many of the reporting features
- postgres is fully supported (and is used on the main galaxy site) and hence I would recommend setting it up from the get-go as transferring data between schemas is non-trivial
Do not run the galaxy process as root as all jobs run by galaxy will be run by the user that launched the process. Having all jobs run as root is unsafe. We create a galaxy user account and run the process as that user and have all files owned by that user.
Genome data: galaxy should have script available to download these. We only download the genomes relevant to our users and create new chain files and 2bit files for custom genomes.

Data is not automatically deleted when the user deletes files from their history. Scripts are available to purge this data: use them in cron
There is an optimal order in which to execute these scripts, refer to the wiki
Problem with users not deleting files: not trivial to link fields in the data store to individual users

Fetch galaxy updates from a mercurial repository. Learn mercurial commands and how to merge/fork if implementing your own local changes to Galaxy code.
Use diff command on .sample files to view changes to available tools, datatypes, environment parameters etc after each update
Galaxy Tool Shed contains repositories of 3rd party tools to download and add to local instances
Read through the Galaxy wiki, particularly the Deploy Galaxy pages.
Add your own datatypes, external data sources and export links

@@ Line 14: / Line 14: @@
 * Generic workflows: Publish workflows for common tasks that anyone can import
 * Galaxy pages: Create tutorials and training materials with embedded Galaxy objects
-* Data Sharing:  Use of galaxy's libraries to store and share data with users. As there is a concept of 'groups' we can share data with specific labs and projects. We have implemented specific file directories for each group so that command line users can place there data there for easy upload to Galaxy's libraries without any data duplication
+* Data Sharing:  Use of galaxy's libraries to store and share data with users. As there is a concept of 'groups' we can share data with specific labs and projects. We have implemented specific file directories for each group so that command line users can place their data there for easy upload to Galaxy's libraries without any data duplication
 == Additional Benefits ==
@@ Line 36: / Line 36: @@
 * Data is not automatically deleted when the user deletes files from their history. Scripts are available to purge this data: use them in cron
 * There is an optimal order in which to execute these scripts, refer to the wiki
-Problem with users not deleting files: not trivial to link fields in the data store to individual users
+* Problem with users not deleting files: not trivial to link fields in the data store to individual users
 == Updating Galaxy ==