Difference between revisions of "Galaxy Experiences"
From BioWiki
Jump to navigationJump to search (→Updating Galaxy: Added Galaxy resource list.) |
|||
(2 intermediate revisions by one other user not shown) | |||
Line 14: | Line 14: | ||
* Generic workflows: Publish workflows for common tasks that anyone can import | * Generic workflows: Publish workflows for common tasks that anyone can import | ||
* Galaxy pages: Create tutorials and training materials with embedded Galaxy objects | * Galaxy pages: Create tutorials and training materials with embedded Galaxy objects | ||
− | * Data Sharing: Use of galaxy's libraries to store and share data with users. As there is a concept of 'groups' we can share data with specific labs and projects. We have implemented specific file directories for each group so that command line users can place | + | * Data Sharing: Use of galaxy's libraries to store and share data with users. As there is a concept of 'groups' we can share data with specific labs and projects. We have implemented specific file directories for each group so that command line users can place their data there for easy upload to Galaxy's libraries without any data duplication |
== Additional Benefits == | == Additional Benefits == | ||
Line 30: | Line 30: | ||
** mysql support lacks many of the reporting features | ** mysql support lacks many of the reporting features | ||
** postgres is fully supported (and is used on the main galaxy site) and hence I would recommend setting it up from the get-go as transferring data between schemas is non-trivial | ** postgres is fully supported (and is used on the main galaxy site) and hence I would recommend setting it up from the get-go as transferring data between schemas is non-trivial | ||
+ | * Turn off debugging on the production server otherwise the paster.log can become very large. | ||
* Do not run the galaxy process as root as all jobs run by galaxy will be run by the user that launched the process. Having all jobs run as root is unsafe. We create a galaxy user account and run the process as that user and have all files owned by that user. | * Do not run the galaxy process as root as all jobs run by galaxy will be run by the user that launched the process. Having all jobs run as root is unsafe. We create a galaxy user account and run the process as that user and have all files owned by that user. | ||
* Genome data: galaxy should have script available to download these. We only download the genomes relevant to our users and create new chain files and 2bit files for custom genomes. | * Genome data: galaxy should have script available to download these. We only download the genomes relevant to our users and create new chain files and 2bit files for custom genomes. | ||
Line 36: | Line 37: | ||
* Data is not automatically deleted when the user deletes files from their history. Scripts are available to purge this data: use them in cron | * Data is not automatically deleted when the user deletes files from their history. Scripts are available to purge this data: use them in cron | ||
* There is an optimal order in which to execute these scripts, refer to the wiki | * There is an optimal order in which to execute these scripts, refer to the wiki | ||
− | Problem with users not deleting files: not trivial to link fields in the data store to individual users | + | * Problem with users not deleting files: not trivial to link fields in the data store to individual users |
== Updating Galaxy == | == Updating Galaxy == | ||
Line 44: | Line 45: | ||
* Read through the Galaxy wiki, particularly the Deploy Galaxy pages. | * Read through the Galaxy wiki, particularly the Deploy Galaxy pages. | ||
* Add your own datatypes, external data sources and export links | * Add your own datatypes, external data sources and export links | ||
+ | |||
+ | == Some Galaxy Resources == | ||
+ | |||
+ | * [http://galaxyproject.org Project home] | ||
+ | * [http://usegalaxy.org Free public server @ Penn State] | ||
+ | * [http://getgalaxy.org Local and Cloud installs] | ||
+ | * [http://galaxyproject.org/wiki/Support Support] | ||
+ | * [http://galaxyproject.org/wiki Project Wiki] | ||
+ | * [http://galaxyproject.org/search Project wide web search] | ||
+ | * [http://galaxyproject.org/GCC2012 2012 Galaxy Community Conference], July 25-27, Chicago, Illinois, US. |
Latest revision as of 11:19, 27 January 2012
Contents
If adding to the list, please add your institution here and flag your comment
- Default (not flagged): Bioinformatics Core, Wellcome Trust Centre for Cell Biology, Edinburgh, UK.
- Installed in our centre in 2007 and the 1st production server was rolled out in April 2008
Hardware on which it is installed
- Main server: 2x6 core (=24 logical core) 64GB RAM
- Two instances running on different ports, one for testing and the other for production
- Cluster: Under development
- Various desktop machines for the development of new tools
- May eventually add a cloud instance
Key uses in the core facility
- Rapid prototyping: ability to add a tool that is under development and push back the optimisation to the user by allowing the user to play around with parameters and different data sets.
- Generic workflows: Publish workflows for common tasks that anyone can import
- Galaxy pages: Create tutorials and training materials with embedded Galaxy objects
- Data Sharing: Use of galaxy's libraries to store and share data with users. As there is a concept of 'groups' we can share data with specific labs and projects. We have implemented specific file directories for each group so that command line users can place their data there for easy upload to Galaxy's libraries without any data duplication
Additional Benefits
- NGS centric: many tools come with galaxy wrappers
- Metadata on genome build (optional) and data type forces good data practices
- Any command line tool can be added fairly quickly: a few min for a simple XML wrapper to a morning for a more complicated interface.
Unresolved Issues
- Login via Apache:
- At the moment if authentication comes from apache, galaxy assumes that the user has permission to use galaxy and will set up an account with the email given my apache. This is why our group has not yet implemented it on our university cluster.
Advice for Initial Setup
- Database for logging jobs: can use sqlite [default], mysql and postgres.
- Sqlite will start to break as the load on the server increases
- mysql support lacks many of the reporting features
- postgres is fully supported (and is used on the main galaxy site) and hence I would recommend setting it up from the get-go as transferring data between schemas is non-trivial
- Turn off debugging on the production server otherwise the paster.log can become very large.
- Do not run the galaxy process as root as all jobs run by galaxy will be run by the user that launched the process. Having all jobs run as root is unsafe. We create a galaxy user account and run the process as that user and have all files owned by that user.
- Genome data: galaxy should have script available to download these. We only download the genomes relevant to our users and create new chain files and 2bit files for custom genomes.
Data Clean Up
- Data is not automatically deleted when the user deletes files from their history. Scripts are available to purge this data: use them in cron
- There is an optimal order in which to execute these scripts, refer to the wiki
- Problem with users not deleting files: not trivial to link fields in the data store to individual users
Updating Galaxy
- Fetch galaxy updates from a mercurial repository. Learn mercurial commands and how to merge/fork if implementing your own local changes to Galaxy code.
- Use diff command on .sample files to view changes to available tools, datatypes, environment parameters etc after each update
- Galaxy Tool Shed contains repositories of 3rd party tools to download and add to local instances
- Read through the Galaxy wiki, particularly the Deploy Galaxy pages.
- Add your own datatypes, external data sources and export links
Some Galaxy Resources
- Project home
- Free public server @ Penn State
- Local and Cloud installs
- Support
- Project Wiki
- Project wide web search
- 2012 Galaxy Community Conference, July 25-27, Chicago, Illinois, US.