An interactive course in 10 units for the members of the MPUSP (Max-Planck-Unit for the Science of Pathogens).
This Github repository contains all course materials. Issues can be reported here.
For comments, criticism, and general feedback please contact the authors at bioinformatics@mpusp.mpg.de
The course materials are a blend of own works and code examples from Justin Bois' Introduction to Programming in the Biological Sciences Bootcamp. The code examples from the Bootcamp course are released with the MIT License.
All other code examples and documentation added by the authors is also released under the MIT License, except where explicitly noted.
All example datasets are taken from published sources which are referenced in Data sources. These datasets are not covered by the MIT License, but are licensed under their respective terms.
In order to work with the course materials, you need to have Python, R, and optionally Quarto (a notebook framework) installed.
With pixi as package manager (recommended), you can create an environment with the required dependencies by running:
pixi init
pixi add python pandas matplotlib jupyter r-tidyverse r-irkernelOr simply use the provided pixi.toml file to activate the environment:
pixi shellWith conda/mamba, you can create an environment with all dependencies by running:
mamba create -p <dirname>/python-course python pandas matplotlib jupyter r-tidyverse r-irkernel
mamba activate python-courseWhen using Jupyter Notebooks (in VSCode), make sure to select the correct kernel (Python or R) for each notebook.
The R kernel for Jupyter is installed with the r-irkernel package, a detailed setup for VSCode can be found here.
When running a Quarto notebook, make sure that the R package reticulate is installed and configured to use the correct Python environment. You can specify the path to the Python binary by running the following command in the terminal:
export RETICULATE_PYTHON="/path/to/your/env/bin/python"- The course is structured into 10 lessons
- Each lesson covers a specific topic in bioinformatics with Python and R
- Each lesson contains background information, hands-on code examples, and exercises for self-study at the end
data/: Contains datasets used in the courselessons/: Contains the lesson notebooks, e.g. in Jupyter format (.ipynb)solutions/: Contains solutions to the exercises provided in the lessonstemplates: Contains templates for creating new lessons or exercisesREADME.md: Instructions and information about the course.gitignore: Specifies files and directories to be ignored by Git
gfmt_sleep\*: Beattie, L., Walsh, D., McLaren, J., Biello, S.M. and White, D., 2016. Perceptual impairment in face identification with poor sleep. Royal Society Open Science, 3(10). Released to Public domain, CC0.iris.data: Anderson, E., 1935. The irises of the Gaspe Peninsula. Bulletin of American Iris Society, 59, pp.2-5. CC-BY-4.0 License.ls_orchids.\*: Cox, A.V., Pridgeon, A.M., Albert, V.A. and Chase, M.W., 1997. Phylogenetics of the slipper orchids (Cypripedioideae, Orchidaceae): nuclear rDNA ITS sequences. Plant Systematics and Evolution, 208(3), pp.197-223. Freely available on NCBI.NC_005816.gb: Zhou, D., Tong, Z., Song, Y., Han, Y., Pei, D., Pang, X., Zhai, J., Li, M., Cui, B., Qi, Z. and Jin, L., 2004. Genetics of metabolic variations between Yersinia pestis biovars and the proposal of a new biovar, microtus. Journal of bacteriology, 186(15), pp.5147-5152. Freely available on NCBI.Jahn_eLife_2021.csv: Jahn, M., Crang, N., Janasch, M., Hober, A., Forsström, B., Kimler, K., Mattausch, A., Chen, Q., Asplund-Samuelsson, J., & Hudson, E. P. (2021). Protein allocation and utilization in the versatile chemolithoautotroph Cupriavidus necator. eLife, 10(e69019), 1–26. Freely available on Github, GPL v3.
Contributions to the course materials are very welcome! If you have suggestions for improvements, or want to report issues, please feel free to open an issue or submit a pull request on GitHub.