From dd754c43aad79e3cc04eba1a2e34af7057c54194 Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 1 Apr 2026 12:03:35 +0100 Subject: [PATCH 01/21] Add initial rough outline of slides --- src/slides.md | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) create mode 100644 src/slides.md diff --git a/src/slides.md b/src/slides.md new file mode 100644 index 0000000..de5fc1c --- /dev/null +++ b/src/slides.md @@ -0,0 +1,56 @@ +--- +title: Reproducibility in Scientific Computing +authors: Jack Franklin & Marion Weinzierl +--- + +# Introduction: What is reproducibility? +- Reproducing results +- Portability +- + +# A likely scenario + +- You have just joined a new research group as a Student/Researcher/PI. +- The group use a custom pipeline/setup to perform their data analysis/simulations. +- You try to get the setup working on your local system/a new hpc system and... + It doesn't work! + +# Version Control + +- First put things into VC +- Then any changes/fixes can be tracked + +$-- Since git will be covered during the week we shouldn't need to do too much +$-- here. + +# Documentation + +- README +- User Docs +- Dev Docs +- Comments? + +# Dependencies + +- Basic documentation +- Project files (e.g. project.toml for python etc) +- System dependencies (nix/guix/docker?) + +# Testing +- Unit tests +- Integration tests +- Automating tests (CI etc) + +# FAIR Principles +- Findability +- Accessibility +- Interoperability +- Reuse + +$-- Maybe we should look at this retrospectively, and see what elements we covered and +$-- where we could improve on. +$-- Also a good way to talk about why these are good principles to start a project with +$-- since we can show that they avoid most/all the problems that we had to solve + +# Conclusion/Outlook + From 40b5d60a0343c1b02a1ee56f4663299e9895c607 Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 15 Apr 2026 13:20:14 +0100 Subject: [PATCH 02/21] Switch to quarto and separate slides out --- src/introduction.qmd | 12 ++++++++++++ src/{slides.md => slides.qmd} | 23 +++++++++++------------ 2 files changed, 23 insertions(+), 12 deletions(-) create mode 100644 src/introduction.qmd rename src/{slides.md => slides.qmd} (66%) diff --git a/src/introduction.qmd b/src/introduction.qmd new file mode 100644 index 0000000..b4075a5 --- /dev/null +++ b/src/introduction.qmd @@ -0,0 +1,12 @@ +## Introduction: What is reproducibility? + +- Reproducing results +- Portability + +## A likely scenario + +- You have just joined a new research group as a Student/Researcher/PI. +- The group use a custom pipeline/setup to perform their data analysis/simulations. +- You try to get the setup working on your local system/a new hpc system and... + It doesn't work! + diff --git a/src/slides.md b/src/slides.qmd similarity index 66% rename from src/slides.md rename to src/slides.qmd index de5fc1c..70ca9d4 100644 --- a/src/slides.md +++ b/src/slides.qmd @@ -1,24 +1,23 @@ --- title: Reproducibility in Scientific Computing -authors: Jack Franklin & Marion Weinzierl ---- -# Introduction: What is reproducibility? -- Reproducing results -- Portability -- +format: + revealjs: + theme: dark -# A likely scenario +authors: + - name: Jack Franklin + - name: Marion Weinzierl +--- -- You have just joined a new research group as a Student/Researcher/PI. -- The group use a custom pipeline/setup to perform their data analysis/simulations. -- You try to get the setup working on your local system/a new hpc system and... - It doesn't work! +{{< include introduction.qmd >}} # Version Control -- First put things into VC +- First put things into version control - Then any changes/fixes can be tracked +- The repository can then also be hosted a remote service (e.g. GitHub, GitLab, Codeberg, Bitbucket) +- This will make collaboration with other people a lot easier! $-- Since git will be covered during the week we shouldn't need to do too much $-- here. From 31fdea6dd859650c9f0d6523b0163a6d271bce09 Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 15 Apr 2026 16:08:59 +0100 Subject: [PATCH 03/21] Remove comments --- src/slides.qmd | 3 --- 1 file changed, 3 deletions(-) diff --git a/src/slides.qmd b/src/slides.qmd index 70ca9d4..44ea811 100644 --- a/src/slides.qmd +++ b/src/slides.qmd @@ -19,9 +19,6 @@ authors: - The repository can then also be hosted a remote service (e.g. GitHub, GitLab, Codeberg, Bitbucket) - This will make collaboration with other people a lot easier! -$-- Since git will be covered during the week we shouldn't need to do too much -$-- here. - # Documentation - README From 4b1111fee7df0063476d76a3aeeeb52f6302c273 Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 15 Apr 2026 16:09:36 +0100 Subject: [PATCH 04/21] Add all html files to gitignore --- .gitignore | 1 + 1 file changed, 1 insertion(+) create mode 100644 .gitignore diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..2d19fc7 --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +*.html From 184bf3b9a6ba97b4f2346dc32a78ac3a942c7c5b Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 22 Apr 2026 09:49:57 +0100 Subject: [PATCH 05/21] Update theme and add logo --- src/slides.qmd | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/slides.qmd b/src/slides.qmd index 44ea811..8df48a5 100644 --- a/src/slides.qmd +++ b/src/slides.qmd @@ -3,7 +3,8 @@ title: Reproducibility in Scientific Computing format: revealjs: - theme: dark + theme: night + logo: https://iccs.cam.ac.uk/sites/default/files/iccs_ucam_combined_reverse_colour.png authors: - name: Jack Franklin From 6671578b8dc0624a2169180d6250f517e0320581 Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 22 Apr 2026 09:59:36 +0100 Subject: [PATCH 06/21] Move slide sections to separate files --- src/dependencies.qmd | 5 +++++ src/documentation.qmd | 6 ++++++ src/fair_principles.qmd | 10 ++++++++++ src/slides.qmd | 38 ++++++-------------------------------- src/testing.qmd | 4 ++++ src/version_control.qmd | 11 +++++++++++ 6 files changed, 42 insertions(+), 32 deletions(-) create mode 100644 src/dependencies.qmd create mode 100644 src/documentation.qmd create mode 100644 src/fair_principles.qmd create mode 100644 src/testing.qmd create mode 100644 src/version_control.qmd diff --git a/src/dependencies.qmd b/src/dependencies.qmd new file mode 100644 index 0000000..f934df5 --- /dev/null +++ b/src/dependencies.qmd @@ -0,0 +1,5 @@ +## Dependencies + +- Basic documentation +- Project files (e.g. project.toml for python etc) +- System dependencies (nix/guix/docker?) diff --git a/src/documentation.qmd b/src/documentation.qmd new file mode 100644 index 0000000..9dae8ac --- /dev/null +++ b/src/documentation.qmd @@ -0,0 +1,6 @@ +## Documentation + +- README +- User Docs +- Dev Docs +- Comments? diff --git a/src/fair_principles.qmd b/src/fair_principles.qmd new file mode 100644 index 0000000..f9402f5 --- /dev/null +++ b/src/fair_principles.qmd @@ -0,0 +1,10 @@ +## FAIR Principles +- Findability +- Accessibility +- Interoperability +- Reuse + +$-- Maybe we should look at this retrospectively, and see what elements we covered and +$-- where we could improve on. +$-- Also a good way to talk about why these are good principles to start a project with +$-- since we can show that they avoid most/all the problems that we had to solve diff --git a/src/slides.qmd b/src/slides.qmd index 8df48a5..3206db6 100644 --- a/src/slides.qmd +++ b/src/slides.qmd @@ -13,41 +13,15 @@ authors: {{< include introduction.qmd >}} -# Version Control +{{< include version_control.qmd >}} -- First put things into version control -- Then any changes/fixes can be tracked -- The repository can then also be hosted a remote service (e.g. GitHub, GitLab, Codeberg, Bitbucket) -- This will make collaboration with other people a lot easier! +{{< include dependencies.qmd >}} -# Documentation +{{< include testing.qmd >}} -- README -- User Docs -- Dev Docs -- Comments? +{{< include documentation.qmd >}} -# Dependencies +{{< include fair_principles.qmd >}} -- Basic documentation -- Project files (e.g. project.toml for python etc) -- System dependencies (nix/guix/docker?) - -# Testing -- Unit tests -- Integration tests -- Automating tests (CI etc) - -# FAIR Principles -- Findability -- Accessibility -- Interoperability -- Reuse - -$-- Maybe we should look at this retrospectively, and see what elements we covered and -$-- where we could improve on. -$-- Also a good way to talk about why these are good principles to start a project with -$-- since we can show that they avoid most/all the problems that we had to solve - -# Conclusion/Outlook +## Conclusion/Outlook diff --git a/src/testing.qmd b/src/testing.qmd new file mode 100644 index 0000000..4b4b2ce --- /dev/null +++ b/src/testing.qmd @@ -0,0 +1,4 @@ +## Testing +- Unit tests +- Integration tests +- Automating tests (CI etc) diff --git a/src/version_control.qmd b/src/version_control.qmd new file mode 100644 index 0000000..75ee39a --- /dev/null +++ b/src/version_control.qmd @@ -0,0 +1,11 @@ +## Version Control + +- The first thing we should do is move our project into version control + +--- + +- The repository can then also be hosted a remote service (e.g. GitHub, GitLab, Codeberg, Bitbucket) +- This will make collaboration with other people a lot easier! + - It will also mean that any work done can be accessed + + From 53f6e21b6ea67cdfc43965aa37e4dbb2c0c89bce Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 29 Apr 2026 10:45:11 +0100 Subject: [PATCH 07/21] Add extra context to intro --- src/introduction.qmd | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/src/introduction.qmd b/src/introduction.qmd index b4075a5..09804bc 100644 --- a/src/introduction.qmd +++ b/src/introduction.qmd @@ -1,12 +1,19 @@ ## Introduction: What is reproducibility? -- Reproducing results -- Portability +There are different, conflicting definitions of what reproducibility is. +However, for this course we will take the following definition: + +*Reproducible*: Performing the same analysis on the same data produces the same +results ## A likely scenario - You have just joined a new research group as a Student/Researcher/PI. - The group use a custom pipeline/setup to perform their data analysis/simulations. - You try to get the setup working on your local system/a new hpc system and... - It doesn't work! + *It doesn't work!* + +## Where do we go from here... +Throughout the rest of this session, we will walk through the steps that we can +take to an ad-hoc collection of scripts into a reproducible scientific workflow From bc98ddd38b37394c0aa6971e05b826aba107467a Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 29 Apr 2026 10:46:03 +0100 Subject: [PATCH 08/21] Add build dir to gitignore --- .gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/.gitignore b/.gitignore index 2d19fc7..725afe5 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,2 @@ *.html +src/slides_files From e2adb0b3fcdadbb6bbe294bd0842e175180a742c Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 29 Apr 2026 12:57:30 +0100 Subject: [PATCH 09/21] Add section on what (not) to add to version control --- src/version_control.qmd | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/src/version_control.qmd b/src/version_control.qmd index 75ee39a..f42e0e7 100644 --- a/src/version_control.qmd +++ b/src/version_control.qmd @@ -1,11 +1,35 @@ ## Version Control - The first thing we should do is move our project into version control +- This way we never lose the original state of the project +- We can then try things without worrying about breaking anything! +- This will also benefit any later development, so the sooner the better ---- +## What to add to VC + +- DON'T do this: +``` bash +git add . +``` + +- Our repository should only contain: + - Code/scripts + - Documentation + - Metadata + - i.e. just text files + +There will be some exceptions to this rule, but for the vast majority of cases +it will be true. + +## What to add to VC + +- If you have large datafiles that are needed for your work, you should host + them separately (e.g. on Zenodo) and link to them in your repository + +## What to do next? - The repository can then also be hosted a remote service (e.g. GitHub, GitLab, Codeberg, Bitbucket) - This will make collaboration with other people a lot easier! - - It will also mean that any work done can be accessed +- It will also mean that any work done can be accessed by collaborators From fab00517a33df85dc1e01ec5e176ae8f1e99de47 Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 29 Apr 2026 13:51:47 +0100 Subject: [PATCH 10/21] Move scenario intro to separate file By separating out the walkthrough scenario, we can reuse these slides for shorter talks more easily. --- src/introduction.qmd | 5 ----- src/introduction_walkthrough.qmd | 7 +++++++ 2 files changed, 7 insertions(+), 5 deletions(-) create mode 100644 src/introduction_walkthrough.qmd diff --git a/src/introduction.qmd b/src/introduction.qmd index 09804bc..8f28d68 100644 --- a/src/introduction.qmd +++ b/src/introduction.qmd @@ -6,12 +6,7 @@ However, for this course we will take the following definition: *Reproducible*: Performing the same analysis on the same data produces the same results -## A likely scenario -- You have just joined a new research group as a Student/Researcher/PI. -- The group use a custom pipeline/setup to perform their data analysis/simulations. -- You try to get the setup working on your local system/a new hpc system and... - *It doesn't work!* ## Where do we go from here... diff --git a/src/introduction_walkthrough.qmd b/src/introduction_walkthrough.qmd new file mode 100644 index 0000000..7208b3a --- /dev/null +++ b/src/introduction_walkthrough.qmd @@ -0,0 +1,7 @@ +## A likely scenario + +- You have just joined a new research group as a Student/Researcher/PI. +- The group use a custom pipeline/setup to perform their data analysis/simulations. +- You try to get the setup working on your local system/a new hpc system and... + *It doesn't work!* + From 914a9a870d8d58d8c25c68f2536156b961ddd32d Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 29 Apr 2026 13:52:59 +0100 Subject: [PATCH 11/21] Add motivation for reproducibility + fixes --- src/introduction.qmd | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/src/introduction.qmd b/src/introduction.qmd index 8f28d68..0f2bd65 100644 --- a/src/introduction.qmd +++ b/src/introduction.qmd @@ -1,14 +1,29 @@ -## Introduction: What is reproducibility? +## What is reproducibility? -There are different, conflicting definitions of what reproducibility is. -However, for this course we will take the following definition: +For this course we will take the following definition: -*Reproducible*: Performing the same analysis on the same data produces the same -results +- *Reproducible*: + Performing the same analysis on the same data produces the same results +## Why is reproducibility important? +In the context of scientific computing/analysis, we want to be able to: + +- Verify our own results +- Verify the results of others + +By making our work reproducible, we ensure that both these things are not just +possible, but straightforward + +## Additional benefits + +- Safely implement changes +- Can perform workflow on different inputs more easily +- Simpler for new team members to get started +- Better collaboration ## Where do we go from here... Throughout the rest of this session, we will walk through the steps that we can -take to an ad-hoc collection of scripts into a reproducible scientific workflow +take to go from an ad hoc collection of scripts into a reproducible scientific +workflow! From 4df5a934693d3c23ccb9500fb030d15b4ab60fa3 Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 29 Apr 2026 14:09:52 +0100 Subject: [PATCH 12/21] Add extra points to "What to add to VC" --- src/version_control.qmd | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/version_control.qmd b/src/version_control.qmd index f42e0e7..79c0ac2 100644 --- a/src/version_control.qmd +++ b/src/version_control.qmd @@ -23,8 +23,11 @@ it will be true. ## What to add to VC -- If you have large datafiles that are needed for your work, you should host - them separately (e.g. on Zenodo) and link to them in your repository +- Large datafiles should be hosted separately (e.g. on Zenodo) +- External dependencies should be declared + - e.g. link to Zenodo dataset in docs and code +- Use .gitignore to automatically ignore any unwanted files + - e.g. build outputs ## What to do next? From 4bb4bdf5fab04bd16d26282187cbab2d892e4142 Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 29 Apr 2026 14:10:29 +0100 Subject: [PATCH 13/21] Add slide on git worktrees I have found these really useful when dealing with messy repositories in the past. --- src/version_control.qmd | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/src/version_control.qmd b/src/version_control.qmd index 79c0ac2..d684040 100644 --- a/src/version_control.qmd +++ b/src/version_control.qmd @@ -29,6 +29,17 @@ it will be true. - Use .gitignore to automatically ignore any unwanted files - e.g. build outputs +## Aside - testing with worktrees + +- git worktrees are like "local clones" of a repository +- Create a worktree: +``` bash +git worktree add -b +``` +- Will make a new directory, with only files that are tracked +- Can use as a cleanroom to ensure all dependencies are there +- For more info: `git worktree add --help` + ## What to do next? - The repository can then also be hosted a remote service (e.g. GitHub, GitLab, Codeberg, Bitbucket) From c4a7cef8ee2e7ba2d86f5af8cf1fc0e8f17a1e7b Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 29 Apr 2026 15:08:53 +0100 Subject: [PATCH 14/21] Add GitHub CI to build and deploy slides Used https://github.com/Cambridge-ICCS/Summer-school-Intro-Git as reference --- .github/workflows/deploy.yml | 43 ++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) create mode 100644 .github/workflows/deploy.yml diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml new file mode 100644 index 0000000..c1194c1 --- /dev/null +++ b/.github/workflows/deploy.yml @@ -0,0 +1,43 @@ +name: Build and deploy slides + +on: + pull_request: + branches: [ "main" ] + push: + branches: [ "main" ] + + # Allows manual run + workflow_dispatch: + +jobs: + # Builds slides with quarto and deploys them to a branch + build: + runs-on: ubuntu-latest + + steps: + - name: Checkout code + uses: actions/checkout@v4 + + - name: Set up Quarto + uses: quarto-dev/quarto-actions/setup@v2 + + - name: Render Quarto Project + run: | + cd src + quarto render slides.qmd + cd ../ + + - name: Test pages build + if: github.ref != 'refs/heads/main' + uses: JamesIves/github-pages-deploy-action@v4 + with: + branch: test-pages + folder: src + dry-run: true + + - name: Deploy pages for main + if: github.ref == 'refs/heads/main' + uses: JamesIves/github-pages-deploy-action@v4 + with: + branch: gh-pages + folder: src From bb3e37d1bd68757ef4218c74d1c79d6f97dd782f Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 29 Apr 2026 17:24:42 +0100 Subject: [PATCH 15/21] Fill out dependencies slides --- src/dependencies.qmd | 36 ++++++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-) diff --git a/src/dependencies.qmd b/src/dependencies.qmd index f934df5..ebcd772 100644 --- a/src/dependencies.qmd +++ b/src/dependencies.qmd @@ -1,5 +1,37 @@ ## Dependencies -- Basic documentation -- Project files (e.g. project.toml for python etc) +- All software has dependencies +- Some are more obvious than others: + - Data/input + - Packages/libraries e.g. numpy, Eigen + - System libraries + - Compiler/Interpreter +- If your code can't run without it, it's a dependency! + +## How to discover dependencies + +- Some dependencies may be "implicit" +- For example, you may have a library installed on your system +- Since the code "just works", you may not be aware of the dependency +- To find these, try running on a different system (or multiple) and see what breaks + +## How to declare dependencies + +- List them in a tracked file in the repository + - e.g. add a "Dependencies" section to your README.md +- Specify: + - Versions of each dependency e.g. numpy 2.3.9 + - Where/how to aquire the dependency + +## Dependency metadata + +- There are automated ways of resolving dependencies +- Usually language/tool specific +- Some tools automatically update dependency metadata + - e.g. Rust's cargo, Julia's Pkg, uv for Python + - Project file: Depencies and compatible versions + - Lock file: Write exact version (plus other metadata e.g. source) of *every* + dependency you are using + - Important to track both - lock files record the exact environment you use + - System dependencies (nix/guix/docker?) From ce67dd0244fb391e0eeda8485663dd3523c7d25a Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 29 Apr 2026 17:36:54 +0100 Subject: [PATCH 16/21] Add description of FAIR principles --- src/fair_principles.qmd | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/src/fair_principles.qmd b/src/fair_principles.qmd index f9402f5..d08bf7e 100644 --- a/src/fair_principles.qmd +++ b/src/fair_principles.qmd @@ -1,10 +1,13 @@ ## FAIR Principles -- Findability -- Accessibility -- Interoperability -- Reuse -$-- Maybe we should look at this retrospectively, and see what elements we covered and -$-- where we could improve on. -$-- Also a good way to talk about why these are good principles to start a project with -$-- since we can show that they avoid most/all the problems that we had to solve +- Findable: Software, and it's metadata, are easy for humans and machines to + find. +- Accessible: Software, and it's metadata, are retrievable via standardised + protocols. +- Interoperable: Software interoperates with other software by exchanging + data and/or metadata, and/or through interaction via a application + programming interfaces (APIs), described through standards. +- Reusable: Software is both usable (can be executed) and reusable (can be + understood, modified, built upon, or incorporated into other software). + +See: https://www.nature.com/articles/s41597-022-01710-x From 0f099dd5511f17d5bde49f4b94a0de3a5ffba5c4 Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 6 May 2026 18:00:04 +0100 Subject: [PATCH 17/21] Add content to testing slides --- src/dependencies.qmd | 6 +++++- src/testing.qmd | 42 +++++++++++++++++++++++++++++++++++++++--- 2 files changed, 44 insertions(+), 4 deletions(-) diff --git a/src/dependencies.qmd b/src/dependencies.qmd index ebcd772..1092566 100644 --- a/src/dependencies.qmd +++ b/src/dependencies.qmd @@ -34,4 +34,8 @@ dependency you are using - Important to track both - lock files record the exact environment you use -- System dependencies (nix/guix/docker?) +## System dependencies + +- Conda +- Docker +- Nix/Guix diff --git a/src/testing.qmd b/src/testing.qmd index 4b4b2ce..b959387 100644 --- a/src/testing.qmd +++ b/src/testing.qmd @@ -1,4 +1,40 @@ ## Testing -- Unit tests -- Integration tests -- Automating tests (CI etc) + +- Important to test code +- Check that code does what it should +- Test on inputs outside of the "normal" range +- Verify that results of code do not change +- Can also be used to check dependency changes + +## Unit tests + +- Test the smallest logical unit of the code +- Ensure each component works as intended +- Test functions for known results +- Compare to previously produced results + +## Integration tests + +- Test that components work together +- Try to have a range of complexity of tests +- Can use previous results to validate model +- Ensure no regression of results + +## Adding tests to a project + +- Often we inherit large projects with no unit tests +- How do we improve test coverage in this case? + 1. Create integration tests - use previous results or create "golden outputs" + 2. Identify and extract parts of the code which can be split apart + 3. Create unit tests for the new functions + 4. Run the integration tests to ensure results have not changed + 5. Repeat 2-4 until all code has unit tests + +- Whenever you change a part of the code, try to use this method +- Code coverage will slowly improve, with less extra work + +## Automating tests (CI etc) + +- Automate testing to ensure tests pass for every commit +- Also useful for tests that can take a long time/need lots of resources +- If hosting code on e.g. GitHub, GitLab etc, can use Continuous Integration (CI) From 39281f188bceb6db90e73810e678e81a4f928593 Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 6 May 2026 18:00:19 +0100 Subject: [PATCH 18/21] Add content on documentation --- src/documentation.qmd | 32 ++++++++++++++++++++++++++++---- 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/src/documentation.qmd b/src/documentation.qmd index 9dae8ac..c3bcfcf 100644 --- a/src/documentation.qmd +++ b/src/documentation.qmd @@ -1,6 +1,30 @@ +# Documentation + ## Documentation -- README -- User Docs -- Dev Docs -- Comments? +- Not all information can be conveyed in code +- We need to tell other people how to use our projects +- And sometimes ourselves! +- Documentation covers anything outside of the code/metadata + +## README + +- Markdown file at the project root +- Should contain: + - Description of project + - Dependencies + - Instructions on building/running + +## Comments + +- Comments in code are also another form of documentation +- Comments should: + - Explain *why* the code is doing something + - Give context that is external to the scope + +## Generating Docs + +- Use tools that generate docs from source code +- Single source of truth +- Comments/Docstrings embedded in code +- Reduce separation between code and docs From 531d2da4a5e0c8dd08698302d16742858f605ec8 Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Wed, 6 May 2026 18:14:56 +0100 Subject: [PATCH 19/21] Fixups on slides --- src/dependencies.qmd | 2 ++ src/fair_principles.qmd | 13 ++++++++++++- src/testing.qmd | 5 +++++ src/version_control.qmd | 4 +++- 4 files changed, 22 insertions(+), 2 deletions(-) diff --git a/src/dependencies.qmd b/src/dependencies.qmd index 1092566..a241419 100644 --- a/src/dependencies.qmd +++ b/src/dependencies.qmd @@ -1,3 +1,5 @@ +# Dependencies + ## Dependencies - All software has dependencies diff --git a/src/fair_principles.qmd b/src/fair_principles.qmd index d08bf7e..70754eb 100644 --- a/src/fair_principles.qmd +++ b/src/fair_principles.qmd @@ -1,12 +1,23 @@ -## FAIR Principles +# FAIR Principles + +--- - Findable: Software, and it's metadata, are easy for humans and machines to find. + +--- + - Accessible: Software, and it's metadata, are retrievable via standardised protocols. + +--- + - Interoperable: Software interoperates with other software by exchanging data and/or metadata, and/or through interaction via a application programming interfaces (APIs), described through standards. + +--- + - Reusable: Software is both usable (can be executed) and reusable (can be understood, modified, built upon, or incorporated into other software). diff --git a/src/testing.qmd b/src/testing.qmd index b959387..e5f30e6 100644 --- a/src/testing.qmd +++ b/src/testing.qmd @@ -1,3 +1,5 @@ +# Testing + ## Testing - Important to test code @@ -24,6 +26,9 @@ - Often we inherit large projects with no unit tests - How do we improve test coverage in this case? + +## Adding tests to a project + 1. Create integration tests - use previous results or create "golden outputs" 2. Identify and extract parts of the code which can be split apart 3. Create unit tests for the new functions diff --git a/src/version_control.qmd b/src/version_control.qmd index d684040..f8ac915 100644 --- a/src/version_control.qmd +++ b/src/version_control.qmd @@ -1,6 +1,8 @@ +# Version Control + ## Version Control -- The first thing we should do is move our project into version control +- The first thing we should do is move our project into version control (VC) - This way we never lose the original state of the project - We can then try things without worrying about breaking anything! - This will also benefit any later development, so the sooner the better From 8bef0a677787c141a5b46f7e593d85edb95e9f46 Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Thu, 7 May 2026 08:45:15 +0100 Subject: [PATCH 20/21] Add short conclusion section --- src/slides.qmd | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/src/slides.qmd b/src/slides.qmd index 3206db6..37d8ade 100644 --- a/src/slides.qmd +++ b/src/slides.qmd @@ -23,5 +23,25 @@ authors: {{< include fair_principles.qmd >}} -## Conclusion/Outlook +# Conclusion/Outlook +## Ingredients for reproducibility: + +- Version Control +- Dependency Metadata +- Public Accessibility + +## Even better if + +- Testing for: + - Verification + - Regression checks + +## Make it easy! + +- When starting from scratch, much easier to implement these as you go +- For a large project: + - Add to VC + - Document dependencies + - Follow best practice for new code + - Implement small improvements whenever modifying From 9bb254cfcdf7aba93600dc6ce6fce104ae13f854 Mon Sep 17 00:00:00 2001 From: jackdfranklin Date: Thu, 7 May 2026 08:50:47 +0100 Subject: [PATCH 21/21] Add summary of benefits to conclusion --- src/slides.qmd | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/src/slides.qmd b/src/slides.qmd index 37d8ade..ecfa7f9 100644 --- a/src/slides.qmd +++ b/src/slides.qmd @@ -25,6 +25,16 @@ authors: # Conclusion/Outlook +## Reproducibility is important + +Primary benefits: +- Confidence in scientific results +- Peer review/cross analysis + +Additional benefits: +- Allows for code resuse +- Better collaboration + ## Ingredients for reproducibility: - Version Control