Skip to content

Implement a global lock#318

Draft
rwb27 wants to merge 8 commits intomainfrom
global-lock
Draft

Implement a global lock#318
rwb27 wants to merge 8 commits intomainfrom
global-lock

Conversation

@rwb27
Copy link
Copy Markdown
Collaborator

@rwb27 rwb27 commented Apr 22, 2026

This PR adds:

  • A server config key for the global lock
  • A class to implement the global lock
  • Context managers used when running actions or setting properties, to acquire said global lock
  • Arguments allowing actions or properties to opt out of the lock
  • Tests for global locking, in isolation and in a server
  • Error handling so that global lock related errors in properties result in helpful HTTP responses (these errors are already handled for Actions).

Properties will either use the lock for setting, or not use the lock. If functional properties should use the lock for reading, they can acquire it in the getter. This is the default behaviour I discussed briefly with @julianstirling based on the idea that reading properties shouldn't change state. If getting a property needs to access hardware, this should be protected by a lock specific to the hardware - the stage and camera already have this in OFM.

It's possible that something similar to the GlobalLock class would be useful to Thing subclasses for other locks, because it supports use as a context manager with a default timeout. Vanilla RLock instances will block forever, which can cause undesirable queuing.

We still need

  • some tests of the lock in the context of properties and actions, including the argument to opt out.
  • tests in the context of a ThingServer and using a TestClient so it's running in a real event loop.
  • error handling for HTTP requests.
  • create an OFM feature branch that enables locking.
  • a way for a Thing to require global locking (e.g. a thing setting?) - that may be best done after we decide on Remove feature flags #316 .
  • documentation!

Closes #101
OFM-Feature-Branch: v3-labthings-global-lock

@barecheck
Copy link
Copy Markdown

barecheck Bot commented Apr 23, 2026

@rwb27
Copy link
Copy Markdown
Collaborator Author

rwb27 commented Apr 27, 2026

I've now got passing tests with a TestClient however there is more to be done. Checking manually with a ThingClient and an HTTP server in a separate process gives different errors to the ones the test checks for. This suggests I may have found a difference between TestClient and actual HTTP, or that my test logic may be wrong.

Either way, I need to figure out how to handle the GlobalLockBusyError neatly, especially in the case of properties where there's not currently good error handling.

@rwb27
Copy link
Copy Markdown
Collaborator Author

rwb27 commented Apr 28, 2026

Either way, I need to figure out how to handle the GlobalLockBusyError neatly, especially in the case of properties where there's not currently good error handling.

Before I tidied up error handling for properties, the error handling behaved differently between the testclient and the HTTP server (the testclient-based test was raising GlobalLockBusyError instead of ClientPropertyError). I've now fixed this, and the behaviour is consistent. I believe this is down to how TestClient deals with unhandled exceptions, which is to propagate them back to the test code. This makes sense for a test suite, and the solution is not to have unhandled exceptions in property getter/setter code. My error handler means that GlobalLockBusyError exceptions now get handled by the FastAPI app, so there's no need to catch them in the Python code.

As has been mentioned elsewhere, it would be nice to add an integration test suite in addition to the OFM suite - largely because we can include more edge cases designed to test more rigorously. I am not proposing to do that here, as I think the TestClient version is already good enough, and adding an integration test suite will want proper review on its own.

rwb27 added 8 commits April 28, 2026 12:19
This commit adds:
* A server config key for the global lock
* A class to implement the global lock
* Context managers used when running actions or setting properties, to acquire said global lock

Properties will either use the lock for setting, or not use the lock. If functional properties should use the lock for reading, they can acquire it in the getter.

No documentation yet, but there are some tests for the lock.

We still need some tests of the lock in the context of properties and actions, including the argument to opt out.
I've added tests for the ThingServerInterface functions related to the lock. This revealed a bug that is now fixed.

I also added a convenience function to mock Thing instances, which provides a few attributes that are needed by the various descriptors and should help keep test suites tidy.
This now checks both directly and in the context of a `ThingServer` (simulated by a `TestClient`) that the global lock works as expected, if it is enabled.
I've now installed a FastAPI error handler that will provide a more
graceful error when the global lock is not available.

Errors are handled for actions by the normal mechanism, but for property
code any exceptions previously resulted in a 500 Internal Server Error
with no details.

I now explicitly catch `GlobalLockBusyError` and provide a 409 response with a message.
I thought it was elegant to call the lock context manager and bind just
the generator object it returns to the wrapped function. Unfortunately,
said generator object may not be re-used.

I have now added a test (that started off failing) to reproduce that
problem, and have changed the offending code to use a fresh
generator each time.

This was caught by the OFM test suite :)
Copy link
Copy Markdown
Contributor

@julianstirling julianstirling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice. I think the point about hold_global_lock(False) we should discuss in the call.

if global locking is enabled. That is appropriate if the action does
not have side effects that would cause problems for other actions, or
if more nuanced locking behaviour is required meaning the lock is
acquired directly in the action code.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be good to point to which function is used to acquire it directly?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer to ThingServerInterface.hold_global_lock

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've linked to hold_global_lock and also the section on the global lock (in preference to repeating that explanation here).

"""


class GlobalLockBusyError(TimeoutError):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this log as error or exception? Seems like the sort of thing where we wouldn't want the traceback as it is an "expected" error during use (or reasonably foreseeable misuse!).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. It doesn't subclass invocation error so currently it will dump a stack trace. You're right it would be better to handle this differently, at least if it occurs in the action wrapper (i.e. if an action can't start because the server is busy).

If the exception happens in user code I think a stack trace probably is helpful, and if not the user should probably handle it.

return self._get_server().global_lock

@contextmanager
def hold_global_lock(self, enabled: bool | None = True) -> Iterator[None]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confuses me, I am not really sure the context where I would want to run with anything except True?

I can see that within internal LabThings code it might be useful i the enabled is set at the variable level. I think the exposed close to the user base shouldn't have an enable as the line:

with tsi.hold_global_lock(False):

does not look like it would not try to get the lock.

I'd suggest the public function may to better to have the signature:

def hold_global_lock(self, *, error_if_unavailable:bool=True) -> Iterator[None]:

This would hold the None and then True case of the current implementation.

@rwb27 rwb27 added this to the v0.2.0 milestone Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Not clear how to lock a thing?

2 participants