Skip to content

Application crashes with AttributeError when embeddings are missing (S3 download returns 403) #114

@AaryanCode69

Description

@AaryanCode69

Description

While following the README instructions to set up the project locally, I encountered an issue where the application crashes on startup if the embeddings cannot be downloaded.

The failure appears to originate from two related problems:

  1. The embeddings cannot currently be downloaded from the public S3 bucket.
  2. When embeddings are missing, the application crashes with an opaque error instead of providing a helpful message.

Steps to Reproduce

  1. Clone the repository.
  2. Install dependencies:
poetry install
  1. Configure environment variables (.env) as described in the README.
  2. Attempt to list available embeddings:
./bin/embeddings_manager ls-remote

This returns a boto3 error:

403 Forbidden / AccessDenied

from the S3 bucket download.reactome.org.

  1. Attempt to install embeddings:
./bin/embeddings_manager install openai/text-embedding-3-large/reactome/Release91

This fails with the same error.

  1. Start the application:
docker compose up

Both containers exit with the following traceback:

AttributeError: 'NoneType' object has no attribute 'glob'

Observed Behavior

  • embeddings_manager ls-remote fails with a boto3 403 AccessDenied.
  • embeddings_manager install fails during the S3 download.
  • When the application starts without embeddings present, it crashes during initialization with:
AttributeError: 'NoneType' object has no attribute 'glob'

The error message does not indicate that the underlying issue is missing embeddings.


Expected Behavior

Ideally one of the following would occur:

  1. If the embeddings cannot be downloaded from S3, embeddings_manager should produce a clear error explaining the access failure.

Example:

ERROR: Unable to access embedding archive from S3 (403 AccessDenied).
Please verify bucket permissions or install embeddings manually.
  1. If embeddings are missing when the application starts, the server should fail with a clear message such as:
ERROR: No embeddings configured for 'reactome'.
Run 'bin/embeddings_manager install <model>/<db>/<version>' to install embeddings.

Alternatively, the affected profile could be disabled while allowing the server to start with reduced functionality.


Likely Cause

From debugging the startup process, it appears the crash occurs because:

  • EmbeddingEnvironment.get_dir() returns None when embeddings are not configured.
  • That value eventually propagates into the retriever initialization.
  • A later call to directory.glob() assumes the directory exists and triggers the AttributeError.

Impact

This currently blocks new contributors from running the application locally using the documented workflow:

ls-remote → install → docker compose up

Since the embeddings cannot be downloaded and the startup error does not explain the root cause, diagnosing the issue requires tracing through several internal modules.


Possible Improvements

Some potential improvements that might make this easier for users:

  • Add clearer error handling in embeddings_manager for S3 access failures.
  • Validate that embeddings exist during application startup and fail with a descriptive message.
  • Optionally allow the application to start while disabling profiles that require missing embeddings.

Happy to help implement a fix if this approach makes sense.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions