fix: retirement PII leaks by redacting pending secondary email/name data by ktyagiapphelix2u · Pull Request #38427 · openedx/openedx-platform

ktyagiapphelix2u · 2026-04-23T10:56:36Z

Summary

This PR closes privacy gaps in the retirement flow related to secondary email and associated pending records in the LMS.

It ensures all retirement entry points consistently redact sensitive fields before deletion, so no plaintext PII is retained after records are removed.

Ticket & Reference

https://2u-internal.atlassian.net/browse/BOMS-499

robrap

Thanks. Some comments to get started...

robrap · 2026-04-24T17:05:09Z

-    .. pii: Contains new_secondary_email, not currently retired
+    .. pii: Contains new_secondary_email, redacted in `DeactivateLogoutView`
    .. pii_types: email_address
-    .. pii_retirement: retained


@bmedx: [inform] This doesn't seem like it would have been intentionally retained, so I'm fine with calling this a bug and just fixing. Any objections?

robrap · 2026-04-24T17:20:10Z

+        account_recovery.secondary_email = f"redacted+{user_id}@redacted.com"
+        account_recovery.is_active = False
+        account_recovery.save(update_fields=['secondary_email', 'is_active'])
+        account_recovery.delete()


@ktyagiapphelix2u: Is it true that the recovery email itself is not yet redacted? I thought it was just pending secondary email updates. If so, I think secondary (recovery) email and pending secondary email should be separate PRs.

Also, it look like you wish to delete where we weren't previously doing so. @bmedx: Any strong opinions either way? In general are we redacting and leaving things in place?

@ktyagiapphelix2u: You did not respond on this comment in particular, but from elsewhere, I think you want this all to stay together, right?

Also, I missed the earlier cls.objects.get(user_id=user_id).delete(), so we're not changing anything, but just adding redaction, right? Again, maybe use the simpler redact-before-delete email I proposed earlier?

Thanks for the follow-up. To address your points:

Same PR- Yes, both changes (recovery email + pending secondary email) address the same retirement gap and follow the same redact-before-delete pattern, so keeping them together makes sense.

Simplified redact-before-delete, setting is_active = False before delete() was pointless (the row is gone immediately after). I've removed it, so retire_recovery_email now matches the simpler pattern used in redact_and_delete_pending_secondary_email, redact the email field, save, then delete.

robrap · 2026-04-24T17:23:41Z

        """
-        Retire user's recovery/secondary email as part of GDPR Phase I.
+        Redact user's recovery/secondary email as part of GDPR Phase I.
        Returns 'True'


No longer returns True in all cases. Fix.

got it implemented

robrap · 2026-04-24T17:24:24Z

    def retire_recovery_email(cls, user_id):
        """
-        Retire user's recovery/secondary email as part of GDPR Phase I.
+        Redact user's recovery/secondary email as part of GDPR Phase I.


This should "Redact and delete" if we leave the delete.

DONE IMPLEMENTED

robrap · 2026-04-24T17:28:18Z

        # Assert that there is no longer a secondary/recovery email for test user
        assert len(AccountRecovery.objects.filter(user_id=self.test_user.id)) == 0

+    def test_user_can_deactivate_pending_secondary_email_change(self):


I'm holding off on reviewing tests. I'm getting confused between the pending and actual records, and this will be simpler when the PRs are split. Also, feel free to get an approval from the team first.

Helpful Info

This PR fixes a PII leakage issue affecting two related models that store secondary email data at different lifecycle stages. The AccountRecovery model stores confirmed/active recovery emails, while PendingSecondaryEmailChange stores unconfirmed email change requests (before the user clicks the activation link).

During user retirement, we must handle both models because a user can have: (1) only a confirmed email, (2) only a pending change request, (3) both (when changing their existing recovery email to a new one), or (4) neither.

Both models require the same fix pattern: redact the email field to prevent PII from syncing to downstream systems save the redacted record, then delete it. The methods are idempotent, they return True if no record exists. For review clarity, I recommend examining the AccountRecovery changes first (confirmed emails), then PendingSecondaryEmailChange (pending emails), and finally the retirement flow that calls both methods.

Akanshu-2u · 2026-04-28T06:24:46Z

+    pending_secondary_email_change.new_secondary_email = get_retired_email_by_email(
+        pending_secondary_email_change.new_secondary_email
+    )
+    pending_secondary_email_change.save(update_fields=['new_secondary_email'])


activate_secondary_email is a normal user action (clicking the confirmation link), not a retirement path. get_retired_email_by_email() produces a retired_<hash>@retired.invalid value. If a downstream system snapshots this deletion, it will see a retirement-style hashed email even though the user was never retired — this will produce confusing audit trails and could trigger false-positive retirement detection in downstream systems.

This change does not belong in this view. The soft-delete protection goal is valid, but the redaction function must not be get_retired_email_by_email() on a non-retired user. Either use a neutral placeholder, or remove this from the activation path entirely.

right—activate_secondary_email is a normal user action, not a retirement flow. Using get_retired_email_by_email() here would incorrectly mark a regular user as “retired” in downstream systems, which could cause confusion or false positives.

I’ve removed the redaction from this path, so now the pending secondary email is just deleted after activation, with no retirement-style hashing. Redaction only happens in actual retirement flows.

What about something like using completed@delete-pending.com? If the user is later retired, this soft-delete would persist, and isn’t that the case we’re trying to fix?

@ktyagiapphelix2u: Did you see my comment? Can you acknowledge and respond? Thank you.

I've implemented your suggestion. The pending secondary email is now overwritten with "completed@delete-pending.com" before deletion.

does not use get_retired_email_by_email(), so it won't be mistaken for a retirement hash or trigger false-positive retirement detection downstream

Akanshu-2u · 2026-04-28T06:29:04Z

+            pending_secondary_email = cls.objects.get(user_id=user_id)
+        except cls.DoesNotExist:
+            return True
+        pending_secondary_email.new_secondary_email = f"redacted+{user_id}@redacted.com"


But PendingEmailChange redaction (PR #38426) and management.py activation both use get_retired_email_by_email(). The edX retirement system has a standardized hashing approach for a reason — it's deterministic, auditable, and consistent across the retirement pipeline. Using a raw format here:

Produces .com domain (use .invalid if using a custom format)
Is inconsistent with the rest of the retirement system in this same codebase
Means downstream audits see two different redaction patterns for the same retirement flow
Both retire_recovery_email and redact_and_delete_pending_secondary_email should use get_retired_email_by_email(record.secondary_email) / get_retired_email_by_email(record.new_secondary_email) to stay consistent.

I'm not finding the PR description useful. I wish you'd simply write yourself the simple list of cases that you are fixing.

I'm not clear if we are fixing the redact-before-delete for pending records that are activated during normal flow, and not during user retirement. I'm not seeing that (maybe I just missed it), but I thought that was a problem. and that we'd be calling the same code.

Because we are simply redacted a deleted record that should never be seen, unless someone goes and looks at the soft-deleted records in downstream (in our case Snowflake), the redacted value is less important to be using the get_retired_email_by_email() function. I could imagine a simple hard-coded string like redact-before-delete@redacted.com.

is_active = False removed pointless before delete(), dropped from retire_recovery_email

get_retired_email_by_email() used in both retirement methods — consistent with the rest of the retirement pipeline

Activation path uses neutral placeholder "completed@delete-pending.com" not a retirement hash, no false-positive signals

Docstrings updated "Redact and delete", correct return type, proper descriptions

except DoesNotExist now returns True consistent early-return pattern in both methods

Unused import removed get_retired_email_by_email import dropped from management.py

Akanshu-2u · 2026-04-28T06:32:48Z

+            pending_email = PendingEmailChange.objects.filter(user=user).first()
+            if pending_email:
+                pending_email.new_email = get_retired_email_by_email(pending_email.new_email)
+                pending_email.save(update_fields=['new_email'])


PR #38426 adds PendingEmailChange.redact_pending_email_by_user_value(user, field="user") to the same location. This PR re-implements the same logic inline instead of using that class method. These two PRs will conflict on merge and the inline approach bypasses the single-responsibility method created in #38426.

This inline block must be removed redact_pending_email_by_user_value() from #38426 is the correct abstraction. If this PR merges first, the inline code must be replaced with that method call once #38426 merges.

Done. The inline block is replaced with PendingEmailChange.redact_pending_email_by_user_value(user, field="user") from PR #38426, which is the correct abstraction. The get_retired_email_by_email import is kept since it's still used elsewhere in the file.

Akanshu-2u · 2026-04-28T06:40:21Z

+        PendingSecondaryEmailChange.redact_and_delete_pending_secondary_email(user_id=user.id)
+        assert len(PendingSecondaryEmailChange.objects.filter(user_id=user.id)) == 0


Only checks the record was deleted. Doesn't assert:

The email was actually redacted (not just deleted directly)

Redaction happened before deletion (the ordering guarantee this PR is built around)

The test should verify the redacted value was persisted and consistent.

Both tests now capture the value passed to save() via a mock side-effect, then assert:

The record is deleted
The saved value differs from the original (redaction happened)
The original domain is absent from the saved value (it's not a partial redaction)
Also applied the same improvement to test_retire_recovery_email, which had the identical gap.

Akanshu-2u · 2026-04-28T06:42:09Z

+    call_command('retire_user', username=user.username, user_email=user.email)
+
+    assert not PendingSecondaryEmailChange.objects.filter(user=user).exists()


Only verifies deletion — does not verify the email was redacted before the delete happened. Use the same mock.patch.object side_effect pattern from PR #38425's updated test_retire_user_redacts_sso_pii_before_deletion .

sorted implemented your suggestion

robrap

@ktyagiapphelix2u: I added some top-level comments before starting this review. Please respond to those as well. Thank you.

robrap · 2026-04-29T23:12:05Z

+        Note: "Retire" here refers to retiring this user's data, not the recovery email
+        feature itself, which remains available for new accounts.


Not sure what "the recovery email feature itself" means. Here is my proposed update:

Suggested change

Note: "Retire" here refers to retiring this user's data, not the recovery email

feature itself, which remains available for new accounts.

Note: In retire_recovery_email, "retire" means this is part of user retirement.

The recovery email itself remains available for use with future accounts.

robrap · 2026-04-29T23:19:21Z

+        account_recovery.secondary_email = f"redacted+{user_id}@redacted.com"
+        account_recovery.is_active = False
+        account_recovery.save(update_fields=['secondary_email', 'is_active'])
+        account_recovery.delete()


@ktyagiapphelix2u: You did not respond on this comment in particular, but from elsewhere, I think you want this all to stay together, right?

Also, I missed the earlier cls.objects.get(user_id=user_id).delete(), so we're not changing anything, but just adding redaction, right? Again, maybe use the simpler redact-before-delete email I proposed earlier?

robrap · 2026-04-29T23:21:20Z

+        account_recovery.is_active = False
+        account_recovery.save(update_fields=['secondary_email', 'is_active'])


Messing with is_active is simply confusing. We're about to delete this row, so in theory we don't need to make any changes. The only reason we are touching the record before deleting is to not leave PII in soft-deleted records downstream. Let's keep this simple.

Suggested change

account_recovery.is_active = False

account_recovery.save(update_fields=['secondary_email', 'is_active'])

account_recovery.save(update_fields=['secondary_email'])

Implemented

ktyagiapphelix2u added 2 commits April 23, 2026 10:55

fix: retirement PII leaks by redacting pending secondary email/name data

1e6614b

fix: retirement PII leaks by redacting pending secondary email/name data

da2bbfd

ktyagiapphelix2u marked this pull request as ready for review April 23, 2026 11:29

ktyagiapphelix2u requested a review from a team as a code owner April 23, 2026 11:29

robrap reviewed Apr 24, 2026

View reviewed changes

fix: retirement PII leaks by redacting pending secondary email/name data

4326d69

Akanshu-2u reviewed Apr 28, 2026

View reviewed changes

robrap reviewed Apr 29, 2026

View reviewed changes

ktyagiapphelix2u added 5 commits May 4, 2026 09:21

fix: retirement PII leaks by redacting pending secondary email/name data

f9aa35a

fix: retirement PII leaks by redacting pending secondary email/name data

d05cd5a

fix: retirement PII leaks by redacting pending secondary email/name data

48e5e17

fix: retirement PII leaks by redacting pending secondary email/name data

d929ff9

Merge branch 'master' into ktyagi/secondaryemail

84d0cf9

		PendingSecondaryEmailChange.redact_and_delete_pending_secondary_email(user_id=user.id)
		assert len(PendingSecondaryEmailChange.objects.filter(user_id=user.id)) == 0

		call_command('retire_user', username=user.username, user_email=user.email)

		assert not PendingSecondaryEmailChange.objects.filter(user=user).exists()

		Note: "Retire" here refers to retiring this user's data, not the recovery email
		feature itself, which remains available for new accounts.

		account_recovery.is_active = False
		account_recovery.save(update_fields=['secondary_email', 'is_active'])

Conversation

ktyagiapphelix2u commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Ticket & Reference

Uh oh!

robrap left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robrap left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ktyagiapphelix2u May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

ktyagiapphelix2u commented Apr 23, 2026 •

edited

Loading

ktyagiapphelix2u May 4, 2026 •

edited

Loading