Forum Discussion
Strategy to Identify Duplicate Profiles
We needed to identify and merge duplicate Braze profiles (same email, different user IDs) across our entire user base. We evaluated two solutions.
Solution 1: Identify duplicate profiles via /users/export/ids and attach a custom attribute for segmentation
Export each profile, identify duplicates externally, then update profiles with a custom attribute is_duplicated=true and capture them via a segment.
{% connected_content
https://rest.au-01.braze.com/users/export/ids
:method post
:headers {
"Authorization": "Bearer XXXX"
}
:body {"email_address":"{{${email_address}}}"}
:content_type application/json
:no_cache
:save result
%}
{% if result.users.size == 1 %}
{% abort_message('User is unique.') %}
{% endif %}
{
"attributes" : [
{
"braze_id" : "{{${braze_id}}}",
"is_duplicated": true,
"_update_existing_only" : true
}
]
}One caveat with this solution is that we need to export the entire user base, and it will take a couple of days to identify the duplicates.
Solution 2: Identify the duplicate profiles via a Segment extension
Key Assumptions:
- Every profile in Braze has at least one entry in USERS_BEHAVIORS_SUBSCRIPTION_GLOBALSTATECHANGE_SHARED (i.e., all profiles have a subscription state change event)
- Every deleted profile appears in USERS_USERDELETEREQUEST_SHARED
Here's the query we're using:
-- Purpose: Identify duplicate Braze profiles (same email, different user IDs) for merging via /users/merge endpoint
-- Excludes deleted/orphaned users and only returns duplicates where 2+ active profiles remain
WITH active_users AS (
-- Get unique user/email pairs, excluding deleted/orphaned users
SELECT u.USER_ID, u.EMAIL_ADDRESS
FROM USERS_BEHAVIORS_SUBSCRIPTION_GLOBALSTATECHANGE_SHARED u
LEFT JOIN USERS_USERDELETEREQUEST_SHARED d ON u.USER_ID = d.USER_ID
WHERE u.EMAIL_ADDRESS IS NOT NULL
AND d.USER_ID IS NULL -- Exclude deleted/orphaned users
GROUP BY u.USER_ID, u.EMAIL_ADDRESS -- Dedupe since source table contains multiple events per user
)
SELECT DISTINCT a.USER_ID
FROM active_users a
INNER JOIN (
-- Find emails with more than one active user profile
SELECT EMAIL_ADDRESS
FROM active_users
GROUP BY EMAIL_ADDRESS
HAVING COUNT(*) > 1
) dup ON a.EMAIL_ADDRESS = dup.EMAIL_ADDRESS;
Key findings:
- This query only takes a couple of minutes to identify duplicates
- USERS_USERDELETEREQUEST_SHARED contains both explicitly deleted users AND users orphaned via /users/merge - this means the query automatically excludes already-merged profiles. Looks like Braze prunes orphaned profiles after /users/merge is performed.
Has anyone else taken this approach? Curious if there are edge cases we should watch out for.
2 Replies
- NirnayMentor
Hi kethiri,
Have you explored Merge duplicates feature on the braze dashboard? There is also an option to generate the list of all duplicates present in the workspace. There is also an option to automate this process i.e., finding the duplicates and merging them using preconfigured rules. Find more information here : https://www.braze.com/docs/user_guide/engagement_tools/segments/user_profiles/duplicate_users/#step-1-go-to-manage-audience.
Hope this helps! - kethiriCollaborator
Hi. We prefer not to enable this because we need the profiles merged as early as possible. Since our emails include braze_id references, delaying the merge increases the risk that an email is sent before the profiles are merged. In that scenario, Braze may randomly select one of the duplicate profiles during send-out, which could result in invalid links.
We have already enabled a Canvas that performs the merge as soon as a profile is created. However, we still need a way to identify and merge the existing duplicates.
The export feature is handy, though. I am looking for a solution to segment these users within Braze without external integrations.
Related Content
- 1 year ago