Google Tag Manager

Forum Discussion

kethiri's avatar
kethiri
Collaborator
2 months ago

Strategy to Identify Duplicate Profiles

We needed to identify and merge duplicate Braze profiles (same email, different user IDs) across our entire user base. We evaluated two solutions.

Solution 1: Identify duplicate profiles via /users/export/ids and attach a custom attribute for segmentation

Export each profile, identify duplicates externally, then update profiles with a custom attribute is_duplicated=true and capture them via a segment.

{% connected_content 
https://rest.au-01.braze.com/users/export/ids
:method post
:headers {
"Authorization": "Bearer XXXX"
} 
:body {"email_address":"{{${email_address}}}"}
:content_type application/json
:no_cache 
:save result
%}
{% if result.users.size == 1 %} 
{% abort_message('User is unique.') %}
{% endif %}
{
 "attributes" : [
    {
    "braze_id" : "{{${braze_id}}}",
    "is_duplicated": true,
    "_update_existing_only" : true
    }
  ]
}

One caveat with this solution is that we need to export the entire user base, and it will take a couple of days to identify the duplicates.

Solution 2: Identify the duplicate profiles via a Segment extension

Key Assumptions:

  • Every profile in Braze has at least one entry in USERS_BEHAVIORS_SUBSCRIPTION_GLOBALSTATECHANGE_SHARED (i.e., all profiles have a subscription state change event)
  • Every deleted profile appears in USERS_USERDELETEREQUEST_SHARED

Here's the query we're using:

-- Purpose: Identify duplicate Braze profiles (same email, different user IDs) for merging via /users/merge endpoint
-- Excludes deleted/orphaned users and only returns duplicates where 2+ active profiles remain

WITH active_users AS (
    -- Get unique user/email pairs, excluding deleted/orphaned users
    SELECT u.USER_ID, u.EMAIL_ADDRESS
    FROM USERS_BEHAVIORS_SUBSCRIPTION_GLOBALSTATECHANGE_SHARED u
    LEFT JOIN USERS_USERDELETEREQUEST_SHARED d ON u.USER_ID = d.USER_ID
    WHERE u.EMAIL_ADDRESS IS NOT NULL
      AND d.USER_ID IS NULL  -- Exclude deleted/orphaned users
    GROUP BY u.USER_ID, u.EMAIL_ADDRESS  -- Dedupe since source table contains multiple events per user
)

SELECT DISTINCT a.USER_ID
FROM active_users a
INNER JOIN (
    -- Find emails with more than one active user profile
    SELECT EMAIL_ADDRESS
    FROM active_users
    GROUP BY EMAIL_ADDRESS
    HAVING COUNT(*) > 1
) dup ON a.EMAIL_ADDRESS = dup.EMAIL_ADDRESS;


Key findings:

  • This query only takes a couple of minutes to identify duplicates
  • USERS_USERDELETEREQUEST_SHARED contains both explicitly deleted users AND users orphaned via /users/merge - this means the query automatically excludes already-merged profiles. Looks like Braze prunes orphaned profiles after /users/merge is performed.

Has anyone else taken this approach? Curious if there are edge cases we should watch out for.

2 Replies

  • Hi kethiri​
    Have you explored Merge duplicates feature on the braze dashboard? There is also an option to generate the list of all duplicates present in the workspace. There is also an option to automate this process i.e., finding the duplicates and merging them using preconfigured rules.  Find more information here : https://www.braze.com/docs/user_guide/engagement_tools/segments/user_profiles/duplicate_users/#step-1-go-to-manage-audience
    Hope this helps! 

  • Hi. We prefer not to enable this because we need the profiles merged as early as possible. Since our emails include braze_id references, delaying the merge increases the risk that an email is sent before the profiles are merged. In that scenario, Braze may randomly select one of the duplicate profiles during send-out, which could result in invalid links.

    We have already enabled a Canvas that performs the merge as soon as a profile is created. However, we still need a way to identify and merge the existing duplicates.

    The export feature is handy, though. I am looking for a solution to segment these users within Braze without external integrations.