Home

How to Quickly Deduplicate Hubspot Contacts with Python(for free!)

by Tristan Donnally

Introduction

When it comes to automated deduplicating of HubSpot contacts there are many great options, such as Koalify, but like any marketplace integration you are charged a monthly fee for the amount of objects used ($5 per 10,000 at the time of writing of this article). This is fine if you want an out of the box solution, or the many features it offers but if you are tech savvy and just need something simple and free this article is for you!

Getting Started

While this article requires no previous coding knowledge you will still need to have Python and Git installed on your device. Git will allow you to grab the source files which use Python. You can also find the code here.

Once you have both installed open a directory and run the following command inside a bash terminal to grab the code and set up the environment:

git clone https://github.com/TDonnally/hubspot-deduper.git

# Create a virtual environment
python -m venv venv

# Pick one of the following
# Run on Windows
source venv/Scripts/activate
# Run on Mac
source venv/bin/activate

# Install requirements
pip install -r requirements.txt

# Create .env file
cat > .env << EOF
HUBSOT_API_KEY=your_api_key_here
EOF

Head over to HubSpot to create an Access Token. Navigate to https://app.hubspot.com/legacy-apps/{your-portal-id} and click Create. Open the Scopes tab and enable crm.objects.contacts.read and crm.objects.contacts.write. Create the access token, navigate to Auth tab and show and copy the token. Then open .env and paste the string into the HUBSPOT_ACCESS_TOKEN value.

Now open the main.py file. Near the top you will see an array called DEDUP_FIELDS which has firstname, lastname and phone by defaullt. Think of these as the filters as these are the fields we are checking for duplicates. If any two or more contacts contain identical values for these fields they will be merged. Feel free to expand upon this array to make the deduplication more restrictive or remove any of these fields to make the merge requirements more relaxed.

You will also need to set merge rules via the DEDUP_PRIMARY variable. The default behavior (if string is empty) is that the most recently updated Contact will be the primary Contact and will overwrite any values in the other contact if both have non-null fields. If you wish to change this you can enter a field name into the DEDUP_PRIMARY string and it will make it so whichever contact most recently updated that field will be the primary.

Running the Script

When you run the script you can choose to either run it in read or write mode by passing a --mode argument in the command.

# Read mode will show you the amount of duplicate contacts
python main.py --mode="read"
# Write mode will merge duplicate contacts
python main.py --mode="write"

It is always good to first run the script in read mode to see how many contacts you would be affecting. If you are unsure I would avoid running the write step because it is irreversible.

Whichever mode you run, at the end of the script runtime a csv will be generated in /output of the contacts that would be or were merged.

Conclusion

Feel free to add onto this code and do whatever you like with it. I believe it could be useful to host and set as a CRON job or to take the merge contacts method and incorporate it into a custom code workflow block within HubSpot. If you are interested in a tutorial on either feel free to reach out via the contact form below.

Stay in the Loop

Sign up for updates on our latest posts and major releases plus occasional tips, case studies, and behind-the-scenes notes. No spam, ever. Unsubscribe anytime.

Ready to Forge Your Next Project?

Tell us a little about your project and we’ll get back to you within 1 business day.

What are you interested in?