I am facing data duplication problem with saved searches. I have nodejs based application in which I have integrated infusionsoft using this javascript library (infusionsoft-api - npm).
I’m using (xml-rpc - Keap Developer Portal) getSavedSearchResultsAllFields method of SearchService to retrieving saved searches contacts. its returns the duplicate data with same email and infusionsoft id for some contacts. (eg 8 thousand out of 78 thousand duplicate records). I’ve double-checked these things on Infusionsoft and with exported CSV files from infusionsoft.
The issue is that when I check the records in the CSV exported from infusionsoft, I don’t see duplicate records for the contact, but when I check these things in the API result I see the duplicate records for the contact.
Firstly, can you edit your pictures please and blank out all the Email Addresses.
I have not seen that issue before, but I wonder how complicated the Saved Search you are calling is.
The Saved Search would be doing joins on several database tables, and it could be possible that multiple records are being returned for one of the tables. They may also be a bug within the Saved Searches as well.
@TomScott can provide you a more detailed explanation on the Saved Searches.
My recommendation would be to adjust your code and use the Contact Id as an Array Key. If any duplicates are present then they can update the previous stored value.
One thing to ask, is there any data loss between the Saved Search and CSV file? Or is it just only duplicates you are seeing?
@martinc For this statement - “My recommendation would be to adjust your code and use the Contact Id as an Array Key. If any duplicates are present then they can update the previous stored value.”
I apologise; I don’t comprehend. I don’t already have a contact ID. For some of the previously stored searches, I’m attempting to retrieve contacts for the first time.
For this statement - “One thing to ask, is there any data loss between the Saved Search and CSV file? Or is it just only duplicates you are seeing?”
As there are approximately 1 lakh 38 thousand contacts total for the seven saved searches, I am unable to check for duplicate contacts by manually comparing the api result with the csv file exported from the Infusionsoft, but I have not yet discovered any data loss issues between the two.
One thing that has me a little confused about is the count for the total number of contacts is the same in both csv files (the one created from the api result and the one exported from Infusionsoft). In addition, there is duplicate data in the csv of the api result but not in the csv exported from Infusionsoft, indicating that possibly i have some data loosing between the two. I will need to run a script to verify this.
Also, Could you also please let me know which table Infusionsoft uses internally to store the saved search contacts so that I can utilise that table directly to query on saved search contacts rather than the sdk’s search service.
I have tested data lost scenario with contacts of one of saved search. The saved search have total 78412 contacts. I have compared contacts with two different CSV (one is exported from infusionsoft and second one is created using API result of SDK method that i’m using to fetch contacts of saved search in code)
Exported CSV => exported from infusionsoft
SDK CSV => generated from api result using SDK method
Exported CSV - count of total contacts = 78411
SDK CSV - count of total contacts = 78410
Number of data missed in sdk csv as compare to exported csv = 8247
Number of duplicate records in the sdk csv = 8247 (Problem: Counts for missed data and duplicate data are same.)
Number of data existed in sdk csv as compare to exported csv = 70155
8247 + 70155 = 78402 (assume 10 records missing email so we skipped that)
Conclusion - May be, the SDK method is adding same number of duplicate records by removing 8247 records. That’s why it is showing the same number of counts as the exported csv but has duplicate data
Note: For contacts that are duplicated, email and Infusionsoft id both field are duplicates.
I cannot comment on what Keap is doing here between the Export and SDK functionality.
To safeguard your code, I would store the list of Contacts in an array indexed by the Contact Id.
So if any duplicates appear you can either check for it, or update the previous entry.
Number of data missed in sdk csv as compare to exported csv = 8247
Number of duplicate data in the sdk csv = 8247 (Here is a problem: Counts for missed data and duplicate data are same.)
Conclusion - May be, the SDK method is adding same number of duplicate records by removing 8247 records. That’s why it is showing the same number of counts as the exported csv but has duplicate data
Has anyone from this thread still experiencing or have a work around? Basically it’s still duplicate contacts replacing existing data. So row count is okay, however there’s data loss.
A workaround solution would be to pull the data from the Contacts table directly instead of using the Saved Search. You can apply filters in the Contacts search. Although if your Saved Search is using fields that are linked to other tables, then it may become problematic using this solution if the field is not available.
After trying to find a fix with support, they advised our Third Party provider for dashboards to Filter by Tag, then Filter by Contact Id in the Saved Report itself. This eliminated the duplicate and missing data. So far so good.
Hope this reaches people who are also in the same situation.