Attention: Confluence is not suitable for the storage of highly confidential data. Please ensure that any data classified as Highly Protected is stored using a more secure platform.
If you have any questions, please refer to the University's data classification guide or contact ict.askcyber@sydney.edu.au
7.2.1 Alternate Channel Phone De-duplication
Analyst Notes
This page contains documentation on analysis, proposed steps to de-dupe phone numbers in Alternate Channels (Contact) object
Approach:
Phone numbers linked to contact records were isolated and cleaned to remove -
Any special characters, blank spaces etc.,
+61 was replaced with 0
00s & 000s were replace with 0
00011 was replace with blank
Most recently created record of a unique phone number on each contact was considered ‘Primary Phone’ and any previous occurrence of the same number was considered ‘Duplicates’.
‘Primary Phone’ was used as an anchor to look up potential duplicates for each ‘Primary Phone’.
Duplicates found were grouped by ‘Primary Phone’ and segmented into two:
‘Primary Phone’ created by mulesoft
‘Primary Phone’ not created by mulesoft
Exclusions: Contact Type - ‘Student’ , 'Graduand'
Overarching Rule:
Given the Status of Primary Record is Former/Good:
For records that have been modified:
Given a TAP user modified any of the duplicate records and the last modified date of that record is the latest, the record with the latest last modified date will be retained.
For records that have never been modified:
Given a TAP user created the most recent record, the data will be considered as recently validated by a user and will be retained.
Given, the record was created by a System user and not modified by a TAP user, the oldest record will be retained.
Count of 'Primary Phone' in Alternate Channel with duplicates (09.5.2024)
Segment | Count of Unique Phone | Count of Duplicates | Total Records in Alternate Channel (Contact) |
---|---|---|---|
'Primary Phone' Created by mulesoft | 77,114 | 109,693 | 186,807 |
'Primary Phone' Not created by mulesoft | 156,086 | 223,790 | 379,876 |
Grand Total | 165,242 | 236,145 | 566,683 |
Segment 1 : De-duplication of Alternate Channel Phone records linked to Contacts - Not Created by Mulesoft
De-duplication of Alternate Channel Phone records linked to Contacts that are not Created by Mulesoft will be done in number of phases as follows due to the complexity in data format(Status, last modified etc.,)
Segment 2 : De-duplication of Alternate Channel Phone records linked to Contacts - Created by Mulesoft (In progress)
Records Created by Mulesoft will be de-duped separately as SITS integration(mulesoft) upserts records previously created by itself in Alternate Channels. These records will be reviewed in detail and include Alternate channel History object records as part of the analysis to enable us to determine the record to be retained.
Appendix
Snapshot of Segment Summary taken from Snowflake -
Snowflake Script used for data prep, analysis & cleanse: