Attention: Confluence is not suitable for the storage of highly confidential data. Please ensure that any data classified as Highly Protected is stored using a more secure platform.
If you have any questions, please refer to the University's data classification guide or contact ict.askcyber@sydney.edu.au
7.3 DQ Email Data Assessment
Summary |
---|
Outlined are the steps in support of the Email Simplification initiative. The primary goal of these steps is to clean, deduplicate, and enhance the overall data quality of emails stored in Salesforce, making it easier to transition to a single email field on the Contact Page, with alternate emails stored in Alternate Channels. Please note that Student contact pages and Accounts are outside of the scope of this initiative. |
Step 1. Identify and fix emails with typos in domains (on contact page and in alternate channels)
Possible typos on contact pages were already reviewed and fixed as part of DCA Action 25, 28, 29, 36 (please see respective tickets in Jira). Total number of instances flagged by the 3rd party, reviewed and fixed where needed: 519.
Possible typos in alternate channels: 441 instances flagged by the 3rd party.
Further actions:
Review and fix (where needed) 441 issues with typos in alternate channels as identified by the 3rd party;
Run the analysis again as the above does not guarantee completeness; moreover, the information might not be up to date; review and fix typos identified.
Results:
Total number of emails with typos fixed on contact page level: 706
Total number of emails with typos fixed on alternate channels level: 2,220
For more details and associated manual work done please visit: https://sydneyuni.atlassian.net/browse/MA-293
Step 2. Analyse emails stored on account level and ensure that these emails are also stored on contact level
The proposal will be put forward to remove some of the info currently stored on the account level/account's alternate channels. Therefore, we need to ensure that no info that is stored on account level/account's alternate channels, but missing from contact page/contact's alternate channels.
Results: The following emails stored on account level/account’s alternate channels were excluded: emails that are already stored on contact page/contact’s alternate channels; invalid emails (emails with typos with fixed version of emails already stored on contact level), emails belonging to Organizations/Corporations etc. Account Type. After all these emails were excluded, remaining 5,460 were uploaded to respective Contacts' alternate channels.
Please note that more in-depth analysis will be required one the decision to remove alternate channels on Account level will be made.
For more details and associated manual work done please visit: https://sydneyuni.atlassian.net/browse/MA-297
Step 3. Identify & clean school emails; transfer school emails to Alternate IDs
Steps:
Move all school emails (@uni.sydney.edu.au) from Contact page level to Alternate Channels (unless it’s a preferred email).
In Alternate Channels: Identify and fix all school emails with incorrectly written domains (e.g. @uni.sydney.com.au, @uni.sydney.edu etc).
In Alternate Channels: Delete duplicate school emails (leave 1 entry regardless of its status).
Transfer School Emails to Alternate IDs.
Please note that Contacts with “Student” primary contact type are out of scope of this ticket.
Results: All 229,876 unique school emails were transferred to Alternate IDs as per below:
Nonunique Alternate Id: <School Email value>
Type: “Student Email”
Account referenced: <Respective account referenced>
Contact referenced: <Respective contact referenced>
Unique Alternate ID: <School Email value>
Issue identified & to be discussed: 1 inactive school domain (70,285 emails) was identified (@mail.usyd.edu.au) in the alternate channels.Resolution: These emails were analysed, transferred to Alternate IDs and deleted from Alternate Channels. |
---|
Issue identified & to be discussed: Integration adds school emails to Alternate Channels instead of Alternate IDs. |
---|
For more details and associated manual work done please visit: https://sydneyuni.atlassian.net/browse/MA-294
Step 4. Identify and remove duplicate emails from contact page level
Steps:
Identify duplicate emails at the contact page level.
Analyse email types.
Leave emails in the correct email type field.
Remove duplicate emails from incorrect email type fields.
Total number of contacts with duplicate emails on contact page: 5,035- all these instances were analysed, moved to the correct email type and duplicates were deleted.
For more details and associated manual work done please visit: https://sydneyuni.atlassian.net/browse/MA-296
Step 5. Identify and remove duplicate emails from alternate channels level
Steps:
Delete records with emails that are already on contact page level from alternate channels.
Delete records with duplicate emails and same status from alternate channels, retaining 1 unique record with the oldest “Created Date”.
For the remaining records - duplicate emails with different statuses - apply the following logic:
Status combination | Record to be retained | |||
Bad | Deceased |
|
| Check if the record if deceased -> Retain "Deceased", delete "Bad" |
Bad | Deceased | Former |
| Check if the record if deceased -> Retain "Deceased", delete "Bad" |
Bad | Former |
|
| Retain the record that was most recently updated by a user |
Bad | Former | Good |
| Retain the record that was most recently updated by a user |
Bad | Former | Good | Last Known | Retain the record that was most recently updated by a user |
Bad | Former | Good | Needs Verification | Retain the record that was most recently updated by a user. UNLESS it’s a record that was updated by @Anastasia Chekhova AND the status is “Needs Verification”. These records were updated as part of “Step 1. Identify and fix emails with typos in domains”, hence the status. |
Bad | Former | Last Known |
| Retain the record that was most recently updated by a user |
Bad | Former | Needs Verification |
| Retain the record that was most recently updated by a user UNLESS it’s a record that was updated by @Anastasia Chekhova AND the status is “Needs Verification”. These records were updated as part of “Step 1. Identify and fix emails with typos in domains”, hence the status. |
Bad | Former | Good | Last Known | Retain the record that was most recently updated by a user |
Bad | Former | Good | Needs Verification | Retain the record that was most recently updated by a user UNLESS it’s a record that was updated by @Anastasia Chekhova AND the status is “Needs Verification”. These records were updated as part of “Step 1. Identify and fix emails with typos in domains”, hence the status. |
Bad | Good |
|
| Retain the record that was most recently updated by a user |
Bad | Good | Last Known |
| Retain the record that was most recently updated by a user |
Bad | Good | Needs Verification |
| Retain the record that was most recently updated by a user UNLESS it’s a record that was updated by @Anastasia Chekhova AND the status is “Needs Verification”. These records were updated as part of “Step 1. Identify and fix emails with typos in domains”, hence the status. |
Bad | Last Known |
|
| Retain the record that was most recently updated by a user |
Bad | Needs Verification |
|
| Retain the record that was most recently updated by a user UNLESS it’s a record that was updated by @Anastasia Chekhova AND the status is “Needs Verification”. These records were updated as part of “Step 1. Identify and fix emails with typos in domains”, hence the status. |
Deceased | Former |
|
| Check if the record if deceased -> Retain "Deceased", delete "Bad" |
Deceased | Good |
|
| Check if the record if deceased -> Retain "Deceased", delete "Bad" |
Former | Good |
|
| Retain the record that was most recently updated by a user |
Former | Good | Last Known |
| Retain the record that was most recently updated by a user |
Former | Good | Needs Verification |
| Retain the record that was most recently updated by a user UNLESS it’s a record that was updated by @Anastasia Chekhova AND the status is “Needs Verification”. These records were updated as part of “Step 1. Identify and fix emails with typos in domains”, hence the status. |
Good | Last Known |
|
| Retain the record that was most recently updated by a user |
Good | Needs Verification |
|
| Retain the record that was most recently updated by a user UNLESS it’s a record that was updated by @Anastasia Chekhova AND the status is “Needs Verification”. These records were updated as part of “Step 1. Identify and fix emails with typos in domains”, hence the status. |
Last Known | Needs Verification |
|
| Retain the record that was most recently updated by a user UNLESS it’s a record that was updated by @Anastasia Chekhova AND the status is “Needs Verification”. These records were updated as part of “Step 1. Identify and fix emails with typos in domains”, hence the status. |
Potential issue identified & to be discussed: New email statutes in alternate channels should be suggested due to current statuses being confusing. New mapping suggested & to be discussed (work in progress): | ||
---|---|---|
Current status | New status | Comments |
Good | Active | Good, active, current, but not preferred email |
Bad | Invalid | Emails with typos or somehow corrupted emails - a discussion should be held on whether we would like to keep them in the system at all |
Former | Former | Former, currently not in use email. To be kept for matching purposes |
Last Known | Needs Verification | Everything that is "Last Known" needs to be verified |
Needs Verification | Needs Verification | These emails should be monitored, verified and status updated to either Current/Good/Active/In Use or Former/Not In Use |
Deceased | Deceased | Deceased record |
Results:
406,255 are alternate channels records with emails that are already stored on Contact Page level (as either Personal, Business, School or Other email). They were deleted from alternate channels.
56,740 alternate channels are records with duplicate emails that have the same status. I suggest we delete these duplicate records, leaving 1 record with the oldest "Created Date".
Approx. 24k duplicated alternate channels with different statuses were deleted from alternate channels.
For more details and associated manual work done please visit: https://sydneyuni.atlassian.net/browse/MA-295
Step 6. Analyse and fix email types
<BLOCKED>
New Email Type picklist is still work in progress as per discussion with @rommel.ngo. The ticket is therefore Blocked.
The suggested mapping is as follows:
Current Email Type | New Email Type |
Business | Business |
Business Address | Business |
Business Email | Business |
Contact Email | Personal |
Personal | |
Email - Alternate | Personal |
Home Email | Personal |
Institution Email | Business |
Optional Email | Other |
Other Email | Other |
Permanent Email | Personal |
Residential Email | Personal |
Sydney Alumni Email | Other |
<blank> | Other |
Potential issue identified & to be discussed: New email types should be aligned with email types used in SITS. Otherwise the integration will be adding emails with incorrect email types to alternate channels. |
---|
For more details and associated manual work done please visit: https://sydneyuni.atlassian.net/browse/MA-298
Step 7. Identify, analyse and fix/remove emails that are used by multiple contacts
<IN PROGRESS>
For more details and associated manual work done please visit: https://sydneyuni.atlassian.net/browse/MA-300
Step 8. Analyse and fix emails in alternate channels that are stored as other type of record (eg. Email stored as Address, LinkedIn, Personal Website etc)
Results: 142 records containing emails were found in alternate channels with the wrong Alternate Channels' types (e.g. Address, Internet Website etc). These records were fixed and correct Alternate Channel type was assigned accordingly.
For more details and associated manual work done please visit: https://sydneyuni.atlassian.net/browse/MA-312
Summary of issues identified
Link to the issue identified | Status |
---|---|
Unresolved - In progress: https://sydneyuni.atlassian.net/browse/MA-435 | |
Resolved. These emails were analysed, transferred to Alternate IDs and deleted from Alternate Channels | |
Unresolved | |
Unresolved |