/
7.3 DQ Email Data Assessment

Attention: Confluence is not suitable for the storage of highly confidential data. Please ensure that any data classified as Highly Protected is stored using a more secure platform.
If you have any questions, please refer to the University's data classification guide or contact ict.askcyber@sydney.edu.au

7.3 DQ Email Data Assessment

Summary

Outlined are the steps in support of the Email Simplification initiative. The primary goal of these steps is to clean, deduplicate, and enhance the overall data quality of emails stored in Salesforce, making it easier to transition to a single email field on the Contact Page, with alternate emails stored in Alternate Channels.

Please note that Student contact pages and Accounts are outside of the scope of this initiative.

Step 1. Identify and fix emails with typos in domains (on contact page and in alternate channels)

Possible typos on contact pages were already reviewed and fixed as part of DCA Action 25, 28, 29, 36 (please see respective tickets in Jira). Total number of instances flagged by the 3rd party, reviewed and fixed where needed: 519.

Possible typos in alternate channels: 441 instances flagged by the 3rd party.

Further actions:

  1. Review and fix (where needed) 441 issues with typos in alternate channels as identified by the 3rd party;

  2. Run the analysis again as the above does not guarantee completeness; moreover, the information might not be up to date; review and fix typos identified.

Results:

  1. Total number of emails with typos fixed on contact page level: 706

  2. Total number of emails with typos fixed on alternate channels level: 2,220

For more details and associated manual work done please visit: https://sydneyuni.atlassian.net/browse/MA-293

Step 2. Analyse emails stored on account level and ensure that these emails are also stored on contact level

The proposal will be put forward to remove some of the info currently stored on the account level/account's alternate channels. Therefore, we need to ensure that no info that is stored on account level/account's alternate channels, but missing from contact page/contact's alternate channels.

Results: The following emails stored on account level/account’s alternate channels were excluded: emails that are already stored on contact page/contact’s alternate channels; invalid emails (emails with typos with fixed version of emails already stored on contact level), emails belonging to Organizations/Corporations etc. Account Type. After all these emails were excluded, remaining 5,460 were uploaded to respective Contacts' alternate channels.

Please note that more in-depth analysis will be required one the decision to remove alternate channels on Account level will be made.

For more details and associated manual work done please visit: https://sydneyuni.atlassian.net/browse/MA-297

Step 3. Identify & clean school emails; transfer school emails to Alternate IDs

Steps:

  1. Move all school emails (@uni.sydney.edu.au) from Contact page level to Alternate Channels (unless it’s a preferred email).

  2. In Alternate Channels: Identify and fix all school emails with incorrectly written domains (e.g. @uni.sydney.com.au, @uni.sydney.edu etc).

  3. In Alternate Channels: Delete duplicate school emails (leave 1 entry regardless of its status).

  4. Transfer School Emails to Alternate IDs.

Please note that Contacts with “Student” primary contact type are out of scope of this ticket.

Results: All 229,876 unique school emails were transferred to Alternate IDs as per below:

  • Nonunique Alternate Id: <School Email value>

  • Type: “Student Email”

  • Account referenced: <Respective account referenced>

  • Contact referenced: <Respective contact referenced>

  • Unique Alternate ID: <School Email value>

Issue identified & to be discussed: 1 inactive school domain (70,285 emails) was identified (@mail.usyd.edu.au) in the alternate channels.

Resolution: These emails were analysed, transferred to Alternate IDs and deleted from Alternate Channels.

Issue identified & to be discussed: 1 inactive school domain (70,285 emails) was identified (@mail.usyd.edu.au) in the alternate channels.

Resolution: These emails were analysed, transferred to Alternate IDs and deleted from Alternate Channels.

Issue identified & to be discussed: Integration adds school emails to Alternate Channels instead of Alternate IDs.

Issue identified & to be discussed: Integration adds school emails to Alternate Channels instead of Alternate IDs.

For more details and associated manual work done please visit: https://sydneyuni.atlassian.net/browse/MA-294

Step 4. Identify and remove duplicate emails from contact page level

Steps:

  1. Identify duplicate emails at the contact page level.

  2. Analyse email types.

  3. Leave emails in the correct email type field.

  4. Remove duplicate emails from incorrect email type fields.

Total number of contacts with duplicate emails on contact page: 5,035- all these instances were analysed, moved to the correct email type and duplicates were deleted.

For more details and associated manual work done please visit: https://sydneyuni.atlassian.net/browse/MA-296

Step 5. Identify and remove duplicate emails from alternate channels level

Steps:

  1. Delete records with emails that are already on contact page level from alternate channels.

  2. Delete records with duplicate emails and same status from alternate channels, retaining 1 unique record with the oldest “Created Date”.

  3. For the remaining records - duplicate emails with different statuses - apply the following logic:

Status combination

Record to be retained

Bad

Deceased

 

 

Check if the record if deceased -> Retain "Deceased", delete "Bad"

Bad

Deceased

Former

 

Check if the record if deceased -> Retain "Deceased", delete "Bad"

Bad

Former

 

 

 Retain the record that was most recently updated by a user

Bad

Former

Good

 

 Retain the record that was most recently updated by a user

Bad

Former

Good

Last Known

 Retain the record that was most recently updated by a user

Bad

Former

Good

Needs Verification

 Retain the record that was most recently updated by a user. UNLESS it’s a record that was updated by @Anastasia Chekhova AND the status is “Needs Verification”. These records were updated as part of “Step 1. Identify and fix emails with typos in domains”, hence the status.

Bad

Former

Last Known

 

 Retain the record that was most recently updated by a user

Bad

Former

Needs Verification

 

 Retain the record that was most recently updated by a user UNLESS it’s a record that was updated by @Anastasia Chekhova AND the status is “Needs Verification”. These records were updated as part of “Step 1. Identify and fix emails with typos in domains”, hence the status.

Bad

Former

Good

Last Known

 Retain the record that was most recently updated by a user

Bad

Former

Good

Needs Verification

 Retain the record that was most recently updated by a user UNLESS it’s a record that was updated by @Anastasia Chekhova AND the status is “Needs Verification”. These records were updated as part of “Step 1. Identify and fix emails with typos in domains”, hence the status.

Bad

Good

 

 

 Retain the record that was most recently updated by a user

Bad 

Good

Last Known

 

 Retain the record that was most recently updated by a user

Bad 

Good

Needs Verification

 

 Retain the record that was most recently updated by a user UNLESS it’s a record that was updated by @Anastasia Chekhova AND the status is “Needs Verification”. These records were updated as part of “Step 1. Identify and fix emails with typos in domains”, hence the status.

Bad

Last Known

 

 

 Retain the record that was most recently updated by a user

Bad

Needs Verification

 

 

 Retain the record that was most recently updated by a user UNLESS it’s a record that was updated by @Anastasia Chekhova AND the status is “Needs Verification”. These records were updated as part of “Step 1. Identify and fix emails with typos in domains”, hence the status.

Deceased

Former

 

 

Check if the record if deceased -> Retain "Deceased", delete "Bad"

Deceased

Good

 

 

Check if the record if deceased -> Retain "Deceased", delete "Bad"

Former

Good

 

 

 Retain the record that was most recently updated by a user

Former

Good

Last Known

 

 Retain the record that was most recently updated by a user

Former

Good

Needs Verification

 

 Retain the record that was most recently updated by a user UNLESS it’s a record that was updated by @Anastasia Chekhova AND the status is “Needs Verification”. These records were updated as part of “Step 1. Identify and fix emails with typos in domains”, hence the status.

Good

Last Known

 

 

 Retain the record that was most recently updated by a user

Good

Needs Verification

 

 

 Retain the record that was most recently updated by a user UNLESS it’s a record that was updated by @Anastasia Chekhova AND the status is “Needs Verification”. These records were updated as part of “Step 1. Identify and fix emails with typos in domains”, hence the status.

Last Known

Needs Verification

 

 

 Retain the record that was most recently updated by a user UNLESS it’s a record that was updated by @Anastasia Chekhova AND the status is “Needs Verification”. These records were updated as part of “Step 1. Identify and fix emails with typos in domains”, hence the status.

 

Potential issue identified & to be discussed: New email statutes in alternate channels should be suggested due to current statuses being confusing. New mapping suggested & to be discussed (work in progress):

Potential issue identified & to be discussed: New email statutes in alternate channels should be suggested due to current statuses being confusing. New mapping suggested & to be discussed (work in progress):

Current status

New status

Comments

Good

Active

Good, active, current, but not preferred email

Bad

Invalid

Emails with typos or somehow corrupted emails - a discussion should be held on whether we would like to keep them in the system at all

Former

Former

Former, currently not in use email. To be kept for matching purposes

Last Known

Needs Verification

Everything that is "Last Known" needs to be verified

Needs Verification

Needs Verification

These emails should be monitored, verified and status updated to either Current/Good/Active/In Use or Former/Not In Use

Deceased

Deceased

Deceased record

Results:

  1. 406,255 are alternate channels records with emails that are already stored on Contact Page level (as either Personal, Business, School or Other email). They were deleted from alternate channels.

  2. 56,740 alternate channels are records with duplicate emails that have the same status. I suggest we delete these duplicate records, leaving 1 record with the oldest "Created Date".

  3. Approx. 24k duplicated alternate channels with different statuses were deleted from alternate channels.

For more details and associated manual work done please visit: https://sydneyuni.atlassian.net/browse/MA-295

Step 6. Analyse and fix email types

<BLOCKED>

New Email Type picklist is still work in progress as per discussion with @rommel.ngo. The ticket is therefore Blocked.

The suggested mapping is as follows:

Current Email Type

New Email Type

Business

Business

Business Address

Business

Business Email

Business

Contact Email

Personal

Email

Personal

Email - Alternate

Personal

Home Email

Personal

Institution Email

Business

Optional Email

Other 

Other Email

Other 

Permanent Email

Personal

Residential Email

Personal

Sydney Alumni Email

Other 

<blank>

Other 

Potential issue identified & to be discussed: New email types should be aligned with email types used in SITS. Otherwise the integration will be adding emails with incorrect email types to alternate channels.

Potential issue identified & to be discussed: New email types should be aligned with email types used in SITS. Otherwise the integration will be adding emails with incorrect email types to alternate channels.

For more details and associated manual work done please visit: https://sydneyuni.atlassian.net/browse/MA-298

Step 7. Identify, analyse and fix/remove emails that are used by multiple contacts

<IN PROGRESS>

For more details and associated manual work done please visit: https://sydneyuni.atlassian.net/browse/MA-300

Step 8. Analyse and fix emails in alternate channels that are stored as other type of record (eg. Email stored as Address, LinkedIn, Personal Website etc)

Results: 142 records containing emails were found in alternate channels with the wrong Alternate Channels' types (e.g. Address, Internet Website etc). These records were fixed and correct Alternate Channel type was assigned accordingly. 

For more details and associated manual work done please visit: https://sydneyuni.atlassian.net/browse/MA-312

Summary of issues identified

Related content

7.2.7 Simplify Phone related Picklist field values
7.2.7 Simplify Phone related Picklist field values
Read with this
2. Data Quality Rules and DQ Dashboards
2. Data Quality Rules and DQ Dashboards
More like this
7.1.2 AS-IS Scenarios for AQ Account Mailing Address flow
7.1.2 AS-IS Scenarios for AQ Account Mailing Address flow
More like this