Duplicates

Throughout the use of the database, duplicate data will occur. This is increasingly likely with Syncing and Importing Data. Having a good UI for identifying and resolving these duplicates is essential.

Data Model

A new field will have to be added to tables:

  • new_id

If the record is a duplicate, “deleted” should be set to True, and this field will be used to record the ID of the correct record, so that all the references in ALL records with refer to the original record can be updated.

Question:
Are references done with ID or UUID? If UUID, this field will be “new_uuid”

API

Args:
Compare either or both together

  • A field table to look for duplicates in
  • A list of ID (generated internally within the Import code) : Value pairs
  • A record from the conflict table (which contains both the original record, and the conflict value).


Process:

  • Find duplicate records - based on similarity using the Levenshstein Function, and displayed on a UI for the user to select. It could also be useful to know how many times this record is referred to. (Skip this step for conflict records)
  • Match fields between the duplicate records (a new record may be a duplicate, but may also have additional data in other fields)
  • Select which record to save.
  • The database show then ensure that all references to the duplicate record are updated to refer to the new record. This step should be able to be called as a separate function as it will need to be run after Sync.

Return:
If a field table is passed, then it will be updated automatically.
If a list of ID:Value pairs is passed and the fields are compared with a Sahana table, then the Sahana internal IDs (or UUIDs?) should be returned so that when this record is imported, Sahana knows that it refers to an existing record.
If only a list of ID:Value pairs is passed, then the ID of the record which the duplicate record has been replaced with should be matched.
This will need to be better defined once the Importer Function is more developed

Wireframes

Identifying Duplicates

Clicking “resolve” will open this screen/popup:

Resolving Duplicates


Navigation
QR Code
QR Code foundation:gsoc_kohli:import:duplicates (generated for current page)