Fsseudonymizer (FSS Edition)

This page provides abbreviated documentation for Fsseudonymizer (FSS Edition), a computer program developed by the Epidemiology Research Unit for generating pseudonymous data. For more detailed information and source code, please see the project's GitHub page.

Background

In our work, we occasionally encounter situations in which we might wish to use data that identifies individual people, businesses, animals, or facilities. It is often essential, for example, that we are able to follow events that occur on an individual premises, or to pull together information from several different sources that pertain to a particular person, animal, or location.

In many of these cases, however, such data is sensitive or confidential: information provided by businesses, for example, might be confidential, and personal information pertaining to individuals is often protected by law.

In order for our work to be successful, then, two seemingly contradictory criteria must be fulfilled:

  • Personal or confidential data that might reveal the specific identity of a person, or a precise location of a particular premises cannot be revealed.
  • However, in order to ensure the validity of our work, it must be possible to uniquely identify all individuals, premises, etc., across multiple data sets.

In order to address both of these requirements, we have developed several approaches for processing data that will allow the assignment of unique pseudonymous identifiers to individuals or entities without revealing confidential or personal information. These approaches can be applied to data from multiple sources by the data providers themselves, thus never revealing any personal or confidential information to any unauthorized individual or organization.

Fsseudonymizer is a computer program that implements a straightforward but highly effective, secure, and consistent approach for pseudonymizing data for subsequent analysis.

System requirements

Fsseudonymizer should work with any recent release of Microsoft Windows (7, 8, or 10)*. Fsseudonymizer is available both as a 32-bit application and a 64-bit application. Provided that you have a 64-bit version of Microsoft Windows, the 64-bit version of Fsseudonymizer is recommended, as it is capable of handling substantially larger data files.

Installation

Installation is straightforward. Simply select and download an installer package from the list below, and double-click to run the installer. Follow the on-screen prompts to customize the desired installation options. By default, an icon for Fsseudonymizer will be created on your Windows Start menu.

Using Fsseudonymizer

Fsseudonymizer is intended to be run by organizations and individuals that are authorized to hold confidential or sensitive data (the data providers). Pseudonymized output generated by Fsseudonymizer can then be passed on to data users, for whom pseudonymous data is suitable for further analysis.

The pass phrase

Fsseudonymizer makes use of an original piece of data (for example, a name) in combination with a computational algorithm known as a cryptographic hash to generate a unique, pseudonymous representation of the original data, called a "hash value".

In order to add an extra level of security, Fsseudonymizer also uses a pass phrase. A pass phrase is a string of several randomly selected words that would be difficult to guess.

Pass phrases should be known only to the data provider(s), treated as sensitive information, and held securely.

In the event that different data sets are to be pseudonymized for the same purpose or the same data user, the same pass phrase should be used to ensure consistency of results.

Input file formats

Fsseudonymizer currently processes input files of the following types:

  • Microsoft Excel 97-2003 (*.xls) spreadsheets
  • Microsoft Excel 2007 or later (*.xlsx) spreadsheets
  • CSV (comma-separated values) files

It is essential that input data is clean and consistent to begin with. Consider a data file that contains the names "Homer Simpson", "Homer J Simpson", and "Homer J. Simpson". In their not-yet-pseudonymous form, it is apparent that these three names likely all refer to the same person, but the pseudonymous identifiers will be radically different from one another. It would be impossible for the data user to know that these in fact represented the same individual, and any subsequent analysis will be incomplete or incorrect.

The user interface

Fsseudonymizer is designed to be simple to use. The image below shows the graphical user interface. To process a data file, simply complete the requested information:

Screen shot from Fsseudonymizer

Input file: Click the Browse button to select an input file to process.

Pass phrase: Enter your pass phrase here. A good pass phrase should be a random sequence of words which would be difficult to guess, should be treated securely by data providers, and should not be provided to data users.

Once entered, Fsseudonymizer will store your pass phrase for future use. If you generate psuedonymous data for different data users or projects, it might be a good idea to use a different pass phrase for each user or project.

User name and Email address: These provide contact details that will be incorporated into the metadata section of a Fsseudonymizer output file.

Output file: Select the name and location for the output file to be generated by Fsseudonymizer.

Once all of the settings above have been specified, the Process file button will be enabled. Click this button to process the input file.

If all goes well, a message will be displayed indicating that an output file was successfully generated.

If any problems were encountered with the input file, these will be displayed. Any such errors must be addressed before an output file can be produced.