Day 12

For next time

Ethics in text mining and analysis

We’ll discuss the reading from the previous RJ as a large group. Some of the key points are summarized here

Personally Identifiable Information protection principles

Principle PII protection rationale Incompatibility with “Big Data” analysis
Collection Limitation There should be limits to the collection of PII, it should be obtained by lawfully and fairly and, ideally, with the knowledge/consent of the data subject The larger the data collection, the better the potential for identifying interesting correlations
Data Quality PII should be relevant to the purposes for which it is to be used, and should be accurate, complete and up-to-date enough for those purposes “Messy data” is fine, it’s not clear what is relevant until its analysed, and even inaccurate or incomplete data can be useful
Purpose Specification purposes for which PII are collected should be specified at the time of data collection. Subsequent use should be limited to those purposes or such others compatible with those purposes and specified on each change of purpose. Data may have been collected for a particular purpose, but analysis may indicate further unrelated and previously unknown, but valuable, purposes. Data as collected may not be obviously PII, but analysis of it may identify individuals
Use Limitation PII should not be disclosed, made available or otherwise used for unspecified purposes except with data subject consent or by authority of law. There may be value in sharing and aggregating data that may not be apparent at the time of collection
Security Safeguards PII should be protected by reasonable security safeguards against such risks as loss or unauthorised access, destruction, use, modification or disclosure of data. It may be unclear what security issues, if any, arise from a particular collection of data or its analysis
Openness data subjects should be able to establish the existence and nature of PII, and the main purposes of its use, and the identity and location of the data controller. Where data is collected and analysed, it may not be obvious that it is PII, and even in circumstances where it is, the researcher may have no way of informing the data subject of its use
Individual Participation An individual should be able to be informed by a data controller whether it holds PII relating to him or her; to have the PII communicated to him or her in meaningful form and reasonable time and at reasonable cost; to be informed if the PII will not be communicated and to be able to challenge that denial, where the PII is not lawfully held to have it erased, rectified, completed or amended. Data that is anonymous may still be utilised in ways that can cause risk/harm to an individual
Accountability A data controller should be accountable for complying with measures that give effect to the other principles How and when might a researcher to be held accountable and for what?

Failure modes of data analysis

Alignment and bias

Let’s examine a reflection write-up from a previous semester MP3 that has been adapted to serve as a prompt that allows us to build up numerous “ask-analyze-assess” alignment possibilities. We will do so today with an eye toward biases that could have played a role in data and algorithms used (or the people that generated them) by the systems that this project incorporated.

Work with one or two people near you as you read the reflection below and do exercises 1 through 5 related to alignment. Examining an already-completed MP3 can give you practice considering limitations and biases in complex systems before you reflect upon your own assignment. The exercise also enables instructors to introduce some considerations that might be less obvious.