Long-term email network data
Undirected email correspondence between users of a large organization with over 1,000 individuals for four consecutive years (2007-2010)
Undirected email correspondence between users of a large organization with over 1,000 individuals for four consecutive years (2007-2010). For this period, we have information of the sender, the receiver and the total amount of emails sent within the organization using the corporate email address. To preserve users' privacy, individuals are completely anonymized and we do not have access to email content (see Ethics statement).
The data is in the following format:
user1ID user2ID #emails
Where #emails is the total amount of emails exchanged (sent and received) in one natural year. The files are separated by years.
This data is exempt from IRB review because: i) The research involves the study of existing data--email logs from 2007 to 2010, which the IT service of the organization archived routinely, as mandated by law; ii) The information is recorded by the investigators in such a manner that subjects cannot be identified, directly or through identifiers linked to the subjects. Indeed, subjects were assigned a "hash" by the IT service prior to the start of our research, so that none of the investigators can link the "hash" back to the subject. We have no demographic information of any kind, so de-anonymization is also impossible.
The dataset is permanently stored at Figshare.