Copyright Exceptions and Data Mining

On July 5th, 2018, we had the opportunity to interview two esteemed individuals within the field of copyright—Dr. Jane Secker and Mr. Chris Morrison. Dr. Secker is a senior lecturer in educational development at City, University of London and Mr. Morrison is the copyright software licensing and information services policy manager at the University of Kent. Both Dr. Secker and Mr. Morrison sit on the Universities UK / GuildHE Copyright Negotiation and Advisory Committee and are co-founders of the UK Copyright Literacy blog.

To provide some context to their responses, two key elements about European law should be noted. First, on June 1st, 2014, an exception was made to UK copyright law through the implementation of the Copyright and Rights in Performances (Research, Education, Libraries and Archives) Regulations 2014. This included the removal of barriers for Text and Data Mining (TDM) for non-commercial purposes. Second, the European Union (EU) is in the process of modernizing its copyright laws, and recently had a vote on the proposal for a Directive of the European Parliament and of the Council on Copyright in the Digital Single Market, which was rejected and will be revisited in September of 2018.

We asked Dr. Secker and Mr. Morrison the following questions:

  • What are some examples of the kinds of data mining that researchers should legitimately be able to do?
  • What legal barriers stand in the way of this and, if you could, tell us about the proposed exception for TDM that’s proposed in Europe.
  • Given the current controversies about data mining by social media companies and political consulting companies, privacy issues have risen to prominence. How would the proposed copyright exception intersect with privacy law and what types of research would not be permitted given European privacy regulation?

According to Mr. Morrison, the right to read should be the right to mine. Dr. Secker reiterated this notion and also stressed the importance of being able to legitimately mine various forms of data, whether it be full text subscription databases, abstracts, digitized collections, social media content, etc.

The legal barriers identified by Mr. Morrison include the various licensing terms, terms and conditions of websites, and differing laws around the world on data mining that are often very complicated even for researchers to fully grasp. Further, many researchers find themselves under pressure from external sources—such as those funding the research—to openly license the data set, which can be troublesome especially if the researcher is working in collaboration with a commercial organization. According to Dr. Secker, TDM has been recognized in UK law since 2014, and it is not something that is currently available as a copyright exception in other European countries. This makes it difficult when working in partnership with others who may not have similar legal restrictions on how they can interact with specific datasets. Also, the barriers are not just legal—they can be technical as well, especially when considering factors such as Digital Rights Management (DRM) protection. This poses a conundrum for copyright—on the one hand, there’s an exception that indicates your ability to engage in TDM, but there are also technical limitations such as DRM or other technical protection measures which may prevent you from obtaining access.

When it comes to research, Dr. Secker reminds us that there are pre-existing ethical codes of practice that researchers must adhere to. For any researcher working in the field of copyright or TDM, they would have to get ethical clearance before conducting their research. Mr. Morrison also reminds us that intellectual property laws are not implemented for privacy purposes, but to incentivize creativity and investment in information goods. Privacy concerns are a separate issue from copyright and it’s important to keep them separate when addressing them.

See below for a transcript of the interview (transcript has been edited for clarity and readability).

What are some examples of the kinds of data mining that researchers should legitimately be able to do?

Mr. Morrison: Well, I think they should be able to mine legitimately acquired sources of data, specifically subscription databases that academic institutions subscribe to. In our view, and the view of many information professionals, we have paid to get legitimate access, and we should be able to run computational analysis and algorithms on those datasets in order to understand the facts and the underlying patterns within that information source. But also, beyond that, anything that has value to pure research, whether that be science, social science, or even humanities, anything where new knowledge can be created, and new understandings can be created after the information source, that should be something that researchers should be able to do without having to go into a very complex and potentially expensive process of getting additional permissions. In summary, the right to read should be the right to mine.

Dr. Secker: The only thing I would add to this is that the law should cover data in all sorts of formats. It should cover full text subscription databases, but the researcher might be mining abstracts as well, such as the case in large scale systematic reviews, so it should cover abstracts, image data as well, where you’ve got digitized collections. In my previous role as the copyright and digital literacy adviser at the London School of Economics (LSE), we had historical sources that had been digitized and they were mainly image-based, although some of them had been converted to text, but being able to mine all sorts of different data is crucial to researchers, and there was a lot of interest in this from researchers.

What legal barriers stand in the way of this and, if you could, tell us about the proposed exception for TDM that’s proposed in Europe.

Mr. Morrison: Well, I think the legal barrier to this from the perspective of the researcher is the numerous licensing terms, terms and conditions, and different laws that for most people are very complicated and worrying. So, the area of research that Jane and I are most interested in is how copyright is perceived and how it’s experienced by those involved in research and education. In our experience, most of them are under a lot of pressure from many different sources that have funded to make their research available in certain ways to publish on an open access basis. At the same time, there are ethical concerns that they have to abide by and therefore copyright and associated rights, such as database rights, are just another aspect of a great many things that they have to make sure they get right and it’s something they find hugely complicated. Questions such as what is commercial and what is non-commercial can also become a barrier when they’re working with other partners in what could be regarded as commercial organizations.

Dr. Secker: We’ve had TDM in UK law since 2014 [], which obviously, other European countries don’t have at the moment. So, if we might want to work with a partner that is outside the UK, and the fact that this would be harmonized as something across Europe, it would help for those kinds of projects because at the moment, it is only something we’ve had for four years in the UK and there’s still been quite a lot of difficulty getting the message out there that it is something that is permitted. The barriers aren’t necessarily legal; a lot of them are technical, so they could be related to things like DRM. That has caused some problems in examples I know of where, essentially, databases or some kind of web-based source will have some sort of mechanism to stop you from downloading the amount of data that you need to perform TDM and if they use DRM, then you get into quite a difficult situation legally because you can’t circumvent the DRM because that’s illegal to do. So, what takes the precedence? You’ve got an exception that says you’re allowed to do TDM but if you’ve got a DRM on there in some form and you need to apply to have it taken off, you can’t just sort of hack into the system, which would be a way around it. But the kind of issue about Europe I think is significant that, where it’s a project that might be working across more than one country, having that exception only in the UK, I think it’s potentially meant that there haven’t been large-scale projects to look at from a sort of European level yet.

Mr. Morrison: Yes, and also to add at the European level that question about DRM or Technical Protection Measures (TPM): we’re obviously part of a process and there’s been some developments today on what’s happening with that final vote that’s going to the vote in September []. But there are potential worrying provisions in there around fixing that situation with the TPM in law so that there is no way to kind of get around that at all even at a local level. Jane has had the experience of referring a potential TDM example to the UK Intellectual Property Office because we wanted to remove the TPM, and that’s possibly going to be changed at the European level which would make that impossible to do. Also, the European proposal which is to limit it to research institutions only could be problematic where we are working, as I mentioned earlier on, in partnership with other organizations, that will potentially limit what researchers can do.

Given the current controversies about data mining by social media companies and political consulting companies, privacy issues have risen to prominence. How would the proposed copyright exception intersect with privacy law and what types of research would not be permitted given European privacy regulation?

Dr. Secker: This is an interesting question. I think in terms of social media data for example, I’ve run into a number of situations about using social media in research, how to sort of harvest data out of Facebook and Twitter particularly. There’s a lot of interest from researchers in doing new types of research and I think one of the things to remember is that there are ethical codes of practice that already exist. So, the Association of Internet Researchers have a strict code of conduct if you’re doing this type of research where privacy and the use of personal data is really clearly considered. I had a number of examples where people would come, often Ph.D. students, where they might have harvested data out of blogs or from social media and a lot of this came down to informed consent and what that means when you are taking data that somebody’s put out on the web. It doesn’t mean it’s fair game to do what you want with it. Obviously, there are huge concerns at the moment with changes to data protection, that privacy should somehow trump copyright and become the kind of thing that we always have to be mindful of. But, I think for any researcher that’s working in this space, they would be getting ethical clearance and I think privacy would be a massive concern. I think if you’re doing a project that involves a very sensitive area, perhaps you’re using a hashtag exposing people’s identity and things that they say as individuals; that’s just kind of unethical from the start really.

Mr. Morrison: Yes, I think when having conversations with people about how to overcome the potential barriers that intellectual property laws provide, the conversation often turns towards privacy, and people will say well, does copyright stop me from doing this in order to protect people’s privacy? I think we’re very clear that intellectual property laws are not there for privacy purposes; they are there to incentivize creativity or the investment in information goods, and the recent General Data Protection Regulations (GDPRs) do create a challenge for researchers using TDM. For example, if they decide they have lawful access to an information source which involves lots of personal data, they would be allowed to do that under copyright law or database rights and the TDM provisions certainly in the UK, but they wouldn’t necessarily have permission to use that personal data for a secondary purpose. For example, to provide their dataset to somebody else to then go and look at and draw their own conclusions because that original data subject would only have given their permission for it to be used by the original service, the original party that had taken it. So, researchers have this issue, but in a way that’s a separate issue from copyright and it’s quite important I think to keep those separate when addressing them.

Dr. Secker: But I think it is about looking at the data while getting ethical clearance. Just because you’re not talking to individuals and interviewing them or getting the data from a questionnaire because you might be doing some sort of large scale mining of something like Twitter, it doesn’t mean that those people’s identity are fair game to be sort of reproduced completely un-anonymized. But it is something people that do social research, I think if they’ve moved into this space and they haven’t done research using these types of sources before, it’s something you can cover in research training and that was certainly what we were trying to do in my previous role. We ran a couple of really successful workshops where we got them to understand what the legal issues were, but really importantly what the ethical issues were with using that type of data.

Recap of the 35th session of the IGC

Rice field in Madagascar (Photo: UN Photo/Lucien Rajaonina).

From March 19th to 23rd, 2018, the World Intellectual Property Organization’s (WIPO) IGC met for its 35th session in Geneva, Switzerland. The Draft Agenda for the session outlines the tentative topics of discussion, such as an update on the operation of the Voluntary Fund pertaining to the participation of Indigenous peoples and local communities, a summary of which can be found here. Also on the draft agenda for discussion are reports, recommendations, and proposals pertaining to genetic resources, which can also be found in the summary of documents.

In regard to the Voluntary Fund, the Information Note on Contributions and Applications for Support provides an exhaustive list of the voluntary contributions paid to the fund by nations, the amount of resources available, and the list of persons who were recommended for funding as of January 26th, 2018, as well as those who are seeking support to attend the IGC’s 36th session. Here, it is worth noting that Canada proposed in its 2018 Federal Budget Plan that it will be allocating an investment of $1 million over the span of five years to allow Indigenous peoples to attend and participate in WIPO meetings pertaining to traditional knowledge and cultural expressions as a way of promoting intellectual property rights amongst its Indigenous communities.

During the course of the meetings, new proposals were submitted, and while some nations accepted them, others resisted. For instance, a revised proposal for a potential treaty preventing the misappropriation of genetic resources received much resistance from developing countries on the grounds that the U.S. introduced new issues this week that were not mentioned in the previous version (Saez). In response, a second version was created for consideration by member states, and the committee chair also created a proposal indicating the need to create an expert group to prevent the misappropriation of genetic resources prior to the next session of the IGC, and this proposal was met with positive reception (Saez).

Interview with Ms. Teresa Hackett

Last week we had the pleasure of interviewing Ms. Teresa Hackett, Copyright and Libraries Programme Manager at Electronic Information for Libraries (EIFL) that works with libraries to enable access to knowledge in developing and transition economy countries in Europe, Africa, Asia Pacific, and Latin America. The Copyright and Libraries programme aims to build capacity of librarians in copyright issues, develop useful resources, and advocate for national and international copyright law reform.

We asked the following questions:

  • What are the three biggest problems for international copyright that you hope WIPO’s work can address?
  • Is the Standing Committee on Copyright and Related Rights (SCCR) making progress in solving those problems?
  • What hurdles do you see in the SCCR’s work toward solving those problems?

The three biggest problems identified by Ms. Hackett were inequalities between nations on the right to legally access and use information for education, research, and personal developments; barriers to cross-border access and use of information; and the replacement of copyright law with licenses for electronic resources. She stated that the SCCR is making progress addressing these problems, albeit quite slowly, which is often the case in international law. Currently the focus is on the important issue of agreeing on a workplan for the next biennium to set out a roadmap for the topics. As far as hurdles go, she indicated that there is some contention between developing and developed countries as to what the solution should be; while developing countries want a solution that’s international and binding, like an international treaty, for example, developed countries do not see a need for an international solution.

See below for a transcript of the interview.

What are the three biggest problems for international copyright that you hope WIPOs work can address?

First, the biggest problem is inequalities between nations on the right to legally access and use information for education, research, personal developments, and so on, in particular for digital information. So, it’s inequalities—a lack of equality between nations on the copyright laws of nations. There’s a big divergence around the world in copyright laws as to what libraries are and are not allowed to do for their activities.

The second problem is that there are barriers to cross-border access and use of information.  That’s due to the territorial nature of copyright. As you know, the Internet is global, and information needs don’t stop at the border. But copyright laws often prevent libraries from sharing or providing information services across borders. In fact, because that’s an international problem, only an international organization like WIPO has the scope and the mandate to properly address it.

The third problem is that copyright law is being, to a large degree, supplanted or replaced by licenses for electronic resources. These licenses often take away user rights that are set out in the copyright law. We view that as kind of undermining copyright laws, so we would like to see some way to protect the limitations and exceptions that are set out in copyright law in the licenses so that in the future copyright law still has a very strong place in how we access and use information.

Would you say that the SCCR is making progress in solving these problems? 

I would say yes overall. The Committee adopted a list of eleven topics for discussion which were debated over two years in the Committee. So, we had a list of eleven topics related to library and archive activities, such as preservation, right of reproduction, legal deposit, lending, parallel importation, cross-border uses, orphan works, TPMss, contracts, liability, and translation as well.

The resulting document was known as the ‘Chair’s chart’. Then the Chair proposed to reduce the eleven topics to nine and in fact, he also took out another two sub-topics. A suggested approach was made on seven topics, with further discussion needed on two topics (contracts and translation).

So, that phase of the work has been completed, and under the guidance of a new Committee Chair, the Committee is discussing a workplan for the next biennium, so for 2018 to 2019 when we hope we will be able to make further progress on the topics and to look at what the possible solutions might be.

So, we are making progress, but the progress is quite slow, as is the case in international law making. But I believe progress is being made.

You already said that one of the biggest hurdles would be the speed in which the changes would occur. Aside from that, what other hurdles do you see in the SCCRs work moving forward to address these issues?

Well I think it’s fair to say that all member states support the work of libraries, understand the value of libraries, and how libraries contribute to providing access to information and knowledge. Libraries contribute, for example, by preserving the memory of the world, providing access to our cultural and linguistic heritage, and supporting learning, education and research.

The problem is finding an agreement on a solution to the problems that the library and the archive community are presenting to the Committee and the member states. You could say that there is a split between industrialized countries and developing countries. Developing countries want a solution that’s binding and effective—likely along the lines of an international treaty or other binding international instrument—whereas the industrialized countries don’t see the need for an international solution at all. They believe that all the problems can be resolved at a national level and they only want to discuss best practices and national experiences. So, we have a difference of opinion and to some extent, an impasse as to what the solution should be.

Therefore, the biggest hurdle is really lack of political support from industrialized countries even though some of those same countries are themselves going through copyright reform processes. We hope that maybe when they have completed their own copyright reforms, they might be more ready to engage in discussions on what the solutions might be at the international level, not just at their own national level or regional level, as in the case of the European Union.

Nigeria Ratifies Copyright Treaties while the IGC Attempts to Reach Agreement on Proposals

As the UN World Intellectual Property Organization’s (WIPO) General Assembly gets underway and the budget for the upcoming biennium is determined, there are two developments that occurred during the first week of meetings that are worth mentioning: (1) Nigeria’s ratification of WIPO copyright treaties and (2) the debates surrounding the interpretation of the Intergovernmental Committee on Intellectual Property and Genetic Resources, Traditional Knowledge and Folklore’s (IGC) proposed programme of work.

As of October 4th, 2017, Nigeria has adopted and ratified the WIPO Copyright Treaty (WCT), WIPO Performances and Phonograms Treaty (WPPT), Marrakesh Treaty to Facilitate Access to Published Works for Persons Who Are Blind, Visually Impaired, or Otherwise Print Disabled (Marrakesh Treaty), and the Beijing Treaty on Audiovisual Performances (Beijing Treaty). Audu Ayinla Kadiri, a Permanent Representative of Nigeria to the UN and other international organizations in Geneva stated that “we [Nigeria] have a very creative Nollywood industry, we have young and enterprising entrepreneurs, we have new businesses coming up here and there, we have a lot of innovative hubs in Nigeria. So, these treaties will serve as a boost to all these trends, which are very positive”.

As discussed in our previous post, the Delegation of Senegal on behalf of the African Group submitted a proposal for the IGC regarding a potential work program for the 2018/19 Biennium. In addition to this, the European Union (EU) created a proposal as well for the mandate of the IGC. The main differences between the two proposals are that The African Group’s proposal “envisages the convening of a high-level negotiating meeting (diplomatic conference) in the first quarter of 2019 to conclude and adopt a legally binding instrument to protect genetic resources”, whereas the EU proposal “is based on further studies and examples of national experiences to narrow gaps on core issues, such as definitions, subject matter, objectives, and the relationship with the public domain”. More importantly, the EU proposal states that until everything in its proposed mandate is accepted, it will be understood that nothing has been accepted. This has led to a contention amongst member states—particularly developing countries—on how to interpret and adopt the EU proposal’s core premises.

The IGC will aim to reach an agreement on the proposal’s main objectives and will report its progress in 2019.

WIPO’s move to open access laudable among international organizations

It’s important, in the interests of transparency, accountability, and access to knowledge, that international organizations adopt open licencing of their publications and records.  WIPO’s recent move to implement a new Open Access policy puts it among the forefront of international organizations adopting Open Access policies.

In 2013, Creative Commons announced a new Creative Commons licence specifically designed for international organizations (IGOs). International Organizations’ Creative Commons licences are similar to other Creative Commons licences, but are specially designed for international organizations, which have special copyright licencing requirements due to their privileges and immunities in national legal processes.

In adopting the Creative Commons-IGO licence, WIPO is at the forefront of UN agencies and other international organizations.  WIPO joins UNESCO and a number of UN-funded programs and portals, as well as the World Health Organization (WHO) and the World Bank Group in adopting the licence.  WIPO and other UN agencies must go further.

The 2012 Hague Conference on Private International Law unanimously endorsed a set of recommendations that included a set of 18 guiding principles on access to law (Greenleaf, Mowbray, & Chung, 2013).  The principles of access to law widely endorsed by states call for not just open access to legislation and case law, but also open access to “relevant historical materials, including preparatory work and legislation that has been amended or repealed, as well as relevant explanatory materials.”   Further, they call on states to “permit and facilitate the reproduction and re-use of legal materials[…]by other bodies, in particular for the purpose of securing free public access to the materials, and to remove any impediments to such reproduction and re-use.”

While not perfect, WIPO has been noted for its relative transparency and the accessibility of its documents.  WIPO’s new Open Access initiative is a part of this.  However, the licence, at present, will be applied to WIPO publications published online on or after November 15, 2016 and other select content.  In the interests of continued openness, transparency and accountability, WIPO should apply the Creative Commons-IGO licence to all of its publications, including historical materials, meeting documents, and explanatory materials, and including documents published prior to November 2016.  In an era of increasing skepticism of globalization, WIPO can build on its own best practices and take the lead on transparency and free access to law.


With thanks to Coralie Zaza for her extensive research on the Open Access policies of international organizations.