Copyright Exceptions and Data Mining

On July 5th, 2018, we had the opportunity to interview two esteemed individuals within the field of copyright—Dr. Jane Secker and Mr. Chris Morrison. Dr. Secker is a senior lecturer in educational development at City, University of London and Mr. Morrison is the copyright software licensing and information services policy manager at the University of Kent. Both Dr. Secker and Mr. Morrison sit on the Universities UK / GuildHE Copyright Negotiation and Advisory Committee and are co-founders of the UK Copyright Literacy blog.

To provide some context to their responses, two key elements about European law should be noted. First, on June 1st, 2014, an exception was made to UK copyright law through the implementation of the Copyright and Rights in Performances (Research, Education, Libraries and Archives) Regulations 2014. This included the removal of barriers for Text and Data Mining (TDM) for non-commercial purposes. Second, the European Union (EU) is in the process of modernizing its copyright laws, and recently had a vote on the proposal for a Directive of the European Parliament and of the Council on Copyright in the Digital Single Market, which was rejected and will be revisited in September of 2018.

We asked Dr. Secker and Mr. Morrison the following questions:

  • What are some examples of the kinds of data mining that researchers should legitimately be able to do?
  • What legal barriers stand in the way of this and, if you could, tell us about the proposed exception for TDM that’s proposed in Europe.
  • Given the current controversies about data mining by social media companies and political consulting companies, privacy issues have risen to prominence. How would the proposed copyright exception intersect with privacy law and what types of research would not be permitted given European privacy regulation?

According to Mr. Morrison, the right to read should be the right to mine. Dr. Secker reiterated this notion and also stressed the importance of being able to legitimately mine various forms of data, whether it be full text subscription databases, abstracts, digitized collections, social media content, etc.

The legal barriers identified by Mr. Morrison include the various licensing terms, terms and conditions of websites, and differing laws around the world on data mining that are often very complicated even for researchers to fully grasp. Further, many researchers find themselves under pressure from external sources—such as those funding the research—to openly license the data set, which can be troublesome especially if the researcher is working in collaboration with a commercial organization. According to Dr. Secker, TDM has been recognized in UK law since 2014, and it is not something that is currently available as a copyright exception in other European countries. This makes it difficult when working in partnership with others who may not have similar legal restrictions on how they can interact with specific datasets. Also, the barriers are not just legal—they can be technical as well, especially when considering factors such as Digital Rights Management (DRM) protection. This poses a conundrum for copyright—on the one hand, there’s an exception that indicates your ability to engage in TDM, but there are also technical limitations such as DRM or other technical protection measures which may prevent you from obtaining access.

When it comes to research, Dr. Secker reminds us that there are pre-existing ethical codes of practice that researchers must adhere to. For any researcher working in the field of copyright or TDM, they would have to get ethical clearance before conducting their research. Mr. Morrison also reminds us that intellectual property laws are not implemented for privacy purposes, but to incentivize creativity and investment in information goods. Privacy concerns are a separate issue from copyright and it’s important to keep them separate when addressing them.

See below for a transcript of the interview (transcript has been edited for clarity and readability).

What are some examples of the kinds of data mining that researchers should legitimately be able to do?

Mr. Morrison: Well, I think they should be able to mine legitimately acquired sources of data, specifically subscription databases that academic institutions subscribe to. In our view, and the view of many information professionals, we have paid to get legitimate access, and we should be able to run computational analysis and algorithms on those datasets in order to understand the facts and the underlying patterns within that information source. But also, beyond that, anything that has value to pure research, whether that be science, social science, or even humanities, anything where new knowledge can be created, and new understandings can be created after the information source, that should be something that researchers should be able to do without having to go into a very complex and potentially expensive process of getting additional permissions. In summary, the right to read should be the right to mine.

Dr. Secker: The only thing I would add to this is that the law should cover data in all sorts of formats. It should cover full text subscription databases, but the researcher might be mining abstracts as well, such as the case in large scale systematic reviews, so it should cover abstracts, image data as well, where you’ve got digitized collections. In my previous role as the copyright and digital literacy adviser at the London School of Economics (LSE), we had historical sources that had been digitized and they were mainly image-based, although some of them had been converted to text, but being able to mine all sorts of different data is crucial to researchers, and there was a lot of interest in this from researchers.

What legal barriers stand in the way of this and, if you could, tell us about the proposed exception for TDM that’s proposed in Europe.

Mr. Morrison: Well, I think the legal barrier to this from the perspective of the researcher is the numerous licensing terms, terms and conditions, and different laws that for most people are very complicated and worrying. So, the area of research that Jane and I are most interested in is how copyright is perceived and how it’s experienced by those involved in research and education. In our experience, most of them are under a lot of pressure from many different sources that have funded to make their research available in certain ways to publish on an open access basis. At the same time, there are ethical concerns that they have to abide by and therefore copyright and associated rights, such as database rights, are just another aspect of a great many things that they have to make sure they get right and it’s something they find hugely complicated. Questions such as what is commercial and what is non-commercial can also become a barrier when they’re working with other partners in what could be regarded as commercial organizations.

Dr. Secker: We’ve had TDM in UK law since 2014 [https://www.gov.uk/government/news/new-exceptions-to-copyright-reflect-digital-age], which obviously, other European countries don’t have at the moment. So, if we might want to work with a partner that is outside the UK, and the fact that this would be harmonized as something across Europe, it would help for those kinds of projects because at the moment, it is only something we’ve had for four years in the UK and there’s still been quite a lot of difficulty getting the message out there that it is something that is permitted. The barriers aren’t necessarily legal; a lot of them are technical, so they could be related to things like DRM. That has caused some problems in examples I know of where, essentially, databases or some kind of web-based source will have some sort of mechanism to stop you from downloading the amount of data that you need to perform TDM and if they use DRM, then you get into quite a difficult situation legally because you can’t circumvent the DRM because that’s illegal to do. So, what takes the precedence? You’ve got an exception that says you’re allowed to do TDM but if you’ve got a DRM on there in some form and you need to apply to have it taken off, you can’t just sort of hack into the system, which would be a way around it. But the kind of issue about Europe I think is significant that, where it’s a project that might be working across more than one country, having that exception only in the UK, I think it’s potentially meant that there haven’t been large-scale projects to look at from a sort of European level yet.

Mr. Morrison: Yes, and also to add at the European level that question about DRM or Technical Protection Measures (TPM): we’re obviously part of a process and there’s been some developments today on what’s happening with that final vote that’s going to the vote in September [https://www.bbc.com/news/technology-44712475]. But there are potential worrying provisions in there around fixing that situation with the TPM in law so that there is no way to kind of get around that at all even at a local level. Jane has had the experience of referring a potential TDM example to the UK Intellectual Property Office because we wanted to remove the TPM, and that’s possibly going to be changed at the European level which would make that impossible to do. Also, the European proposal which is to limit it to research institutions only could be problematic where we are working, as I mentioned earlier on, in partnership with other organizations, that will potentially limit what researchers can do.

Given the current controversies about data mining by social media companies and political consulting companies, privacy issues have risen to prominence. How would the proposed copyright exception intersect with privacy law and what types of research would not be permitted given European privacy regulation?

Dr. Secker: This is an interesting question. I think in terms of social media data for example, I’ve run into a number of situations about using social media in research, how to sort of harvest data out of Facebook and Twitter particularly. There’s a lot of interest from researchers in doing new types of research and I think one of the things to remember is that there are ethical codes of practice that already exist. So, the Association of Internet Researchers have a strict code of conduct if you’re doing this type of research where privacy and the use of personal data is really clearly considered. I had a number of examples where people would come, often Ph.D. students, where they might have harvested data out of blogs or from social media and a lot of this came down to informed consent and what that means when you are taking data that somebody’s put out on the web. It doesn’t mean it’s fair game to do what you want with it. Obviously, there are huge concerns at the moment with changes to data protection, that privacy should somehow trump copyright and become the kind of thing that we always have to be mindful of. But, I think for any researcher that’s working in this space, they would be getting ethical clearance and I think privacy would be a massive concern. I think if you’re doing a project that involves a very sensitive area, perhaps you’re using a hashtag exposing people’s identity and things that they say as individuals; that’s just kind of unethical from the start really.

Mr. Morrison: Yes, I think when having conversations with people about how to overcome the potential barriers that intellectual property laws provide, the conversation often turns towards privacy, and people will say well, does copyright stop me from doing this in order to protect people’s privacy? I think we’re very clear that intellectual property laws are not there for privacy purposes; they are there to incentivize creativity or the investment in information goods, and the recent General Data Protection Regulations (GDPRs) do create a challenge for researchers using TDM. For example, if they decide they have lawful access to an information source which involves lots of personal data, they would be allowed to do that under copyright law or database rights and the TDM provisions certainly in the UK, but they wouldn’t necessarily have permission to use that personal data for a secondary purpose. For example, to provide their dataset to somebody else to then go and look at and draw their own conclusions because that original data subject would only have given their permission for it to be used by the original service, the original party that had taken it. So, researchers have this issue, but in a way that’s a separate issue from copyright and it’s quite important I think to keep those separate when addressing them.

Dr. Secker: But I think it is about looking at the data while getting ethical clearance. Just because you’re not talking to individuals and interviewing them or getting the data from a questionnaire because you might be doing some sort of large scale mining of something like Twitter, it doesn’t mean that those people’s identity are fair game to be sort of reproduced completely un-anonymized. But it is something people that do social research, I think if they’ve moved into this space and they haven’t done research using these types of sources before, it’s something you can cover in research training and that was certainly what we were trying to do in my previous role. We ran a couple of really successful workshops where we got them to understand what the legal issues were, but really importantly what the ethical issues were with using that type of data.

Interview with Ms. Teresa Hackett

Last week we had the pleasure of interviewing Ms. Teresa Hackett, Copyright and Libraries Programme Manager at Electronic Information for Libraries (EIFL) that works with libraries to enable access to knowledge in developing and transition economy countries in Europe, Africa, Asia Pacific, and Latin America. The Copyright and Libraries programme aims to build capacity of librarians in copyright issues, develop useful resources, and advocate for national and international copyright law reform.

We asked the following questions:

  • What are the three biggest problems for international copyright that you hope WIPO’s work can address?
  • Is the Standing Committee on Copyright and Related Rights (SCCR) making progress in solving those problems?
  • What hurdles do you see in the SCCR’s work toward solving those problems?

The three biggest problems identified by Ms. Hackett were inequalities between nations on the right to legally access and use information for education, research, and personal developments; barriers to cross-border access and use of information; and the replacement of copyright law with licenses for electronic resources. She stated that the SCCR is making progress addressing these problems, albeit quite slowly, which is often the case in international law. Currently the focus is on the important issue of agreeing on a workplan for the next biennium to set out a roadmap for the topics. As far as hurdles go, she indicated that there is some contention between developing and developed countries as to what the solution should be; while developing countries want a solution that’s international and binding, like an international treaty, for example, developed countries do not see a need for an international solution.

See below for a transcript of the interview.

What are the three biggest problems for international copyright that you hope WIPOs work can address?

First, the biggest problem is inequalities between nations on the right to legally access and use information for education, research, personal developments, and so on, in particular for digital information. So, it’s inequalities—a lack of equality between nations on the copyright laws of nations. There’s a big divergence around the world in copyright laws as to what libraries are and are not allowed to do for their activities.

The second problem is that there are barriers to cross-border access and use of information.  That’s due to the territorial nature of copyright. As you know, the Internet is global, and information needs don’t stop at the border. But copyright laws often prevent libraries from sharing or providing information services across borders. In fact, because that’s an international problem, only an international organization like WIPO has the scope and the mandate to properly address it.

The third problem is that copyright law is being, to a large degree, supplanted or replaced by licenses for electronic resources. These licenses often take away user rights that are set out in the copyright law. We view that as kind of undermining copyright laws, so we would like to see some way to protect the limitations and exceptions that are set out in copyright law in the licenses so that in the future copyright law still has a very strong place in how we access and use information.

Would you say that the SCCR is making progress in solving these problems? 

I would say yes overall. The Committee adopted a list of eleven topics for discussion which were debated over two years in the Committee. So, we had a list of eleven topics related to library and archive activities, such as preservation, right of reproduction, legal deposit, lending, parallel importation, cross-border uses, orphan works, TPMss, contracts, liability, and translation as well.

The resulting document was known as the ‘Chair’s chart’. Then the Chair proposed to reduce the eleven topics to nine and in fact, he also took out another two sub-topics. A suggested approach was made on seven topics, with further discussion needed on two topics (contracts and translation).

So, that phase of the work has been completed, and under the guidance of a new Committee Chair, the Committee is discussing a workplan for the next biennium, so for 2018 to 2019 when we hope we will be able to make further progress on the topics and to look at what the possible solutions might be.

So, we are making progress, but the progress is quite slow, as is the case in international law making. But I believe progress is being made.

You already said that one of the biggest hurdles would be the speed in which the changes would occur. Aside from that, what other hurdles do you see in the SCCRs work moving forward to address these issues?

Well I think it’s fair to say that all member states support the work of libraries, understand the value of libraries, and how libraries contribute to providing access to information and knowledge. Libraries contribute, for example, by preserving the memory of the world, providing access to our cultural and linguistic heritage, and supporting learning, education and research.

The problem is finding an agreement on a solution to the problems that the library and the archive community are presenting to the Committee and the member states. You could say that there is a split between industrialized countries and developing countries. Developing countries want a solution that’s binding and effective—likely along the lines of an international treaty or other binding international instrument—whereas the industrialized countries don’t see the need for an international solution at all. They believe that all the problems can be resolved at a national level and they only want to discuss best practices and national experiences. So, we have a difference of opinion and to some extent, an impasse as to what the solution should be.

Therefore, the biggest hurdle is really lack of political support from industrialized countries even though some of those same countries are themselves going through copyright reform processes. We hope that maybe when they have completed their own copyright reforms, they might be more ready to engage in discussions on what the solutions might be at the international level, not just at their own national level or regional level, as in the case of the European Union.

Recap of the 35th session of the Standing Committee on Copyright and Related Rights (SCCR)

Daren Tang, Director General of the World Intellectual Property Organization
Daren Tang, Director General of the World Intellectual Property Organization

From November 13-17, 2017, the World Intellectual Property Organization’s (WIPO) SCCR met for its 35th session in Geneva, Switzerland. The Draft Agenda for the session outlines the various topics and objectives that were to be discussed at these meetings. The most pressing and longstanding of these topics was the protection of broadcasting organizations and the limitations and exceptions for libraries and archives, as well as the limitations and exceptions for educational and research institutions and for persons with other disabilities.

The Draft Action Plans on Limitations and Exceptions for the 2018-19 Biennium which was also presented at the session, outlined the list of limitations and exceptions that were to be made for the selected actors. As stated by Teresa Hackett, the EIFL Copyright and Libraries Programme manager, an “action plan is important to give the Committee direction on its future work, as well as helping library groups prepare for their work ahead” (Hackett). Yet, despite the widespread acknowledgments by the members on the progress shown through the draft action plans presented by the secretariat, it was not formally adopted. Instead, it will be revised and presented at the SCCR’s 36th session in April of 2018.

There were many studies presented at the 35th session which outlined outstanding problems on copyright in the digital age. Some worth noting include the one presented by Professor Kenneth Crews, an attorney, who presented his study on Copyright Limitations and Exceptions for Libraries and Archives, indicating that “a number of countries have revised their copyright laws and the exceptions they provide to libraries and archives … fewer countries have no exception, and fewer countries are relying on general exception” (Saez). A Proposal to Advance Discussions was prepared by the Delegations of Argentina, Brazil and Chile, outlining a number of exceptions that “should not conflict with a normal exploitation of the programme-carrying signal and not unreasonably prejudice the legitimate interests of broadcasters and cablecasters” (Saez). Lastly, a Study and Additional Analysis of Study on Copyright, Limitations and Exceptions for Educational Activities was presented by Daniel Seng, a law professor at the University of Singapore, which examined “WIPO member states’ legislation as of August 2017 … to understand whether and how member states relied on the existing exceptions and limitations in the Berne Convention for the Protection of Literary and Artistic Works to construct their own limitations and exceptions in their national laws” (Saez).

So, what can we expect at the 36th session of the SCCR? According to the Draft Action Plans on Limitations and Exceptions for the 2018-19 Biennium, the five categories of limitations and exceptions remain intact: libraries, archives, museums, educational and research institutions, and persons with other disabilities (Balasubramaniam). The work plan suggests studies, brainstorming exercises, seminars, and conferences to take place in the upcoming year, however, no agreements have been made between countries on whether or not WIPO will establish new international rules in the aforementioned areas.

Upcoming agenda of next week’s SCCR meeting

Next week the World Intellectual Property Organization (WIPO) Standing Committee on Copyright and Related Rights (SCCR) is set to meet in Geneva, Switzerland for its 35th session from Nov. 13-17, 2017. Among the issues expected to be discussed at this meeting are the SCCR’s campaign to protect broadcasting organizations; accreditation of non-governmental organizations including the Center for Information Policy Research (CIPR) and the Canadian Museums Association (CMA); the scoping study on access to copyright related works by persons with disabilities; the scoping study on the impact of the digital environment on copyright legislation adopted between 2008 and 2016; and the updated study and additional analysis of study on copyright limitations and exceptions for educational activities. Other matters that will be discussed include limitations and exceptions for libraries and archives, as well as a proposal from Senegal and Congo to include the Resale Right (droit de suite) in the Agenda of Future work by the SCCR.

Reports and studies from the latest SCCR meeting

Photo: WIPO – Emmanuel Berrod.

As discussed in our previous post, the SCCR meeting held in November in Geneva discussed, among other topics, the limitations and exceptions for libraries and archives. Most notably, WIPO released a Study on Copyright Limitations and Exceptions for Educational Activities prepared by Professor Daniel Seng from National University of Singapore. The extended study reviews 2,048 pieces of copyright legislation  from 189 WIPO member states. It was a necessary addition to the other studies commissioned by WIPO mapping different limitations and exceptions in many different countries. As EIFL relates, the study focuses on eight categories of limitations and exceptions that relate to educational activities, forming the basis of an informal chart prepared by the Chair for further discussion on the topics by the committee. The revised chart will provide the basis for future discussion at the next SCCR meeting in May 2017.

Additionally, a preliminary presentation of the scoping study on limitations and exceptions for persons with disabilities, other than print disabilities, and a description of topics was discussed and will further be studied. (Item 22 of the Summary) The complete study is expected to be presented at the next SCCR meeting as well.

Of note, the Committee discussed the Proposal for Analysis of Copyright Related to the Digital Environment submitted in 2015 by the Group of Latin American and Caribbean Countries (GRULAC). Delegates of Chile delivered a powerful statement on behalf of GRULAC, acknowledging the digital work and help of archivists and museologists in Brazil and the U.K. in ensuring gender equity in the UN Charter of 1945.

In this statement, Chile explains that Brazilian scientist and diplomat “Bertha Lutz – with the help of delegates from Uruguay, Mexico, Dominican Republic and Australia – demanded the inclusion of women’s rights in the Charter and the creation of an intergovernmental body for the promotion of gender equality, while the plenipotentiary delegate of the United States and the British delegate opposed.

The Committee supported the proposals that were made by some delegations to commission a scoping study on the impact of digital developments on the evolution of national legal frameworks over the last ten years. A proposal was made to add the topic to the SCCR agenda as a standing agenda item.

EIFL concludes, on behalf of librarians and archivists everywhere, that “with a busy agenda ahead, we will have to work hard to ensure that limitations and exceptions for libraries, archives and museums get their rightful attention and that we keep moving forward in 2017 for the benefit of libraries everywhere.”