Legal Obstacles for Big Data Processing in the Service of Artificial Intelligence (acronym BIG DATA 4 AI)

Interdisciplinary research project funded by the Technology Agency of the Czech Republic (2021-2023) No. TL05000550


The goal of the project is to support the Czech industry and the development of AI by the means of direct assistance to relevant companies. Firstly, we aim to define areas in which there are legislative obstacles for AI start-ups and companies with industrial applications of AI. For that reason, we distributed an online questionnaire to companies and research centres in the field of AI. allowed us to contact many of them given the network they created. Subsequently, a number of qualitative dialogues were conducted by our researchers with the representatives of the companies and centres. The aim is to find out the most relevant issues and create guidelines that will address them from a legal standpoint, f. e. issues with big data processing, storage, and access. 

Aside from the guidelines, several instructional videos are being created in cooperation with media experts from the Faculty of Social Sciences. As a result of cooperation with and experts from Czech Technical University, we can approach companies developing AI and create partially automated advisory tools for better orientation in the legal jungle. The project fulfils some of the goals of the Czech National Strategy on artificial intelligence, where legal and ethical issues are among the obstacles that need to be tackled to promote the use and development of AI in the Czech Republic.

An interdisciplinary team was assembled, including experts from the Institute of State and Law of the Czech Academy of Sciences (ÚSP AV), the Faculty of Social Sciences of Charles University (FSS UK) and


The instructional video Big Data, or big data, provides basic information about this concept. It shows examples of the use of big data and tools for its analysis and processing – such as big language models and their use, among other things, for text translation but also the dangers of supporting prejudice. The video concludes with a reminder that it is always the content of the training data that fully determines the model results. When the training data contains biases, the model learns them, and then they show up when working with the model.

The Artificial Intelligence Instructional Video provides information on what the term encompasses. Artificial intelligence refers to a scientific field, but the term is often used to refer to a specific algorithm or artificial neural network. It is a broad and overused term, but that does not change the fact that it covers a wide range of activities. The video provides information on the development of general artificial intelligence, which in theory should solve any problem, but in practice the results of its efforts are highly dependent on the intent, and most importantly, the sufficiency and quality of the training data. You will also learn why regulation is essential for the future impact of AI on society.

The instructional video Legal Challenges and Risks of Big Data discusses in particular how to legally use data for AI training. It highlights the different options for protecting data. These include copyright law, which includes database law. Here, the processor must be careful about the content of the license, which can also take the form of a creative commons or open license. The content and type of licence determines the conditions and scope of the power to deal with the work. The video explains the concepts of personal and non-personal data and confidential information, and how they can be handled. For inspiration, we list open data resources.

The instructional video Compliant Data Processing covers the area of risk analysis and records of data processing and personal data. It provides guidance on how to approach the selection of an appropriate dataset at the outset, how to treat contractual relationships for data handling and what other documents are needed. It provides information on how the developer should find out whether there are other obligations, for example, arising from cyber security or sectoral legislation.


An article on the topic of legal offences in the area of personal data processing from Ján Matejka and Pavel Mates was published in the journal Právník (issue 10/2021).

Workshop at the Institute of State and Law on the topic of the European proposal on the Data Act that was conducted on the 18th of May 2022.

Questionnaire in Czech and English language that was distributed among companies and research centres developing and using AI for their operations. An excerpt of the research report based on the collected data.



    Lenka Kučerová

    Lenka Kučerová joined in the spring of 2019 and led the organization for three years. From 2022 onwards, she focuses on community development and partnerships. Lenka has been actively developing the Czech entrepreneurial environment since 2010, when she founded the CzechAccelerator program at CzechInvest. She ran two Prague-based startup accelerators StartupYard and Wayra CEE and in 2015 co-founded the StarLift Foundation. At the startup studio CEAi she was responsible for community development, marketing and corporate culture.


    Pavel Kordík

    doc. Ing. Pavel Kordík, PhD works for CTU in Prague as a vice dean for industrial collaboration, researcher and educator. His research activities range from data mining automation to artificial intelligence, meta-learning, optimization and information visualization. As an entrepreneur, he cofounded, to build a bridge between academia and industry. He also commercialize outcomes of AI research in the Recombee company providing recommender systems as a service. Pavel is also co-founder of and member of it's executive committee.


    Vojtěch Vančura

    Ing. Vojtěch Vančura is a machine learning researcher at Recombee and a Ph.D. student at the Faculty of Information Technology at the Czech Technical University. His research focuses on the scalability of shallow methods for collaborative filtering and the use of large language models and computer vision for the cold-start problem in recommender systems.


    Václav Moravec

    PhDr. Václav Moravec et PhD works at the Institute of Communication Studies and Journalism at the Faculty of Social Sciences, Charles University in Prague and at the Department of Production at FAMU. He specializes in the transformation of audiovisual media, journalistic ethics, automated journalism and artificial intelligence journalism. He is known also for the TV discussion programmes Otázky Václava Moravce and Fokus Václava Moravce airing on Czech Television. He is a member of the Executive Committee of the Artificial Intelligence Initiative


    Veronika Příbaň Žolnerčíková

    Mgr. Veronika Příbaň Žolnerčíková, PhD is an expert in ICT law, a research fellow at the Institute of State and Law of the Academy of Sciences of the Czech Republic as well as at the Institute of Law and Technology, Faculty of Law, Masaryk University. Her research topic is artificial intelligence from a legal perspective. She has also focused on the law of new technologies in her previous position at the Legislative Department of the Ministry of Justice and a law firm. Veronika occasionally contributes to teaching at Masaryk University and Charles University.


    Ján Matejka

    JUDr. Ján Matejka, PhD is the Director of the Institute of State and Law of the Czech Academy of Sciences. His professional experience has spanned the areas of intellectual property, ICT, civil, contract and business law. His professional experience involves also e-government, electronic signatures, information security and telecommunications. He is a senior lecturer of Data Protection Law at the Faculty of Mathematics and Physics of Charles University in Prague. He is the author of many works on the Internet and computer Law.


    Eva Fialová

    JUDr. Eva Fialová, LL.M., PhD works as a researcher at the Institute of State and Law of the Czech Academy of Sciences as well as a practising lawyer. She has experience in data protection law and information technology law in general. At the Institute of State and Law, Eva participates in projects analysing legal aspects of AI (autonomous systems and autonomous vehicles) and Big Data. She publishes in professional journals and gives lectures on ICT law. She is a member of expert groups on this topic.


    Tereza Novotná

    Mgr. Tereza Novotná, PhD is an assistant at the Institute of Law and Technology at Masaryk University and a research fellow at the Institute of State and Law of the Czech Academy of Sciences. Her primary research topic is legal informatics, specifically legal information retrieval and legal text processing. At the same time, she is interested in the algorithmisation of legal rules and their translation into machine-readable form. She further focuses her research on interaction with the user, i.e. the addressee of the law, which she considers a key component of her research.


    Kateřina Turková

    Ing. et Mgr. Kateřina Turková, PhD is a researcher and associate lecturer at the Institute of Communication Studies and Journalism of Charles University, Czech Republic. She obtained her PhD and Master’s degree in media studies from the Faculty of Social Sciences of Charles University and her Master’s degree in economics and economic administration from the University of Economics in Prague. In her academic research, she focuses on issues associated with social media, sports, and quantitative research.


    Irena Prázová

    Mgr. Irena Prázová, PhD is a Head of the CEMES Center for Media Studies at the Institute of Communication Studies and Journalism of the Faculty of Social Sciences of Charles University, Czech Republic. She is also the director of the faculty library. In her academic research, she focuses on issues associated with media and information literacy.