AI & Law : dataset for machine learning


The potential of artificial intelligence and associated technologies for law is at the core of the concerns of the legal community.

However, applications are limited today, legaltech professionals lacking data specific to the French language and legal language to feed and advance their algorithms. This lack of learning data also makes it impossible to address the ethical dangers associated with the development of artificial intelligence tools.

This is why the Open Law association Open Law *, within the framework of the "Ethics and Algorithms" mission of the CNIL, has decided to solicit its community around a project on learning data.


  • Demonstrate the feasibility and usefulness of building quality training datasets;
  • Explore methodologies for creating learning data;
  • Document the process to make it replicable for other datasets.
  • In accordance with the principles established by the association, the game (s) of data produced will subsequently be usable by all in an open format.


  • Lack of training data specific to the French language and legal language.
  • Ethical dangers related to the development of artificial intelligence tools without quality training data.
Proposed Solutions
  • Use the Open Law community, which brings together the necessary legal and technical skills, to form one / more set (s) of training data, which will then be usable by all.
  • Use Court cases Data (as a first step) as an extension of the work undertaken under the Open Case Law program

Camille Ledouaron - ELS Francis Lefebvre

Ressources du programme