Talk: BigCode: Open and responsible research on code-generating AI systems
While code-generating AI systems like CoPilot have emerged as a powerful tool for professional developers, there are growing legal and ethical concerns around the development of these models. Questions have been raised as to whether these AI models respect current free software licenses---both for model training and generation---and what the social impact of this technology is on the free software community. The BigCode project is a scientific collaboration (with over 350 participants) working on the responsible development of code-generating AI systems. In this talk, we discuss how we navigate the legal-ethics-governance aspects around the development of these models, including how we developed a permissively licensed code dataset, give developers the option to remove their code from the training data, redact personally identifiable information (PII), and attribute generated programs to the original code snippet.
Harm de Vries is a staff research scientist at ServiceNow.
Leandro von Werra is a machine learning engineer at HuggingFace.
If you want to see us host more sessions like this, please consider donating to support the organization of LibrePlanet.