Hugging Face and ServiceNow launch BigCode, an AI systems project generating open source code

Code generation systems such as DeepMind's AlphaCode, Amazon's CodeWhisperer, and OpenAI's Codex, which powers GitHub's Copilot service, offer fascinating insight into what's possible with AI today. today in the field of computer programming. But so far, only a handful of these AI systems have been made freely available to the public and open source, reflecting the commercial incentives of the companies building them.

In a bid to change that, AI startup Hugging Face and ServiceNow Research, the R&D arm of ServiceNow, today launched BigCode, a new project that aims to develop "state-of-the-art" AI systems. technology" for code in an "open" environment. and responsible”. The goal is to eventually release a dataset large enough to form a code generation system, which will then be used to create a prototype - a 15 billion parameter model, larger than Codex (12 billion parameters) but smaller than AlphaCode (~41.4 billion parameters) - using ServiceNow's internal graphics card cluster. In machine learning, parameters are the parts of an AI system learned from historical training data and essentially define the skill of the system on a problem, such as code generation.

Inspired by Hugging Face's BigScience efforts to open up highly sophisticated text-generating systems, BigCode will be open to anyone with a professional background in AI research who can commit time to the project, organizers say . The application form was put online this afternoon.

"In general, we expect candidates to be affiliated with a research organization (academic or industrial) and to work on the technical/ethical/legal aspects of [big language models] for applications of coding," ServiceNow wrote in a blog post. Publish. "Once the [code generation system] is trained, we will evaluate its capabilities... We will strive to make the evaluation easier and broader so that we can learn more about the [system's] capabilities. "

By collaboratively developing a code generation system, which will be open source under a license that will allow developers to reuse it subject to certain terms and conditions, BigCode seeks to resolve some of the controversies that have arisen around the practice of AI-powered code generation – especially when it comes to fair use. The nonprofit Software Freedom Conservancy, among others, has criticized GitHub and OpenAI for using public source code, not all of which is permissively licensed, to train and monetize Codex. Codex is available through OpenAI's paid API, while GitHub recently started charging for access to Copilot. For their part, GitHub and OpenAI continue to assert that Codex and Copilot do not violate any license terms.

The organizers of BigCode say they will work to ensure that only files from repositories with permissive licenses make it into the aforementioned training dataset. Along the way, they say, they will strive to establish “responsible” AI practices for training and sharing code-generating systems of all types, seeking feedback from relevant stakeholders before proceeding. make political decisions.

ServiceNow and Hugging Face did not provide any timelines for when the project would be completed. But they expect it to explore several forms of code generation over the next few months, including systems that automatically complete and synthesize code from snippets and natural language descriptions and work in a wide range of domains, tasks and programming languages.

Assuming the ethical, technical, and legal issues are ever resolved, AI-based coding tools could significantly reduce development costs while allowing coders to focus on more creative tasks. According to a Cambridge University study, at least half of developer effort goes into debugging and not active programming, costing the software industry an estimated $312 billion a year.

Hugging Face and ServiceNow launch BigCode, an AI systems project generating open source code

Code generation systems such as DeepMind's AlphaCode, Amazon's CodeWhisperer, and OpenAI's Codex, which powers GitHub's Copilot service, offer fascinating insight into what's possible with AI today. today in the field of computer programming. But so far, only a handful of these AI systems have been made freely available to the public and open source, reflecting the commercial incentives of the companies building them.

In a bid to change that, AI startup Hugging Face and ServiceNow Research, the R&D arm of ServiceNow, today launched BigCode, a new project that aims to develop "state-of-the-art" AI systems. technology" for code in an "open" environment. and responsible”. The goal is to eventually release a dataset large enough to form a code generation system, which will then be used to create a prototype - a 15 billion parameter model, larger than Codex (12 billion parameters) but smaller than AlphaCode (~41.4 billion parameters) - using ServiceNow's internal graphics card cluster. In machine learning, parameters are the parts of an AI system learned from historical training data and essentially define the skill of the system on a problem, such as code generation.

Inspired by Hugging Face's BigScience efforts to open up highly sophisticated text-generating systems, BigCode will be open to anyone with a professional background in AI research who can commit time to the project, organizers say . The application form was put online this afternoon.

"In general, we expect candidates to be affiliated with a research organization (academic or industrial) and to work on the technical/ethical/legal aspects of [big language models] for applications of coding," ServiceNow wrote in a blog post. Publish. "Once the [code generation system] is trained, we will evaluate its capabilities... We will strive to make the evaluation easier and broader so that we can learn more about the [system's] capabilities. "

By collaboratively developing a code generation system, which will be open source under a license that will allow developers to reuse it subject to certain terms and conditions, BigCode seeks to resolve some of the controversies that have arisen around the practice of AI-powered code generation – especially when it comes to fair use. The nonprofit Software Freedom Conservancy, among others, has criticized GitHub and OpenAI for using public source code, not all of which is permissively licensed, to train and monetize Codex. Codex is available through OpenAI's paid API, while GitHub recently started charging for access to Copilot. For their part, GitHub and OpenAI continue to assert that Codex and Copilot do not violate any license terms.

The organizers of BigCode say they will work to ensure that only files from repositories with permissive licenses make it into the aforementioned training dataset. Along the way, they say, they will strive to establish “responsible” AI practices for training and sharing code-generating systems of all types, seeking feedback from relevant stakeholders before proceeding. make political decisions.

ServiceNow and Hugging Face did not provide any timelines for when the project would be completed. But they expect it to explore several forms of code generation over the next few months, including systems that automatically complete and synthesize code from snippets and natural language descriptions and work in a wide range of domains, tasks and programming languages.

Assuming the ethical, technical, and legal issues are ever resolved, AI-based coding tools could significantly reduce development costs while allowing coders to focus on more creative tasks. According to a Cambridge University study, at least half of developer effort goes into debugging and not active programming, costing the software industry an estimated $312 billion a year.

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow