Andrew Marble
marble.onl
andrew@willows.ai
August 30, 2023
The Electronic Frontier Foundation, a digital civil liberties advocacy group, recently published a good article entitled “ISPs Should Not Police Online Speech—No Matter How Awful It Is.”1 It argues that providers within the internet “stack” should refrain from censoring traffic according to their own values. That laws and regulations already exist to deal with norms around content; subjective single chokepoint censorship will inevitably be abused and not limited to things universally agreed as worth blocking; and that once infrastructure providers start applying their own judgements, pressure will mount to censor more and more things. Importantly, they’re not advocating for a free-for-all, but against vigilante justice, advocating that legal and regulatory mechanisms be used to deal with infringing content.
The AI industry would be well served by taking a similar stance with respect to model licensing. A core principle of open source software licenses is non-discrimination in how the software is used. Copyright holders open sourcing their licenses rightly do not concern themselves with end uses. This is not an invitation for illegal use (which is already illegal). It’s an implicit acknowledgement that a license is the wrong place to try and enforce opinions, norms, laws, whatever you want to call them, about use.
Recently we’ve seen (so-called) “ethical” and commercially oriented licenses pushed as alternatives to open-source. I’m talking specifically here about AI models where the weights are released for download, not about served models or infrastructure. For example, the OpenRail++ license used by StableDiffusionXL includes the following schedule2:
You agree not to use the Model or Derivatives of the Model:
In any way that violates any applicable national, federal, state, local or international law or regulation;
For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
To generate or disseminate verifiably false information and/or content with the purpose of harming others;
To generate or disseminate personal identifiable information that can be used to harm an individual;
To defame, disparage or otherwise harass others;
For fully automated decision making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation;
For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics;
To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
For any use intended to or which has the effect of discriminating against individuals or groups based on legally protected characteristics or categories;
To provide medical advice and medical results interpretation;
To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment (e.g. by text profiling, drawing causal relationships between assertions made in documents, indiscriminate and arbitrarily-targeted use).
This combines things that are already illegal, things that are subject to current regulation, things that arguably should be subject to regulation, and judgements of the copyright holder about what AI should not be used for. Likewise, Facebook’s LLaMA model contains various usage restrictions3, including the usual enumeration of illegal things but notably (bolding mine)
2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following:
a. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
b. Guns and illegal weapons (including weapon development)
c. Illegal drugs and regulated/controlled substances
d. Operation of critical infrastructure, transportation technologies, or heavy machinery
These are clearly value judgements about specific things. While we don’t know the motivations, the result is that Facebook has acted as a kind of vigilante censor telling us how we can and cannot use source code in an extra-legal way.
The choice of license resides with the companies releasing the models. However, the resources required to build these models put them firmly in the same category as backbone infrastructure providers, and so applying their own values on uses has the same negative effect as internet censorship. They’ve already crossed from just restricting purely illegal uses to specific legal ones (guns, military), as well as restricted “disinformation” which has become a synonym for disagreement. It’s only a matter of time until special interests push for more restrictions, including ones that you don’t agree with. To make matters worse, past open source licenses were uniform (other than the copyright/copyleft divide) making assembling a system out of different software components straightforward. If we have to start taking the morals imposed by every model provider into account when building AI solutions, this adds an added complication that will surely benefit some bureaucrat but that’s overall a dead weight on the ecosystem.
Again, there are existing laws and regulations that deal with inappropriate uses, and current efforts to enact more to deal specifically with AI. Advocating against corporate censorship or restriction doesn’t mean agreeing with all uses, it means imposing the restrictions democratically and not through vigilantism. So far these restrictions have received little attention in the AI community. We need stronger norms and more pressure calling out restrictive licenses, even if we agree with the current restrictions.
https://www.eff.org/deeplinks/2023/08/isps-should-not-police-online-speech-no-matter-how-awful-it↩︎
https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md Note in the source how the model terms are pasted as a difficult to read unformatted blob.↩︎
https://ai.meta.com/resources/models-and-libraries/llama-downloads/↩︎