Terminological Rigor

More rigorous definitions of what the term open-source means, and criticism against models that claim openness without its preconditions—or “open w”.

More pedantic mental models/definitions of open source.

  • Open weights alone does not make a model open source. (open_source_weight, 13)
  • “Open source” gets used to describe models that in fact do not have their training data (or some other element that the commenter requires for something to be open source), which is not only a misleading description for users but also a dishonest marketing tactic. (open_source_marketing, 8)
  • For a model to earn the title of open source, it must be replicable, meaning all of its training data and steps should be well documented for anyone else to recreate the model. (open_source_replicability, 7)
  • Models (or software) is only open source if it comes with an open source compatible license. Often times, such licenses must grant the freedom use, study, modify, and distribute. Source available ≠ open source. (open_through_license, 7)
  • This is related to ^opensourcereplicability7, in that limitations in documentation mean the model isn’t replicable, or at least, difficult to be replicated. (limited_documentation_open, 4)
  • There are legal implications of open source licenses, in that they defend certain rights. Highly simlar to ^openthroughlicense7, but more specifically on the point of legal implications. (open_legal_implication, 4)
  • Models should be labeled open source according to definitions from trusted sources, like the Open Source Initiative. (open_trusted_sources, 3) ^opentrustedsources3
    • In direct opposition is ^criticalownershipopen2, which calls to question whether these “authorities” are trustworthy.

Less pedantic mental models/definitions of open source. Generally speaking, these could be considered closer to Openness Pragmatism, since they are more concerned with the practical elements of running a model than they are about the technical elements of what makes up a model.

  • Inference code is the only “source” needed to make the weights useful, it is a sufficient condition for open source. (inference_code_source, 3)
  • Models are qualitatatively different from the “software” in OSS. The comment under this subsubtheme isn’t detailed enough for catching implicit meaning, but it is likely saying that open weights in a model are not the equivalent of open code in a software. In some ways, both this subsubtheme and the one below are connected to ^opensourceweight13, as it boils down to the question of whether weights are enough or not. (models_not_oss, 1)
  • In direct opposition, models are not qualitatively different from the “software” in OSS. The point of the one comment in this subsubtheme is that being open source in software usually means having the code freely shared, and if the weights are the equivalent of code, open weights are enough to qualify a model as open source. (models_are_software, 1)

Criticism of the state of affairs, i.e. how things are being done in the open model space right now.

  • It is better for standards like APIs to come from open source. These posts are specifically talking about the fact that OpenAI’s APIs are de facto for even open source/weight model inference. (open_source_standard, 3)
  • There is an open research problem in ML, not just for frontier models. (open_no_replicability, 1)
  • Real open source is not waiting for the next open weight model. (open_big_tech, 1)