Economies of Scale

Other than convenience, cost is another factor that makes cloud offerings more attractive than local ones.

Tradeoffs in fine-tuning

  • Fine-tuning models is quite demanding, so people use cloud compute for it, though some do a proof of concept using their local machines. (cloud_fine_tuning, 8)
  • Some with a lot of local compute use their local computers for fine-tuning. (local_fine_tuning, 3)

Support for Cloud

Cloud’s advantages

  • Cloud chatbot/LLM service providers have a scaling advantage, as given the same compute, a company hosting models for multiple users typically can use it more efficiently than an individual. (cloud_chatbot_scaling, 10)
  • The cost-to-token ratio is favorable towards API services than it is to purchase a local computer. (cost_to_token, 4)
  • In cases when you need a large open model, some choose to get compute from the cloud, balancing privacy, cost, and need. (cloud_llm_hosting, 6)

Local’s disadvantages

  • Some open-weight models are prohibitively large to run on local compute. (model_too_large, 2)
  • Size is not the only factor that make models difficult to run, but also context length. (not_just_size, 2)
  • Smaller models (typical of open models) can suffer more from hallucination. (small_hallucination_problem, 2)

Support for Local

Local’s advantages

  • Arguments for economic advantages that local inference has, often citing fixed cost (buying a personal computer) vs. recurring costs (paying for a subscription or borrowing compute). (local_more_economical, 9)

Cloud’s disadvantages

  • Running models in the cloud does not get any cheaper, because we will continue consuming more tokens for “reasoning” and high-token tasks. (reasoning_token_expensive, 1)

How to make local more economical

  • Local inference is as cheap as the best deal you can get. (local_inference_cheap, 4)
  • Since you are borrowing someone else’s computer when using cloud compute, the persistence of storage and ability to extend the loan is not guaranteed. (persistence_extension_liability, 1) ^persistenceextensionliability1