We employed a modified version of Gauthier et al.’s “topic-guided thematic analysis” methodology, which uses topic modeling to purposively sample from a large pool of social data \cite{gauthier_2022}. In our initial round of coding, we focused on answering the first two research questions: how r/LocalLLaMA make sense of openness in AI, and what drives and deterrents exist in their adoption. In this process, we first read the top 50 posts of each topic, ordered by the number of comments they had. This resulted in a selection of six topics: -1, 1, 2, 7, 20, and 25.
We then conducted an additional round of coding focusing on posts that discuss development practices to answer our RQ3: How the community collectively solve problems that arise in the development of open AI systems? We included 17 more topics resulting in a total of 23 topics, and altogether, 134 posts and 2,270 comments were coded. Respectively, these samples cover 41% of all the topics, 0.2% of all the posts, and 2.1% of all the comments in the subreddit.
The comments within each post were coded according to the following comment-specific inclusion criteria:
\begin{itemize} \item Must actively participate in the thread’s discussion rather than merely be referential or off-topic. \item Must express a perspective, share an experience, or extend the conversation within the thread in a meaningful way. \item Must be intelligible and contextually grounded within the parent post or comment.
Similar to Gauthier et al.’s approach, we employed Braun and Clarke’s six-step reflexive thematic analysis approach \cite{braun_2006}. Familiarization with the data began during the topic modeling stage, where we iteratively adjusted parameters and qualitatively inspected outputs to arrive at a coherent and representative set of 56 topics. This process not only ensured topic quality but also deepened our engagement with the dataset. In both rounds of our coding process, the first 200 comments were used to generate initial codes, which served as the basis for constructing preliminary themes. These themes guided inductive coding for the remaining comments, with ongoing refinement—adding, removing, and merging themes—to best reflect the dataset. All comments associated with each theme were then re-analyzed to create subthemes. Finally, we clustered the subthemes to align with key aspects of our research questions, allowing us to trace patterns that cut across themes. The resulting subthemes and illustrative quotes form the backbone of the findings section.