1. LLM-Generated Knowledge Base

WEAVER uses Large Language Models (LLMs) to generate a knowledge base of concepts related to the testing task.

  • Seed Concept: The process begins with a user-provided “seed concept,” which is a high-level term representing the task (e.g., “online toxicity”).
  • Structured Querying: The tool iteratively prompts the LLM to list entities or concepts related to the seed.
  • ConceptNet Relations: To ensure the concepts are semantically meaningful, WEAVER uses 25 specific relations from ConceptNet (such as MotivatedBy, LocatedAt, or TypeOf) to structure the prompts. For example, it might ask the LLM, “List some types of online toxicity”.

2. Diverse and Relevant Recommendations

To avoid overwhelming the user with too much information, WEAVER uses a graph-based recommendation system to present a manageable subset of concepts.

  • Balancing Relevance and Diversity: The system aims to recommend concepts that are both relevant to the user’s query and diverse enough to offer new perspectives.
  • Scoring:
    • Relevance is measured using the perplexity of sentences connecting the concept to the query, calculated via GPT-2.
    • Diversity is measured by calculating the cosine distance between concept embeddings using SentenceBERT.
  • Selection Algorithm: It treats the selection process as a graph problem, attempting to find a subgraph that maximizes a weighted sum of diversity (edge weights) and relevance (node weights). It uses a greedy peeling algorithm to approximate this efficient subgraph in linear time.

3. Interactive User Interface

WEAVER provides a visual interface for users to navigate the generated knowledge base and move toward creating actual tests.

  • Tree Structure: The interface visualizes the knowledge base as a tree, starting with the seed concept.
  • Exploration: Users can expand nodes to see child concepts, select specific concepts to test, or manually add their own concepts.
  • Test Case Integration: Once a requirement (concept) is identified, WEAVER integrates with tools like AdaTest (which uses LLMs to suggest test cases) to help the user generate specific input-output pairs for testing the model.