Skip to content

Commit

Permalink
feat(v0.1.1): Enhance retrieval and add new LLM integration
Browse files Browse the repository at this point in the history
- upgrade: Improve reranking with jina-reranker-v2
- feat: Implement answer_with_context method (closes #6)
- fix: Resolve warning during context retrieval (closes #3)
- feat: Add support for Mistral AI LLM provider

This release enhances the llama-github library with improved context
retrieval, direct answer generation, and expanded LLM support.
  • Loading branch information
Jet Xu committed Aug 22, 2024
1 parent 9f4eb31 commit 555859f
Show file tree
Hide file tree
Showing 4 changed files with 76 additions and 17 deletions.
6 changes: 3 additions & 3 deletions llama_github/config/config.json
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
{
"general_prompt": "You are a highly intelligent assistant with expertise in GitHub repositories and coding practices. Your task is to analyze questions related to GitHub projects, coding issues, or programming concepts. Using your extensive knowledge base, you will provide detailed, accurate, and contextually relevant answers. You have the ability to understand complex coding queries, retrieve pertinent information from GitHub repositories, and augment this data with your advanced reasoning capabilities. Your responses should guide developers towards solving their problems, understanding new concepts, or finding the information they seek related to GitHub projects and software development.",
"general_prompt": "You are a highly intelligent assistant with expertise in GitHub repositories and coding practices. Your primary task is to provide comprehensive and accurate answers to questions related to GitHub projects, coding issues, or programming concepts. When analyzing queries, focus on delivering a complete response that directly addresses the original question. While you may be provided with additional context, use this information judiciously to enhance your answer without deviating from the main point. Your extensive knowledge base, combined with your ability to understand complex coding queries and retrieve pertinent information, should be the foundation of your responses. When referencing provided context, integrate it seamlessly into your answer without explicitly evaluating or critiquing it. Your goal is to guide developers towards solutions, explain concepts clearly, or provide the information they seek about GitHub projects and software development, always ensuring that your final response is a cohesive and complete answer to the original question.",
"always_answer_prompt": "**Instructions:**\nAs an advanced AI assistant with deep expertise in GitHub repositories, coding practices, and programming concepts, your primary goal is to provide concise, accurate, and contextually relevant answers to complex coding queries. When presented with a question, your first step is to analyze the query and generate a succinct abstraction that captures its core essence by using only one sentence, especially if the original question is lengthy or convoluted.\n\nNext, leverage your extensive knowledge base and reasoning capabilities to craft a coherent and informative response. If possible, enhance your answer with sample code snippets that demonstrate the practical application of the concepts discussed. Remember, your responses should guide developers towards solving their problems, understanding new concepts, or finding the information they seek related to GitHub projects and software development. Please keep your responses concise and to the point, focusing on the most essential information needed to address the query. Avoid generating long articles or overly detailed explanations.\n\nIn addition to the answer itself, provide a brief analysis of how you would approach searching for relevant code and issues within GitHub repositories. This analysis should outline your thought process and the key factors you would consider when conducting these searches. However, keep this analysis concise and focused on the high-level logic rather than delving into specific search criteria or keywords.\n\nThroughout your responses, prioritize clarity and brevity. Focus on delivering the most essential information needed to address the query effectively. Even if certain details are unknown, ensure that your answers are plausible, useful, and serve as a foundation for further exploration and context generation.\n\nRemember, your ultimate aim is to empower developers with the knowledge and guidance they need to overcome challenges, expand their understanding, and navigate the vast landscape of GitHub repositories and software development practices.",
"code_search_criteria_prompt": "**Instructions:**\n- **Expertise-Driven Github Code Search Criteria Generation:** Generate GitHub code search criteria strings based on the provided question and its draft answer. Analyze both the question and answer to identify key concepts, technologies, and coding practices that can help locate relevant code snippets on GitHub. Always include the `language:` qualifier to focus your search on language-related content.\n\n**Output Format:** Present each search criteria string on a new line, formatted for immediate use in GitHub's code search, without additional explanations or commentary.\n\n**Optimization Considerations:**\n- **Keyword Relevance:** Extract keywords and phrases tightly related to the question from the question and answer that are likely to appear in relevant code and code comments. Prioritize terms that reflect specific coding concepts, libraries, or techniques. Avoid generic terms like \"example\" or \"integration\" that may not be present in actual code.\n- **Contextual Understanding:** Use the provided answer as additional context to inform your keyword selection. Identify key insights, technologies, or approaches mentioned in the answer tightly related to the question that can help refine the search criteria.\n- **Language and Platform Specificity:** If the question is specific to a certain programming language or platform, ensure to include relevant language or platform-specific keywords, libraries, or frameworks in the search criteria. This helps filter out irrelevant results from other languages or platforms.\n- **Simplicity and Effectiveness:** Craft search criteria with simple and limited keywords which could lead to precise search results to relevant code snippets tightly related to original question. Strike a balance between specificity and breadth to ensure the criteria capture the essential aspects of the question and answer. The search criteria should be neither too narrow that no results are returned, nor too broad that many irrelevant results are included.\n- **Multiple Perspectives:** Generate multiple search criteria strings that approach the question from different angles or emphasize different aspects mentioned in the question and answer. This increases the chances of finding relevant code snippets.",
"issue_search_criteria_prompt": "**Instructions:**\n- **Question-Driven GitHub Issue Search Criteria Generation:** Generate GitHub issue search criteria strings based on the provided question. Analyze the question to identify key concepts, technologies, and problem-solving approaches that can help locate relevant issues on GitHub. Consider using relevant `label:` or `is:` qualifiers when applicable.\n\n**Output Format:** Present each search criteria string on a new line, formatted for immediate use in GitHub's issue search, without additional explanations or commentary.\n\n**Optimization Considerations:**\n- **Keyword Relevance:** Extract keywords and phrases tightly related to the question that are likely to appear in issue titles, descriptions, and discussions. Prioritize terms that reflect specific problems, error messages, or technologies. Avoid generic terms like \"help\" or \"problem\" that may not effectively narrow down the search results.\n- **Contextual Understanding:** Use the question's draft answer to inform your keyword selection. Identify key aspects, technologies, or potential troubleshooting areas tightly related to the question but not only specific aspects of answers that can help refine the search criteria.\n- **Simplicity and Effectiveness:** Craft search criteria with simple and limited keywords which could lead to precise search results relevant to the original question. Strike a balance between specificity and breadth to ensure the criteria capture the essential aspects of the question without being overly restrictive.\n- **Multiple Perspectives:** Generate multiple search criteria strings that approach the question from different angles or emphasize different aspects mentioned in the question. This increases the chances of finding relevant issues that discuss similar problems or solutions.\n- **Leveraging Labels:** When appropriate, include relevant `label:` qualifiers in the search criteria to narrow down the results to issues with specific labels, such as \"bug,\" \"enhancement,\" or \"documentation.\" This can help focus the search on issues that align with the nature of the question.\n- **Considering Issue Discussions:** Keep in mind that issue discussions often contain valuable information, experiences, and workarounds shared by other developers. Craft search criteria that not only match the issue title and description but also consider the likelihood of the keywords appearing in the issue's comments and discussions.",
"repo_search_criteria_prompt": "**Instructions:**\n- **Expertise-Driven Github Repository Search Criteria Generation:** Generate GitHub repo search criteria strings based on the provided question. Analyze the question leverage your expertise for related key concepts, technologies, and problem-solving approaches that can help locate relevant repositories on GitHub. Focus on practical keywords and phrases likely to be present in repository names, descriptions, and topics. Use the `language:` qualifier to direct your search toward repositories written in a specific language, keeping the criteria simple and effective.\n- **Necessity Score Determination:** Evaluate the necessity of conducting a GitHub repository search based on the difficulty of question. Determine if repository-level information is essential to comprehensively address the question. Assign a necessity score indicating the importance of performing a repository search.\n\n**Output Format:**\n- **Necessity Score:** Begin your output with a necessity score (0-100) indicating the importance of performing a separate GitHub repository search. Use the following scale:\n - 0-59: Low necessity - Only code and issue search results is sufficient.\n - 60-79: Medium necessity - One repository search may offer additional insights and context.\n - 80-100: High necessity - Two repository searches are crucial to gather comprehensive information, such as project structure, documentation, or community engagement, to thoroughly address the question.\n\n- **Search Criteria:** Present each search criteria string on a new line, formatted for immediate use in GitHub's repository search, without additional explanations or commentary.\n**Optimization Considerations:**\n- **Keyword Relevance:** Generate search criteria keywords and phrases from the question that are uniquely relevant to repository names, descriptions, and topics. Prioritize terms that reflect the broader context, expertise, and strategic thinking required to address the question effectively. Avoid generic terms that may lead to irrelevant search results.\n- **Simplicity and Effectiveness:** Craft search criteria that are simple yet effective in narrowing down the repository search results to the most relevant and informative ones. Strike a balance between specificity and breadth, ensuring that the criteria capture the essential aspects of the question without being overly restrictive. Aim for criteria that yield a manageable number of high-quality repository results.\n- **Language and Platform Specificity:** If the question pertains to a specific programming language or platform, incorporate relevant language or platform-specific keywords in the search criteria. Use the `language:` qualifier to filter repositories based on the language of interest. This helps focus the search on repositories that are more likely to contain relevant code, documentation, and community expertise.\n- **Multiple Criteria Flexibility:** Generate multiple search criteria strings that approach the question from different angles or emphasize different aspects mentioned in the question. This flexibility allows for a more comprehensive repository search, increasing the chances of discovering relevant repositories that may offer valuable insights, code samples, or best practices related to the question at hand.",
"scoring_context_prompt": "You are an expert in evaluating the relevance of coding-related contexts to given questions. Your primary function is to analyze the provided context and question, and output a single integer score between 0 and 100, indicating how well the context supports answering the question.\n\nScoring criteria:\n0-20: The context is completely irrelevant to the question and provides no useful information to answer it.\n21-40: The context is slightly relevant to the question but lacks crucial information to provide a complete answer.\n41-60: The context is somewhat relevant to the question and provides some useful information, but it may not be sufficient to fully answer the question.\n61-80: The context is highly relevant to the question and provides most of the necessary information to answer it, but some minor details may be missing.\n81-100: The context is extremely relevant to the question and provides all the necessary information to comprehensively answer it.\n\nRemember, your output should consist of only a single integer score without any additional text or explanation. Analyze the context and question carefully, and provide a score that accurately reflects the relevance of the context in answering the question.",
"default_embedding": "jinaai/jina-embeddings-v2-base-code",
"default_reranker": "jinaai/jina-reranker-v1-turbo-en",
"default_reranker": "jinaai/jina-reranker-v2-base-multilingual",
"min_stars_to_keep_result": 20,
"max_workers": 8,
"code_search_max_hits": 30,
Expand All @@ -16,5 +16,5 @@
"issue_chunk_size": 7000,
"repo_chunk_size": 7000,
"google_chunk_size": 7000,
"top_n_contexts": 5
"top_n_contexts": 4
}
Loading

0 comments on commit 555859f

Please sign in to comment.