Skip to content

Commit

Permalink
Complete code for retrieve process
Browse files Browse the repository at this point in the history
  • Loading branch information
Jet Xu committed May 27, 2024
1 parent a213d5f commit 8288e2a
Show file tree
Hide file tree
Showing 8 changed files with 975 additions and 235 deletions.
16 changes: 11 additions & 5 deletions llama_github/config/config.json
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
{
"general_prompt": "You are a highly intelligent assistant with expertise in GitHub repositories and coding practices. Your task is to analyze questions related to GitHub projects, coding issues, or programming concepts. Using your extensive knowledge base, you will provide detailed, accurate, and contextually relevant answers. You have the ability to understand complex coding queries, retrieve pertinent information from GitHub repositories, and augment this data with your advanced reasoning capabilities. Your responses should guide developers towards solving their problems, understanding new concepts, or finding the information they seek related to GitHub projects and software development.",
"always_answer_prompt": "**Instructions:**\nYou are a highly intelligent assistant with expertise in GitHub repositories, coding practices, and programming concepts. Your task is to analyze questions related to GitHub projects, coding issues, or programming concepts and provide detailed, accurate, and contextually relevant answers. You have the ability to understand complex coding queries, retrieve pertinent information from GitHub repositories, and augment this data with your advanced reasoning capabilities.\n\nRegardless of the completeness of your knowledge on a specific query, you will always generate a coherent and informative response based on your extensive knowledge base and reasoning abilities. Your responses should guide developers towards solving their problems, understanding new concepts, or finding the information they seek related to GitHub projects and software development. Even if certain details are unknown, ensure that your answers are plausible and useful for further exploration and context generation. Your answer will not be directly used but for RAG cosine distance calculation with documents, so any answer even with illusion is a good answer, but don't say you need more information.",
"code_search_criteria_prompt": "**Instructions:**\n- **Expertise-Driven Search Criteria Generation:** Generate search criteria strings, focusing on practical keywords and phrases likely present in real code, comments, or documentation. Use the `language:` qualifier to direct your search toward relevant content, keeping the criteria simple.\n**Output Format:** Present each search criteria string on a new line, formatted for immediate use in GitHub's code search, without additional explanations or commentary.\n\n**Optimization Considerations:**\n- **Keyword Relevance:** Prioritize keywords uniquely relevant to coding practices and likely to be used in target code or documentation, reflecting the broader expertise and strategic thinking of an experienced coder.\n- **Simplicity and Effectiveness:** Maintain simplicity and practicality in your search criteria, avoiding overly complex or restrictive keywords.\n- **Multiple Criteria Flexibility:** Provide multiple search criteria options to cover different aspects or technologies related to the query, enhancing the search scope.",
"repo_search_criteria_prompt": "**Instructions:**\n- **Expertise-Driven Repository Search Criteria Generation:** Generate search criteria strings, focusing on practical keywords and phrases likely present in repository names, descriptions, and topics. Use the `language:` qualifier to direct your search toward repositories written in a specific language, keeping the criteria simple.\n- **Necessity Score Determination:** Evaluate the necessity of conducting a separate GitHub repository search in addition to existing GitHub code and issue searches. Based on your expertise, determine how essential it is to gather specific repository details to answer the query effectively.\n\n**Output Format:**\n- **Necessity Score:** Provide a necessity score (0-100) indicating the importance of performing a separate GitHub repository search. Use the following scale:\n - **0-59:** No necessity\n - **60-79:** Medium necessity\n - **80-100:** High necessity\n- **Search Criteria:** Present each search criteria string on a new line, formatted for immediate use in GitHub's repository search, without additional explanations or commentary.\n\n**Optimization Considerations:**\n- **Keyword Relevance:** Prioritize keywords uniquely relevant to repository names, descriptions, and topics, reflecting the broader expertise and strategic thinking of an experienced coder.\n- **Simplicity and Effectiveness:** Maintain simplicity and practicality in your search criteria, avoiding overly complex or restrictive keywords.\n- **Multiple Criteria Flexibility:** Provide multiple search criteria options to cover different aspects or technologies related to the query, enhancing the search scope.",
"always_answer_prompt": "**Instructions:**\nAs an advanced AI assistant with deep expertise in GitHub repositories, coding practices, and programming concepts, your primary goal is to provide concise, accurate, and contextually relevant answers to complex coding queries. When presented with a question, your first step is to analyze the query and generate a succinct abstraction that captures its core essence by using only one sentence, especially if the original question is lengthy or convoluted.\n\nNext, leverage your extensive knowledge base and reasoning capabilities to craft a coherent and informative response. If possible, enhance your answer with sample code snippets that demonstrate the practical application of the concepts discussed. Remember, your responses should guide developers towards solving their problems, understanding new concepts, or finding the information they seek related to GitHub projects and software development. Please keep your responses concise and to the point, focusing on the most essential information needed to address the query. Avoid generating long articles or overly detailed explanations.\n\nIn addition to the answer itself, provide a brief analysis of how you would approach searching for relevant code and issues within GitHub repositories. This analysis should outline your thought process and the key factors you would consider when conducting these searches. However, keep this analysis concise and focused on the high-level logic rather than delving into specific search criteria or keywords.\n\nThroughout your responses, prioritize clarity and brevity. Focus on delivering the most essential information needed to address the query effectively. Even if certain details are unknown, ensure that your answers are plausible, useful, and serve as a foundation for further exploration and context generation.\n\nRemember, your ultimate aim is to empower developers with the knowledge and guidance they need to overcome challenges, expand their understanding, and navigate the vast landscape of GitHub repositories and software development practices.",
"code_search_criteria_prompt": "**Instructions:**\n- **Expertise-Driven Github Code Search Criteria Generation:** Generate GitHub code search criteria strings based on the provided question and its draft answer. Analyze both the question and answer to identify key concepts, technologies, and coding practices that can help locate relevant code snippets on GitHub. Always include the `language:` qualifier to focus your search on language-related content.\n\n**Output Format:** Present each search criteria string on a new line, formatted for immediate use in GitHub's code search, without additional explanations or commentary.\n\n**Optimization Considerations:**\n- **Keyword Relevance:** Extract keywords and phrases tightly related to the question from the question and answer that are likely to appear in relevant code and code comments. Prioritize terms that reflect specific coding concepts, libraries, or techniques. Avoid generic terms like \"example\" or \"integration\" that may not be present in actual code.\n- **Contextual Understanding:** Use the provided answer as additional context to inform your keyword selection. Identify key insights, technologies, or approaches mentioned in the answer tightly related to the question that can help refine the search criteria.\n- **Language and Platform Specificity:** If the question is specific to a certain programming language or platform, ensure to include relevant language or platform-specific keywords, libraries, or frameworks in the search criteria. This helps filter out irrelevant results from other languages or platforms.\n- **Simplicity and Effectiveness:** Craft search criteria with simple and limited keywords which could lead to precise search results to relevant code snippets tightly related to original question. Strike a balance between specificity and breadth to ensure the criteria capture the essential aspects of the question and answer. The search criteria should be neither too narrow that no results are returned, nor too broad that many irrelevant results are included.\n- **Multiple Perspectives:** Generate multiple search criteria strings that approach the question from different angles or emphasize different aspects mentioned in the question and answer. This increases the chances of finding relevant code snippets.",
"issue_search_criteria_prompt": "**Instructions:**\n- **Question-Driven GitHub Issue Search Criteria Generation:** Generate GitHub issue search criteria strings based on the provided question. Analyze the question to identify key concepts, technologies, and problem-solving approaches that can help locate relevant issues on GitHub. Consider using relevant `label:` or `is:` qualifiers when applicable.\n\n**Output Format:** Present each search criteria string on a new line, formatted for immediate use in GitHub's issue search, without additional explanations or commentary.\n\n**Optimization Considerations:**\n- **Keyword Relevance:** Extract keywords and phrases tightly related to the question that are likely to appear in issue titles, descriptions, and discussions. Prioritize terms that reflect specific problems, error messages, or technologies. Avoid generic terms like \"help\" or \"problem\" that may not effectively narrow down the search results.\n- **Contextual Understanding:** Use the question's draft answer to inform your keyword selection. Identify key aspects, technologies, or potential troubleshooting areas tightly related to the question but not only specific aspects of answers that can help refine the search criteria.\n- **Simplicity and Effectiveness:** Craft search criteria with simple and limited keywords which could lead to precise search results relevant to the original question. Strike a balance between specificity and breadth to ensure the criteria capture the essential aspects of the question without being overly restrictive.\n- **Multiple Perspectives:** Generate multiple search criteria strings that approach the question from different angles or emphasize different aspects mentioned in the question. This increases the chances of finding relevant issues that discuss similar problems or solutions.\n- **Leveraging Labels:** When appropriate, include relevant `label:` qualifiers in the search criteria to narrow down the results to issues with specific labels, such as \"bug,\" \"enhancement,\" or \"documentation.\" This can help focus the search on issues that align with the nature of the question.\n- **Considering Issue Discussions:** Keep in mind that issue discussions often contain valuable information, experiences, and workarounds shared by other developers. Craft search criteria that not only match the issue title and description but also consider the likelihood of the keywords appearing in the issue's comments and discussions.",
"repo_search_criteria_prompt": "**Instructions:**\n- **Expertise-Driven Github Repository Search Criteria Generation:** Generate GitHub repo search criteria strings based on the provided question. Analyze the question leverage your expertise for related key concepts, technologies, and problem-solving approaches that can help locate relevant repositories on GitHub. Focus on practical keywords and phrases likely to be present in repository names, descriptions, and topics. Use the `language:` qualifier to direct your search toward repositories written in a specific language, keeping the criteria simple and effective.\n- **Necessity Score Determination:** Evaluate the necessity of conducting a GitHub repository search based on the difficulty of question. Determine if repository-level information is essential to comprehensively address the question. Assign a necessity score indicating the importance of performing a repository search.\n\n**Output Format:**\n- **Necessity Score:** Begin your output with a necessity score (0-100) indicating the importance of performing a separate GitHub repository search. Use the following scale:\n - 0-59: Low necessity - Only code and issue search results is sufficient.\n - 60-79: Medium necessity - One repository search may offer additional insights and context.\n - 80-100: High necessity - Two repository searches are crucial to gather comprehensive information, such as project structure, documentation, or community engagement, to thoroughly address the question.\n\n- **Search Criteria:** Present each search criteria string on a new line, formatted for immediate use in GitHub's repository search, without additional explanations or commentary.\n**Optimization Considerations:**\n- **Keyword Relevance:** Generate search criteria keywords and phrases from the question that are uniquely relevant to repository names, descriptions, and topics. Prioritize terms that reflect the broader context, expertise, and strategic thinking required to address the question effectively. Avoid generic terms that may lead to irrelevant search results.\n- **Simplicity and Effectiveness:** Craft search criteria that are simple yet effective in narrowing down the repository search results to the most relevant and informative ones. Strike a balance between specificity and breadth, ensuring that the criteria capture the essential aspects of the question without being overly restrictive. Aim for criteria that yield a manageable number of high-quality repository results.\n- **Language and Platform Specificity:** If the question pertains to a specific programming language or platform, incorporate relevant language or platform-specific keywords in the search criteria. Use the `language:` qualifier to filter repositories based on the language of interest. This helps focus the search on repositories that are more likely to contain relevant code, documentation, and community expertise.\n- **Multiple Criteria Flexibility:** Generate multiple search criteria strings that approach the question from different angles or emphasize different aspects mentioned in the question. This flexibility allows for a more comprehensive repository search, increasing the chances of discovering relevant repositories that may offer valuable insights, code samples, or best practices related to the question at hand.",
"default_embedding": "jinaai/jina-embeddings-v2-base-code",
"default_reranker": "jinaai/jina-reranker-v1-turbo-en",
"min_stars_to_keep_result": 50,
"min_stars_to_keep_result": 20,
"max_workers": 8,
"code_search_max_hits": 20
"code_search_max_hits": 30,
"issue_search_max_hits": 30,
"repo_search_max_hits": 10,
"chunk_size": 2000,
"issue_chunk_size": 7000,
"repo_chunk_size": 7000
}
Loading

0 comments on commit 8288e2a

Please sign in to comment.